I have finished pushing the changes I have mentioned below.

Hopefully things will improve. Again, I am sorry for leaving Racket in
such a red state for the past couple of weeks. :)

On 10/04/2019 09:14, 'Paulo Matos' via Racket Developers wrote:
> 
> 
> On 09/04/2019 19:44, Alexis King wrote:
>> Hi Paulo,
>>
> 
> Hi Alexis,
> 
>> The work you’re doing is really cool, though I admit most of it is over my 
>> head. Thank you for putting in the time to set it all up. One thing I have 
>> noticed, however, is that the GitLab pipeline seems to almost always fail or 
>> timeout, which causes almost every commit on the commits page of the GitHub 
>> repo[1] to be marked with a loud, red failure indicator.
>>
> 
> Thanks for your email. It is correct what you say and this is an issue
> close to my heart that I wanted to see sorted. Finally this email is the
> poke that will get me to sort these out. Apologies I haven't done it
> earlier.
> 
>> I don’t understand what you’re doing well enough to say whether or not this 
>> is because something is going wrong in the CI scripts itself or because they 
>> are (correctly) detecting that Racket doesn’t currently support some of the 
>> tested architectures. But in either case, while the testing of those 
>> architectures is very nice to have, it seems extreme to cause the whole 
>> commit to be marked as a failure every time for things that (correct me if 
>> I’m wrong) seem unlikely to be changed/fixed in the immediate future.
>>
> 
> There are a few issues with compiling in other archs that's what these
> jobs capture. #2018 is one of the main issues and I have been looking at
> it with Matthew and Sam, but it's turning out to be a major pain. Other
> archs reveal similar behaviour.
> 
> As you say, this shouldn't cause commits to get the red-cross.
> 
>> For the Travis builds, we have a job that tests RacketCS, which currently 
>> always fails, but we have the CI configured to ignore the failure of that 
>> particular job when deciding whether or not to say the overall commit 
>> passed. Is there some way something similar could be done with the GitLab 
>> pipeline? Running all those jobs is valuable, in the same way that the 
>> RacketCS build is, it’d just be nice to avoid making the at-a-glance commit 
>> status meaningless. And just as we will surely promote the RacketCS job from 
>> an “allowed failure” to an ordinary job once it passes consistently, we 
>> would of course do the same for the various architecture jobs as well.
>>
> 
> Yes, that's partially the solution. Currently I don't have enough
> machines or AWS time to dedicate to Racket builds so I will instead do
> the following straight away:
> - Regularly failing jobs will be marked as 'can fail', until they don't
> fail anymore and then I will remove the flag.
> - Move long running jobs or jobs for which I don't have straightaway
> enough machines available, to run nightly only.
> 
> In the long term I would like CI jobs to finish in a respectable time:
> <1h or even <30mins. I would like all archs tested and no failures. This
> will take some time but we'll get there.
> 
>> Thanks,
>> Alexis
>>
> 
> Thanks for the suggestions and the poke. Now I am off to make racket
> green again.
> 
>> [1]: https://github.com/racket/racket/commits/master
>>
>>> On Apr 2, 2019, at 02:59, 'Paulo Matos' via Racket Developers 
>>> <[email protected]> wrote:
>>>
>>> Hello,
>>>
>>> Short Summary: I have added in 35d269c29 [1] cross architectural testing
>>> using virtualized qemu machines. There are problems - we need to fix those.
>>>
>>> Long Story:
>>>
>>> For months now, I have been wishing I could get cross-arch testing done
>>> on a regular basis on Racket. Initially I had something setup privately
>>> for RISC-V but I quickly noticed that the framework could be extended to
>>> other architectures.
>>>
>>> Thanks to Sam I got permission to get gitlab.com/racket/racket setup and
>>> get things moving. It took a couple of months to get everything right.
>>> Not necessarily due to inherent CI problems but I had to report a couple
>>> of Gitlab issues first, debug qemu as well and setup a few of my
>>> machines for this.
>>>
>>> The important things are:
>>> - with testing running on gitlab, people who would like to contribute
>>> CPU time to Racket can do so by setting up a gitlab runner on said
>>> machine (contact me for help). Because Gitlab CI free machines have a
>>> maximum timeout that's enough for normal testing but not enough for
>>> virtualization I needed to add some extra machines to do these specific
>>> jobs. Besides the Gitlab CI machines, we have a 4 CPU x86_64, a 16 CPU
>>> x86_64 and a rpi3 running in my server room. Of course, with more
>>> machines, more tests can run simultaneously and therefore provide
>>> quicker feedback.
>>> - Matthew pointed to me a few archs Racket should support so I added those:
>>>     Testing added for Racket:
>>>     Native: armv7l (running on rpi3), x86_64
>>>     Emulated: arm64, armel, armhf, i386, mips, mips64el, mipsel, ppc64el, 
>>> s390x
>>>
>>>     Testing added for Racket CS:
>>>     Native: x86_64
>>>     Emulation: i386
>>>
>>> - There are problems and initially because so many of the architectures
>>> fail either to compile or to test I assumed that this was a qemu bug.
>>> Since I am not a virtualization expert it took me a few days and some
>>> help from the qemu people to setup an environment to debug qemu inside a
>>> chroot inside a docker container running racket in a different arch.
>>> Afer some analysis, it turned out the segfault during compilation was
>>> definitely coming from Racket [5]. In a discussion with Matthew he
>>> proposed I could disable generational GC to ease debugging of the
>>> problem. Turns out disabling it, caused the sigsegv not to occur any
>>> more. So, at this point I think we are in the realm of a problem in
>>> Racket. I haven't gotten to the bottom of this yet, but hopefully when I
>>> do we can get all the lights green in the cross-arch testing.
>>>
>>> There are a few things I would like to do in the future like running
>>> benchmarks on a regular basis on Racket and RacketCS and have these
>>> displayed on a dashboard but these will come later.  First I would like
>>> look into these failures which might be related to #2018 [2] and #1749 [3].
>>>
>>> Lastly, this is another opportunity to help fix some Racket issues and
>>> get involved. If you are into different archs, debugging and
>>> contributing take a look at the logs coming out of the pipelines [4].
>>>
>>> If you need some help or clarification on any of this, let me know.
>>>
>>> [1]
>>> https://github.com/racket/racket/commit/35d269c29eee6f6f7f3f83ea6f01b92ae1db180a
>>> [2] https://github.com/racket/racket/issues/2018
>>> [3] https://github.com/racket/racket/issues/1749
>>> [4] https://gitlab.com/racket/racket/pipelines/
>>> [5] https://gitlab.com/racket/racket/-/jobs/188658454
>>>
>>> -- 
>>> Paulo Matos
>>
> 

-- 
Paulo Matos

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-dev/c3d97f47-7784-8d3e-3e47-aeb3d5b2b4ca%40linki.tools.
For more options, visit https://groups.google.com/d/optout.

Reply via email to