I have finished pushing the changes I have mentioned below. Hopefully things will improve. Again, I am sorry for leaving Racket in such a red state for the past couple of weeks. :)
On 10/04/2019 09:14, 'Paulo Matos' via Racket Developers wrote: > > > On 09/04/2019 19:44, Alexis King wrote: >> Hi Paulo, >> > > Hi Alexis, > >> The work you’re doing is really cool, though I admit most of it is over my >> head. Thank you for putting in the time to set it all up. One thing I have >> noticed, however, is that the GitLab pipeline seems to almost always fail or >> timeout, which causes almost every commit on the commits page of the GitHub >> repo[1] to be marked with a loud, red failure indicator. >> > > Thanks for your email. It is correct what you say and this is an issue > close to my heart that I wanted to see sorted. Finally this email is the > poke that will get me to sort these out. Apologies I haven't done it > earlier. > >> I don’t understand what you’re doing well enough to say whether or not this >> is because something is going wrong in the CI scripts itself or because they >> are (correctly) detecting that Racket doesn’t currently support some of the >> tested architectures. But in either case, while the testing of those >> architectures is very nice to have, it seems extreme to cause the whole >> commit to be marked as a failure every time for things that (correct me if >> I’m wrong) seem unlikely to be changed/fixed in the immediate future. >> > > There are a few issues with compiling in other archs that's what these > jobs capture. #2018 is one of the main issues and I have been looking at > it with Matthew and Sam, but it's turning out to be a major pain. Other > archs reveal similar behaviour. > > As you say, this shouldn't cause commits to get the red-cross. > >> For the Travis builds, we have a job that tests RacketCS, which currently >> always fails, but we have the CI configured to ignore the failure of that >> particular job when deciding whether or not to say the overall commit >> passed. Is there some way something similar could be done with the GitLab >> pipeline? Running all those jobs is valuable, in the same way that the >> RacketCS build is, it’d just be nice to avoid making the at-a-glance commit >> status meaningless. And just as we will surely promote the RacketCS job from >> an “allowed failure” to an ordinary job once it passes consistently, we >> would of course do the same for the various architecture jobs as well. >> > > Yes, that's partially the solution. Currently I don't have enough > machines or AWS time to dedicate to Racket builds so I will instead do > the following straight away: > - Regularly failing jobs will be marked as 'can fail', until they don't > fail anymore and then I will remove the flag. > - Move long running jobs or jobs for which I don't have straightaway > enough machines available, to run nightly only. > > In the long term I would like CI jobs to finish in a respectable time: > <1h or even <30mins. I would like all archs tested and no failures. This > will take some time but we'll get there. > >> Thanks, >> Alexis >> > > Thanks for the suggestions and the poke. Now I am off to make racket > green again. > >> [1]: https://github.com/racket/racket/commits/master >> >>> On Apr 2, 2019, at 02:59, 'Paulo Matos' via Racket Developers >>> <[email protected]> wrote: >>> >>> Hello, >>> >>> Short Summary: I have added in 35d269c29 [1] cross architectural testing >>> using virtualized qemu machines. There are problems - we need to fix those. >>> >>> Long Story: >>> >>> For months now, I have been wishing I could get cross-arch testing done >>> on a regular basis on Racket. Initially I had something setup privately >>> for RISC-V but I quickly noticed that the framework could be extended to >>> other architectures. >>> >>> Thanks to Sam I got permission to get gitlab.com/racket/racket setup and >>> get things moving. It took a couple of months to get everything right. >>> Not necessarily due to inherent CI problems but I had to report a couple >>> of Gitlab issues first, debug qemu as well and setup a few of my >>> machines for this. >>> >>> The important things are: >>> - with testing running on gitlab, people who would like to contribute >>> CPU time to Racket can do so by setting up a gitlab runner on said >>> machine (contact me for help). Because Gitlab CI free machines have a >>> maximum timeout that's enough for normal testing but not enough for >>> virtualization I needed to add some extra machines to do these specific >>> jobs. Besides the Gitlab CI machines, we have a 4 CPU x86_64, a 16 CPU >>> x86_64 and a rpi3 running in my server room. Of course, with more >>> machines, more tests can run simultaneously and therefore provide >>> quicker feedback. >>> - Matthew pointed to me a few archs Racket should support so I added those: >>> Testing added for Racket: >>> Native: armv7l (running on rpi3), x86_64 >>> Emulated: arm64, armel, armhf, i386, mips, mips64el, mipsel, ppc64el, >>> s390x >>> >>> Testing added for Racket CS: >>> Native: x86_64 >>> Emulation: i386 >>> >>> - There are problems and initially because so many of the architectures >>> fail either to compile or to test I assumed that this was a qemu bug. >>> Since I am not a virtualization expert it took me a few days and some >>> help from the qemu people to setup an environment to debug qemu inside a >>> chroot inside a docker container running racket in a different arch. >>> Afer some analysis, it turned out the segfault during compilation was >>> definitely coming from Racket [5]. In a discussion with Matthew he >>> proposed I could disable generational GC to ease debugging of the >>> problem. Turns out disabling it, caused the sigsegv not to occur any >>> more. So, at this point I think we are in the realm of a problem in >>> Racket. I haven't gotten to the bottom of this yet, but hopefully when I >>> do we can get all the lights green in the cross-arch testing. >>> >>> There are a few things I would like to do in the future like running >>> benchmarks on a regular basis on Racket and RacketCS and have these >>> displayed on a dashboard but these will come later. First I would like >>> look into these failures which might be related to #2018 [2] and #1749 [3]. >>> >>> Lastly, this is another opportunity to help fix some Racket issues and >>> get involved. If you are into different archs, debugging and >>> contributing take a look at the logs coming out of the pipelines [4]. >>> >>> If you need some help or clarification on any of this, let me know. >>> >>> [1] >>> https://github.com/racket/racket/commit/35d269c29eee6f6f7f3f83ea6f01b92ae1db180a >>> [2] https://github.com/racket/racket/issues/2018 >>> [3] https://github.com/racket/racket/issues/1749 >>> [4] https://gitlab.com/racket/racket/pipelines/ >>> [5] https://gitlab.com/racket/racket/-/jobs/188658454 >>> >>> -- >>> Paulo Matos >> > -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/c3d97f47-7784-8d3e-3e47-aeb3d5b2b4ca%40linki.tools. For more options, visit https://groups.google.com/d/optout.
