I have finished pushing the changes I have mentioned below. Hopefully things will improve. Again, I am sorry for leaving Racket in such a red state for the past couple of weeks. :)
On 10/04/2019 09:14, 'Paulo Matos' via Racket Developers wrote: > > > On 09/04/2019 19:44, Alexis King wrote: >> Hi Paulo, >> > > Hi Alexis, > >> The work you’re doing is really cool, though I admit most of it is over my >> head. Thank you for putting in the time to set it all up. One thing I have >> noticed, however, is that the GitLab pipeline seems to almost always fail or >> timeout, which causes almost every commit on the commits page of the GitHub >> repo[1] to be marked with a loud, red failure indicator. >> > > Thanks for your email. It is correct what you say and this is an issue > close to my heart that I wanted to see sorted. Finally this email is the > poke that will get me to sort these out. Apologies I haven't done it > earlier. > >> I don’t understand what you’re doing well enough to say whether or not this >> is because something is going wrong in the CI scripts itself or because they >> are (correctly) detecting that Racket doesn’t currently support some of the >> tested architectures. But in either case, while the testing of those >> architectures is very nice to have, it seems extreme to cause the whole >> commit to be marked as a failure every time for things that (correct me if >> I’m wrong) seem unlikely to be changed/fixed in the immediate future. >> > > There are a few issues with compiling in other archs that's what these > jobs capture. #2018 is one of the main issues and I have been looking at > it with Matthew and Sam, but it's turning out to be a major pain. Other > archs reveal similar behaviour. > > As you say, this shouldn't cause commits to get the red-cross. > >> For the Travis builds, we have a job that tests RacketCS, which currently >> always fails, but we have the CI configured to ignore the failure of that >> particular job when deciding whether or not to say the overall commit >> passed. Is there some way something similar could be done with the GitLab >> pipeline? Running all those jobs is valuable, in the same way that the >> RacketCS build is, it’d just be nice to avoid making the at-a-glance commit >> status meaningless. And just as we will surely promote the RacketCS job from >> an “allowed failure” to an ordinary job once it passes consistently, we >> would of course do the same for the various architecture jobs as well. >> > > Yes, that's partially the solution. Currently I don't have enough > machines or AWS time to dedicate to Racket builds so I will instead do > the following straight away: > - Regularly failing jobs will be marked as 'can fail', until they don't > fail anymore and then I will remove the flag. > - Move long running jobs or jobs for which I don't have straightaway > enough machines available, to run nightly only. > > In the long term I would like CI jobs to finish in a respectable time: > <1h or even <30mins. I would like all archs tested and no failures. This > will take some time but we'll get there. > >> Thanks, >> Alexis >> > > Thanks for the suggestions and the poke. Now I am off to make racket > green again. > >> [1]: https://github.com/racket/racket/commits/master >> >>> On Apr 2, 2019, at 02:59, 'Paulo Matos' via Racket Developers >>> <racket-dev@googlegroups.com> wrote: >>> >>> Hello, >>> >>> Short Summary: I have added in 35d269c29 [1] cross architectural testing >>> using virtualized qemu machines. There are problems - we need to fix those. >>> >>> Long Story: >>> >>> For months now, I have been wishing I could get cross-arch testing done >>> on a regular basis on Racket. Initially I had something setup privately >>> for RISC-V but I quickly noticed that the framework could be extended to >>> other architectures. >>> >>> Thanks to Sam I got permission to get gitlab.com/racket/racket setup and >>> get things moving. It took a couple of months to get everything right. >>> Not necessarily due to inherent CI problems but I had to report a couple >>> of Gitlab issues first, debug qemu as well and setup a few of my >>> machines for this. >>> >>> The important things are: >>> - with testing running on gitlab, people who would like to contribute >>> CPU time to Racket can do so by setting up a gitlab runner on said >>> machine (contact me for help). Because Gitlab CI free machines have a >>> maximum timeout that's enough for normal testing but not enough for >>> virtualization I needed to add some extra machines to do these specific >>> jobs. Besides the Gitlab CI machines, we have a 4 CPU x86_64, a 16 CPU >>> x86_64 and a rpi3 running in my server room. Of course, with more >>> machines, more tests can run simultaneously and therefore provide >>> quicker feedback. >>> - Matthew pointed to me a few archs Racket should support so I added those: >>> Testing added for Racket: >>> Native: armv7l (running on rpi3), x86_64 >>> Emulated: arm64, armel, armhf, i386, mips, mips64el, mipsel, ppc64el, >>> s390x >>> >>> Testing added for Racket CS: >>> Native: x86_64 >>> Emulation: i386 >>> >>> - There are problems and initially because so many of the architectures >>> fail either to compile or to test I assumed that this was a qemu bug. >>> Since I am not a virtualization expert it took me a few days and some >>> help from the qemu people to setup an environment to debug qemu inside a >>> chroot inside a docker container running racket in a different arch. >>> Afer some analysis, it turned out the segfault during compilation was >>> definitely coming from Racket [5]. In a discussion with Matthew he >>> proposed I could disable generational GC to ease debugging of the >>> problem. Turns out disabling it, caused the sigsegv not to occur any >>> more. So, at this point I think we are in the realm of a problem in >>> Racket. I haven't gotten to the bottom of this yet, but hopefully when I >>> do we can get all the lights green in the cross-arch testing. >>> >>> There are a few things I would like to do in the future like running >>> benchmarks on a regular basis on Racket and RacketCS and have these >>> displayed on a dashboard but these will come later. First I would like >>> look into these failures which might be related to #2018 [2] and #1749 [3]. >>> >>> Lastly, this is another opportunity to help fix some Racket issues and >>> get involved. If you are into different archs, debugging and >>> contributing take a look at the logs coming out of the pipelines [4]. >>> >>> If you need some help or clarification on any of this, let me know. >>> >>> [1] >>> https://github.com/racket/racket/commit/35d269c29eee6f6f7f3f83ea6f01b92ae1db180a >>> [2] https://github.com/racket/racket/issues/2018 >>> [3] https://github.com/racket/racket/issues/1749 >>> [4] https://gitlab.com/racket/racket/pipelines/ >>> [5] https://gitlab.com/racket/racket/-/jobs/188658454 >>> >>> -- >>> Paulo Matos >> > -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+unsubscr...@googlegroups.com. To post to this group, send email to racket-dev@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/c3d97f47-7784-8d3e-3e47-aeb3d5b2b4ca%40linki.tools. For more options, visit https://groups.google.com/d/optout.