On 09/04/2019 19:44, Alexis King wrote:
> Hi Paulo,
> 

Hi Alexis,

> The work you’re doing is really cool, though I admit most of it is over my 
> head. Thank you for putting in the time to set it all up. One thing I have 
> noticed, however, is that the GitLab pipeline seems to almost always fail or 
> timeout, which causes almost every commit on the commits page of the GitHub 
> repo[1] to be marked with a loud, red failure indicator.
> 

Thanks for your email. It is correct what you say and this is an issue
close to my heart that I wanted to see sorted. Finally this email is the
poke that will get me to sort these out. Apologies I haven't done it
earlier.

> I don’t understand what you’re doing well enough to say whether or not this 
> is because something is going wrong in the CI scripts itself or because they 
> are (correctly) detecting that Racket doesn’t currently support some of the 
> tested architectures. But in either case, while the testing of those 
> architectures is very nice to have, it seems extreme to cause the whole 
> commit to be marked as a failure every time for things that (correct me if 
> I’m wrong) seem unlikely to be changed/fixed in the immediate future.
> 

There are a few issues with compiling in other archs that's what these
jobs capture. #2018 is one of the main issues and I have been looking at
it with Matthew and Sam, but it's turning out to be a major pain. Other
archs reveal similar behaviour.

As you say, this shouldn't cause commits to get the red-cross.

> For the Travis builds, we have a job that tests RacketCS, which currently 
> always fails, but we have the CI configured to ignore the failure of that 
> particular job when deciding whether or not to say the overall commit passed. 
> Is there some way something similar could be done with the GitLab pipeline? 
> Running all those jobs is valuable, in the same way that the RacketCS build 
> is, it’d just be nice to avoid making the at-a-glance commit status 
> meaningless. And just as we will surely promote the RacketCS job from an 
> “allowed failure” to an ordinary job once it passes consistently, we would of 
> course do the same for the various architecture jobs as well.
> 

Yes, that's partially the solution. Currently I don't have enough
machines or AWS time to dedicate to Racket builds so I will instead do
the following straight away:
- Regularly failing jobs will be marked as 'can fail', until they don't
fail anymore and then I will remove the flag.
- Move long running jobs or jobs for which I don't have straightaway
enough machines available, to run nightly only.

In the long term I would like CI jobs to finish in a respectable time:
<1h or even <30mins. I would like all archs tested and no failures. This
will take some time but we'll get there.

> Thanks,
> Alexis
>

Thanks for the suggestions and the poke. Now I am off to make racket
green again.

> [1]: https://github.com/racket/racket/commits/master
> 
>> On Apr 2, 2019, at 02:59, 'Paulo Matos' via Racket Developers 
>> <[email protected]> wrote:
>>
>> Hello,
>>
>> Short Summary: I have added in 35d269c29 [1] cross architectural testing
>> using virtualized qemu machines. There are problems - we need to fix those.
>>
>> Long Story:
>>
>> For months now, I have been wishing I could get cross-arch testing done
>> on a regular basis on Racket. Initially I had something setup privately
>> for RISC-V but I quickly noticed that the framework could be extended to
>> other architectures.
>>
>> Thanks to Sam I got permission to get gitlab.com/racket/racket setup and
>> get things moving. It took a couple of months to get everything right.
>> Not necessarily due to inherent CI problems but I had to report a couple
>> of Gitlab issues first, debug qemu as well and setup a few of my
>> machines for this.
>>
>> The important things are:
>> - with testing running on gitlab, people who would like to contribute
>> CPU time to Racket can do so by setting up a gitlab runner on said
>> machine (contact me for help). Because Gitlab CI free machines have a
>> maximum timeout that's enough for normal testing but not enough for
>> virtualization I needed to add some extra machines to do these specific
>> jobs. Besides the Gitlab CI machines, we have a 4 CPU x86_64, a 16 CPU
>> x86_64 and a rpi3 running in my server room. Of course, with more
>> machines, more tests can run simultaneously and therefore provide
>> quicker feedback.
>> - Matthew pointed to me a few archs Racket should support so I added those:
>>      Testing added for Racket:
>>      Native: armv7l (running on rpi3), x86_64
>>      Emulated: arm64, armel, armhf, i386, mips, mips64el, mipsel, ppc64el, 
>> s390x
>>
>>      Testing added for Racket CS:
>>      Native: x86_64
>>      Emulation: i386
>>
>> - There are problems and initially because so many of the architectures
>> fail either to compile or to test I assumed that this was a qemu bug.
>> Since I am not a virtualization expert it took me a few days and some
>> help from the qemu people to setup an environment to debug qemu inside a
>> chroot inside a docker container running racket in a different arch.
>> Afer some analysis, it turned out the segfault during compilation was
>> definitely coming from Racket [5]. In a discussion with Matthew he
>> proposed I could disable generational GC to ease debugging of the
>> problem. Turns out disabling it, caused the sigsegv not to occur any
>> more. So, at this point I think we are in the realm of a problem in
>> Racket. I haven't gotten to the bottom of this yet, but hopefully when I
>> do we can get all the lights green in the cross-arch testing.
>>
>> There are a few things I would like to do in the future like running
>> benchmarks on a regular basis on Racket and RacketCS and have these
>> displayed on a dashboard but these will come later.  First I would like
>> look into these failures which might be related to #2018 [2] and #1749 [3].
>>
>> Lastly, this is another opportunity to help fix some Racket issues and
>> get involved. If you are into different archs, debugging and
>> contributing take a look at the logs coming out of the pipelines [4].
>>
>> If you need some help or clarification on any of this, let me know.
>>
>> [1]
>> https://github.com/racket/racket/commit/35d269c29eee6f6f7f3f83ea6f01b92ae1db180a
>> [2] https://github.com/racket/racket/issues/2018
>> [3] https://github.com/racket/racket/issues/1749
>> [4] https://gitlab.com/racket/racket/pipelines/
>> [5] https://gitlab.com/racket/racket/-/jobs/188658454
>>
>> -- 
>> Paulo Matos
> 

-- 
Paulo Matos

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-dev/e8c850b5-1c06-945b-1449-3c62796b71a3%40linki.tools.
For more options, visit https://groups.google.com/d/optout.

Reply via email to