subject:"On CI"

Re: Acknowledgement of CI instability

2024-07-23 Thread Moritz Angermann

Hi,

As I’ve now arrived in Seoul and have my Mac’s back, I can bring two more
M1 Mac’s back online today. However they are hooked up on a serviced
apartment internet which might not be the fastest.

Best,
 Moritz

On Tue, 23 Jul 2024 at 11:36 PM, Sam Derbyshire 
wrote:

> Hi all,
>
> The GHC team would like to acknowledge some ongoing CI instabilities in
> the GHC project:
>
>   - regular failures of the i386 job, possibly related to the bump to
> debian 12
> <https://gitlab.haskell.org/ghc/ghc/-/commit/203830065b81fe29003c1640a354f11661ffc604>
> (although the job still failed occasionally before this),
>   - flakiness of the MultiLayerModulesDefsGhciReload test causing the
> fedora33-release job to fail,
>   - lack of availability of darwin runners, causing aarch64-darwin and
> x86_64-darwin jobs to time out.
>
> These issues are currently causing a string of marge batch failures,
> holding up several MRs.
>
> We are currently short on resources for addressing problems with CI, so
> please bear with us while we sort the situation out. In the meantime, feel
> free to let us know of any other CI issues that are impacting your work on
> GHC.
>
> Best,
>
> Sam
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Acknowledgement of CI instability

2024-07-23 Thread Sam Derbyshire

Hi all,

The GHC team would like to acknowledge some ongoing CI instabilities in the
GHC project:

  - regular failures of the i386 job, possibly related to the bump to
debian 12
<https://gitlab.haskell.org/ghc/ghc/-/commit/203830065b81fe29003c1640a354f11661ffc604>
(although the job still failed occasionally before this),
  - flakiness of the MultiLayerModulesDefsGhciReload test causing the
fedora33-release job to fail,
  - lack of availability of darwin runners, causing aarch64-darwin and
x86_64-darwin jobs to time out.

These issues are currently causing a string of marge batch failures,
holding up several MRs.

We are currently short on resources for addressing problems with CI, so
please bear with us while we sort the situation out. In the meantime, feel
free to let us know of any other CI issues that are impacting your work on
GHC.

Best,

Sam
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI stuck?

2024-07-12 Thread Sylvain Henry


Hi Simon,

There was a MR failing with the JS job [1,2]. I've fixed it 1 hour ago 
so it should pass now.


Sylvain

[1] https://gitlab.haskell.org/ghc/ghc/-/merge_requests/13025#note_575683
[2] https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12991#note_575777


On 12/07/2024 09:58, Simon Peyton Jones wrote:

Dear GHC devs

Is GHC's CI stuck in some way?  My !12928 has been scheduled by Marge 
over 10 times now, and each time the commit has failed. Ten seems...  
a lot.


Thanks

Simon

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

CI stuck?

2024-07-12 Thread Simon Peyton Jones

Dear GHC devs

Is GHC's CI stuck in some way?  My !12928 has been scheduled by Marge over
10 times now, and each time the commit has failed.  Ten seems...  a lot.

Thanks

Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

GitLab CI happenings in July

2023-07-03 Thread Bryan Richter via ghc-devs

Hello,

This is the first (and, perhaps, last[1]) monthly update on GitLab CI. This
month in particular deserves its own little email for the following reasons:

1. Some of the Darwin runners recently spontaneously self-upgraded,
introducing toolchain changes that broke CI. All the fixes are now on GHC's
master branch. This leaves us with two choices:

A) Re-enable the affected runners now. All current patches will have a 50%
of failing CI because the old problems are still present. Users (you) can
rebase your patches to avoid this problem.

B) Wait before re-enabling the runners. All in-flight MRs have a better
chance of getting green CI and/or organically getting rebased for other
reasons. However, Darwin capacity would remain at 50% for longer, slowing
down all pipelines.

My current plan is to wait one week before re-enabling the runners, but
ultimately it's not my call. Opinions welcome.

2. I will be on vacation from July 17 to July 28 (weeks 29 and 30). Please
tell Marge to be good while I am away.

3. GitLab was recently upgraded. Please do not be alarmed by any UI changes.

Enjoy!

-Bryan

[1]: Intuitively, I like the idea of giving monthly updates about GitLab
CI. But I don't know if it will be practical or valuable. I'll take a look
in a month to see if there's anything notable to write about again.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI sad face

2023-06-30 Thread Bryan Richter via ghc-devs

Final update: both problems are solved!

Now we just need to wait for the wheels of time to do their magic. The
final patch is still waiting to get picked up and merged.

The queue for CI jobs is still a bit longer than usual right now, but I
think it's legitimate. There are simply more open MRs on GitLab than usual,
which is a good thing. (Darwin jobs aren't the source of the bottleneck.)

-Bryan

P.S. A quick shoutout to Marge for preventing two patches that merged
cleanly but created invalid results from making their way into the master
branch.

On Wed, 28 Jun 2023 at 09:20, Bryan Richter 
wrote:

> Nice!
>
> Other good news is that I lost track of all the Mac runners we actually
> have, and our current capacity is actually 3/6 rather than 1/4.
>
> On Wed, 28 Jun 2023 at 09:15, Rodrigo Mesquita <
> rodrigo.m.mesqu...@gmail.com> wrote:
>
>> The root of the second problem was !10723, which started failing on its
>> own pipeline after being rebased.
>> I’m pushing a fix.
>>
>> - Rodrigo
>>
>> On 28 Jun 2023, at 06:41, Bryan Richter via ghc-devs <
>> ghc-devs@haskell.org> wrote:
>>
>> Two things are negatively impacting GHC CI right now:
>>
>> Darwin runner capacity is down to one machine, since the other three are
>> paused. The problem and solution are known[1], but until the fix is
>> implemented in GHC, expect pipelines to get backed up. I will work on a
>> patch this morning
>>
>> [1]: https://gitlab.haskell.org/ghc/ghc/-/issues/23561
>>
>> The other problem is one I just noticed, and I don't have any good info
>> about it yet. The symptom is that Marge batch merges are failing reliably.
>> Three patches that do fine individually somehow cause a type error in the
>> hadrian-ghc-in-ghci job when combined[2]. The only clue is the error
>> itself, which complains of an out-of-scope data constructor
>> "ArchJavaScript" in the file compiler/GHC/Driver/Main.hs. A cursory look at
>> the individual patches doesn't shed any light. I just rebased all of them
>> to see if I can shake the error out of them that way. Any knowledge that
>> can be brought to bear would be appreciated
>>
>> [2]:
>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10745#note_507418
>>
>> -Bryan
>> ___
>> ghc-devs mailing list
>> ghc-devs@haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
>>
>>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI sad face

2023-06-28 Thread Bryan Richter via ghc-devs

Nice!

Other good news is that I lost track of all the Mac runners we actually
have, and our current capacity is actually 3/6 rather than 1/4.

On Wed, 28 Jun 2023 at 09:15, Rodrigo Mesquita 
wrote:

> The root of the second problem was !10723, which started failing on its
> own pipeline after being rebased.
> I’m pushing a fix.
>
> - Rodrigo
>
> On 28 Jun 2023, at 06:41, Bryan Richter via ghc-devs 
> wrote:
>
> Two things are negatively impacting GHC CI right now:
>
> Darwin runner capacity is down to one machine, since the other three are
> paused. The problem and solution are known[1], but until the fix is
> implemented in GHC, expect pipelines to get backed up. I will work on a
> patch this morning
>
> [1]: https://gitlab.haskell.org/ghc/ghc/-/issues/23561
>
> The other problem is one I just noticed, and I don't have any good info
> about it yet. The symptom is that Marge batch merges are failing reliably.
> Three patches that do fine individually somehow cause a type error in the
> hadrian-ghc-in-ghci job when combined[2]. The only clue is the error
> itself, which complains of an out-of-scope data constructor
> "ArchJavaScript" in the file compiler/GHC/Driver/Main.hs. A cursory look at
> the individual patches doesn't shed any light. I just rebased all of them
> to see if I can shake the error out of them that way. Any knowledge that
> can be brought to bear would be appreciated
>
> [2]: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10745#note_507418
>
> -Bryan
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI sad face

2023-06-28 Thread Rodrigo Mesquita

The root of the second problem was !10723, which started failing on its own 
pipeline after being rebased.
I’m pushing a fix.

- Rodrigo

> On 28 Jun 2023, at 06:41, Bryan Richter via ghc-devs  
> wrote:
> 
> Two things are negatively impacting GHC CI right now:
> 
> Darwin runner capacity is down to one machine, since the other three are 
> paused. The problem and solution are known[1], but until the fix is 
> implemented in GHC, expect pipelines to get backed up. I will work on a patch 
> this morning
> 
> [1]: https://gitlab.haskell.org/ghc/ghc/-/issues/23561
> 
> The other problem is one I just noticed, and I don't have any good info about 
> it yet. The symptom is that Marge batch merges are failing reliably. Three 
> patches that do fine individually somehow cause a type error in the 
> hadrian-ghc-in-ghci job when combined[2]. The only clue is the error itself, 
> which complains of an out-of-scope data constructor "ArchJavaScript" in the 
> file compiler/GHC/Driver/Main.hs. A cursory look at the individual patches 
> doesn't shed any light. I just rebased all of them to see if I can shake the 
> error out of them that way. Any knowledge that can be brought to bear would 
> be appreciated
> 
> [2]: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10745#note_507418
> 
> -Bryan
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

CI sad face

2023-06-27 Thread Bryan Richter via ghc-devs

Two things are negatively impacting GHC CI right now:

Darwin runner capacity is down to one machine, since the other three are
paused. The problem and solution are known[1], but until the fix is
implemented in GHC, expect pipelines to get backed up. I will work on a
patch this morning

[1]: https://gitlab.haskell.org/ghc/ghc/-/issues/23561

The other problem is one I just noticed, and I don't have any good info
about it yet. The symptom is that Marge batch merges are failing reliably.
Three patches that do fine individually somehow cause a type error in the
hadrian-ghc-in-ghci job when combined[2]. The only clue is the error
itself, which complains of an out-of-scope data constructor
"ArchJavaScript" in the file compiler/GHC/Driver/Main.hs. A cursory look at
the individual patches doesn't shed any light. I just rebased all of them
to see if I can shake the error out of them that way. Any knowledge that
can be brought to bear would be appreciated

[2]: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10745#note_507418

-Bryan
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI

2023-03-19 Thread Bryan Richter via ghc-devs

Hoo boy... now I've fixed an even *bigger* problem that is pretty
embarrassing.

https://gitlab.haskell.org/ghc/ghc-perf-import/-/commit/67238099e9c3478ba591080c6e582985c62b83c0

It's time to give this service a proper testsuite.

On Sun, 19 Mar 2023 at 14:42, Bryan Richter 
wrote:

> I did find that some jobs were being retried repeatedly. I have deployed a
> workaround to prevent this from continuing. The problem was related to
> T18623 behaving strangely, and I've opened
> https://gitlab.haskell.org/ghc/ghc/-/issues/23139 so GHC devs can have a
> look at it.
>
> On Sun, 19 Mar 2023 at 13:27, Bryan Richter 
> wrote:
>
>> I'm back at my computer and am investigating. I don't see the problem I
>> feared, but I do see some anomalies. I'll update once it's back to normal.
>>
>> On Sat, 18 Mar 2023 at 17:55, Bryan Richter 
>> wrote:
>>
>>> I'm away from my computer for the day, but yes there were some jobs that
>>> got stuck in a restart loop. See
>>> https://gitlab.haskell.org/ghc/ghc/-/issues/23094#note_487426 .
>>> Unfortunately I don't know if there are others, but I did fix the root
>>> cause of that particular loop.
>>>
>>> On Sat, 18 Mar 2023, 15.06 Sam Derbyshire, 
>>> wrote:
>>>
>>>> I think there's a problem with jobs restarting, on my renamer MR
>>>> <https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8686> there were
>>>> 5 full pipelines running at once. I had to cancel some of them, but also it
>>>> seems some got cancelled by some new CI pipelines restarting.
>>>>
>>>> On Sat, 18 Mar 2023 at 13:59, Simon Peyton Jones <
>>>> simon.peytonjo...@gmail.com> wrote:
>>>>
>>>>> All GHC CI pipelines seem stalled, sadly
>>>>>
>>>>> e.g.
>>>>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10123/pipelines
>>>>>
>>>>> Can someone unglue it?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Simon
>>>>> ___
>>>>> ghc-devs mailing list
>>>>> ghc-devs@haskell.org
>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>>>
>>>>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI

2023-03-19 Thread Bryan Richter via ghc-devs

I did find that some jobs were being retried repeatedly. I have deployed a
workaround to prevent this from continuing. The problem was related to
T18623 behaving strangely, and I've opened
https://gitlab.haskell.org/ghc/ghc/-/issues/23139 so GHC devs can have a
look at it.

On Sun, 19 Mar 2023 at 13:27, Bryan Richter 
wrote:

> I'm back at my computer and am investigating. I don't see the problem I
> feared, but I do see some anomalies. I'll update once it's back to normal.
>
> On Sat, 18 Mar 2023 at 17:55, Bryan Richter 
> wrote:
>
>> I'm away from my computer for the day, but yes there were some jobs that
>> got stuck in a restart loop. See
>> https://gitlab.haskell.org/ghc/ghc/-/issues/23094#note_487426 .
>> Unfortunately I don't know if there are others, but I did fix the root
>> cause of that particular loop.
>>
>> On Sat, 18 Mar 2023, 15.06 Sam Derbyshire, 
>> wrote:
>>
>>> I think there's a problem with jobs restarting, on my renamer MR
>>> <https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8686> there were 5
>>> full pipelines running at once. I had to cancel some of them, but also it
>>> seems some got cancelled by some new CI pipelines restarting.
>>>
>>> On Sat, 18 Mar 2023 at 13:59, Simon Peyton Jones <
>>> simon.peytonjo...@gmail.com> wrote:
>>>
>>>> All GHC CI pipelines seem stalled, sadly
>>>>
>>>> e.g.
>>>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10123/pipelines
>>>>
>>>> Can someone unglue it?
>>>>
>>>> Thanks!
>>>>
>>>> Simon
>>>> ___
>>>> ghc-devs mailing list
>>>> ghc-devs@haskell.org
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>>
>>>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI

2023-03-19 Thread Bryan Richter via ghc-devs

Spam detection software, running on the system "mail.haskell.org", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  I'm back at my computer and am investigating. I don't see
  the problem I feared, but I do see some anomalies. I'll update once it's back
   to normal. On Sat, 18 Mar 2023 at 17:55, Bryan Richter 

   wrote: [...] 

Content analysis details:   (5.8 points, 5.0 required)

 pts rule name  description
 -- --
-0.0 SPF_PASS   SPF: sender matches SPF record
 5.0 UNWANTED_LANGUAGE_BODY BODY: Message written in an undesired language
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.8 BAYES_50   BODY: Bayes spam probability is 40 to 60%
[score: 0.5000]
 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid

The original message was not completely plain text, and may be unsafe to
open with some email clients; in particular, it may contain a virus,
or confirm that your address can receive spam.  If you wish to view
it, it may be safer to save it to a file and open it with an editor.

--- Begin Message ---
I'm back at my computer and am investigating. I don't see the problem I
feared, but I do see some anomalies. I'll update once it's back to normal.

On Sat, 18 Mar 2023 at 17:55, Bryan Richter 
wrote:

> I'm away from my computer for the day, but yes there were some jobs that
> got stuck in a restart loop. See
> https://gitlab.haskell.org/ghc/ghc/-/issues/23094#note_487426 .
> Unfortunately I don't know if there are others, but I did fix the root
> cause of that particular loop.
>
> On Sat, 18 Mar 2023, 15.06 Sam Derbyshire, 
> wrote:
>
>> I think there's a problem with jobs restarting, on my renamer MR
>> <https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8686> there were 5
>> full pipelines running at once. I had to cancel some of them, but also it
>> seems some got cancelled by some new CI pipelines restarting.
>>
>> On Sat, 18 Mar 2023 at 13:59, Simon Peyton Jones <
>> simon.peytonjo...@gmail.com> wrote:
>>
>>> All GHC CI pipelines seem stalled, sadly
>>>
>>> e.g. https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10123/pipelines
>>>
>>> Can someone unglue it?
>>>
>>> Thanks!
>>>
>>> Simon
>>> ___
>>> ghc-devs mailing list
>>> ghc-devs@haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>
>>
--- End Message ---
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI

2023-03-18 Thread Bryan Richter via ghc-devs

I'm away from my computer for the day, but yes there were some jobs that
got stuck in a restart loop. See
https://gitlab.haskell.org/ghc/ghc/-/issues/23094#note_487426 .
Unfortunately I don't know if there are others, but I did fix the root
cause of that particular loop.

On Sat, 18 Mar 2023, 15.06 Sam Derbyshire,  wrote:

> I think there's a problem with jobs restarting, on my renamer MR
> <https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8686> there were 5
> full pipelines running at once. I had to cancel some of them, but also it
> seems some got cancelled by some new CI pipelines restarting.
>
> On Sat, 18 Mar 2023 at 13:59, Simon Peyton Jones <
> simon.peytonjo...@gmail.com> wrote:
>
>> All GHC CI pipelines seem stalled, sadly
>>
>> e.g. https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10123/pipelines
>>
>> Can someone unglue it?
>>
>> Thanks!
>>
>> Simon
>> ___
>> ghc-devs mailing list
>> ghc-devs@haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI

2023-03-18 Thread Sam Derbyshire

I think there's a problem with jobs restarting, on my renamer MR
<https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8686> there were 5
full pipelines running at once. I had to cancel some of them, but also it
seems some got cancelled by some new CI pipelines restarting.

On Sat, 18 Mar 2023 at 13:59, Simon Peyton Jones <
simon.peytonjo...@gmail.com> wrote:

> All GHC CI pipelines seem stalled, sadly
>
> e.g. https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10123/pipelines
>
> Can someone unglue it?
>
> Thanks!
>
> Simon
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

CI

2023-03-18 Thread Simon Peyton Jones

All GHC CI pipelines seem stalled, sadly

e.g. https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10123/pipelines

Can someone unglue it?

Thanks!

Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Coordination on FreeBSD CI, default WinIO and Info Tables profiling work

2023-03-15 Thread Hécate


Hi everyone,

I have created topical aggregators of tickets that go beyond the rhythm 
of releases (aka. "epics") for  the following topics:


* Info Tables Profiling: https://gitlab.haskell.org/groups/ghc/-/epics/3

* Setting WinIO "on" by default: 
https://gitlab.haskell.org/groups/ghc/-/epics/4


* FreeBSD CI revival: https://gitlab.haskell.org/groups/ghc/-/epics/5

These epics have no deadline and their purpose is to track the evolution 
of our workload for certain "big" tasks that go beyond a single ticket.


They are also useful as they are a (albeit imprecise) tool to help 
determine after the fact the magnitude of a project and the efforts it 
took. This will certainly be helpful for future estimations.


And finally their prime purpose is to enable more awareness about our 
co-contributors work, so that we get all a better sense of what it takes 
to do certain things. :)


Please do feel free to create your own for projects that are not fit for 
a single milestone (or are not related to release milestones at all).



Cheers,
Hécate

--
Hécate ✨
: @TechnoEmpress
IRC: Hecate
WWW: https://glitchbra.in
RUN: BSD

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Overnight CI failures

2022-09-30 Thread Bryan Richter via ghc-devs

(Adding ghc-devs)

Are these fragile tests?

1. T14346 got a "bad file descriptor" on Darwin
2. linker_unload got some gold errors on Linux

Neither of these have been reported to me before, so I don't know much
about them. Nor have I looked deeply (or at all) at the tests themselves,
yet.

On Thu, Sep 29, 2022 at 3:37 PM Simon Peyton Jones <
simon.peytonjo...@gmail.com> wrote:

> Bryan
>
> These failed overnight
>
> On !8897
>
>- https://gitlab.haskell.org/ghc/ghc/-/jobs/1185519
>- https://gitlab.haskell.org/ghc/ghc/-/jobs/1185520
>
> I think it's extremely unlikely that this had anything to do with my patch.
>
> Simon
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-29 Thread Cheng Shao

When hadrian builds the binary-dist job, invoking tar and xz is
already the last step and there'll be no other ongoing jobs. But I do
agree with reverting, this minor optimization I proposed has caused
more trouble than its worth :/

On Thu, Sep 29, 2022 at 9:25 AM Bryan Richter  wrote:
>
> Matthew pointed out that the build system already parallelizes jobs, so it's 
> risky to force parallelization of any individual job. That means I should 
> just revert.
>
> On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao  wrote:
>>
>> I believe we can either modify ci.sh to disable parallel compression
>> for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
>> XZ_OPT=-9 for i386.
>>
>> On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter  
>> wrote:
>> >
>> > Aha: while i386-linux-deb9-validate sets no extra XZ options, 
>> > nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
>> >
>> > A revert would fix the problem, but presumably so would tweaking that 
>> > option. Does anyone have information that would lead to a better decision 
>> > here?
>> >
>> >
>> > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
>> >>
>> >> Sure, in which case pls revert it. Apologies for the impact, though
>> >> I'm still a bit curious, the i386 job did pass in the original MR.
>> >>
>> >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  
>> >> wrote:
>> >> >
>> >> > Yep, it seems to mostly be xz that is running out of memory. (All 
>> >> > recent builds that I sampled, but not all builds through all time.) 
>> >> > Thanks for pointing it out!
>> >> >
>> >> > I can revert the change.
>> >> >
>> >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>> >> >>
>> >> >> Hi Bryan,
>> >> >>
>> >> >> This may be an unintended fallout of !8940. Would you try starting an
>> >> >> i386 pipeline with it reversed to see if it solves the issue, in which
>> >> >> case we should revert or fix it in master?
>> >> >>
>> >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>> >> >>  wrote:
>> >> >> >
>> >> >> > Hi all,
>> >> >> >
>> >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been 
>> >> >> > failing consistently.
>> >> >> >
>> >> >> > They show up on the failure dashboard because the logs contain the 
>> >> >> > phrase "Cannot allocate memory".
>> >> >> >
>> >> >> > I haven't looked yet to see if they always fail in the same place, 
>> >> >> > but I'll do that soon. The first example I looked at, however, has 
>> >> >> > the line "xz: (stdin): Cannot allocate memory", so it's not GHC 
>> >> >> > (alone) causing the problem.
>> >> >> >
>> >> >> > As a consequence of showing up on the dashboard, the jobs get 
>> >> >> > restarted. Since they fail consistently, they keep getting 
>> >> >> > restarted. Since the jobs keep getting restarted, the pipelines stay 
>> >> >> > alive. When I checked just now, there were 8 nightly runs still 
>> >> >> > running. :) Thus I'm going to cancel the still-running 
>> >> >> > nightly-i386-linux-deb9-validate jobs and let the pipelines die in 
>> >> >> > peace. You can still find all examples of failed jobs on the 
>> >> >> > dashboard:
>> >> >> >
>> >> >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >> >> >
>> >> >> > To prevent future problems, it would be good if someone could help 
>> >> >> > me look into this. Otherwise I'll just disable the job. :(
>> >> >> > ___
>> >> >> > ghc-devs mailing list
>> >> >> > ghc-devs@haskell.org
>> >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-29 Thread Bryan Richter via ghc-devs

Matthew pointed out that the build system already parallelizes jobs, so
it's risky to force parallelization of any individual job. That means I
should just revert.

On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao  wrote:

> I believe we can either modify ci.sh to disable parallel compression
> for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
> XZ_OPT=-9 for i386.
>
> On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter 
> wrote:
> >
> > Aha: while i386-linux-deb9-validate sets no extra XZ options,
> nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
> >
> > A revert would fix the problem, but presumably so would tweaking that
> option. Does anyone have information that would lead to a better decision
> here?
> >
> >
> > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
> >>
> >> Sure, in which case pls revert it. Apologies for the impact, though
> >> I'm still a bit curious, the i386 job did pass in the original MR.
> >>
> >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter 
> wrote:
> >> >
> >> > Yep, it seems to mostly be xz that is running out of memory. (All
> recent builds that I sampled, but not all builds through all time.) Thanks
> for pointing it out!
> >> >
> >> > I can revert the change.
> >> >
> >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao 
> wrote:
> >> >>
> >> >> Hi Bryan,
> >> >>
> >> >> This may be an unintended fallout of !8940. Would you try starting an
> >> >> i386 pipeline with it reversed to see if it solves the issue, in
> which
> >> >> case we should revert or fix it in master?
> >> >>
> >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
> >> >>  wrote:
> >> >> >
> >> >> > Hi all,
> >> >> >
> >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >> >> >
> >> >> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >> >> >
> >> >> > I haven't looked yet to see if they always fail in the same place,
> but I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >> >> >
> >> >> > As a consequence of showing up on the dashboard, the jobs get
> restarted. Since they fail consistently, they keep getting restarted. Since
> the jobs keep getting restarted, the pipelines stay alive. When I checked
> just now, there were 8 nightly runs still running. :) Thus I'm going to
> cancel the still-running nightly-i386-linux-deb9-validate jobs and let the
> pipelines die in peace. You can still find all examples of failed jobs on
> the dashboard:
> >> >> >
> >> >> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >> >> >
> >> >> > To prevent future problems, it would be good if someone could help
> me look into this. Otherwise I'll just disable the job. :(
> >> >> > ___
> >> >> > ghc-devs mailing list
> >> >> > ghc-devs@haskell.org
> >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao

I believe we can either modify ci.sh to disable parallel compression
for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
XZ_OPT=-9 for i386.

On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter  wrote:
>
> Aha: while i386-linux-deb9-validate sets no extra XZ options, 
> nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
>
> A revert would fix the problem, but presumably so would tweaking that option. 
> Does anyone have information that would lead to a better decision here?
>
>
> On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
>>
>> Sure, in which case pls revert it. Apologies for the impact, though
>> I'm still a bit curious, the i386 job did pass in the original MR.
>>
>> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  
>> wrote:
>> >
>> > Yep, it seems to mostly be xz that is running out of memory. (All recent 
>> > builds that I sampled, but not all builds through all time.) Thanks for 
>> > pointing it out!
>> >
>> > I can revert the change.
>> >
>> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>> >>
>> >> Hi Bryan,
>> >>
>> >> This may be an unintended fallout of !8940. Would you try starting an
>> >> i386 pipeline with it reversed to see if it solves the issue, in which
>> >> case we should revert or fix it in master?
>> >>
>> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>> >>  wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > For the past week or so, nightly-i386-linux-deb9-validate has been 
>> >> > failing consistently.
>> >> >
>> >> > They show up on the failure dashboard because the logs contain the 
>> >> > phrase "Cannot allocate memory".
>> >> >
>> >> > I haven't looked yet to see if they always fail in the same place, but 
>> >> > I'll do that soon. The first example I looked at, however, has the line 
>> >> > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing 
>> >> > the problem.
>> >> >
>> >> > As a consequence of showing up on the dashboard, the jobs get 
>> >> > restarted. Since they fail consistently, they keep getting restarted. 
>> >> > Since the jobs keep getting restarted, the pipelines stay alive. When I 
>> >> > checked just now, there were 8 nightly runs still running. :) Thus I'm 
>> >> > going to cancel the still-running nightly-i386-linux-deb9-validate jobs 
>> >> > and let the pipelines die in peace. You can still find all examples of 
>> >> > failed jobs on the dashboard:
>> >> >
>> >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >> >
>> >> > To prevent future problems, it would be good if someone could help me 
>> >> > look into this. Otherwise I'll just disable the job. :(
>> >> > ___
>> >> > ghc-devs mailing list
>> >> > ghc-devs@haskell.org
>> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs

Aha: while i386-linux-deb9-validate sets no extra XZ options,
*nightly*-i386-linux-deb9-validate
(the failing job) sets "XZ_OPT = 9".

A revert would fix the problem, but presumably so would tweaking that
option. Does anyone have information that would lead to a better decision
here?


On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:

> Sure, in which case pls revert it. Apologies for the impact, though
> I'm still a bit curious, the i386 job did pass in the original MR.
>
> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter 
> wrote:
> >
> > Yep, it seems to mostly be xz that is running out of memory. (All recent
> builds that I sampled, but not all builds through all time.) Thanks for
> pointing it out!
> >
> > I can revert the change.
> >
> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
> >>
> >> Hi Bryan,
> >>
> >> This may be an unintended fallout of !8940. Would you try starting an
> >> i386 pipeline with it reversed to see if it solves the issue, in which
> >> case we should revert or fix it in master?
> >>
> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
> >>  wrote:
> >> >
> >> > Hi all,
> >> >
> >> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >> >
> >> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >> >
> >> > I haven't looked yet to see if they always fail in the same place,
> but I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >> >
> >> > As a consequence of showing up on the dashboard, the jobs get
> restarted. Since they fail consistently, they keep getting restarted. Since
> the jobs keep getting restarted, the pipelines stay alive. When I checked
> just now, there were 8 nightly runs still running. :) Thus I'm going to
> cancel the still-running nightly-i386-linux-deb9-validate jobs and let the
> pipelines die in peace. You can still find all examples of failed jobs on
> the dashboard:
> >> >
> >> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >> >
> >> > To prevent future problems, it would be good if someone could help me
> look into this. Otherwise I'll just disable the job. :(
> >> > ___
> >> > ghc-devs mailing list
> >> > ghc-devs@haskell.org
> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao

Sure, in which case pls revert it. Apologies for the impact, though
I'm still a bit curious, the i386 job did pass in the original MR.

On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  wrote:
>
> Yep, it seems to mostly be xz that is running out of memory. (All recent 
> builds that I sampled, but not all builds through all time.) Thanks for 
> pointing it out!
>
> I can revert the change.
>
> On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>>
>> Hi Bryan,
>>
>> This may be an unintended fallout of !8940. Would you try starting an
>> i386 pipeline with it reversed to see if it solves the issue, in which
>> case we should revert or fix it in master?
>>
>> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>>  wrote:
>> >
>> > Hi all,
>> >
>> > For the past week or so, nightly-i386-linux-deb9-validate has been failing 
>> > consistently.
>> >
>> > They show up on the failure dashboard because the logs contain the phrase 
>> > "Cannot allocate memory".
>> >
>> > I haven't looked yet to see if they always fail in the same place, but 
>> > I'll do that soon. The first example I looked at, however, has the line 
>> > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the 
>> > problem.
>> >
>> > As a consequence of showing up on the dashboard, the jobs get restarted. 
>> > Since they fail consistently, they keep getting restarted. Since the jobs 
>> > keep getting restarted, the pipelines stay alive. When I checked just now, 
>> > there were 8 nightly runs still running. :) Thus I'm going to cancel the 
>> > still-running nightly-i386-linux-deb9-validate jobs and let the pipelines 
>> > die in peace. You can still find all examples of failed jobs on the 
>> > dashboard:
>> >
>> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >
>> > To prevent future problems, it would be good if someone could help me look 
>> > into this. Otherwise I'll just disable the job. :(
>> > ___
>> > ghc-devs mailing list
>> > ghc-devs@haskell.org
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs

Yep, it seems to mostly be xz that is running out of memory. (All recent
builds that I sampled, but not all builds through all time.) Thanks for
pointing it out!

I can revert the change.

On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:

> Hi Bryan,
>
> This may be an unintended fallout of !8940. Would you try starting an
> i386 pipeline with it reversed to see if it solves the issue, in which
> case we should revert or fix it in master?
>
> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>  wrote:
> >
> > Hi all,
> >
> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >
> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >
> > I haven't looked yet to see if they always fail in the same place, but
> I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >
> > As a consequence of showing up on the dashboard, the jobs get restarted.
> Since they fail consistently, they keep getting restarted. Since the jobs
> keep getting restarted, the pipelines stay alive. When I checked just now,
> there were 8 nightly runs still running. :) Thus I'm going to cancel the
> still-running nightly-i386-linux-deb9-validate jobs and let the pipelines
> die in peace. You can still find all examples of failed jobs on the
> dashboard:
> >
> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >
> > To prevent future problems, it would be good if someone could help me
> look into this. Otherwise I'll just disable the job. :(
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao

Hi Bryan,

This may be an unintended fallout of !8940. Would you try starting an
i386 pipeline with it reversed to see if it solves the issue, in which
case we should revert or fix it in master?

On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
 wrote:
>
> Hi all,
>
> For the past week or so, nightly-i386-linux-deb9-validate has been failing 
> consistently.
>
> They show up on the failure dashboard because the logs contain the phrase 
> "Cannot allocate memory".
>
> I haven't looked yet to see if they always fail in the same place, but I'll 
> do that soon. The first example I looked at, however, has the line "xz: 
> (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem.
>
> As a consequence of showing up on the dashboard, the jobs get restarted. 
> Since they fail consistently, they keep getting restarted. Since the jobs 
> keep getting restarted, the pipelines stay alive. When I checked just now, 
> there were 8 nightly runs still running. :) Thus I'm going to cancel the 
> still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die 
> in peace. You can still find all examples of failed jobs on the dashboard:
>
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>
> To prevent future problems, it would be good if someone could help me look 
> into this. Otherwise I'll just disable the job. :(
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs

Hi all,

For the past week or so, nightly-i386-linux-deb9-validate has been failing
consistently.

They show up on the failure dashboard because the logs contain the phrase
"Cannot allocate memory".

I haven't looked yet to see if they always fail in the same place, but I'll
do that soon. The first example I looked at, however, has the line "xz:
(stdin): Cannot allocate memory", so it's not GHC (alone) causing the
problem.

As a consequence of showing up on the dashboard, the jobs get restarted.
Since they fail consistently, they keep getting restarted. Since the jobs
keep getting restarted, the pipelines stay alive. When I checked just now,
there were 8 nightly runs still running. :) Thus I'm going to cancel the
still-running nightly-i386-linux-deb9-validate jobs and let the pipelines
die in peace. You can still find all examples of failed jobs on the
dashboard:

https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate

To prevent future problems, it would be good if someone could help me look
into this. Otherwise I'll just disable the job. :(
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

GHC compiler perf CI

2022-09-11 Thread Simon Peyton Jones

Dear devs

I used to use the build *x86_64-linux-deb10-int_native-validate* as the
place to look for compiler/bytes-allocated changes in perf/compiler.  But
now it doesn't show those results any more, only runtime/bytes-allocated in
perf/should_run.


   - Should we not run perf/compiler in every build?
   - Why has it gone from the build above
   - Which build should I look at to see perf/compiler data

Thanks

Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI failures

2022-09-05 Thread Ben Gamari

Ben Gamari  writes:

> Simon Peyton Jones  writes:
>
>> Matthew, Ben, Bryan
>>
>> CI is failing in in "lint-ci-config"..
>>
>> See https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8916
>> or https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7847
>>
> I'll investigate.
>
I believe this should be fixed by !8943. Perhaps you could try rebasing
on top of this?

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI failures

2022-09-05 Thread Ben Gamari

Simon Peyton Jones  writes:

> Matthew, Ben, Bryan
>
> CI is failing in in "lint-ci-config"..
>
> See https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8916
> or https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7847
>
I'll investigate.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

CI failures

2022-09-05 Thread Simon Peyton Jones

Matthew, Ben, Bryan

CI is failing in in "lint-ci-config"..

See https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8916
or https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7847

What's up?

Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Tracking intermittently failing CI jobs

2022-07-12 Thread Bryan Richter via ghc-devs

Hello again,

Thanks to everyone who pointed out spurious failures over the last few
weeks. Here's the current state of affairs and some discussion on next
steps.

*
*

*Dashboard
***

I made a dashboard for tracking spurious failures:

https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2

I created this for three reasons:

1. Keep tabs on new occurrences of spurious failures
2. Understand which problems are causing the most issues
3. Measure the effectiveness of any intervention

The dashboard still needs development, but it can already be used to
show that the number of "Cannot connect to Docker daemon" failures has
been reduced.

*
*

*Characterizing and Fixing Failures*

I have preliminary results on a few failure types. For instance, I used
the "docker" type of failure to bootstrap the dashboard. Along with
"Killed with signal 9", it seems to indicate a problem with the CI
runner, itself.

To look more deeply into these types of runner-system failures, *I will
need more access*. If you are responsible for some runners and you're
comfortable giving me shell access, you can find my public ssh key at
https://gitlab.haskell.org/-/snippets/5546. (Posted as a snippet so at
least you know the key comes from somebody who can access my GitLab
account. Other secure means of communication are listed at
https://keybase.io/chreekat.) Send me a message if you do so.

Besides runner problems, there are spurious failures that may have more
to do with the CI code, itself. They include some problem with
environment variables and (probably) some issue with console buffering.
Neither of these are being tracked on the dashboard yet. Many other
problems are yet to be explored at all.

*Next Steps*

The theme for the next steps is finalizing the dashboard and
characterizing more failures.

* Track more failure types on the dashboard
* Improve the process of backfilling failure data on the dashboard
* Include more metadata (like project id!) on the dashboard so it's
easier to zoom on failures
* Document the dashboard and the processes that populate it for posterity
* Diagnose runner-system failures (if accessible)
* Continue exploring other failure types
* Fix failures omg!?

The list of next steps is currently heavy on finalizing the dashboard
and light on fixing spurious failures. I know that might be frustrating.
My justification is that CI is a complex hardware/software/human system
under continuous operation where most the low-hanging fruit have already
been plucked. It's time to get serious. :) My goal is to make spurious
failures surprising rather than commonplace. This is the best way I know
to achieve that.

Thanks again for helping me with this goal. :)

-Bryan

P.S. If you're interested, I've been posting updates like this one on
Discourse:

https://discourse.haskell.org/search?q=DevOps%20Weekly%20Log%20%23haskell-foundation%20order%3Alatest_topic

On 18/05/2022 13:25, Bryan wrote:

Hi all,

I'd like to get some data on weird CI failures. Before clicking
"retry" on a spurious failure, please paste the url for your job into
the spreadsheet you'll find linked at
https://gitlab.haskell.org/ghc/ghc/-/issues/21591.

Sorry for the slight misdirection. I wanted the spreadsheet to be
world-writable, which means I don't want its url floating around in
too many places. Maybe you can bookmark it if CI is causing you too
much trouble. :)

-Bryan

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Tracking intermittently failing CI jobs

2022-05-18 Thread Bryan

Hi all,

I'd like to get some data on weird CI failures. Before clicking "retry" on a 
spurious failure, please paste the url for your job into the spreadsheet you'll 
find linked at https://gitlab.haskell.org/ghc/ghc/-/issues/21591.

Sorry for the slight misdirection. I wanted the spreadsheet to be 
world-writable, which means I don't want its url floating around in too many 
places. Maybe you can bookmark it if CI is causing you too much trouble. :)

-Bryan___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Windows CI instability

2022-04-01 Thread Matthew Pickering

Hi all,

Currently the windows CI issue is experiencing high amounts of
instability so if your patch fails for this reason then don't worry.
We are attempting to fix it.

Cheers,

Matt
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

White space in CI

2022-02-01 Thread Simon Peyton Jones

 Devs

As you'll see from this pipeline record
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7105/pipelines

CI consistently fails once a single commit has trailing whitespace, even if
it is fixed in a subsequent commit

   - dce2054d
   
<https://gitlab.haskell.org/ghc/ghc/-/commit/dce2054d44ea60bdde6409050284fbbcc227457a>
   introduced trailing whitespace
   - 6411223c
   
<https://gitlab.haskell.org/ghc/ghc/-/commit/6411223cd3977c92d01b09b55a455d8d86adde1d>
   removed it again.
   - but all subsequent pipelines fail

This came as a big surprise.  It doesn't make sense to lint each individual
commit.  Let's just lint the final version!  (I will squash them in due
course, but I didn't want to lose my work-in-progress history.)

Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI: Choice of base commit for perf comparisons

2021-12-22 Thread Joachim Breitner

Thanks! I like it when my feature suggestions are implemented even before I 
voice them ;-)

22.12.2021 14:13:24 Richard Eisenberg :

> It seems to be that this thought is in the air right now. This was done just 
> a few days ago: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7184
> 
> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7231 also looks relevant.
> 
> Richard
> 
>> On Dec 22, 2021, at 7:19 AM, Joachim Breitner  
>> wrote:
>> 
>> Hi,
>> 
>> the new (or “new”?) handling of perf numbers, where CI just magically
>> records and compares them, without us having to manually edit the
>> `all.T` files, is a big improvement, thanks!
>> 
>> However, I found the choice of the base commit to compare against
>> unhelpful. Assume master is at commit M, and I start a feature branch
>> and MR with commit A. CI runs, and tells me about a performance
>> regressions, and CI is red. I now fix the issue and push commit B to
>> the branch. CI runs, but it picks A to compare against, and now it is
>> red because of an seemingly unexpected performance improvement!
>> 
>> I would have expected that all CI runs for this MR to compare the
>> performance against the base branch on master, and to look for perf
>> change notices in all commit messages in between.
>> 
>> I see these advantages:
>> 
>> * The reported perf changes correspond to the changes shown on the MR
>>   page
>> * Green CI = the MR is ready (after squashing)
>> * CI will have numbers for the base commit more reliably
>>   (else, if I push commit C quickly after B, then the job for B might
>>   be cancelled and Ci will report changes of C against A instead of B,
>>   which is unexpected).
>> 
>> I have used this logic of reporting perf changes (or any other
>> “differential CI”) against the base branch in the Motoko project and it
>> was quite natural.
>> 
>> Would it be desirable and possible for us here, too?
>> 
>> 
>> (A possible rebuttal might be: we don’t push new commits to feature
>> branches, but always squash and rebase, as that’s what we have to do
>> before merging anyways. If that’s the case then ok, although I
>> generally lean to having chronological commits on feature branches and
>> a nice squashed commit on master.)
>> 
>> Cheers,
>> Joachim
>> 
>> 
>> -- 
>> Joachim Breitner
>> m...@joachim-breitner.de
>> http://www.joachim-breitner.de/
>> 
>> ___
>> ghc-devs mailing list
>> ghc-devs@haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI: Choice of base commit for perf comparisons

2021-12-22 Thread Richard Eisenberg

It seems to be that this thought is in the air right now. This was done just a 
few days ago: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7184

https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7231 also looks relevant.

Richard

> On Dec 22, 2021, at 7:19 AM, Joachim Breitner  
> wrote:
> 
> Hi,
> 
> the new (or “new”?) handling of perf numbers, where CI just magically
> records and compares them, without us having to manually edit the
> `all.T` files, is a big improvement, thanks!
> 
> However, I found the choice of the base commit to compare against
> unhelpful. Assume master is at commit M, and I start a feature branch
> and MR with commit A. CI runs, and tells me about a performance
> regressions, and CI is red. I now fix the issue and push commit B to
> the branch. CI runs, but it picks A to compare against, and now it is
> red because of an seemingly unexpected performance improvement!
> 
> I would have expected that all CI runs for this MR to compare the
> performance against the base branch on master, and to look for perf
> change notices in all commit messages in between.
> 
> I see these advantages:
> 
> * The reported perf changes correspond to the changes shown on the MR 
>   page
> * Green CI = the MR is ready (after squashing)
> * CI will have numbers for the base commit more reliably
>   (else, if I push commit C quickly after B, then the job for B might
>   be cancelled and Ci will report changes of C against A instead of B,
>   which is unexpected).
> 
> I have used this logic of reporting perf changes (or any other
> “differential CI”) against the base branch in the Motoko project and it
> was quite natural.
> 
> Would it be desirable and possible for us here, too?
> 
> 
> (A possible rebuttal might be: we don’t push new commits to feature
> branches, but always squash and rebase, as that’s what we have to do
> before merging anyways. If that’s the case then ok, although I
> generally lean to having chronological commits on feature branches and
> a nice squashed commit on master.)
> 
> Cheers,
> Joachim
> 
> 
> -- 
> Joachim Breitner
>  m...@joachim-breitner.de
>  http://www.joachim-breitner.de/
> 
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

CI: Choice of base commit for perf comparisons

2021-12-22 Thread Joachim Breitner

Hi,

the new (or “new”?) handling of perf numbers, where CI just magically
records and compares them, without us having to manually edit the
`all.T` files, is a big improvement, thanks!

However, I found the choice of the base commit to compare against
unhelpful. Assume master is at commit M, and I start a feature branch
and MR with commit A. CI runs, and tells me about a performance
regressions, and CI is red. I now fix the issue and push commit B to
the branch. CI runs, but it picks A to compare against, and now it is
red because of an seemingly unexpected performance improvement!

I would have expected that all CI runs for this MR to compare the
performance against the base branch on master, and to look for perf
change notices in all commit messages in between.

I see these advantages:

 * The reported perf changes correspond to the changes shown on the MR 
   page
 * Green CI = the MR is ready (after squashing)
 * CI will have numbers for the base commit more reliably
   (else, if I push commit C quickly after B, then the job for B might
   be cancelled and Ci will report changes of C against A instead of B,
   which is unexpected).

I have used this logic of reporting perf changes (or any other
“differential CI”) against the base branch in the Motoko project and it
was quite natural.

Would it be desirable and possible for us here, too?


(A possible rebuttal might be: we don’t push new commits to feature
branches, but always squash and rebase, as that’s what we have to do
before merging anyways. If that’s the case then ok, although I
generally lean to having chronological commits on feature branches and
a nice squashed commit on master.)

Cheers,
Joachim


-- 
Joachim Breitner
  m...@joachim-breitner.de
  http://www.joachim-breitner.de/

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI build failures

2021-07-27 Thread Gergő Érdi

Thanks, this is all great news

On Tue, Jul 27, 2021, 21:56 Ben Gamari  wrote:

> ÉRDI Gergő  writes:
>
> > Hi,
> >
> > I'm seeing three build failures in CI:
> >
> Hi,
>
> > 1. On perf-nofib, it fails with:
> >
> Don't worry about this one for the moment. This job is marked as
> accepting of failure for a reason (hence the job state being an orange
> exclamation mark rather than a red X).
>
> > == make boot -j --jobserver-fds=3,4 --no-print-directory;
> >   in /builds/cactus/ghc/nofib/real/smallpt
> > 
> > /builds/cactus/ghc/ghc/bin/ghc  -M -dep-suffix "" -dep-makefile .depend
> > -osuf o -O2 -Wno-tabs -Rghc-timing -H32m -hisuf hi
> -packageunboxed-ref
> > -rtsopts smallpt.hs
> > : cannot satisfy -package unboxed-ref
> >  (use -v for more information)
> >
> > (e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743141#L1465)
> >
> > 2. On validate-x86_64-darwin, pretty much every test fails because of
> the
> > following extra stderr output:
> >
> > +
> > +:
> > +warning: Couldn't figure out C compiler information!
> > +     Make sure you're using GNU gcc, or clang
> >
> > (e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743129#L3655)
> >
> Yes, this will be fixed by !6162 once I get it passing CI.
>
> > 3. On validate-x86_64-linux-deb9-integer-simple, T11545 fails on memory
> > consumption:
> >
> > Unexpected stat failures:
> > perf/compiler/T11545.run  T11545 [stat decreased from
> x86_64-linux-deb9-integer-simple-validate baseline @
> > 5f3991c7cab8ccc9ab8daeebbfce57afbd9acc33] (normal)
> >
> This test appears to be quite sensitive to environment. I suspect we
> should further increase its acceptance window to avoid this sort of
> spurious failure.
>
> Cheers,
>
> Cheers,
>
> - Ben
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI build failures

2021-07-27 Thread Ben Gamari

ÉRDI Gergő  writes:

> Hi,
>
> I'm seeing three build failures in CI:
>
Hi,

> 1. On perf-nofib, it fails with:
>
Don't worry about this one for the moment. This job is marked as
accepting of failure for a reason (hence the job state being an orange
exclamation mark rather than a red X).

> == make boot -j --jobserver-fds=3,4 --no-print-directory;
>   in /builds/cactus/ghc/nofib/real/smallpt
> 
> /builds/cactus/ghc/ghc/bin/ghc  -M -dep-suffix "" -dep-makefile .depend 
> -osuf o -O2 -Wno-tabs -Rghc-timing -H32m -hisuf hi -packageunboxed-ref 
> -rtsopts smallpt.hs
> : cannot satisfy -package unboxed-ref
>  (use -v for more information)
>
> (e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743141#L1465)
>
> 2. On validate-x86_64-darwin, pretty much every test fails because of the 
> following extra stderr output:
>
> +
> +:
> +warning: Couldn't figure out C compiler information!
> + Make sure you're using GNU gcc, or clang
>
> (e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743129#L3655)
>
Yes, this will be fixed by !6162 once I get it passing CI.

> 3. On validate-x86_64-linux-deb9-integer-simple, T11545 fails on memory 
> consumption:
>
> Unexpected stat failures:
> perf/compiler/T11545.run  T11545 [stat decreased from 
> x86_64-linux-deb9-integer-simple-validate baseline @ 
> 5f3991c7cab8ccc9ab8daeebbfce57afbd9acc33] (normal)
>
This test appears to be quite sensitive to environment. I suspect we
should further increase its acceptance window to avoid this sort of
spurious failure.

Cheers,

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI build failures

2021-07-27 Thread Gergő Érdi

The other two are resilient to restarts.

On Tue, Jul 27, 2021, 18:49 Moritz Angermann 
wrote:

> You can safely ignore the x86_64-darwin failure. I can get you the juicy
> details over a beverage some time. It boils down to some odd behavior using
> rosetta2 on AArch64 Mac mini’s to build x86_64 GHCs. There is a fix
> somewhere from Ben, so it’s just a question of time until it’s properly
> fixed.
>
> The other two I’m afraid I have no idea. I’ll see to restart them. (You
> can’t ?)
>
> On Tue 27. Jul 2021 at 18:10, ÉRDI Gergő  wrote:
>
>> Hi,
>>
>> I'm seeing three build failures in CI:
>>
>> 1. On perf-nofib, it fails with:
>>
>> == make boot -j --jobserver-fds=3,4 --no-print-directory;
>>   in /builds/cactus/ghc/nofib/real/smallpt
>> 
>> /builds/cactus/ghc/ghc/bin/ghc  -M -dep-suffix "" -dep-makefile .depend
>> -osuf o -O2 -Wno-tabs -Rghc-timing -H32m -hisuf hi
>> -packageunboxed-ref
>> -rtsopts smallpt.hs
>> : cannot satisfy -package unboxed-ref
>>  (use -v for more information)
>>
>> (e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743141#L1465)
>>
>> 2. On validate-x86_64-darwin, pretty much every test fails because of the
>> following extra stderr output:
>>
>> +
>> +:
>> +warning: Couldn't figure out C compiler information!
>> + Make sure you're using GNU gcc, or clang
>>
>> (e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743129#L3655)
>>
>> 3. On validate-x86_64-linux-deb9-integer-simple, T11545 fails on memory
>> consumption:
>>
>> Unexpected stat failures:
>> perf/compiler/T11545.run  T11545 [stat decreased from
>> x86_64-linux-deb9-integer-simple-validate baseline @
>> 5f3991c7cab8ccc9ab8daeebbfce57afbd9acc33] (normal)
>>
>> This one is interesting because there is already a commit that is
>> supposed
>> to fix this:
>>
>> commit efaad7add092c88eab46e00a9f349d4675bbee06
>> Author: Matthew Pickering 
>> Date:   Wed Jul 21 10:03:42 2021 +0100
>>
>>  Stop ug_boring_info retaining a chain of old CoreExpr
>>
>>  [...]
>>
>>  -
>>  Metric Decrease:
>>  T11545
>>  -
>>
>> But still, it's failing.
>>
>> Can someone kick these build setups please?
>>
>> --
>>
>>.--= ULLA! =-.
>> \ http://gergo.erdi.hu   \
>>  `---= ge...@erdi.hu =---'
>> ___
>> ghc-devs mailing list
>> ghc-devs@haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI build failures

2021-07-27 Thread Moritz Angermann

You can safely ignore the x86_64-darwin failure. I can get you the juicy
details over a beverage some time. It boils down to some odd behavior using
rosetta2 on AArch64 Mac mini’s to build x86_64 GHCs. There is a fix
somewhere from Ben, so it’s just a question of time until it’s properly
fixed.

The other two I’m afraid I have no idea. I’ll see to restart them. (You
can’t ?)

On Tue 27. Jul 2021 at 18:10, ÉRDI Gergő  wrote:

> Hi,
>
> I'm seeing three build failures in CI:
>
> 1. On perf-nofib, it fails with:
>
> == make boot -j --jobserver-fds=3,4 --no-print-directory;
>   in /builds/cactus/ghc/nofib/real/smallpt
> 
> /builds/cactus/ghc/ghc/bin/ghc  -M -dep-suffix "" -dep-makefile .depend
> -osuf o -O2 -Wno-tabs -Rghc-timing -H32m -hisuf hi -packageunboxed-ref
> -rtsopts smallpt.hs
> : cannot satisfy -package unboxed-ref
>  (use -v for more information)
>
> (e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743141#L1465)
>
> 2. On validate-x86_64-darwin, pretty much every test fails because of the
> following extra stderr output:
>
> +
> +:
> +warning: Couldn't figure out C compiler information!
> + Make sure you're using GNU gcc, or clang
>
> (e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743129#L3655)
>
> 3. On validate-x86_64-linux-deb9-integer-simple, T11545 fails on memory
> consumption:
>
> Unexpected stat failures:
> perf/compiler/T11545.run  T11545 [stat decreased from
> x86_64-linux-deb9-integer-simple-validate baseline @
> 5f3991c7cab8ccc9ab8daeebbfce57afbd9acc33] (normal)
>
> This one is interesting because there is already a commit that is supposed
> to fix this:
>
> commit efaad7add092c88eab46e00a9f349d4675bbee06
> Author: Matthew Pickering 
> Date:   Wed Jul 21 10:03:42 2021 +0100
>
>  Stop ug_boring_info retaining a chain of old CoreExpr
>
>  [...]
>
>  -
>  Metric Decrease:
>  T11545
>  -
>
> But still, it's failing.
>
> Can someone kick these build setups please?
>
> --
>
>.--= ULLA! =-.
> \ http://gergo.erdi.hu   \
>  `---= ge...@erdi.hu =---'
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

CI build failures

2021-07-27 Thread ÉRDI Gergő


Hi,

I'm seeing three build failures in CI:

1. On perf-nofib, it fails with:

== make boot -j --jobserver-fds=3,4 --no-print-directory;
 in /builds/cactus/ghc/nofib/real/smallpt

/builds/cactus/ghc/ghc/bin/ghc  -M -dep-suffix "" -dep-makefile .depend 
-osuf o -O2 -Wno-tabs -Rghc-timing -H32m -hisuf hi -packageunboxed-ref 
-rtsopts smallpt.hs

: cannot satisfy -package unboxed-ref
(use -v for more information)

(e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743141#L1465)

2. On validate-x86_64-darwin, pretty much every test fails because of the 
following extra stderr output:


+
+:
+warning: Couldn't figure out C compiler information!
+ Make sure you're using GNU gcc, or clang

(e.g. https://gitlab.haskell.org/cactus/ghc/-/jobs/743129#L3655)

3. On validate-x86_64-linux-deb9-integer-simple, T11545 fails on memory 
consumption:


Unexpected stat failures:
   perf/compiler/T11545.run  T11545 [stat decreased from x86_64-linux-deb9-integer-simple-validate baseline @ 
5f3991c7cab8ccc9ab8daeebbfce57afbd9acc33] (normal)


This one is interesting because there is already a commit that is supposed 
to fix this:


commit efaad7add092c88eab46e00a9f349d4675bbee06
Author: Matthew Pickering 
Date:   Wed Jul 21 10:03:42 2021 +0100

Stop ug_boring_info retaining a chain of old CoreExpr

[...]

-
Metric Decrease:
T11545
-

But still, it's failing.

Can someone kick these build setups please?

--

  .--= ULLA! =-.
   \ http://gergo.erdi.hu   \
`---= ge...@erdi.hu =---'
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

FYI: Darwin CI currently broken for forks

2021-07-19 Thread Matthew Pickering

Hi all,

There is a configuration issue with the darwin builders which has
meant that for the last 6 days CI has been broken if you have pushed
from a fork because the majority of darwin builders are only
configured to work with branches pushed to the main project. These
failures manifest as timeout errors
(https://gitlab.haskell.org/blamario/ghc/-/jobs/733244).

Hopefully this can be resolved in the coming days.

Cheers,

Matt
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: CI Status Update

2021-06-17 Thread Ben Gamari

Ben Gamari  writes:

> Hi all,
>
Hi all,

At this point CI should be fully functional on the stable and master
branches again. However, do note that older base commits may refer to
Docker images that are sadly no longer available. Such cases can be
resolved by simply rebasing.

Cheers,

- Ben

signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

CI Status Update

2021-06-15 Thread Ben Gamari

Hi all,

As you may have realized, CI has been a bit of disaster over the last
few days. It appears that this is just the most recent chapter in our
on-going troubles with Docker image storage, being due to an outage of
our upstream storage service [1]. Davean and I have started to implement
a plan to migrate away from DreamObjects back to local storage.
Unfortunately to complete this migration we need DreamObjects to come
back online; I am currently waiting until this occurs. Further updates
will come as the situation develops.

Cheers,

- Ben

[1] https://www.dreamhoststatus.com/


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

[CI] macOS builds

2021-06-05 Thread Moritz Angermann

Hi there!

You might have seen failed or stuck or pending darwin builds. Our CI
builders we got generously donated have ~250GB of disk space (which should
be absolutely adequat for what we do), and macOS BigSur does some odd
reservation of 200GB in /System/Volumes/Data, this is despite automatic
updates being disabled and time machine being disabled.

It used to happen only when the system was expecting an update to be
performed and the 200GB were freed after the update was done. After the
latest update to 11.4, however, it seems to have not freed that space. This
leaves the CI machine with ~50GB for for the system + build tools + gitlab
checkouts and builds, and they frequently run out of space :-/

If someone knows how to prevent the system from doing stupid stuff like
this (my hunch is it's keeping a backup of the system pre-udpate, for
disaster recovery). Please come forward, my google searches haven't
revealed anything useful yet.

I have filed a TSI with Apple (still had a few on my developer account),
but I don't expect them to come back to me before the end of June. Next
week is WWDC, and there will be a massive backlog of issues that queued up
leading up to, and during the WWDC.  I've also only had very marginal
success with them resolving issues that were not "you wrote this program
wrong".

If everything fails, maybe the solution is to attach some usbc ssd's to the
macs and have gitlab builds be run dedicatedly on those disks. I'm a bit
concerned about performance but we would have to see.

Any ideas are welcome, please also feel free to hit me up on
libera.chat#ghc, or the haskell foundations slack.

Cheers,
 Moritz
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Darwin CI Status

2021-05-20 Thread Matthew Pickering

Thanks Moritz for that update.

The latest is that currently darwin CI is disabled and the merge train
is unblocked (*choo choo*).

I am testing Moritz's patches to speed-up CI and will merge them in
shortly to get darwin coverage back.

Cheers,

Matt

On Wed, May 19, 2021 at 9:46 AM Moritz Angermann
 wrote:
>
> Matt has access to the M1 builder in my closet now. The darwin performance 
> issue
> is mainly there since BigSur, and (afaik) primarily due to the amount of 
> DYLD_LIBRARY_PATH's
> we pass to GHC invocations. The system linker spends the majority of the time 
> in the
> kernel stat'ing and getelements (or some similar directory) call for each and 
> every possible
> path.
>
> Switching to hadrian will cut down the time from ~5hs to ~2hs. At some point 
> we had make
> builds <90min by just killing all DYLD_LIBRARY_PATH logic we ever had, but 
> that broke
> bindists.
>
> The CI has time values attached and some summary at the end right now, which 
> highlights
> time spent in the system and in user mode. This is up to 80% sys, 20% user, 
> and went to
> something like 20% sys, 80% user after nuking all DYLD_LIBRARY_PATH's, with 
> hadrian it's
> closer to ~25% sys, 75% user.
>
> Of note, this is mostly due to time spent during the *test-suite*, not the 
> actual build. For the
> actual build make and hadrian are comparable, though I've seen hadrian to 
> oddly have a
> much higher variance in how long it takes to *build* ghc, whereas the make 
> build was more
> consistent.
>
> The test-suite quite notoriously calls GHC *a lot of times*, which makes any 
> linker issue due
> to DYLD_LIBRARY_PATH (and similar lookups) much worse.
>
> If we would finally split building and testing, we'd see this more clearly I 
> believe. Maybe this
> is motivation enough for someone to come forward to break build/test into two 
> CI steps?
>
> Cheers,
>  Moritz
>
> On Wed, May 19, 2021 at 4:14 PM Matthew Pickering 
>  wrote:
>>
>> Hi all,
>>
>> The darwin pipelines are gumming up the merge pipeline as they are
>> taking over 4 hours to complete on average.
>>
>> I am going to disable them -
>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5785
>>
>> Please can someone give me access to one of the M1 builders so I can
>> debug why the tests are taking so long. Once I have fixed the issue
>> then I will enable the pipelines.
>>
>> Cheers,
>>
>> Matt
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Darwin CI Status

2021-05-19 Thread Moritz Angermann

Matt has access to the M1 builder in my closet now. The darwin performance
issue
is mainly there since BigSur, and (afaik) primarily due to the amount of
DYLD_LIBRARY_PATH's
we pass to GHC invocations. The system linker spends the majority of the
time in the
kernel stat'ing and getelements (or some similar directory) call for each
and every possible
path.

Switching to hadrian will cut down the time from ~5hs to ~2hs. At some
point we had make
builds <90min by just killing all DYLD_LIBRARY_PATH logic we ever had, but
that broke
bindists.

The CI has time values attached and some summary at the end right now,
which highlights
time spent in the system and in user mode. This is up to 80% sys, 20% user,
and went to
something like 20% sys, 80% user after nuking all DYLD_LIBRARY_PATH's, with
hadrian it's
closer to ~25% sys, 75% user.

Of note, this is mostly due to time spent during the *test-suite*, not the
actual build. For the
actual build make and hadrian are comparable, though I've seen hadrian to
oddly have a
much higher variance in how long it takes to *build* ghc, whereas the make
build was more
consistent.

The test-suite quite notoriously calls GHC *a lot of times*, which makes
any linker issue due
to DYLD_LIBRARY_PATH (and similar lookups) much worse.

If we would finally split building and testing, we'd see this more clearly
I believe. Maybe this
is motivation enough for someone to come forward to break build/test into
two CI steps?

Cheers,
 Moritz

On Wed, May 19, 2021 at 4:14 PM Matthew Pickering <
matthewtpicker...@gmail.com> wrote:

> Hi all,
>
> The darwin pipelines are gumming up the merge pipeline as they are
> taking over 4 hours to complete on average.
>
> I am going to disable them -
> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5785
>
> Please can someone give me access to one of the M1 builders so I can
> debug why the tests are taking so long. Once I have fixed the issue
> then I will enable the pipelines.
>
> Cheers,
>
> Matt
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Darwin CI Status

2021-05-19 Thread Matthew Pickering

Hi all,

The darwin pipelines are gumming up the merge pipeline as they are
taking over 4 hours to complete on average.

I am going to disable them -
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5785

Please can someone give me access to one of the M1 builders so I can
debug why the tests are taking so long. Once I have fixed the issue
then I will enable the pipelines.

Cheers,

Matt
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: HLint in the GHC CI, an eight-months retrospective

2021-03-26 Thread Hécate


Hi Richard,

I am sorry, I have indeed forgotten one of the most important parts of my 
email. :)


The Hadrian rules are

lint:base
lint:compiler


You can invoke them as simply as:


./hadrian/build lint:base




You need to have a recent version of HLint in your PATH. If you use 
ghc.nix, this should be taken care of for you.


Hope it clarified things!

Cheers,
Hécate

Le 25 mars 2021 21:39:15 Richard Eisenberg  a écrit :


Thanks for this update! Glad to know this effort is going well.

One quick question: suppose I am editing something in `base`. My 
understanding is that my edit will be linted. How can I run hlint locally 
so that I can easily respond to trouble before CI takes a crack? And where 
would I learn this information (that is, how to run hlint locally)?


Thanks!
Richard


On Mar 25, 2021, at 11:19 AM, Hécate  wrote:

Hello fellow devs,

this email is an activity report on the integration of the HLint[0] tool in 
the Continuous Integration (CI) pipelines.


On Jul. 5, 2020 I opened a discussion ticket[1] on the topic of code 
linting in the several components of the GHC code-base. It has served as a 
reference anchor for the Merge Requests (MR) that stemmed from it, and 
allowed us to refine our expectations and processes. If you are not 
acquainted with its content, I invite you to read the whole conversation.


Subsequently, several Hadrian lint rules have been integrated in the 
following months, in order to run HLint on targeted components of the GHC 
repository (the base library, the compiler code-base, etc).
Being satisfied with the state of the rules we applied to the code-base, 
such as removing extraneous pragmata and keywords, it was decided to 
integrate the base library linting rule in the CI. This was five months 
ago, in September[2], and I am happy to report that developer friction has 
been so far minimal.
In parallel to this work on the base library, I took care of cleaning-up 
the compiler, and harmonised the various micro coding styles that have 
emerged quite organically during the decades of development that are behind 
us (I never realised how many variations of the same ten lines of pragmata 
could coexist in the same folders).
Upon feedback from stakeholders of this sub-code base, the rules file was 
altered to better suit their development needs, such as not removing 
extraneous `do` keywords, as they are useful to introduce a block in which 
debug statements can be easily inserted.


Since today, the linting of the compiler code-base has been integrated in 
our CI pipelines, without further burdening our CI times.
Things seem to run smoothly, and I welcome comments and requests of any 
kind related to this area of our code quality process.


Regarding our future plans, there has been a discussion about integrating 
such a linting mechanism for our C code-base, in the RTS. Nothing is 
formally established yet, so I would be grateful if people who have 
experience and wisdom about it can chime in to contribute to the 
discussion: https://gitlab.haskell.org/ghc/ghc/-/issues/19437.


And I would like to say that I am overall very thankful for the involvement 
of the people who have been giving us feedback and have been reviewing the 
resulting MRs.


Have a very nice day,
Hécate

---
[0]: https://github.com/ndmitchell/hlint
[1]: https://gitlab.haskell.org/ghc/ghc/-/issues/18424
[2]: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4147

--
Hécate ✨
: @TechnoEmpress
IRC: Uniaika
WWW: https://glitchbra.in
RUN: BSD

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: HLint in the GHC CI, an eight-months retrospective

2021-03-25 Thread Richard Eisenberg

Thanks for this update! Glad to know this effort is going well.

One quick question: suppose I am editing something in `base`. My understanding 
is that my edit will be linted. How can I run hlint locally so that I can 
easily respond to trouble before CI takes a crack? And where would I learn this 
information (that is, how to run hlint locally)?

Thanks!
Richard

> On Mar 25, 2021, at 11:19 AM, Hécate  wrote:
> 
> Hello fellow devs,
> 
> this email is an activity report on the integration of the HLint[0] tool in 
> the Continuous Integration (CI) pipelines.
> 
> On Jul. 5, 2020 I opened a discussion ticket[1] on the topic of code linting 
> in the several components of the GHC code-base. It has served as a reference 
> anchor for the Merge Requests (MR) that stemmed from it, and allowed us to 
> refine our expectations and processes. If you are not acquainted with its 
> content, I invite you to read the whole conversation.
> 
> Subsequently, several Hadrian lint rules have been integrated in the 
> following months, in order to run HLint on targeted components of the GHC 
> repository (the base library, the compiler code-base, etc).
> Being satisfied with the state of the rules we applied to the code-base, such 
> as removing extraneous pragmata and keywords, it was decided to integrate the 
> base library linting rule in the CI. This was five months ago, in 
> September[2], and I am happy to report that developer friction has been so 
> far minimal.
> In parallel to this work on the base library, I took care of cleaning-up the 
> compiler, and harmonised the various micro coding styles that have emerged 
> quite organically during the decades of development that are behind us (I 
> never realised how many variations of the same ten lines of pragmata could 
> coexist in the same folders).
> Upon feedback from stakeholders of this sub-code base, the rules file was 
> altered to better suit their development needs, such as not removing 
> extraneous `do` keywords, as they are useful to introduce a block in which 
> debug statements can be easily inserted.
> 
> Since today, the linting of the compiler code-base has been integrated in our 
> CI pipelines, without further burdening our CI times.
> Things seem to run smoothly, and I welcome comments and requests of any kind 
> related to this area of our code quality process.
> 
> Regarding our future plans, there has been a discussion about integrating 
> such a linting mechanism for our C code-base, in the RTS. Nothing is formally 
> established yet, so I would be grateful if people who have experience and 
> wisdom about it can chime in to contribute to the discussion: 
> https://gitlab.haskell.org/ghc/ghc/-/issues/19437.
> 
> And I would like to say that I am overall very thankful for the involvement 
> of the people who have been giving us feedback and have been reviewing the 
> resulting MRs.
> 
> Have a very nice day,
> Hécate
> 
> ---
> [0]: https://github.com/ndmitchell/hlint
> [1]: https://gitlab.haskell.org/ghc/ghc/-/issues/18424
> [2]: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4147
> 
> -- 
> Hécate ✨
> : @TechnoEmpress
> IRC: Uniaika
> WWW: https://glitchbra.in
> RUN: BSD
> 
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

HLint in the GHC CI, an eight-months retrospective

2021-03-25 Thread Hécate


Hello fellow devs,

this email is an activity report on the integration of the HLint[0] tool 
in the Continuous Integration (CI) pipelines.


On Jul. 5, 2020 I opened a discussion ticket[1] on the topic of code 
linting in the several components of the GHC code-base. It has served as 
a reference anchor for the Merge Requests (MR) that stemmed from it, and 
allowed us to refine our expectations and processes. If you are not 
acquainted with its content, I invite you to read the whole conversation.


Subsequently, several Hadrian lint rules have been integrated in the 
following months, in order to run HLint on targeted components of the 
GHC repository (the base library, the compiler code-base, etc).
Being satisfied with the state of the rules we applied to the code-base, 
such as removing extraneous pragmata and keywords, it was decided to 
integrate the base library linting rule in the CI. This was five months 
ago, in September[2], and I am happy to report that developer friction 
has been so far minimal.
In parallel to this work on the base library, I took care of cleaning-up 
the compiler, and harmonised the various micro coding styles that have 
emerged quite organically during the decades of development that are 
behind us (I never realised how many variations of the same ten lines of 
pragmata could coexist in the same folders).
Upon feedback from stakeholders of this sub-code base, the rules file 
was altered to better suit their development needs, such as not removing 
extraneous `do` keywords, as they are useful to introduce a block in 
which debug statements can be easily inserted.


Since today, the linting of the compiler code-base has been integrated 
in our CI pipelines, without further burdening our CI times.
Things seem to run smoothly, and I welcome comments and requests of any 
kind related to this area of our code quality process.


Regarding our future plans, there has been a discussion about 
integrating such a linting mechanism for our C code-base, in the RTS. 
Nothing is formally established yet, so I would be grateful if people 
who have experience and wisdom about it can chime in to contribute to 
the discussion: https://gitlab.haskell.org/ghc/ghc/-/issues/19437.


And I would like to say that I am overall very thankful for the 
involvement of the people who have been giving us feedback and have been 
reviewing the resulting MRs.


Have a very nice day,
Hécate

---
[0]: https://github.com/ndmitchell/hlint
[1]: https://gitlab.haskell.org/ghc/ghc/-/issues/18424
[2]: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4147

--
Hécate ✨
: @TechnoEmpress
IRC: Uniaika
WWW: https://glitchbra.in
RUN: BSD

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

1 2 >

1 - 100 of 169 matches

Mail list logo