Re: Overnight CI failures

2022-09-30 Thread Bryan Richter via ghc-devs
(Adding ghc-devs)

Are these fragile tests?

1. T14346 got a "bad file descriptor" on Darwin
2. linker_unload got some gold errors on Linux

Neither of these have been reported to me before, so I don't know much
about them. Nor have I looked deeply (or at all) at the tests themselves,
yet.

On Thu, Sep 29, 2022 at 3:37 PM Simon Peyton Jones <
simon.peytonjo...@gmail.com> wrote:

> Bryan
>
> These failed overnight
>
> On !8897
>
>- https://gitlab.haskell.org/ghc/ghc/-/jobs/1185519
>- https://gitlab.haskell.org/ghc/ghc/-/jobs/1185520
>
> I think it's extremely unlikely that this had anything to do with my patch.
>
> Simon
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-29 Thread Cheng Shao
When hadrian builds the binary-dist job, invoking tar and xz is
already the last step and there'll be no other ongoing jobs. But I do
agree with reverting, this minor optimization I proposed has caused
more trouble than its worth :/

On Thu, Sep 29, 2022 at 9:25 AM Bryan Richter  wrote:
>
> Matthew pointed out that the build system already parallelizes jobs, so it's 
> risky to force parallelization of any individual job. That means I should 
> just revert.
>
> On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao  wrote:
>>
>> I believe we can either modify ci.sh to disable parallel compression
>> for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
>> XZ_OPT=-9 for i386.
>>
>> On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter  
>> wrote:
>> >
>> > Aha: while i386-linux-deb9-validate sets no extra XZ options, 
>> > nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
>> >
>> > A revert would fix the problem, but presumably so would tweaking that 
>> > option. Does anyone have information that would lead to a better decision 
>> > here?
>> >
>> >
>> > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
>> >>
>> >> Sure, in which case pls revert it. Apologies for the impact, though
>> >> I'm still a bit curious, the i386 job did pass in the original MR.
>> >>
>> >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  
>> >> wrote:
>> >> >
>> >> > Yep, it seems to mostly be xz that is running out of memory. (All 
>> >> > recent builds that I sampled, but not all builds through all time.) 
>> >> > Thanks for pointing it out!
>> >> >
>> >> > I can revert the change.
>> >> >
>> >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>> >> >>
>> >> >> Hi Bryan,
>> >> >>
>> >> >> This may be an unintended fallout of !8940. Would you try starting an
>> >> >> i386 pipeline with it reversed to see if it solves the issue, in which
>> >> >> case we should revert or fix it in master?
>> >> >>
>> >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>> >> >>  wrote:
>> >> >> >
>> >> >> > Hi all,
>> >> >> >
>> >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been 
>> >> >> > failing consistently.
>> >> >> >
>> >> >> > They show up on the failure dashboard because the logs contain the 
>> >> >> > phrase "Cannot allocate memory".
>> >> >> >
>> >> >> > I haven't looked yet to see if they always fail in the same place, 
>> >> >> > but I'll do that soon. The first example I looked at, however, has 
>> >> >> > the line "xz: (stdin): Cannot allocate memory", so it's not GHC 
>> >> >> > (alone) causing the problem.
>> >> >> >
>> >> >> > As a consequence of showing up on the dashboard, the jobs get 
>> >> >> > restarted. Since they fail consistently, they keep getting 
>> >> >> > restarted. Since the jobs keep getting restarted, the pipelines stay 
>> >> >> > alive. When I checked just now, there were 8 nightly runs still 
>> >> >> > running. :) Thus I'm going to cancel the still-running 
>> >> >> > nightly-i386-linux-deb9-validate jobs and let the pipelines die in 
>> >> >> > peace. You can still find all examples of failed jobs on the 
>> >> >> > dashboard:
>> >> >> >
>> >> >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >> >> >
>> >> >> > To prevent future problems, it would be good if someone could help 
>> >> >> > me look into this. Otherwise I'll just disable the job. :(
>> >> >> > ___
>> >> >> > ghc-devs mailing list
>> >> >> > ghc-devs@haskell.org
>> >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-29 Thread Bryan Richter via ghc-devs
Matthew pointed out that the build system already parallelizes jobs, so
it's risky to force parallelization of any individual job. That means I
should just revert.

On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao  wrote:

> I believe we can either modify ci.sh to disable parallel compression
> for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
> XZ_OPT=-9 for i386.
>
> On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter 
> wrote:
> >
> > Aha: while i386-linux-deb9-validate sets no extra XZ options,
> nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
> >
> > A revert would fix the problem, but presumably so would tweaking that
> option. Does anyone have information that would lead to a better decision
> here?
> >
> >
> > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
> >>
> >> Sure, in which case pls revert it. Apologies for the impact, though
> >> I'm still a bit curious, the i386 job did pass in the original MR.
> >>
> >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter 
> wrote:
> >> >
> >> > Yep, it seems to mostly be xz that is running out of memory. (All
> recent builds that I sampled, but not all builds through all time.) Thanks
> for pointing it out!
> >> >
> >> > I can revert the change.
> >> >
> >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao 
> wrote:
> >> >>
> >> >> Hi Bryan,
> >> >>
> >> >> This may be an unintended fallout of !8940. Would you try starting an
> >> >> i386 pipeline with it reversed to see if it solves the issue, in
> which
> >> >> case we should revert or fix it in master?
> >> >>
> >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
> >> >>  wrote:
> >> >> >
> >> >> > Hi all,
> >> >> >
> >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >> >> >
> >> >> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >> >> >
> >> >> > I haven't looked yet to see if they always fail in the same place,
> but I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >> >> >
> >> >> > As a consequence of showing up on the dashboard, the jobs get
> restarted. Since they fail consistently, they keep getting restarted. Since
> the jobs keep getting restarted, the pipelines stay alive. When I checked
> just now, there were 8 nightly runs still running. :) Thus I'm going to
> cancel the still-running nightly-i386-linux-deb9-validate jobs and let the
> pipelines die in peace. You can still find all examples of failed jobs on
> the dashboard:
> >> >> >
> >> >> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >> >> >
> >> >> > To prevent future problems, it would be good if someone could help
> me look into this. Otherwise I'll just disable the job. :(
> >> >> > ___
> >> >> > ghc-devs mailing list
> >> >> > ghc-devs@haskell.org
> >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
I believe we can either modify ci.sh to disable parallel compression
for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
XZ_OPT=-9 for i386.

On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter  wrote:
>
> Aha: while i386-linux-deb9-validate sets no extra XZ options, 
> nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
>
> A revert would fix the problem, but presumably so would tweaking that option. 
> Does anyone have information that would lead to a better decision here?
>
>
> On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
>>
>> Sure, in which case pls revert it. Apologies for the impact, though
>> I'm still a bit curious, the i386 job did pass in the original MR.
>>
>> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  
>> wrote:
>> >
>> > Yep, it seems to mostly be xz that is running out of memory. (All recent 
>> > builds that I sampled, but not all builds through all time.) Thanks for 
>> > pointing it out!
>> >
>> > I can revert the change.
>> >
>> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>> >>
>> >> Hi Bryan,
>> >>
>> >> This may be an unintended fallout of !8940. Would you try starting an
>> >> i386 pipeline with it reversed to see if it solves the issue, in which
>> >> case we should revert or fix it in master?
>> >>
>> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>> >>  wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > For the past week or so, nightly-i386-linux-deb9-validate has been 
>> >> > failing consistently.
>> >> >
>> >> > They show up on the failure dashboard because the logs contain the 
>> >> > phrase "Cannot allocate memory".
>> >> >
>> >> > I haven't looked yet to see if they always fail in the same place, but 
>> >> > I'll do that soon. The first example I looked at, however, has the line 
>> >> > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing 
>> >> > the problem.
>> >> >
>> >> > As a consequence of showing up on the dashboard, the jobs get 
>> >> > restarted. Since they fail consistently, they keep getting restarted. 
>> >> > Since the jobs keep getting restarted, the pipelines stay alive. When I 
>> >> > checked just now, there were 8 nightly runs still running. :) Thus I'm 
>> >> > going to cancel the still-running nightly-i386-linux-deb9-validate jobs 
>> >> > and let the pipelines die in peace. You can still find all examples of 
>> >> > failed jobs on the dashboard:
>> >> >
>> >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >> >
>> >> > To prevent future problems, it would be good if someone could help me 
>> >> > look into this. Otherwise I'll just disable the job. :(
>> >> > ___
>> >> > ghc-devs mailing list
>> >> > ghc-devs@haskell.org
>> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs
Aha: while i386-linux-deb9-validate sets no extra XZ options,
*nightly*-i386-linux-deb9-validate
(the failing job) sets "XZ_OPT = 9".

A revert would fix the problem, but presumably so would tweaking that
option. Does anyone have information that would lead to a better decision
here?


On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:

> Sure, in which case pls revert it. Apologies for the impact, though
> I'm still a bit curious, the i386 job did pass in the original MR.
>
> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter 
> wrote:
> >
> > Yep, it seems to mostly be xz that is running out of memory. (All recent
> builds that I sampled, but not all builds through all time.) Thanks for
> pointing it out!
> >
> > I can revert the change.
> >
> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
> >>
> >> Hi Bryan,
> >>
> >> This may be an unintended fallout of !8940. Would you try starting an
> >> i386 pipeline with it reversed to see if it solves the issue, in which
> >> case we should revert or fix it in master?
> >>
> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
> >>  wrote:
> >> >
> >> > Hi all,
> >> >
> >> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >> >
> >> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >> >
> >> > I haven't looked yet to see if they always fail in the same place,
> but I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >> >
> >> > As a consequence of showing up on the dashboard, the jobs get
> restarted. Since they fail consistently, they keep getting restarted. Since
> the jobs keep getting restarted, the pipelines stay alive. When I checked
> just now, there were 8 nightly runs still running. :) Thus I'm going to
> cancel the still-running nightly-i386-linux-deb9-validate jobs and let the
> pipelines die in peace. You can still find all examples of failed jobs on
> the dashboard:
> >> >
> >> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >> >
> >> > To prevent future problems, it would be good if someone could help me
> look into this. Otherwise I'll just disable the job. :(
> >> > ___
> >> > ghc-devs mailing list
> >> > ghc-devs@haskell.org
> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
Sure, in which case pls revert it. Apologies for the impact, though
I'm still a bit curious, the i386 job did pass in the original MR.

On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  wrote:
>
> Yep, it seems to mostly be xz that is running out of memory. (All recent 
> builds that I sampled, but not all builds through all time.) Thanks for 
> pointing it out!
>
> I can revert the change.
>
> On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>>
>> Hi Bryan,
>>
>> This may be an unintended fallout of !8940. Would you try starting an
>> i386 pipeline with it reversed to see if it solves the issue, in which
>> case we should revert or fix it in master?
>>
>> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>>  wrote:
>> >
>> > Hi all,
>> >
>> > For the past week or so, nightly-i386-linux-deb9-validate has been failing 
>> > consistently.
>> >
>> > They show up on the failure dashboard because the logs contain the phrase 
>> > "Cannot allocate memory".
>> >
>> > I haven't looked yet to see if they always fail in the same place, but 
>> > I'll do that soon. The first example I looked at, however, has the line 
>> > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the 
>> > problem.
>> >
>> > As a consequence of showing up on the dashboard, the jobs get restarted. 
>> > Since they fail consistently, they keep getting restarted. Since the jobs 
>> > keep getting restarted, the pipelines stay alive. When I checked just now, 
>> > there were 8 nightly runs still running. :) Thus I'm going to cancel the 
>> > still-running nightly-i386-linux-deb9-validate jobs and let the pipelines 
>> > die in peace. You can still find all examples of failed jobs on the 
>> > dashboard:
>> >
>> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >
>> > To prevent future problems, it would be good if someone could help me look 
>> > into this. Otherwise I'll just disable the job. :(
>> > ___
>> > ghc-devs mailing list
>> > ghc-devs@haskell.org
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs
Yep, it seems to mostly be xz that is running out of memory. (All recent
builds that I sampled, but not all builds through all time.) Thanks for
pointing it out!

I can revert the change.

On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:

> Hi Bryan,
>
> This may be an unintended fallout of !8940. Would you try starting an
> i386 pipeline with it reversed to see if it solves the issue, in which
> case we should revert or fix it in master?
>
> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>  wrote:
> >
> > Hi all,
> >
> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >
> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >
> > I haven't looked yet to see if they always fail in the same place, but
> I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >
> > As a consequence of showing up on the dashboard, the jobs get restarted.
> Since they fail consistently, they keep getting restarted. Since the jobs
> keep getting restarted, the pipelines stay alive. When I checked just now,
> there were 8 nightly runs still running. :) Thus I'm going to cancel the
> still-running nightly-i386-linux-deb9-validate jobs and let the pipelines
> die in peace. You can still find all examples of failed jobs on the
> dashboard:
> >
> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >
> > To prevent future problems, it would be good if someone could help me
> look into this. Otherwise I'll just disable the job. :(
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
Hi Bryan,

This may be an unintended fallout of !8940. Would you try starting an
i386 pipeline with it reversed to see if it solves the issue, in which
case we should revert or fix it in master?

On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
 wrote:
>
> Hi all,
>
> For the past week or so, nightly-i386-linux-deb9-validate has been failing 
> consistently.
>
> They show up on the failure dashboard because the logs contain the phrase 
> "Cannot allocate memory".
>
> I haven't looked yet to see if they always fail in the same place, but I'll 
> do that soon. The first example I looked at, however, has the line "xz: 
> (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem.
>
> As a consequence of showing up on the dashboard, the jobs get restarted. 
> Since they fail consistently, they keep getting restarted. Since the jobs 
> keep getting restarted, the pipelines stay alive. When I checked just now, 
> there were 8 nightly runs still running. :) Thus I'm going to cancel the 
> still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die 
> in peace. You can still find all examples of failed jobs on the dashboard:
>
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>
> To prevent future problems, it would be good if someone could help me look 
> into this. Otherwise I'll just disable the job. :(
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Darwin CI Status

2021-05-20 Thread Matthew Pickering
Thanks Moritz for that update.

The latest is that currently darwin CI is disabled and the merge train
is unblocked (*choo choo*).

I am testing Moritz's patches to speed-up CI and will merge them in
shortly to get darwin coverage back.

Cheers,

Matt

On Wed, May 19, 2021 at 9:46 AM Moritz Angermann
 wrote:
>
> Matt has access to the M1 builder in my closet now. The darwin performance 
> issue
> is mainly there since BigSur, and (afaik) primarily due to the amount of 
> DYLD_LIBRARY_PATH's
> we pass to GHC invocations. The system linker spends the majority of the time 
> in the
> kernel stat'ing and getelements (or some similar directory) call for each and 
> every possible
> path.
>
> Switching to hadrian will cut down the time from ~5hs to ~2hs. At some point 
> we had make
> builds <90min by just killing all DYLD_LIBRARY_PATH logic we ever had, but 
> that broke
> bindists.
>
> The CI has time values attached and some summary at the end right now, which 
> highlights
> time spent in the system and in user mode. This is up to 80% sys, 20% user, 
> and went to
> something like 20% sys, 80% user after nuking all DYLD_LIBRARY_PATH's, with 
> hadrian it's
> closer to ~25% sys, 75% user.
>
> Of note, this is mostly due to time spent during the *test-suite*, not the 
> actual build. For the
> actual build make and hadrian are comparable, though I've seen hadrian to 
> oddly have a
> much higher variance in how long it takes to *build* ghc, whereas the make 
> build was more
> consistent.
>
> The test-suite quite notoriously calls GHC *a lot of times*, which makes any 
> linker issue due
> to DYLD_LIBRARY_PATH (and similar lookups) much worse.
>
> If we would finally split building and testing, we'd see this more clearly I 
> believe. Maybe this
> is motivation enough for someone to come forward to break build/test into two 
> CI steps?
>
> Cheers,
>  Moritz
>
> On Wed, May 19, 2021 at 4:14 PM Matthew Pickering 
>  wrote:
>>
>> Hi all,
>>
>> The darwin pipelines are gumming up the merge pipeline as they are
>> taking over 4 hours to complete on average.
>>
>> I am going to disable them -
>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5785
>>
>> Please can someone give me access to one of the M1 builders so I can
>> debug why the tests are taking so long. Once I have fixed the issue
>> then I will enable the pipelines.
>>
>> Cheers,
>>
>> Matt
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Darwin CI Status

2021-05-19 Thread Moritz Angermann
Matt has access to the M1 builder in my closet now. The darwin performance
issue
is mainly there since BigSur, and (afaik) primarily due to the amount of
DYLD_LIBRARY_PATH's
we pass to GHC invocations. The system linker spends the majority of the
time in the
kernel stat'ing and getelements (or some similar directory) call for each
and every possible
path.

Switching to hadrian will cut down the time from ~5hs to ~2hs. At some
point we had make
builds <90min by just killing all DYLD_LIBRARY_PATH logic we ever had, but
that broke
bindists.

The CI has time values attached and some summary at the end right now,
which highlights
time spent in the system and in user mode. This is up to 80% sys, 20% user,
and went to
something like 20% sys, 80% user after nuking all DYLD_LIBRARY_PATH's, with
hadrian it's
closer to ~25% sys, 75% user.

Of note, this is mostly due to time spent during the *test-suite*, not the
actual build. For the
actual build make and hadrian are comparable, though I've seen hadrian to
oddly have a
much higher variance in how long it takes to *build* ghc, whereas the make
build was more
consistent.

The test-suite quite notoriously calls GHC *a lot of times*, which makes
any linker issue due
to DYLD_LIBRARY_PATH (and similar lookups) much worse.

If we would finally split building and testing, we'd see this more clearly
I believe. Maybe this
is motivation enough for someone to come forward to break build/test into
two CI steps?

Cheers,
 Moritz

On Wed, May 19, 2021 at 4:14 PM Matthew Pickering <
matthewtpicker...@gmail.com> wrote:

> Hi all,
>
> The darwin pipelines are gumming up the merge pipeline as they are
> taking over 4 hours to complete on average.
>
> I am going to disable them -
> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5785
>
> Please can someone give me access to one of the M1 builders so I can
> debug why the tests are taking so long. Once I have fixed the issue
> then I will enable the pipelines.
>
> Cheers,
>
> Matt
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-24 Thread Andreas Klebinger

> What about the case where the rebase *lessens* the improvement? That
is, you're expecting these 10 cases to improve, but after a rebase, only
1 improves. That's news! But a blanket "accept improvements" won't tell you.

I don't think that scenario currently triggers a CI failure. So this
wouldn't really change.

As I understand it the current logic is:

* Run tests
* Check if any cross the metric thresholds set in the test.
* If so check if that test is allowed to cross the threshold.

I believe we don't check that all benchmarks listed with an expected
in/decrease actually do so.
It would also be hard to do so reasonably without making it even harder
to push MRs through CI.

Andreas

Am 24/03/2021 um 13:08 schrieb Richard Eisenberg:

What about the case where the rebase *lessens* the improvement? That is, you're expecting 
these 10 cases to improve, but after a rebase, only 1 improves. That's news! But a 
blanket "accept improvements" won't tell you.

I'm not hard against this proposal, because I know precise tracking has its own 
costs. Just wanted to bring up another scenario that might be factored in.

Richard


On Mar 24, 2021, at 7:44 AM, Andreas Klebinger  wrote:

After the idea of letting marge accept unexpected perf improvements and
looking at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4759
which failed because of a single test, for a single build flavour
crossing the
improvement threshold where CI fails after rebasing I wondered.

When would accepting a unexpected perf improvement ever backfire?

In practice I either have a patch that I expect to improve performance
for some things
so I want to accept whatever gains I get. Or I don't expect improvements
so it's *maybe*
worth failing CI for in case I optimized away some code I shouldn't or
something of that
sort.

How could this be actionable? Perhaps having a set of indicator for CI of
"Accept allocation decreases"
"Accept residency decreases"

Would be saner. I have personally *never* gotten value out of the
requirement
to list the indivial tests that improve. Usually a whole lot of them do.
Some cross
the threshold so I add them. If I'm unlucky I have to rebase and a new
one might
make it across the threshold.

Being able to accept improvements (but not regressions) wholesale might be a
reasonable alternative.

Opinions?

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-24 Thread Moritz Angermann
Yes, this is exactly one of the issues that marge might run into as well,
the aggregate ends up performing differently from the individual ones. Now
we have marge to ensure that at least the aggregate builds together, which
is the whole point of these merge trains. Not to end up in a situation
where two patches that are fine on their own, end up to produce a broken
merged state that doesn't build anymore.

Now we have marge to ensure every commit is buildable. Next we should run
regression tests on all commits on master (and that includes each and
everyone that marge brings into master. Then we have visualisation that
tells us how performance metrics go up/down over time, and we can drill
down into commits if they yield interesting results in either way.

Now lets say you had a commit that should have made GHC 50% faster across
the board, but somehow after the aggregate with other patches this didn't
happen anymore? We'd still expect this to somehow show in each of the
singular commits on master right?

On Wed, Mar 24, 2021 at 8:09 PM Richard Eisenberg  wrote:

> What about the case where the rebase *lessens* the improvement? That is,
> you're expecting these 10 cases to improve, but after a rebase, only 1
> improves. That's news! But a blanket "accept improvements" won't tell you.
>
> I'm not hard against this proposal, because I know precise tracking has
> its own costs. Just wanted to bring up another scenario that might be
> factored in.
>
> Richard
>
> > On Mar 24, 2021, at 7:44 AM, Andreas Klebinger 
> wrote:
> >
> > After the idea of letting marge accept unexpected perf improvements and
> > looking at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4759
> > which failed because of a single test, for a single build flavour
> > crossing the
> > improvement threshold where CI fails after rebasing I wondered.
> >
> > When would accepting a unexpected perf improvement ever backfire?
> >
> > In practice I either have a patch that I expect to improve performance
> > for some things
> > so I want to accept whatever gains I get. Or I don't expect improvements
> > so it's *maybe*
> > worth failing CI for in case I optimized away some code I shouldn't or
> > something of that
> > sort.
> >
> > How could this be actionable? Perhaps having a set of indicator for CI of
> > "Accept allocation decreases"
> > "Accept residency decreases"
> >
> > Would be saner. I have personally *never* gotten value out of the
> > requirement
> > to list the indivial tests that improve. Usually a whole lot of them do.
> > Some cross
> > the threshold so I add them. If I'm unlucky I have to rebase and a new
> > one might
> > make it across the threshold.
> >
> > Being able to accept improvements (but not regressions) wholesale might
> be a
> > reasonable alternative.
> >
> > Opinions?
> >
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-24 Thread Richard Eisenberg
What about the case where the rebase *lessens* the improvement? That is, you're 
expecting these 10 cases to improve, but after a rebase, only 1 improves. 
That's news! But a blanket "accept improvements" won't tell you.

I'm not hard against this proposal, because I know precise tracking has its own 
costs. Just wanted to bring up another scenario that might be factored in.

Richard

> On Mar 24, 2021, at 7:44 AM, Andreas Klebinger  
> wrote:
> 
> After the idea of letting marge accept unexpected perf improvements and
> looking at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4759
> which failed because of a single test, for a single build flavour
> crossing the
> improvement threshold where CI fails after rebasing I wondered.
> 
> When would accepting a unexpected perf improvement ever backfire?
> 
> In practice I either have a patch that I expect to improve performance
> for some things
> so I want to accept whatever gains I get. Or I don't expect improvements
> so it's *maybe*
> worth failing CI for in case I optimized away some code I shouldn't or
> something of that
> sort.
> 
> How could this be actionable? Perhaps having a set of indicator for CI of
> "Accept allocation decreases"
> "Accept residency decreases"
> 
> Would be saner. I have personally *never* gotten value out of the
> requirement
> to list the indivial tests that improve. Usually a whole lot of them do.
> Some cross
> the threshold so I add them. If I'm unlucky I have to rebase and a new
> one might
> make it across the threshold.
> 
> Being able to accept improvements (but not regressions) wholesale might be a
> reasonable alternative.
> 
> Opinions?
> 
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-24 Thread Andreas Klebinger

After the idea of letting marge accept unexpected perf improvements and
looking at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4759
which failed because of a single test, for a single build flavour
crossing the
improvement threshold where CI fails after rebasing I wondered.

When would accepting a unexpected perf improvement ever backfire?

In practice I either have a patch that I expect to improve performance
for some things
so I want to accept whatever gains I get. Or I don't expect improvements
so it's *maybe*
worth failing CI for in case I optimized away some code I shouldn't or
something of that
sort.

How could this be actionable? Perhaps having a set of indicator for CI of
"Accept allocation decreases"
"Accept residency decreases"

Would be saner. I have personally *never* gotten value out of the
requirement
to list the indivial tests that improve. Usually a whole lot of them do.
Some cross
the threshold so I add them. If I'm unlucky I have to rebase and a new
one might
make it across the threshold.

Being able to accept improvements (but not regressions) wholesale might be a
reasonable alternative.

Opinions?

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: On CI

2021-03-18 Thread Ben Gamari
Simon Peyton Jones via ghc-devs  writes:

> > We need to do something about this, and I'd advocate for just not making 
> > stats fail with marge.
>
> Generally I agree. One point you don’t mention is that our perf tests
> (which CI forces us to look at assiduously) are often pretty weird
> cases. So there is at least a danger that these more exotic cases will
> stand in the way of (say) a perf improvement in the typical case.
>
> But “not making stats fail” is a bit crude.   Instead how about
>
To be clear, the proposal isn't to accept stats failures for merge request
validation jobs. I believe Moritz was merely suggesting that we accept
such failures in marge-bot validations (that is, the pre-merge
validation done on batches of merge requests).

In my opinion this is reasonable since we know that all of the MRs in
the batch do not individually regress. While it's possible that
interactions between two or more MRs result in a qualitative change in
performance, it seems quite unlikely. What is far *more* likely (and
what we see regularly) is that the cumulative effect of a batch of
improving patches pushes the batches' overall stat change out of the
acceptance threshold. This is quite annoying as it dooms the entire
batch.

For this reason, I think we should at very least accept stat
improvements during Marge validations (as you suggest). I agree that we
probably want a batch to fail if two patches accumulate to form a
regression, even if the two passed CI individually.

>   * We already have per-benchmark windows. If the stat falls outside
>   the window, we fail. You are effectively saying “widen all windows
>   to infinity”. If something makes a stat 10 times worse, I think we
>   *should* fail. But 10% worse? Maybe we should accept and look later
>   as you suggest. So I’d argue for widening the windows rather than
>   disabling them completely.
>
Yes, I agree.
>
>   * If we did that we’d need good instrumentation to spot steps and
>   drift in perf, as you say. An advantage is that since the perf
>   instrumentation runs only on committed master patches, not on every
>   CI, it can cost more. In particular , it could run a bunch of
>   “typical” tests, including nofib and compiling Cabal or other
>   libraries.
>
We already have the beginnings of such instrumentation.

> The big danger is that by relieving patch authors from worrying about
> perf drift, it’ll end up in the lap of the GHC HQ team. If it’s hard
> for the author of a single patch (with which she is intimately
> familiar) to work out why it’s making some test 2% worse, imagine how
> hard, and demotivating, it’d be for Ben to wonder why 50 patches (with
> which he is unfamiliar) are making some test 5% worse.
>
Yes, I absolutely agree with this. I would very much like to avoid
having to do this sort of post-hoc investigation any more than
necessary.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-18 Thread Ben Gamari
Karel Gardas  writes:

> On 3/17/21 4:16 PM, Andreas Klebinger wrote:
>> Now that isn't really an issue anyway I think. The question is rather is
>> 2% a large enough regression to worry about? 5%? 10%?
>
> 5-10% is still around system noise even on lightly loaded workstation.
> Not sure if CI is not run on some shared cloud resources where it may be
> even higher.
>
I think when we say "performance" we should be clear about what we are
referring to. Currently, GHC does not measure instructions/cycles/time.
We only measure allocations and residency. These are significantly more
deterministic than time measurements, even on cloud hardware.

I do think that eventually we should start to measure a broader spectrum
of metrics, but this is something that can be done on dedicated hardware
as a separate CI job.

> I've done simple experiment of pining ghc compiling ghc-cabal and I've
> been able to "speed" it up by 5-10% on W-2265.
>
Do note that once we switch to Hadrian ghc-cabal will vanish entirely
(since Hadrian implements its functionality directly).

> Also following this CI/performance regs discussion I'm not entirely sure
> if  this is not just a witch-hunt hurting/beating mostly most active GHC
> developers. Another idea may be to give up on CI doing perf reg testing
> at all and invest saved resources into proper investigation of
> GHC/Haskell programs performance. Not sure, if this would not be more
> beneficial longer term.
>
I don't think this would be beneficial. It's much easier to prevent a
regression from getting into the tree than it is to find and
characterise it after it has been merged.

> Just one random number thrown to the ring. Linux's perf claims that
> nearly every second L3 cache access on the example above ends with cache
> miss. Is it a good number or bad number? See stats below (perf stat -d
> on ghc with +RTS -T -s -RTS').
>
It is very hard to tell; it sounds bad but it is not easy to know why or
whether it is possible to improve. This is one of the reasons why I have
been trying to improve sharing within GHC recently; reducing residency should
improve cache locality.

Nevertheless, the difficulty interpreting architectural events is why I
generally only use `perf` for differential measurements.

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-18 Thread John Ericson
My guess is most of the "noise" is not run time, but the compiled code 
changing in hard to predict ways.


https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1776/diffs for 
example was a very small PR that took *months* of on-off work to get 
passing metrics tests. In the end, binding `is_boot` twice helped a bit, 
and dumb luck helped a little bit more. No matter how you analyze that, 
that's a lot of pain for what's manifestly a performance-irrelevant MR 
--- no one is writing 10,000 default methods or whatever could possibly 
make this the micro-optimizing worth it!


Perhaps this is an extreme example, but my rough sense is that it's not 
an isolated outlier.


John

On 3/18/21 1:39 PM, davean wrote:
I left the wiggle room for things like longer wall time causing more 
time events in the IO Manager/RTS which can be a thermal/HW issue.

They're small and indirect though

-davean

On Thu, Mar 18, 2021 at 1:37 PM Sebastian Graf > wrote:


To be clear: All performance tests that run as part of CI measure
allocations only. No wall clock time.
Those measurements are (mostly) deterministic and reproducible
between compiles of the same worktree and not impacted by thermal
issues/hardware at all.

Am Do., 18. März 2021 um 18:09 Uhr schrieb davean mailto:dav...@xkcd.com>>:

That really shouldn't be near system noise for a well
constructed performance test. You might be seeing things like
thermal issues, etc though - good benchmarking is a serious
subject.
Also we're not talking wall clock tests, we're talking
specific metrics. The machines do tend to be bare metal, but
many of these are entirely CPU performance independent, memory
timing independent, etc. Well not quite but that's a longer
discussion.

The investigation of Haskell code performance is a very good
thing to do BTW, but you'd still want to avoid regressions in
the improvements you made. How well we can do that and the
cost of it is the primary issue here.

-davean


On Wed, Mar 17, 2021 at 6:22 PM Karel Gardas
mailto:karel.gar...@centrum.cz>> wrote:

On 3/17/21 4:16 PM, Andreas Klebinger wrote:
> Now that isn't really an issue anyway I think. The
question is rather is
> 2% a large enough regression to worry about? 5%? 10%?

5-10% is still around system noise even on lightly loaded
workstation.
Not sure if CI is not run on some shared cloud resources
where it may be
even higher.

I've done simple experiment of pining ghc compiling
ghc-cabal and I've
been able to "speed" it up by 5-10% on W-2265.

Also following this CI/performance regs discussion I'm not
entirely sure
if  this is not just a witch-hunt hurting/beating mostly
most active GHC
developers. Another idea may be to give up on CI doing
perf reg testing
at all and invest saved resources into proper investigation of
GHC/Haskell programs performance. Not sure, if this would
not be more
beneficial longer term.

Just one random number thrown to the ring. Linux's perf
claims that
nearly every second L3 cache access on the example above
ends with cache
miss. Is it a good number or bad number? See stats below
(perf stat -d
on ghc with +RTS -T -s -RTS').

Good luck to anybody working on that!

Karel


Linking utils/ghc-cabal/dist/build/tmp/ghc-cabal ...
  61,020,836,136 bytes allocated in the heap
   5,229,185,608 bytes copied during GC
     301,742,768 bytes maximum residency (19 sample(s))
       3,533,000 bytes maximum slop
             840 MiB total memory in use (0 MB lost due to
fragmentation)

                                     Tot time (elapsed) 
Avg pause  Max
pause
  Gen  0      2012 colls,     0 par    5.725s  5.731s   
 0.0028s
0.1267s
  Gen  1        19 colls,     0 par    1.695s  1.696s   
 0.0893s
0.2636s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0
fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   27.849s  ( 32.163s elapsed)
  GC      time    7.419s  (  7.427s elapsed)
  EXIT    time    0.000s  (  0.010s elapsed)
  Total   time   35.269s  ( 39.601s elapsed)

  Alloc rate    2,191,122,004 bytes per MUT second

  Productivity  79.0% of total 

Re: On CI

2021-03-18 Thread davean
I left the wiggle room for things like longer wall time causing more time
events in the IO Manager/RTS which can be a thermal/HW issue.
They're small and indirect though

-davean

On Thu, Mar 18, 2021 at 1:37 PM Sebastian Graf  wrote:

> To be clear: All performance tests that run as part of CI measure
> allocations only. No wall clock time.
> Those measurements are (mostly) deterministic and reproducible between
> compiles of the same worktree and not impacted by thermal issues/hardware
> at all.
>
> Am Do., 18. März 2021 um 18:09 Uhr schrieb davean :
>
>> That really shouldn't be near system noise for a well constructed
>> performance test. You might be seeing things like thermal issues, etc
>> though - good benchmarking is a serious subject.
>> Also we're not talking wall clock tests, we're talking specific metrics.
>> The machines do tend to be bare metal, but many of these are entirely CPU
>> performance independent, memory timing independent, etc. Well not quite but
>> that's a longer discussion.
>>
>> The investigation of Haskell code performance is a very good thing to do
>> BTW, but you'd still want to avoid regressions in the improvements you
>> made. How well we can do that and the cost of it is the primary issue here.
>>
>> -davean
>>
>>
>> On Wed, Mar 17, 2021 at 6:22 PM Karel Gardas 
>> wrote:
>>
>>> On 3/17/21 4:16 PM, Andreas Klebinger wrote:
>>> > Now that isn't really an issue anyway I think. The question is rather
>>> is
>>> > 2% a large enough regression to worry about? 5%? 10%?
>>>
>>> 5-10% is still around system noise even on lightly loaded workstation.
>>> Not sure if CI is not run on some shared cloud resources where it may be
>>> even higher.
>>>
>>> I've done simple experiment of pining ghc compiling ghc-cabal and I've
>>> been able to "speed" it up by 5-10% on W-2265.
>>>
>>> Also following this CI/performance regs discussion I'm not entirely sure
>>> if  this is not just a witch-hunt hurting/beating mostly most active GHC
>>> developers. Another idea may be to give up on CI doing perf reg testing
>>> at all and invest saved resources into proper investigation of
>>> GHC/Haskell programs performance. Not sure, if this would not be more
>>> beneficial longer term.
>>>
>>> Just one random number thrown to the ring. Linux's perf claims that
>>> nearly every second L3 cache access on the example above ends with cache
>>> miss. Is it a good number or bad number? See stats below (perf stat -d
>>> on ghc with +RTS -T -s -RTS').
>>>
>>> Good luck to anybody working on that!
>>>
>>> Karel
>>>
>>>
>>> Linking utils/ghc-cabal/dist/build/tmp/ghc-cabal ...
>>>   61,020,836,136 bytes allocated in the heap
>>>5,229,185,608 bytes copied during GC
>>>  301,742,768 bytes maximum residency (19 sample(s))
>>>3,533,000 bytes maximum slop
>>>  840 MiB total memory in use (0 MB lost due to fragmentation)
>>>
>>>  Tot time (elapsed)  Avg pause  Max
>>> pause
>>>   Gen  0  2012 colls, 0 par5.725s   5.731s 0.0028s
>>> 0.1267s
>>>   Gen  119 colls, 0 par1.695s   1.696s 0.0893s
>>> 0.2636s
>>>
>>>   TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
>>>
>>>   SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
>>>
>>>   INITtime0.000s  (  0.000s elapsed)
>>>   MUT time   27.849s  ( 32.163s elapsed)
>>>   GC  time7.419s  (  7.427s elapsed)
>>>   EXITtime0.000s  (  0.010s elapsed)
>>>   Total   time   35.269s  ( 39.601s elapsed)
>>>
>>>   Alloc rate2,191,122,004 bytes per MUT second
>>>
>>>   Productivity  79.0% of total user, 81.2% of total elapsed
>>>
>>>
>>>  Performance counter stats for
>>> '/export/home/karel/sfw/ghc-8.10.3/bin/ghc -H32m -O -Wall -optc-Wall -O0
>>> -hide-all-packages -package ghc-prim -package base -package binary
>>> -package array -package transformers -package time -package containers
>>> -package bytestring -package deepseq -package process -package pretty
>>> -package directory -package filepath -package template-haskell -package
>>> unix --make utils/ghc-cabal/Main.hs -o
>>> utils/ghc-cabal/dist/build/tmp/ghc-cabal -no-user-package-db -Wall
>>> -fno-warn-unused-imports -fno-warn-warnings-deprecations
>>> -DCABAL_VERSION=3,4,0,0 -DBOOTSTRAPPING -odir bootstrapping -hidir
>>> bootstrapping libraries/Cabal/Cabal/Distribution/Fields/Lexer.hs
>>> -ilibraries/Cabal/Cabal -ilibraries/binary/src -ilibraries/filepath
>>> -ilibraries/hpc -ilibraries/mtl -ilibraries/text/src
>>> libraries/text/cbits/cbits.c -Ilibraries/text/include
>>> -ilibraries/parsec/src +RTS -T -s -RTS':
>>>
>>>  39,632.99 msec task-clock#0.999 CPUs
>>> utilized
>>> 17,191  context-switches  #0.434 K/sec
>>>
>>>  0  cpu-migrations#0.000 K/sec
>>>
>>>899,930  page-faults   #0.023 M/sec
>>>
>>>177,636,979,975  cycles#

Re: On CI

2021-03-18 Thread Sebastian Graf
To be clear: All performance tests that run as part of CI measure
allocations only. No wall clock time.
Those measurements are (mostly) deterministic and reproducible between
compiles of the same worktree and not impacted by thermal issues/hardware
at all.

Am Do., 18. März 2021 um 18:09 Uhr schrieb davean :

> That really shouldn't be near system noise for a well constructed
> performance test. You might be seeing things like thermal issues, etc
> though - good benchmarking is a serious subject.
> Also we're not talking wall clock tests, we're talking specific metrics.
> The machines do tend to be bare metal, but many of these are entirely CPU
> performance independent, memory timing independent, etc. Well not quite but
> that's a longer discussion.
>
> The investigation of Haskell code performance is a very good thing to do
> BTW, but you'd still want to avoid regressions in the improvements you
> made. How well we can do that and the cost of it is the primary issue here.
>
> -davean
>
>
> On Wed, Mar 17, 2021 at 6:22 PM Karel Gardas 
> wrote:
>
>> On 3/17/21 4:16 PM, Andreas Klebinger wrote:
>> > Now that isn't really an issue anyway I think. The question is rather is
>> > 2% a large enough regression to worry about? 5%? 10%?
>>
>> 5-10% is still around system noise even on lightly loaded workstation.
>> Not sure if CI is not run on some shared cloud resources where it may be
>> even higher.
>>
>> I've done simple experiment of pining ghc compiling ghc-cabal and I've
>> been able to "speed" it up by 5-10% on W-2265.
>>
>> Also following this CI/performance regs discussion I'm not entirely sure
>> if  this is not just a witch-hunt hurting/beating mostly most active GHC
>> developers. Another idea may be to give up on CI doing perf reg testing
>> at all and invest saved resources into proper investigation of
>> GHC/Haskell programs performance. Not sure, if this would not be more
>> beneficial longer term.
>>
>> Just one random number thrown to the ring. Linux's perf claims that
>> nearly every second L3 cache access on the example above ends with cache
>> miss. Is it a good number or bad number? See stats below (perf stat -d
>> on ghc with +RTS -T -s -RTS').
>>
>> Good luck to anybody working on that!
>>
>> Karel
>>
>>
>> Linking utils/ghc-cabal/dist/build/tmp/ghc-cabal ...
>>   61,020,836,136 bytes allocated in the heap
>>5,229,185,608 bytes copied during GC
>>  301,742,768 bytes maximum residency (19 sample(s))
>>3,533,000 bytes maximum slop
>>  840 MiB total memory in use (0 MB lost due to fragmentation)
>>
>>  Tot time (elapsed)  Avg pause  Max
>> pause
>>   Gen  0  2012 colls, 0 par5.725s   5.731s 0.0028s
>> 0.1267s
>>   Gen  119 colls, 0 par1.695s   1.696s 0.0893s
>> 0.2636s
>>
>>   TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
>>
>>   SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
>>
>>   INITtime0.000s  (  0.000s elapsed)
>>   MUT time   27.849s  ( 32.163s elapsed)
>>   GC  time7.419s  (  7.427s elapsed)
>>   EXITtime0.000s  (  0.010s elapsed)
>>   Total   time   35.269s  ( 39.601s elapsed)
>>
>>   Alloc rate2,191,122,004 bytes per MUT second
>>
>>   Productivity  79.0% of total user, 81.2% of total elapsed
>>
>>
>>  Performance counter stats for
>> '/export/home/karel/sfw/ghc-8.10.3/bin/ghc -H32m -O -Wall -optc-Wall -O0
>> -hide-all-packages -package ghc-prim -package base -package binary
>> -package array -package transformers -package time -package containers
>> -package bytestring -package deepseq -package process -package pretty
>> -package directory -package filepath -package template-haskell -package
>> unix --make utils/ghc-cabal/Main.hs -o
>> utils/ghc-cabal/dist/build/tmp/ghc-cabal -no-user-package-db -Wall
>> -fno-warn-unused-imports -fno-warn-warnings-deprecations
>> -DCABAL_VERSION=3,4,0,0 -DBOOTSTRAPPING -odir bootstrapping -hidir
>> bootstrapping libraries/Cabal/Cabal/Distribution/Fields/Lexer.hs
>> -ilibraries/Cabal/Cabal -ilibraries/binary/src -ilibraries/filepath
>> -ilibraries/hpc -ilibraries/mtl -ilibraries/text/src
>> libraries/text/cbits/cbits.c -Ilibraries/text/include
>> -ilibraries/parsec/src +RTS -T -s -RTS':
>>
>>  39,632.99 msec task-clock#0.999 CPUs
>> utilized
>> 17,191  context-switches  #0.434 K/sec
>>
>>  0  cpu-migrations#0.000 K/sec
>>
>>899,930  page-faults   #0.023 M/sec
>>
>>177,636,979,975  cycles#4.482 GHz
>>   (87.54%)
>>181,945,795,221  instructions  #1.02  insn per
>> cycle   (87.59%)
>> 34,033,574,511  branches  #  858.718 M/sec
>>   (87.42%)
>>  1,664,969,299  branch-misses #4.89% of all
>> branches  (87.48%)
>> 41,522,737,426   

Re: On CI

2021-03-18 Thread davean
That really shouldn't be near system noise for a well constructed
performance test. You might be seeing things like thermal issues, etc
though - good benchmarking is a serious subject.
Also we're not talking wall clock tests, we're talking specific metrics.
The machines do tend to be bare metal, but many of these are entirely CPU
performance independent, memory timing independent, etc. Well not quite but
that's a longer discussion.

The investigation of Haskell code performance is a very good thing to do
BTW, but you'd still want to avoid regressions in the improvements you
made. How well we can do that and the cost of it is the primary issue here.

-davean


On Wed, Mar 17, 2021 at 6:22 PM Karel Gardas 
wrote:

> On 3/17/21 4:16 PM, Andreas Klebinger wrote:
> > Now that isn't really an issue anyway I think. The question is rather is
> > 2% a large enough regression to worry about? 5%? 10%?
>
> 5-10% is still around system noise even on lightly loaded workstation.
> Not sure if CI is not run on some shared cloud resources where it may be
> even higher.
>
> I've done simple experiment of pining ghc compiling ghc-cabal and I've
> been able to "speed" it up by 5-10% on W-2265.
>
> Also following this CI/performance regs discussion I'm not entirely sure
> if  this is not just a witch-hunt hurting/beating mostly most active GHC
> developers. Another idea may be to give up on CI doing perf reg testing
> at all and invest saved resources into proper investigation of
> GHC/Haskell programs performance. Not sure, if this would not be more
> beneficial longer term.
>
> Just one random number thrown to the ring. Linux's perf claims that
> nearly every second L3 cache access on the example above ends with cache
> miss. Is it a good number or bad number? See stats below (perf stat -d
> on ghc with +RTS -T -s -RTS').
>
> Good luck to anybody working on that!
>
> Karel
>
>
> Linking utils/ghc-cabal/dist/build/tmp/ghc-cabal ...
>   61,020,836,136 bytes allocated in the heap
>5,229,185,608 bytes copied during GC
>  301,742,768 bytes maximum residency (19 sample(s))
>3,533,000 bytes maximum slop
>  840 MiB total memory in use (0 MB lost due to fragmentation)
>
>  Tot time (elapsed)  Avg pause  Max
> pause
>   Gen  0  2012 colls, 0 par5.725s   5.731s 0.0028s
> 0.1267s
>   Gen  119 colls, 0 par1.695s   1.696s 0.0893s
> 0.2636s
>
>   TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
>
>   SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
>
>   INITtime0.000s  (  0.000s elapsed)
>   MUT time   27.849s  ( 32.163s elapsed)
>   GC  time7.419s  (  7.427s elapsed)
>   EXITtime0.000s  (  0.010s elapsed)
>   Total   time   35.269s  ( 39.601s elapsed)
>
>   Alloc rate2,191,122,004 bytes per MUT second
>
>   Productivity  79.0% of total user, 81.2% of total elapsed
>
>
>  Performance counter stats for
> '/export/home/karel/sfw/ghc-8.10.3/bin/ghc -H32m -O -Wall -optc-Wall -O0
> -hide-all-packages -package ghc-prim -package base -package binary
> -package array -package transformers -package time -package containers
> -package bytestring -package deepseq -package process -package pretty
> -package directory -package filepath -package template-haskell -package
> unix --make utils/ghc-cabal/Main.hs -o
> utils/ghc-cabal/dist/build/tmp/ghc-cabal -no-user-package-db -Wall
> -fno-warn-unused-imports -fno-warn-warnings-deprecations
> -DCABAL_VERSION=3,4,0,0 -DBOOTSTRAPPING -odir bootstrapping -hidir
> bootstrapping libraries/Cabal/Cabal/Distribution/Fields/Lexer.hs
> -ilibraries/Cabal/Cabal -ilibraries/binary/src -ilibraries/filepath
> -ilibraries/hpc -ilibraries/mtl -ilibraries/text/src
> libraries/text/cbits/cbits.c -Ilibraries/text/include
> -ilibraries/parsec/src +RTS -T -s -RTS':
>
>  39,632.99 msec task-clock#0.999 CPUs
> utilized
> 17,191  context-switches  #0.434 K/sec
>
>  0  cpu-migrations#0.000 K/sec
>
>899,930  page-faults   #0.023 M/sec
>
>177,636,979,975  cycles#4.482 GHz
>   (87.54%)
>181,945,795,221  instructions  #1.02  insn per
> cycle   (87.59%)
> 34,033,574,511  branches  #  858.718 M/sec
>   (87.42%)
>  1,664,969,299  branch-misses #4.89% of all
> branches  (87.48%)
> 41,522,737,426  L1-dcache-loads   # 1047.681 M/sec
>   (87.53%)
>  2,675,319,939  L1-dcache-load-misses #6.44% of all
> L1-dcache hits(87.48%)
>372,370,395  LLC-loads #9.395 M/sec
>   (87.49%)
>173,614,140  LLC-load-misses   #   46.62% of all
> LL-cache hits (87.46%)
>
>   39.663103602 seconds time elapsed
>
>   38.288158000 

Re: On CI

2021-03-17 Thread Karel Gardas
On 3/17/21 4:16 PM, Andreas Klebinger wrote:
> Now that isn't really an issue anyway I think. The question is rather is
> 2% a large enough regression to worry about? 5%? 10%?

5-10% is still around system noise even on lightly loaded workstation.
Not sure if CI is not run on some shared cloud resources where it may be
even higher.

I've done simple experiment of pining ghc compiling ghc-cabal and I've
been able to "speed" it up by 5-10% on W-2265.

Also following this CI/performance regs discussion I'm not entirely sure
if  this is not just a witch-hunt hurting/beating mostly most active GHC
developers. Another idea may be to give up on CI doing perf reg testing
at all and invest saved resources into proper investigation of
GHC/Haskell programs performance. Not sure, if this would not be more
beneficial longer term.

Just one random number thrown to the ring. Linux's perf claims that
nearly every second L3 cache access on the example above ends with cache
miss. Is it a good number or bad number? See stats below (perf stat -d
on ghc with +RTS -T -s -RTS').

Good luck to anybody working on that!

Karel


Linking utils/ghc-cabal/dist/build/tmp/ghc-cabal ...
  61,020,836,136 bytes allocated in the heap
   5,229,185,608 bytes copied during GC
 301,742,768 bytes maximum residency (19 sample(s))
   3,533,000 bytes maximum slop
 840 MiB total memory in use (0 MB lost due to fragmentation)

 Tot time (elapsed)  Avg pause  Max
pause
  Gen  0  2012 colls, 0 par5.725s   5.731s 0.0028s
0.1267s
  Gen  119 colls, 0 par1.695s   1.696s 0.0893s
0.2636s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INITtime0.000s  (  0.000s elapsed)
  MUT time   27.849s  ( 32.163s elapsed)
  GC  time7.419s  (  7.427s elapsed)
  EXITtime0.000s  (  0.010s elapsed)
  Total   time   35.269s  ( 39.601s elapsed)

  Alloc rate2,191,122,004 bytes per MUT second

  Productivity  79.0% of total user, 81.2% of total elapsed


 Performance counter stats for
'/export/home/karel/sfw/ghc-8.10.3/bin/ghc -H32m -O -Wall -optc-Wall -O0
-hide-all-packages -package ghc-prim -package base -package binary
-package array -package transformers -package time -package containers
-package bytestring -package deepseq -package process -package pretty
-package directory -package filepath -package template-haskell -package
unix --make utils/ghc-cabal/Main.hs -o
utils/ghc-cabal/dist/build/tmp/ghc-cabal -no-user-package-db -Wall
-fno-warn-unused-imports -fno-warn-warnings-deprecations
-DCABAL_VERSION=3,4,0,0 -DBOOTSTRAPPING -odir bootstrapping -hidir
bootstrapping libraries/Cabal/Cabal/Distribution/Fields/Lexer.hs
-ilibraries/Cabal/Cabal -ilibraries/binary/src -ilibraries/filepath
-ilibraries/hpc -ilibraries/mtl -ilibraries/text/src
libraries/text/cbits/cbits.c -Ilibraries/text/include
-ilibraries/parsec/src +RTS -T -s -RTS':

 39,632.99 msec task-clock#0.999 CPUs
utilized
17,191  context-switches  #0.434 K/sec

 0  cpu-migrations#0.000 K/sec

   899,930  page-faults   #0.023 M/sec

   177,636,979,975  cycles#4.482 GHz
  (87.54%)
   181,945,795,221  instructions  #1.02  insn per
cycle   (87.59%)
34,033,574,511  branches  #  858.718 M/sec
  (87.42%)
 1,664,969,299  branch-misses #4.89% of all
branches  (87.48%)
41,522,737,426  L1-dcache-loads   # 1047.681 M/sec
  (87.53%)
 2,675,319,939  L1-dcache-load-misses #6.44% of all
L1-dcache hits(87.48%)
   372,370,395  LLC-loads #9.395 M/sec
  (87.49%)
   173,614,140  LLC-load-misses   #   46.62% of all
LL-cache hits (87.46%)

  39.663103602 seconds time elapsed

  38.288158000 seconds user
   1.358263000 seconds sys
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Merijn Verstraaten
On 17 Mar 2021, at 16:16, Andreas Klebinger  wrote:
> 
> While I fully agree with this. We should *always* want to know if a small 
> syntetic benchmark regresses by a lot.
> Or in other words we don't want CI to accept such a regression for us ever, 
> but the developer of a patch should need to explicitly ok it.
> 
> Otherwise we just slow down a lot of seldom-used code paths by a lot.
> 
> Now that isn't really an issue anyway I think. The question is rather is 2% a 
> large enough regression to worry about? 5%? 10%?

You probably want a sliding window anyway. Having N 1.8% regressions in a row 
can still slow things down a lot. While a 3% regression after a 5% improvement 
is probably fine.

- Merijn


signature.asc
Description: Message signed with OpenPGP
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Andreas Klebinger

> I'd be quite happy to accept a 25% regression on T9872c if it yielded
a 1% improvement on compiling Cabal. T9872 is very very very strange!
(Maybe if *all* the T9872 tests regressed, I'd be more worried.)

While I fully agree with this. We should *always* want to know if a
small syntetic benchmark regresses by a lot.
Or in other words we don't want CI to accept such a regression for us
ever, but the developer of a patch should need to explicitly ok it.

Otherwise we just slow down a lot of seldom-used code paths by a lot.

Now that isn't really an issue anyway I think. The question is rather is
2% a large enough regression to worry about? 5%? 10%?

Cheers,
Andreas

Am 17/03/2021 um 14:39 schrieb Richard Eisenberg:




On Mar 17, 2021, at 6:18 AM, Moritz Angermann
mailto:moritz.angerm...@gmail.com>> wrote:

But what do we expect of patch authors? Right now if five people
write patches to GHC, and each of them eventually manage to get their
MRs green, after a long review, they finally see it assigned to
marge, and then it starts failing? Their patch on its own was fine,
but their aggregate with other people's code leads to regressions? So
we now expect all patch authors together to try to figure out what
happened? Figuring out why something regressed is hard enough, and we
only have a very few people who are actually capable of debugging
this. Thus I believe it would end up with Ben, Andreas, Matthiew,
Simon, ... or someone else from GHC HQ anyway to figure out why it
regressed, be it in the Review Stage, or dissecting a marge
aggregate, or on master.


I have previously posted against the idea of allowing Marge to accept
regressions... but the paragraph above is sadly convincing. Maybe
Simon is right about opening up the windows to, say, be 100% (which
would catch a 10x regression) instead of infinite, but I'm now
convinced that Marge should be very generous in allowing regressions
-- provided we also have some way of monitoring drift over time.

Separately, I've been concerned for some time about the peculiarity of
our perf tests. For example, I'd be quite happy to accept a 25%
regression on T9872c if it yielded a 1% improvement on compiling
Cabal. T9872 is very very very strange! (Maybe if *all* the T9872
tests regressed, I'd be more worried.) I would be very happy to learn
that some more general, representative tests are included in our
examinations.

Richard

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread John Ericson
Yes, I think the counter point of "automating what Ben does" so people 
besides Ben can do it is very important. In this case, I think a good 
thing we could do is asynchronously build more of master post-merge, 
such as use the perf stats to automatically bisect anything that is 
fishy, including within marge bot roll-ups which wouldn't be built by 
the regular workflow anyways.


I also agree with Sebastian that the overfit/overly-synthetic nature of 
our current tests + the sketchy way we ignored drift makes the current 
approach worth abandoning in any event. The fact that the gold standard 
must include tests of larger, "real world" code, which unfortunately 
takes longer to build, I also think is a point towards this asynchronous 
approach: We trade MR latency for stat latency, but better utilize our 
build machines and get better stats, and when a human is to fix 
something a few days later, they have a much better foundation to start 
their investigation.


Finally I agree with SPJ that for fairness and sustainability's sake, 
the person investigating issues after the fact should ideally be the MR 
authors, and definitely definitely not Ben. But I hope that better 
stats, nice looking graphs, and maybe a system to automatically ping MR 
authors, will make the perf debugging much more accessible enabling that 
goal.


John

On 3/17/21 9:47 AM, Sebastian Graf wrote:
Re: Performance drift: I opened 
https://gitlab.haskell.org/ghc/ghc/-/issues/17658 
 a while ago with 
an idea of how to measure drift a bit better.
It's basically an automatically checked version of "Ben stares at 
performance reports every two weeks and sees that T9872 has regressed 
by 10% since 9.0"


Maybe we can have Marge check for drift and each individual MR for 
incremental perf regressions?


Sebastian

Am Mi., 17. März 2021 um 14:40 Uhr schrieb Richard Eisenberg 
mailto:r...@richarde.dev>>:





On Mar 17, 2021, at 6:18 AM, Moritz Angermann
mailto:moritz.angerm...@gmail.com>>
wrote:

But what do we expect of patch authors? Right now if five people
write patches to GHC, and each of them eventually manage to get
their MRs green, after a long review, they finally see it
assigned to marge, and then it starts failing? Their patch on its
own was fine, but their aggregate with other people's code leads
to regressions? So we now expect all patch authors together to
try to figure out what happened? Figuring out why something
regressed is hard enough, and we only have a very few people who
are actually capable of debugging this. Thus I believe it would
end up with Ben, Andreas, Matthiew, Simon, ... or someone else
from GHC HQ anyway to figure out why it regressed, be it in the
Review Stage, or dissecting a marge aggregate, or on master.


I have previously posted against the idea of allowing Marge to
accept regressions... but the paragraph above is sadly convincing.
Maybe Simon is right about opening up the windows to, say, be 100%
(which would catch a 10x regression) instead of infinite, but I'm
now convinced that Marge should be very generous in allowing
regressions -- provided we also have some way of monitoring drift
over time.

Separately, I've been concerned for some time about the
peculiarity of our perf tests. For example, I'd be quite happy to
accept a 25% regression on T9872c if it yielded a 1% improvement
on compiling Cabal. T9872 is very very very strange! (Maybe if
*all* the T9872 tests regressed, I'd be more worried.) I would be
very happy to learn that some more general, representative tests
are included in our examinations.

Richard
___
ghc-devs mailing list
ghc-devs@haskell.org 
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Sebastian Graf
Re: Performance drift: I opened
https://gitlab.haskell.org/ghc/ghc/-/issues/17658 a while ago with an idea
of how to measure drift a bit better.
It's basically an automatically checked version of "Ben stares at
performance reports every two weeks and sees that T9872 has regressed by
10% since 9.0"

Maybe we can have Marge check for drift and each individual MR for
incremental perf regressions?

Sebastian

Am Mi., 17. März 2021 um 14:40 Uhr schrieb Richard Eisenberg <
r...@richarde.dev>:

>
>
> On Mar 17, 2021, at 6:18 AM, Moritz Angermann 
> wrote:
>
> But what do we expect of patch authors? Right now if five people write
> patches to GHC, and each of them eventually manage to get their MRs green,
> after a long review, they finally see it assigned to marge, and then it
> starts failing? Their patch on its own was fine, but their aggregate with
> other people's code leads to regressions? So we now expect all patch
> authors together to try to figure out what happened? Figuring out why
> something regressed is hard enough, and we only have a very few people who
> are actually capable of debugging this. Thus I believe it would end up with
> Ben, Andreas, Matthiew, Simon, ... or someone else from GHC HQ anyway to
> figure out why it regressed, be it in the Review Stage, or dissecting a
> marge aggregate, or on master.
>
>
> I have previously posted against the idea of allowing Marge to accept
> regressions... but the paragraph above is sadly convincing. Maybe Simon is
> right about opening up the windows to, say, be 100% (which would catch a
> 10x regression) instead of infinite, but I'm now convinced that Marge
> should be very generous in allowing regressions -- provided we also have
> some way of monitoring drift over time.
>
> Separately, I've been concerned for some time about the peculiarity of our
> perf tests. For example, I'd be quite happy to accept a 25% regression on
> T9872c if it yielded a 1% improvement on compiling Cabal. T9872 is very
> very very strange! (Maybe if *all* the T9872 tests regressed, I'd be more
> worried.) I would be very happy to learn that some more general,
> representative tests are included in our examinations.
>
> Richard
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Richard Eisenberg


> On Mar 17, 2021, at 6:18 AM, Moritz Angermann  
> wrote:
> 
> But what do we expect of patch authors? Right now if five people write 
> patches to GHC, and each of them eventually manage to get their MRs green, 
> after a long review, they finally see it assigned to marge, and then it 
> starts failing? Their patch on its own was fine, but their aggregate with 
> other people's code leads to regressions? So we now expect all patch authors 
> together to try to figure out what happened? Figuring out why something 
> regressed is hard enough, and we only have a very few people who are actually 
> capable of debugging this. Thus I believe it would end up with Ben, Andreas, 
> Matthiew, Simon, ... or someone else from GHC HQ anyway to figure out why it 
> regressed, be it in the Review Stage, or dissecting a marge aggregate, or on 
> master.

I have previously posted against the idea of allowing Marge to accept 
regressions... but the paragraph above is sadly convincing. Maybe Simon is 
right about opening up the windows to, say, be 100% (which would catch a 10x 
regression) instead of infinite, but I'm now convinced that Marge should be 
very generous in allowing regressions -- provided we also have some way of 
monitoring drift over time.

Separately, I've been concerned for some time about the peculiarity of our perf 
tests. For example, I'd be quite happy to accept a 25% regression on T9872c if 
it yielded a 1% improvement on compiling Cabal. T9872 is very very very 
strange! (Maybe if *all* the T9872 tests regressed, I'd be more worried.) I 
would be very happy to learn that some more general, representative tests are 
included in our examinations.

Richard___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Moritz Angermann
I am not advocating to drop perf tests during merge requests, I just want
them to not be fatal for marge batches. Yes this means that a bunch of
unrelated merge requests all could be fine wrt to the perf checks per merge
request, but the aggregate might fail perf.  And then subsequently the next
MR against the merged aggregate will start failing. Even that is a pretty
bad situation imo.

I honestly don't have a good answer, I just see marge work on batches, over
and over and over again, just to fail. Eventually marge should figure out a
subset of the merges that fit into the perf window, but that might be after
10 tries? So after up to ~30+hours?, which means there won't be any merge
request landing in GHC for 30hs. I find that rather unacceptable.

I think we need better visualisation of perf regressions that happen on
master. Ben has some wip for this, and I think John said there might be
some way to add a nice (maybe reflex) ui to it.  If we can see regressions
on master easily, and go from "ohh this point in time GHC got worse", to
"this is the commit". We might be able to figure it out.

But what do we expect of patch authors? Right now if five people write
patches to GHC, and each of them eventually manage to get their MRs green,
after a long review, they finally see it assigned to marge, and then it
starts failing? Their patch on its own was fine, but their aggregate with
other people's code leads to regressions? So we now expect all patch
authors together to try to figure out what happened? Figuring out why
something regressed is hard enough, and we only have a very few people who
are actually capable of debugging this. Thus I believe it would end up with
Ben, Andreas, Matthiew, Simon, ... or someone else from GHC HQ anyway to
figure out why it regressed, be it in the Review Stage, or dissecting a
marge aggregate, or on master.

Thus I believe in most cases we'd have to look at the regressions anyway,
and right now we just convolutedly make working on GHC a rather depressing
job. Increasing the barrier to entry by also requiring everyone to have
absolutely stellar perf regression skills is quite a challenge.

There is also the question of our synthetic benchmarks actually measuring
real world performance? Do the micro benchmarks translate to the same
regressions in say building aeson, vector or Cabal? The latter being what
most practitioners care about more than the micro benchmarks.

Again, I'm absolutely not in favour of GHC regressing, it's slow enough as
it is. I just think CI should be assisting us and not holding development
back.

Cheers,
 Moritz

On Wed, Mar 17, 2021 at 5:54 PM Spiwack, Arnaud 
wrote:

> Ah, so it was really two identical pipelines (one for the branch where
> Margebot batches commits, and one for the MR that Margebot creates before
> merging). That's indeed a non-trivial amount of purely wasted
> computer-hours.
>
> Taking a step back, I am inclined to agree with the proposal of not
> checking stat regressions in Margebot. My high-level opinion on this is
> that perf tests don't actually test the right thing. Namely, they don't
> prevent performance drift over time (if a given test is allowed to degrade
> by 2% every commit, it can take a 100% performance hit in just 35 commits).
> While it is important to measure performance, and to avoid too egregious
> performance degradation in a given commit, it's usually performance over
> time which matters. I don't really know how to apply it to collaborative
> development, and help maintain healthy performance. But flagging
> performance regressions in MRs, while not making them block batched merges
> sounds like a reasonable compromise.
>
>
> On Wed, Mar 17, 2021 at 9:34 AM Moritz Angermann <
> moritz.angerm...@gmail.com> wrote:
>
>> *why* is a very good question. The MR fixing it is here:
>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5275
>>
>> On Wed, Mar 17, 2021 at 4:26 PM Spiwack, Arnaud 
>> wrote:
>>
>>> Then I have a question: why are there two pipelines running on each
>>> merge batch?
>>>
>>> On Wed, Mar 17, 2021 at 9:22 AM Moritz Angermann <
>>> moritz.angerm...@gmail.com> wrote:
>>>
 No it wasn't. It was about the stat failures described in the next
 paragraph. I could have been more clear about that. My apologies!

 On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud <
 arnaud.spiw...@tweag.io> wrote:

>
> and if either of both (see below) failed, marge's merge would fail as
>> well.
>>
>
> Re: “see below” is this referring to a missing part of your email?
>

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Spiwack, Arnaud
Ah, so it was really two identical pipelines (one for the branch where
Margebot batches commits, and one for the MR that Margebot creates before
merging). That's indeed a non-trivial amount of purely wasted
computer-hours.

Taking a step back, I am inclined to agree with the proposal of not
checking stat regressions in Margebot. My high-level opinion on this is
that perf tests don't actually test the right thing. Namely, they don't
prevent performance drift over time (if a given test is allowed to degrade
by 2% every commit, it can take a 100% performance hit in just 35 commits).
While it is important to measure performance, and to avoid too egregious
performance degradation in a given commit, it's usually performance over
time which matters. I don't really know how to apply it to collaborative
development, and help maintain healthy performance. But flagging
performance regressions in MRs, while not making them block batched merges
sounds like a reasonable compromise.


On Wed, Mar 17, 2021 at 9:34 AM Moritz Angermann 
wrote:

> *why* is a very good question. The MR fixing it is here:
> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5275
>
> On Wed, Mar 17, 2021 at 4:26 PM Spiwack, Arnaud 
> wrote:
>
>> Then I have a question: why are there two pipelines running on each merge
>> batch?
>>
>> On Wed, Mar 17, 2021 at 9:22 AM Moritz Angermann <
>> moritz.angerm...@gmail.com> wrote:
>>
>>> No it wasn't. It was about the stat failures described in the next
>>> paragraph. I could have been more clear about that. My apologies!
>>>
>>> On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud 
>>> wrote:
>>>

 and if either of both (see below) failed, marge's merge would fail as
> well.
>

 Re: “see below” is this referring to a missing part of your email?

>>>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: On CI

2021-03-17 Thread Simon Peyton Jones via ghc-devs
We need to do something about this, and I'd advocate for just not making stats 
fail with marge.

Generally I agree.   One point you don’t mention is that our perf tests (which 
CI forces us to look at assiduously) are often pretty weird cases.  So there is 
at least a danger that these more exotic cases will stand in the way of (say) a 
perf improvement in the typical case.

But “not making stats fail” is a bit crude.   Instead how about

  *   Always accept stat improvements



  *   We already have per-benchmark windows.  If the stat falls outside the 
window, we fail.  You are effectively saying “widen all windows to infinity”.  
If something makes a stat 10 times worse, I think we *should* fail.  But 10% 
worse?  Maybe we should accept and look later as you suggest.   So I’d argue 
for widening the windows rather than disabling them completely.


  *   If we did that we’d need good instrumentation to spot steps and drift in 
perf, as you say.  An advantage is that since the perf instrumentation runs 
only on committed master patches, not on every CI, it can cost more.  In 
particular , it could run a bunch of “typical” tests, including nofib and 
compiling Cabal or other libraries.

The big danger is that by relieving patch authors from worrying about perf 
drift, it’ll end up in the lap of the GHC HQ team.  If it’s hard for the author 
of a single patch (with which she is intimately familiar) to work out why it’s 
making some test 2% worse, imagine how hard, and demotivating, it’d be for Ben 
to wonder why 50 patches (with which he is unfamiliar) are making some test 5% 
worse.

I’m not sure how to address this problem.   At least we should make it clear 
that patch authors are expected to engage *actively* in a conversation about 
why their patch is making something worse, even after it lands.

Simon

From: ghc-devs  On Behalf Of Moritz Angermann
Sent: 17 March 2021 03:00
To: ghc-devs 
Subject: On CI

Hi there!

Just a quick update on our CI situation. Ben, John, Davean and I have been
discussion on CI yesterday, and what we can do about it, as well as some
minor notes on why we are frustrated with it. This is an open invitation to 
anyone who in earnest wants to work on CI. Please come forward and help!
We'd be glad to have more people involved!

First the good news, over the last few weeks we've seen we *can* improve
CI performance quite substantially. And the goal is now to have MR go through
CI within at most 3hs.  There are some ideas on how to make this even faster,
especially on wide (high core count) machines; however that will take a bit more
time.

Now to the more thorny issue: Stat failures.  We do not want GHC to regress,
and I believe everyone is on board with that mission.  Yet we have just 
witnessed a train of marge trials all fail due to a -2% regression in a few 
tests. Thus we've been blocking getting stuff into master for at least another 
day. This is (in my opinion) not acceptable! We just had five days of nothing 
working because master was broken and subsequently all CI pipelines kept 
failing. We have thus effectively wasted a week. While we can mitigate the 
latter part by enforcing marge for all merges to master (and with faster 
pipeline turnaround times this might be more palatable than with 9-12h 
turnaround times -- when you need to get something done! ha!), but that won't 
help us with issues where marge can't find a set of buildable MRs, because she 
just keeps hitting a combination of MRs that somehow together increase or 
decrease metrics.

We have three knobs to adjust:
- Make GHC build faster / make the testsuite run faster.
  There is some rather interesting work going on about parallelizing (earlier)
  during builds. We've also seen that we've wasted enormous amounts of
  time during darwin builds in the kernel, because of a bug in the testdriver.
- Use faster hardware.
  We've seen that just this can cut windows build times from 220min to 80min.
- Reduce the amount of builds.
  We used to build two pipelines for each marge merge, and if either of both
  (see below) failed, marge's merge would fail as well. So not only did we build
  twice as much as we needed, we also increased our chances to hit bogous
  build failures by 2.

We need to do something about this, and I'd advocate for just not making stats 
fail with marge. Build errors of course, but stat failures, no. And then have a 
separate dashboard (and Ben has some old code lying around for this, which 
someone would need to pick up and polish, ...), that tracks GHC's Performance 
for each commit to master, with easy access from the dashboard to the offending 
commit. We will also need to consider the implications of synthetic micro 
benchmarks, as opposed to say building Cabal or other packages, that reflect 
more real-world experience of users using GHC.

I will try to provide a data driven report on GHC's CI on a bi-weekly or month 
(we will have to see what the costs for writing it up, and the 

Re: On CI

2021-03-17 Thread Moritz Angermann
*why* is a very good question. The MR fixing it is here:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5275

On Wed, Mar 17, 2021 at 4:26 PM Spiwack, Arnaud 
wrote:

> Then I have a question: why are there two pipelines running on each merge
> batch?
>
> On Wed, Mar 17, 2021 at 9:22 AM Moritz Angermann <
> moritz.angerm...@gmail.com> wrote:
>
>> No it wasn't. It was about the stat failures described in the next
>> paragraph. I could have been more clear about that. My apologies!
>>
>> On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud 
>> wrote:
>>
>>>
>>> and if either of both (see below) failed, marge's merge would fail as
 well.

>>>
>>> Re: “see below” is this referring to a missing part of your email?
>>>
>>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Spiwack, Arnaud
Then I have a question: why are there two pipelines running on each merge
batch?

On Wed, Mar 17, 2021 at 9:22 AM Moritz Angermann 
wrote:

> No it wasn't. It was about the stat failures described in the next
> paragraph. I could have been more clear about that. My apologies!
>
> On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud 
> wrote:
>
>>
>> and if either of both (see below) failed, marge's merge would fail as
>>> well.
>>>
>>
>> Re: “see below” is this referring to a missing part of your email?
>>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Moritz Angermann
No it wasn't. It was about the stat failures described in the next
paragraph. I could have been more clear about that. My apologies!

On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud 
wrote:

>
> and if either of both (see below) failed, marge's merge would fail as well.
>>
>
> Re: “see below” is this referring to a missing part of your email?
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-03-17 Thread Spiwack, Arnaud
> and if either of both (see below) failed, marge's merge would fail as well.
>

Re: “see below” is this referring to a missing part of your email?
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-02-22 Thread John Ericson
I agree one should be able to get most of the testing value from stage1. 
And the tooling team at IOHK has done some work in 
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3652 to allow a 
stage 1 compiler to be tested. That's a very important first step!


But TH and GHCi require either iserv (external interpreter) or an 
compiler whose own ABI and the outputted ABI match for the internal 
interpreter, and ideally we should test both. I think doing a --freeze1 
stage2 build *in addition* to the stage1 build would work in the 
majority of cases, and that would allow us to incrementally build and 
test both. Remember that iserv uses the ghc library, and needs to be ABI 
comparable with the stage1 compiler that is using it, so it is less a 
panacea than it might seem like for ABI changes vs mere cross compilation.


I opened https://github.com/ghc-proposals/ghc-proposals/issues/162 for 
an ABI-agnostic interpreter that would allow stage1 alone to do GHCi and 
TH a third away unconditionally. This would also allow TH to safely be 
used in GHC itself, but for the purposes of this discussion, it's nice 
to make testing more reliable without the --freeze1 stage 2 gamble.


Bottom line is, yes, building stage 2 from a freshly-built stage 1 will 
invalidate any cache, and so we should avoid that.


John

On 2/22/21 8:42 AM, Spiwack, Arnaud wrote:
Let me know if I'm talking nonsense, but I believe that we are 
building both stages for each architecture and flavour. Do we need to 
build two stages everywhere? What stops us from building a single 
stage? And if anything, what can we change to get into a situation 
where we can?


Quite better than reusing build incrementally, is not building at all.

On Mon, Feb 22, 2021 at 10:09 AM Simon Peyton Jones via ghc-devs 
mailto:ghc-devs@haskell.org>> wrote:


Incremental CI can cut multiple hours to < mere minutes,
especially with the test suite being embarrassingly parallel.
There simply no way optimizations to the compiler independent from
sharing a cache between CI runs can get anywhere close to that
return on investment.

I rather agree with this.  I don’t think there is much low-hanging
fruit on compile times, aside from coercion-zapping which we are
working on anyway.  If we got a 10% reduction in compile time we’d
be over the moon, but our users would barely notice.

To get truly substantial improvements (a factor of 2 or 10) I
think we need to do less compiling – hence incremental CI.


Simon

*From:*ghc-devs mailto:ghc-devs-boun...@haskell.org>> *On Behalf Of *John Ericson
*Sent:* 22 February 2021 05:53
*To:* ghc-devs mailto:ghc-devs@haskell.org>>
    *Subject:* Re: On CI

I'm not opposed to some effort going into this, but I would
strongly opposite putting all our effort there. Incremental CI can
cut multiple hours to < mere minutes, especially with the test
suite being embarrassingly parallel. There simply no way
optimizations to the compiler independent from sharing a cache
between CI runs can get anywhere close to that return on investment.

(FWIW, I'm also skeptical that the people complaining about GHC
performance know what's hurting them most. For example, after
non-incrementality, the next slowest thing is linking, which
is...not done by GHC! But all that is a separate conversation.)

John

On 2/19/21 2:42 PM, Richard Eisenberg wrote:

There are some good ideas here, but I want to throw out
another one: put all our effort into reducing compile times.
There is a loud plea to do this on Discourse

<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.haskell.org%2Ft%2Fcall-for-ideas-forming-a-technical-agenda%2F1901%2F24=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691120329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=1CV0MEVUZpbAbmKAWTIiqLgjft7IbN%2BCSnvB3W3iX%2FU%3D=0>,
and it would both solve these CI problems and also help
everyone else.

This isn't to say to stop exploring the ideas here. But since
time is mostly fixed, tackling compilation times in general
may be the best way out of this. Ben's survey of other
projects (thanks!) shows that we're way, way behind in how
long our CI takes to run.

Richard



On Feb 19, 2021, at 7:20 AM, Sebastian Graf
mailto:sgraf1...@gmail.com>> wrote:

Recompilation avoidance

I think in order to cache more in CI, we first have to
invest some time in fixing recompilation avoidance in our
bootstrapped build system.

I just tested on a hadrian perf ticky build: Adding one
line of *comment* in the compiler causes

  * a (

Re: On CI

2021-02-22 Thread Spiwack, Arnaud
Let me know if I'm talking nonsense, but I believe that we are building
both stages for each architecture and flavour. Do we need to build two
stages everywhere? What stops us from building a single stage? And if
anything, what can we change to get into a situation where we can?

Quite better than reusing build incrementally, is not building at all.

On Mon, Feb 22, 2021 at 10:09 AM Simon Peyton Jones via ghc-devs <
ghc-devs@haskell.org> wrote:

> Incremental CI can cut multiple hours to < mere minutes, especially with
> the test suite being embarrassingly parallel. There simply no way
> optimizations to the compiler independent from sharing a cache between CI
> runs can get anywhere close to that return on investment.
>
> I rather agree with this.  I don’t think there is much low-hanging fruit
> on compile times, aside from coercion-zapping which we are working on
> anyway.  If we got a 10% reduction in compile time we’d be over the moon,
> but our users would barely notice.
>
>
>
> To get truly substantial improvements (a factor of 2 or 10) I think we
> need to do less compiling – hence incremental CI.
>
>
> Simon
>
>
>
> *From:* ghc-devs  *On Behalf Of *John
> Ericson
> *Sent:* 22 February 2021 05:53
> *To:* ghc-devs 
> *Subject:* Re: On CI
>
>
>
> I'm not opposed to some effort going into this, but I would strongly
> opposite putting all our effort there. Incremental CI can cut multiple
> hours to < mere minutes, especially with the test suite being
> embarrassingly parallel. There simply no way optimizations to the compiler
> independent from sharing a cache between CI runs can get anywhere close to
> that return on investment.
>
> (FWIW, I'm also skeptical that the people complaining about GHC
> performance know what's hurting them most. For example, after
> non-incrementality, the next slowest thing is linking, which is...not done
> by GHC! But all that is a separate conversation.)
>
> John
>
> On 2/19/21 2:42 PM, Richard Eisenberg wrote:
>
> There are some good ideas here, but I want to throw out another one: put
> all our effort into reducing compile times. There is a loud plea to do this
> on Discourse
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.haskell.org%2Ft%2Fcall-for-ideas-forming-a-technical-agenda%2F1901%2F24=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691120329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=1CV0MEVUZpbAbmKAWTIiqLgjft7IbN%2BCSnvB3W3iX%2FU%3D=0>,
> and it would both solve these CI problems and also help everyone else.
>
>
>
> This isn't to say to stop exploring the ideas here. But since time is
> mostly fixed, tackling compilation times in general may be the best way out
> of this. Ben's survey of other projects (thanks!) shows that we're way, way
> behind in how long our CI takes to run.
>
>
>
> Richard
>
>
>
> On Feb 19, 2021, at 7:20 AM, Sebastian Graf  wrote:
>
>
>
> Recompilation avoidance
>
>
>
> I think in order to cache more in CI, we first have to invest some time in
> fixing recompilation avoidance in our bootstrapped build system.
>
>
>
> I just tested on a hadrian perf ticky build: Adding one line of *comment*
> in the compiler causes
>
>- a (pretty slow, yet negligible) rebuild of the stage1 compiler
>- 2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It
>doesn't depend in any way on the change I made)
>- apparent full rebuild the libraries
>- apparent full rebuild of the stage2 compiler
>
> That took 17 minutes, a full build takes ~45minutes. So there definitely
> is some caching going on, but not nearly as much as there could be.
>
> I know there have been great and boring efforts on compiler determinism in
> the past, but either it's not good enough or our build system needs fixing.
>
> I think a good first step to assert would be to make sure that the hash of
> the stage1 compiler executable doesn't change if I only change a comment.
>
> I'm aware there probably is stuff going on, like embedding configure dates
> in interface files and executables, that would need to go, but if possible
> this would be a huge improvement.
>
>
>
> On the other hand, we can simply tack on a [skip ci] to the commit
> message, as I did for
> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F4975=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691130329

RE: On CI

2021-02-22 Thread Simon Peyton Jones via ghc-devs
Incremental CI can cut multiple hours to < mere minutes, especially with the 
test suite being embarrassingly parallel. There simply no way optimizations to 
the compiler independent from sharing a cache between CI runs can get anywhere 
close to that return on investment.
I rather agree with this.  I don't think there is much low-hanging fruit on 
compile times, aside from coercion-zapping which we are working on anyway.  If 
we got a 10% reduction in compile time we'd be over the moon, but our users 
would barely notice.

To get truly substantial improvements (a factor of 2 or 10) I think we need to 
do less compiling - hence incremental CI.

Simon

From: ghc-devs  On Behalf Of John Ericson
Sent: 22 February 2021 05:53
To: ghc-devs 
Subject: Re: On CI


I'm not opposed to some effort going into this, but I would strongly opposite 
putting all our effort there. Incremental CI can cut multiple hours to < mere 
minutes, especially with the test suite being embarrassingly parallel. There 
simply no way optimizations to the compiler independent from sharing a cache 
between CI runs can get anywhere close to that return on investment.

(FWIW, I'm also skeptical that the people complaining about GHC performance 
know what's hurting them most. For example, after non-incrementality, the next 
slowest thing is linking, which is...not done by GHC! But all that is a 
separate conversation.)

John
On 2/19/21 2:42 PM, Richard Eisenberg wrote:
There are some good ideas here, but I want to throw out another one: put all 
our effort into reducing compile times. There is a loud plea to do this on 
Discourse<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.haskell.org%2Ft%2Fcall-for-ideas-forming-a-technical-agenda%2F1901%2F24=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691120329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=1CV0MEVUZpbAbmKAWTIiqLgjft7IbN%2BCSnvB3W3iX%2FU%3D=0>,
 and it would both solve these CI problems and also help everyone else.

This isn't to say to stop exploring the ideas here. But since time is mostly 
fixed, tackling compilation times in general may be the best way out of this. 
Ben's survey of other projects (thanks!) shows that we're way, way behind in 
how long our CI takes to run.

Richard


On Feb 19, 2021, at 7:20 AM, Sebastian Graf 
mailto:sgraf1...@gmail.com>> wrote:

Recompilation avoidance

I think in order to cache more in CI, we first have to invest some time in 
fixing recompilation avoidance in our bootstrapped build system.

I just tested on a hadrian perf ticky build: Adding one line of *comment* in 
the compiler causes

  *   a (pretty slow, yet negligible) rebuild of the stage1 compiler
  *   2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It 
doesn't depend in any way on the change I made)
  *   apparent full rebuild the libraries
  *   apparent full rebuild of the stage2 compiler
That took 17 minutes, a full build takes ~45minutes. So there definitely is 
some caching going on, but not nearly as much as there could be.
I know there have been great and boring efforts on compiler determinism in the 
past, but either it's not good enough or our build system needs fixing.
I think a good first step to assert would be to make sure that the hash of the 
stage1 compiler executable doesn't change if I only change a comment.
I'm aware there probably is stuff going on, like embedding configure dates in 
interface files and executables, that would need to go, but if possible this 
would be a huge improvement.

On the other hand, we can simply tack on a [skip ci] to the commit message, as 
I did for 
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F4975=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691130329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=bgT0LeZXjF%2BMklzctvZL6WaVpaddN7%2FSpojcEXGXv7Q%3D=0>.
 Variants like [skip tests] or [frontend] could help to identify which tests to 
run by default.

Lean

I had a chat with a colleague about how they do CI for Lean. Apparently, CI 
turnaround time including tests is generally 25 minutes (~15 minutes for the 
build) for a complete pipeline, testing 6 different OSes and configurations in 
parallel: 
https://github.com/leanprover/lean4/actions/workflows/ci.yml<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fleanprover%2Flean4%2Factions%2Fworkflows%2Fci.yml=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik

Re: On CI

2021-02-21 Thread John Ericson
language with staged meta-programming.


Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via 
ghc-devs mailto:ghc-devs@haskell.org>>:


Doing "optimistic caching" like you suggest sounds very
promising. A way to regain more robustness would be as follows.
If the build fails while building the libraries or the stage2
compiler, this might be a false negative due to the optimistic
caching. Therefore, evict the "optimistic caches" and restart
building the libraries. That way we can validate that the build
failure was a true build failure and not just due to the
aggressive caching scheme.

Just my 2p

Josef


*From:* ghc-devs mailto:ghc-devs-boun...@haskell.org>> on behalf of Simon Peyton
Jones via ghc-devs mailto:ghc-devs@haskell.org>>
*Sent:* Friday, February 19, 2021 8:57 AM
    *To:* John Ericson mailto:john.ericson@obsidian.systems>>; ghc-devs
mailto:ghc-devs@haskell.org>>
*Subject:* RE: On CI

 1. Building and testing happen together. When tests failure
spuriously, we also have to rebuild GHC in addition to
re-running the tests. That's pure waste.
https://gitlab.haskell.org/ghc/ghc/-/issues/13897

<https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897=04%7c01%7csimo...@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc==0>
tracks this more or less.

I don’t get this.  We have to build GHC before we can test it,
don’t we?
2 .  We don't cache between jobs.
This is, I think, the big one.   We endlessly build the exact
same binaries.
There is a problem, though.  If we make **any** change in GHC,
even a trivial refactoring, its binary will change slightly.  So
now any caching build system will assume that anything built by
that GHC must be rebuilt – we can’t use the cached version.  That
includes all the libraries and the stage2 compiler.  So caching
can save all the preliminaries (building the initial Cabal, and
large chunk of stage1, since they are built with the same
bootstrap compiler) but after that we are dead.
I don’t know any robust way out of this.  That small change in
the source code of GHC might be trivial refactoring, or it might
introduce a critical mis-compilation which we really want to see
in its build products.
However, for smoke-testing MRs, on every architecture, we could
perhaps cut corners. (Leaving Marge to do full diligence.)  For
example, we could declare that if we have the result of compiling
library module X.hs with the stage1 GHC in the last full commit
in master, then we can re-use that build product rather than
compiling X.hs with the MR’s slightly modified stage1 GHC.  That
**might** be wrong; but it’s usually right.
Anyway, there are big wins to be had here.
Simon

*From:*ghc-devs mailto:ghc-devs-boun...@haskell.org>> *On Behalf Of *John Ericson
*Sent:* 19 February 2021 03:19
*To:* ghc-devs mailto:ghc-devs@haskell.org>>
*Subject:* Re: On CI

I am also wary of us to deferring checking whole platforms and
what not. I think that's just kicking the can down the road, and
will result in more variance and uncertainty. It might be alright
for those authoring PRs, but it will make Ben's job keeping the
system running even more grueling.

Before getting into these complex trade-offs, I think we should
focus on the cornerstone issue that CI isn't incremental.

 1. Building and testing happen together. When tests failure
spuriously, we also have to rebuild GHC in addition to
re-running the tests. That's pure waste.
https://gitlab.haskell.org/ghc/ghc/-/issues/13897

<https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897=04%7c01%7csimo...@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc==0>
tracks this more or less.
 2. We don't cache between jobs. Shake and Make do not enforce
dependency soundness, nor cache-correctness when the build
plan itself changes, and this had made this hard/impossible
to do safely. Naively this only helps with stage 1 and not
stage 2, but if we have separate stage 1 and --freeze1 stage
2 builds, both can be incremental. Yes, this is also lossy,
but I only see it lea

Re: On CI

2021-02-19 Thread Richard Eisenberg
erefore, evict the 
> "optimistic caches" and restart building the libraries. That way we can 
> validate that the build failure was a true build failure and not just due to 
> the aggressive caching scheme.
> 
> Just my 2p
> 
> Josef
> 
> From: ghc-devs  <mailto:ghc-devs-boun...@haskell.org>> on behalf of Simon Peyton Jones via 
> ghc-devs mailto:ghc-devs@haskell.org>>
> Sent: Friday, February 19, 2021 8:57 AM
> To: John Ericson ; ghc-devs 
> mailto:ghc-devs@haskell.org>>
> Subject: RE: On CI
>  
> Building and testing happen together. When tests failure spuriously, we also 
> have to rebuild GHC in addition to re-running the tests. That's pure waste. 
> https://gitlab.haskell.org/ghc/ghc/-/issues/13897 
> <https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897=04%7c01%7csimo...@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc==0>
>  tracks this more or less.
> I don’t get this.  We have to build GHC before we can test it, don’t we?
> 2 .  We don't cache between jobs.
> This is, I think, the big one.   We endlessly build the exact same binaries.
> There is a problem, though.  If we make *any* change in GHC, even a trivial 
> refactoring, its binary will change slightly.  So now any caching build 
> system will assume that anything built by that GHC must be rebuilt – we can’t 
> use the cached version.  That includes all the libraries and the stage2 
> compiler.  So caching can save all the preliminaries (building the initial 
> Cabal, and large chunk of stage1, since they are built with the same 
> bootstrap compiler) but after that we are dead.
> I don’t know any robust way out of this.  That small change in the source 
> code of GHC might be trivial refactoring, or it might introduce a critical 
> mis-compilation which we really want to see in its build products. 
> However, for smoke-testing MRs, on every architecture, we could perhaps cut 
> corners.  (Leaving Marge to do full diligence.)  For example, we could 
> declare that if we have the result of compiling library module X.hs with the 
> stage1 GHC in the last full commit in master, then we can re-use that build 
> product rather than compiling X.hs with the MR’s slightly modified stage1 
> GHC.  That *might* be wrong; but it’s usually right.
> Anyway, there are big wins to be had here.
> Simon
>  
>  
>  
> From: ghc-devs  <mailto:ghc-devs-boun...@haskell.org>> On Behalf Of John Ericson
> Sent: 19 February 2021 03:19
> To: ghc-devs mailto:ghc-devs@haskell.org>>
> Subject: Re: On CI
>  
> I am also wary of us to deferring checking whole platforms and what not. I 
> think that's just kicking the can down the road, and will result in more 
> variance and uncertainty. It might be alright for those authoring PRs, but it 
> will make Ben's job keeping the system running even more grueling.
> 
> Before getting into these complex trade-offs, I think we should focus on the 
> cornerstone issue that CI isn't incremental.
> 
> Building and testing happen together. When tests failure spuriously, we also 
> have to rebuild GHC in addition to re-running the tests. That's pure waste. 
> https://gitlab.haskell.org/ghc/ghc/-/issues/13897 
> <https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897=04%7c01%7csimo...@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc==0>
>  tracks this more or less.
> We don't cache between jobs. Shake and Make do not enforce dependency 
> soundness, nor cache-correctness when the build plan itself changes, and this 
> had made this hard/impossible to do safely. Naively this only helps with 
> stage 1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 
> 2 builds, both can be incremental. Yes, this is also lossy, but I only see it 
> leading to false failures not false acceptances (if we can also test the 
> stage 1 one), so I consider it safe. MRs that only work with a slow full 
> build because ABI can so indicate.
> The second, main part is quite hard to tackle, but I strongly believe 
> incrementality is what we need most, and what we should remain focused on.
> John
> 
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs 
> <http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs>
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-02-19 Thread Sebastian Graf
Recompilation avoidance

I think in order to cache more in CI, we first have to invest some time in
fixing recompilation avoidance in our bootstrapped build system.

I just tested on a hadrian perf ticky build: Adding one line of *comment*
in the compiler causes

   - a (pretty slow, yet negligible) rebuild of the stage1 compiler
   - 2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It
   doesn't depend in any way on the change I made)
   - apparent full rebuild the libraries
   - apparent full rebuild of the stage2 compiler

That took 17 minutes, a full build takes ~45minutes. So there definitely is
some caching going on, but not nearly as much as there could be.
I know there have been great and boring efforts on compiler determinism in
the past, but either it's not good enough or our build system needs fixing.
I think a good first step to assert would be to make sure that the hash of
the stage1 compiler executable doesn't change if I only change a comment.
I'm aware there probably is stuff going on, like embedding configure dates
in interface files and executables, that would need to go, but if possible
this would be a huge improvement.

On the other hand, we can simply tack on a [skip ci] to the commit message,
as I did for https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975.
Variants like [skip tests] or [frontend] could help to identify which tests
to run by default.

Lean

I had a chat with a colleague about how they do CI for Lean. Apparently, CI
turnaround time including tests is generally 25 minutes (~15 minutes for
the build) for a complete pipeline, testing 6 different OSes and
configurations in parallel:
https://github.com/leanprover/lean4/actions/workflows/ci.yml
They utilise ccache to cache the clang-based C++-backend, so that they only
have to re-run the front- and middle-end. In effect, they take advantage of
the fact that the "function" clang, in contrast to the "function" stage1
compiler, stays the same.
It's hard to achieve that for GHC, where a complete compiler pipeline comes
as one big, fused "function": An external tool can never be certain that a
change to Parser.y could not affect the CodeGen phase.

Inspired by Lean, the following is a bit inconcrete and imaginary, but
maybe we could make it so that compiler phases "sign" parts of the
interface file with the binary hash of the respective subcomponents of the
phase?
E.g., if all the object files that influence CodeGen (that will later be
linked into the stage1 compiler) result in a hash of 0xdeadbeef before and
after the change to Parser.y, we know we can stop recompiling Data.List
with the stage1 compiler when we see that the IR passed to CodeGen didn't
change, because the last compile did CodeGen with a stage1 compiler with
the same hash 0xdeadbeef. The 0xdeadbeef hash is a proxy for saying "the
function CodeGen stayed the same", so we can reuse its cached outputs.
Of course, that is utopic without a tool that does the "taint analysis" of
which modules in GHC influence CodeGen. Probably just including all the
transitive dependencies of GHC.CmmToAsm suffices, but probably that's too
crude already. For another example, a change to GHC.Utils.Unique would
probably entail a full rebuild of the compiler because it basically affects
all compiler phases.
There are probably parallels with recompilation avoidance in a language
with staged meta-programming.

Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs <
ghc-devs@haskell.org>:

> Doing "optimistic caching" like you suggest sounds very promising. A way
> to regain more robustness would be as follows.
> If the build fails while building the libraries or the stage2 compiler,
> this might be a false negative due to the optimistic caching. Therefore,
> evict the "optimistic caches" and restart building the libraries. That way
> we can validate that the build failure was a true build failure and not
> just due to the aggressive caching scheme.
>
> Just my 2p
>
> Josef
>
> --
> *From:* ghc-devs  on behalf of Simon Peyton
> Jones via ghc-devs 
> *Sent:* Friday, February 19, 2021 8:57 AM
> *To:* John Ericson ; ghc-devs <
> ghc-devs@haskell.org>
> *Subject:* RE: On CI
>
>
>1. Building and testing happen together. When tests failure
>spuriously, we also have to rebuild GHC in addition to re-running the
>tests. That's pure waste.
>https://gitlab.haskell.org/ghc/ghc/-/issues/13897
>
> <https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897=04%7c01%7csimo...@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000=FG2fyYCXbacp69Q8

Re: On CI

2021-02-19 Thread Josef Svenningsson via ghc-devs
Doing "optimistic caching" like you suggest sounds very promising. A way to 
regain more robustness would be as follows.
If the build fails while building the libraries or the stage2 compiler, this 
might be a false negative due to the optimistic caching. Therefore, evict the 
"optimistic caches" and restart building the libraries. That way we can 
validate that the build failure was a true build failure and not just due to 
the aggressive caching scheme.

Just my 2p

Josef


From: ghc-devs  on behalf of Simon Peyton Jones 
via ghc-devs 
Sent: Friday, February 19, 2021 8:57 AM
To: John Ericson ; ghc-devs 

Subject: RE: On CI


  1.  Building and testing happen together. When tests failure spuriously, we 
also have to rebuild GHC in addition to re-running the tests. That's pure 
waste. 
https://gitlab.haskell.org/ghc/ghc/-/issues/13897<https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897=04|01|simo...@microsoft.com|3d503922473f4cd0543f08d8d48522b2|72f988bf86f141af91ab2d7cd011db47|1|0|637493018301253098|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc==0>
 tracks this more or less.

I don’t get this.  We have to build GHC before we can test it, don’t we?

2 .  We don't cache between jobs.

This is, I think, the big one.   We endlessly build the exact same binaries.

There is a problem, though.  If we make *any* change in GHC, even a trivial 
refactoring, its binary will change slightly.  So now any caching build system 
will assume that anything built by that GHC must be rebuilt – we can’t use the 
cached version.  That includes all the libraries and the stage2 compiler.  So 
caching can save all the preliminaries (building the initial Cabal, and large 
chunk of stage1, since they are built with the same bootstrap compiler) but 
after that we are dead.

I don’t know any robust way out of this.  That small change in the source code 
of GHC might be trivial refactoring, or it might introduce a critical 
mis-compilation which we really want to see in its build products.

However, for smoke-testing MRs, on every architecture, we could perhaps cut 
corners.  (Leaving Marge to do full diligence.)  For example, we could declare 
that if we have the result of compiling library module X.hs with the stage1 GHC 
in the last full commit in master, then we can re-use that build product rather 
than compiling X.hs with the MR’s slightly modified stage1 GHC.  That *might* 
be wrong; but it’s usually right.

Anyway, there are big wins to be had here.

Simon







From: ghc-devs  On Behalf Of John Ericson
Sent: 19 February 2021 03:19
To: ghc-devs 
Subject: Re: On CI



I am also wary of us to deferring checking whole platforms and what not. I 
think that's just kicking the can down the road, and will result in more 
variance and uncertainty. It might be alright for those authoring PRs, but it 
will make Ben's job keeping the system running even more grueling.

Before getting into these complex trade-offs, I think we should focus on the 
cornerstone issue that CI isn't incremental.

  1.  Building and testing happen together. When tests failure spuriously, we 
also have to rebuild GHC in addition to re-running the tests. That's pure 
waste. 
https://gitlab.haskell.org/ghc/ghc/-/issues/13897<https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897=04|01|simo...@microsoft.com|3d503922473f4cd0543f08d8d48522b2|72f988bf86f141af91ab2d7cd011db47|1|0|637493018301253098|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc==0>
 tracks this more or less.
  2.  We don't cache between jobs. Shake and Make do not enforce dependency 
soundness, nor cache-correctness when the build plan itself changes, and this 
had made this hard/impossible to do safely. Naively this only helps with stage 
1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 
builds, both can be incremental. Yes, this is also lossy, but I only see it 
leading to false failures not false acceptances (if we can also test the stage 
1 one), so I consider it safe. MRs that only work with a slow full build 
because ABI can so indicate.

The second, main part is quite hard to tackle, but I strongly believe 
incrementality is what we need most, and what we should remain focused on.

John
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: On CI

2021-02-19 Thread Ben Gamari
Simon Peyton Jones via ghc-devs  writes:

>>   1. Building and testing happen together. When tests failure
>>   spuriously, we also have to rebuild GHC in addition to re-running
>>   the tests. That's pure waste.
>>   https://gitlab.haskell.org/ghc/ghc/-/issues/13897 tracks this more
>>   or less.

> I don't get this.  We have to build GHC before we can test it, don't we?

>> 2 .  We don't cache between jobs.

> This is, I think, the big one.   We endlessly build the exact same binaries.
> There is a problem, though. If we make *any* change in GHC, even a
> trivial refactoring, its binary will change slightly. So now any
> caching build system will assume that anything built by that GHC must
> be rebuilt - we can't use the cached version. That includes all the
> libraries and the stage2 compiler. So caching can save all the
> preliminaries (building the initial Cabal, and large chunk of stage1,
> since they are built with the same bootstrap compiler) but after that
> we are dead.
>
> I don't know any robust way out of this. That small change in the
> source code of GHC might be trivial refactoring, or it might introduce
> a critical mis-compilation which we really want to see in its build
> products.
>
> However, for smoke-testing MRs, on every architecture, we could
> perhaps cut corners. (Leaving Marge to do full diligence.) For
> example, we could declare that if we have the result of compiling
> library module X.hs with the stage1 GHC in the last full commit in
> master, then we can re-use that build product rather than compiling
> X.hs with the MR's slightly modified stage1 GHC. That *might* be
> wrong; but it's usually right.
>
The question is: what happens if the it *is* wrong?

There are three answers here:

 a. Allowing the build pipeline to pass despite a build/test failure
eliminates most of the benefit of running the job to begin with as
allow-failure jobs tend to be ignored.

 b. Making the pipeline fail leaves the contributor to pick up the pieces of a
failure that they may or may not be responsible for, which sounds
frustrating indeed.

 c. Retry the build, but this time from scratch. This is a tantalizing option
but carries the risk that we end up doing *more* work than we do now
(namely, if all jobs end up running both builds)

The only tenable option here in my opinion is (c). It's ugly, but may be
viable.

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: On CI

2021-02-19 Thread Simon Peyton Jones via ghc-devs
  1.  Building and testing happen together. When tests failure spuriously, we 
also have to rebuild GHC in addition to re-running the tests. That's pure 
waste. 
https://gitlab.haskell.org/ghc/ghc/-/issues/13897<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897=04%7C01%7Csimonpj%40microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=FG2fyYCXbacp69Q8Il6GE0aX%2B7ZLNkH1u84NA%2FVMjQc%3D=0>
 tracks this more or less.
I don't get this.  We have to build GHC before we can test it, don't we?
2 .  We don't cache between jobs.
This is, I think, the big one.   We endlessly build the exact same binaries.
There is a problem, though.  If we make *any* change in GHC, even a trivial 
refactoring, its binary will change slightly.  So now any caching build system 
will assume that anything built by that GHC must be rebuilt - we can't use the 
cached version.  That includes all the libraries and the stage2 compiler.  So 
caching can save all the preliminaries (building the initial Cabal, and large 
chunk of stage1, since they are built with the same bootstrap compiler) but 
after that we are dead.
I don't know any robust way out of this.  That small change in the source code 
of GHC might be trivial refactoring, or it might introduce a critical 
mis-compilation which we really want to see in its build products.
However, for smoke-testing MRs, on every architecture, we could perhaps cut 
corners.  (Leaving Marge to do full diligence.)  For example, we could declare 
that if we have the result of compiling library module X.hs with the stage1 GHC 
in the last full commit in master, then we can re-use that build product rather 
than compiling X.hs with the MR's slightly modified stage1 GHC.  That *might* 
be wrong; but it's usually right.
Anyway, there are big wins to be had here.
Simon



From: ghc-devs  On Behalf Of John Ericson
Sent: 19 February 2021 03:19
To: ghc-devs 
Subject: Re: On CI


I am also wary of us to deferring checking whole platforms and what not. I 
think that's just kicking the can down the road, and will result in more 
variance and uncertainty. It might be alright for those authoring PRs, but it 
will make Ben's job keeping the system running even more grueling.

Before getting into these complex trade-offs, I think we should focus on the 
cornerstone issue that CI isn't incremental.

  1.  Building and testing happen together. When tests failure spuriously, we 
also have to rebuild GHC in addition to re-running the tests. That's pure 
waste. 
https://gitlab.haskell.org/ghc/ghc/-/issues/13897<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897=04%7C01%7Csimonpj%40microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=FG2fyYCXbacp69Q8Il6GE0aX%2B7ZLNkH1u84NA%2FVMjQc%3D=0>
 tracks this more or less.
  2.  We don't cache between jobs. Shake and Make do not enforce dependency 
soundness, nor cache-correctness when the build plan itself changes, and this 
had made this hard/impossible to do safely. Naively this only helps with stage 
1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 
builds, both can be incremental. Yes, this is also lossy, but I only see it 
leading to false failures not false acceptances (if we can also test the stage 
1 one), so I consider it safe. MRs that only work with a slow full build 
because ABI can so indicate.
The second, main part is quite hard to tackle, but I strongly believe 
incrementality is what we need most, and what we should remain focused on.

John
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-02-18 Thread John Ericson
I am also wary of us to deferring checking whole platforms and what not. 
I think that's just kicking the can down the road, and will result in 
more variance and uncertainty. It might be alright for those authoring 
PRs, but it will make Ben's job keeping the system running even more 
grueling.


Before getting into these complex trade-offs, I think we should focus on 
the cornerstone issue that CI isn't incremental.


1. Building and testing happen together. When tests failure spuriously,
   we also have to rebuild GHC in addition to re-running the tests.
   That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897
   tracks this more or less.
2. We don't cache between jobs. Shake and Make do not enforce
   dependency soundness, nor cache-correctness when the build plan
   itself changes, and this had made this hard/impossible to do safely.
   Naively this only helps with stage 1 and not stage 2, but if we have
   separate stage 1 and --freeze1 stage 2 builds, both can be
   incremental. Yes, this is also lossy, but I only see it leading to
   false failures not false acceptances (if we can also test the stage
   1 one), so I consider it safe. MRs that only work with a slow full
   build because ABI can so indicate.

The second, main part is quite hard to tackle, but I strongly believe 
incrementality is what we need most, and what we should remain focused on.


John

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-02-18 Thread Ben Gamari
Moritz Angermann  writes:

> At this point I believe we have ample Linux build capacity. Darwin looks
> pretty good as well the ~4 M1s we have should in principle also be able to
> build x86_64-darwin at acceptable speeds. Although on Big Sur only.
>
> The aarch64-Linux story is a bit constraint by powerful and fast CI
> machines but probabaly bearable for the time being. I doubt anyone really
> looks at those jobs anyway as they are permitted to fail.

For the record, I look at this once in a while to make sure that they
haven't broken (and usually pick off one or two failures in the
process).

> If aarch64 would become a bottle neck, I’d be inclined to just disable
> them. With the NCG soon this will likely become much more bearable as
> wel, even though we might want to run the nightly llvm builds.
>
> To be frank, I don’t see 9.2 happening in two weeks with the current CI.
>
I'm not sure what you mean. Is this in reference to your own 9.2-slated
work or the release as a whole?

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: On CI

2021-02-18 Thread Ben Gamari
Apologies for the latency here. This thread has required a fair amount of
reflection.

Sebastian Graf  writes:

> Hi Moritz,
>
> I, too, had my gripes with CI turnaround times in the past. Here's a
> somewhat radical proposal:
>
>- Run "full-build" stage builds only on Marge MRs. Then we can assign to
>Marge much earlier, but probably have to do a bit more of (manual)
>bisecting of spoiled Marge batches.
>   - I hope this gets rid of a bit of the friction of small MRs. I
>   recently caught myself wanting to do a bunch of small, independent, but
>   related changes as part of the same MR, simply because it's such a 
> hassle
>   to post them in individual MRs right now and also because it
>   steals so much CI capacity.
>
>- Regular MRs should still have the ability to easily run individual
>builds of what is now the "full-build" stage, similar to how we can run
>optional "hackage" builds today. This is probably useful to pin down the
>reason for a spoiled Marge batch.


I am torn here. For most of my non-trivial patches I personally don't
mind long turnarounds: I walk away and return a day later to see whether
anything failed. Spurious failures due to fragile tests make this a bit
tiresome, but this is a problem that we are gradually solving (by fixing
bugs and marking tests as fragile).

However, I agree that small MRs are currently rather painful. On the
other hand, diagnosing failed Marge batches is *also* rather tiresome. I
am worried that by deferring full validation of MRs we will only
exacerbate this problem. Furthermore, I worry that by deferring full
validation we run the risk of rather *increasing* the MR turnaround
time, since there are entire classes of issues that wouldn't be caught
until the MR made it to Marge.

Ultimately it's unclear to me whether this proposal would help or hurt.
Nevertheless, I am willing to try it. However, if we go this route we
should consider what can be done to reduce the incidence of failed Marge
batches.

One problem that I'm particularly worried about is that of tests with
OS-dependent expected output (e.g. `$test_name.stdout-mingw32). I find
that people (understandably) forget to update these when updating test
output. I suspect that this will be a frequent source of failed Marge
batches if we defer full validation. I can see a few ways that would
mitigate this:

 * eliminate platform-dependent output files
 * introduce a linter that fails if it sees a test with
   platform-dependent output that doesn't touch all output files
 * always run the full-build stage on MRs that touch tests with
   platform-dependent output files

Regardless of whether we implement Sebastian's proposal, one smaller
measure we could implement to help the problem of small MRs is to
introduce some sort of mechanism to mark MRs as "trivial" (e.g. a label
or a commit/MR description keyword), which results in the `full-build`
being skipped for that MR. Perhaps this would be helpful?


> Another frustrating aspect is that if you want to merge an n-sized chain of
> dependent changes individually, you have to
>
>- Open an MR for each change (initially the last change will be
>comprised of n commits)
>- Review first change, turn pipeline green   (A)
>- Assign to Marge, wait for batch to be merged   (B)
>- Review second change, turn pipeline green
>- Assign to Marge, wait for batch to be merged
>- ... and so on ...
>
> Note that this (A) incurs many context switches for the dev and the latency of
> *at least* one run of CI.
> And then (B) incurs the latency of *at least* one full-build, if you're
> lucky and the batch succeeds. I've recently seen batches that were
> resubmitted by Marge at least 5 times due to spurious CI failures and
> timeouts. I think this is a huge factor for latency.
>
> Although after (A), I should just pop the the patch off my mental stack,
> that isn't particularly true, because Marge keeps on reminding me when a
> stack fails or succeeds, both of which require at least some attention from
> me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next
> change.
>
> Maybe we can also learn from other projects like Rust, GCC or clang, which
> I haven't had a look at yet.
>
I did a bit of digging on this.

 * Rust: It appears that Rust's CI scheme is somewhat similar to what
   you proposed above. They do relatively minimal validation of MRs
   (e.g. https://github.com/rust-lang/rust/runs/1905017693),
   with a full-validation for merges
   (e.g. https://github.com/rust-lang-ci/rust/runs/1925049948). The latter
   usually takes between 3 and 4 hours, with some jobs taking 5 hours.

 * GCC: As far as I can tell, gcc doesn't actually have any (functional)
   continuous integration. Discussions with contributors suggest that
   some companies that employ contributors might have their own private
   infrastructure, but I don't believe there is anything public.

 * LLVM: I can't work out 

Re: On CI

2021-02-18 Thread Moritz Angermann
I'm glad to report that my math was off. But it was off only because I
assumed that we'd successfully build all
windows configurations, which we of course don't. Thus some builds fail
faster.

Sylvain also provided a windows machine temporarily, until it expired.
This led to a slew of new windows wibbles.
The CI script Ben wrote, and generously used to help set up the new
builder, seems to assume an older Git install,
and thus a path was broken which thankfully to gitlab led to the brilliant
error of just stalling.
Next up, because we use msys2's pacman to provision the windows builders,
and pacman essentially gives us
symbols for packages to install, we ended up getting a newer autoconf onto
the new builder (and I assume this
will happen with any other builders we add as well). This new autoconf
(which I've also ran into on the M1s) doesn't
like our configure.ac/aclocal.m4 anymore and barfs; I wasn't able to figure
out how to force pacman to install an
older version and *not* give it some odd version suffix (which prevents it
from working as a drop in replacement).

In any case we *must* update our autoconf files. So I guess the time is now.


On Wed, Feb 17, 2021 at 6:58 PM Moritz Angermann 
wrote:

> At this point I believe we have ample Linux build capacity. Darwin looks
> pretty good as well the ~4 M1s we have should in principle also be able to
> build x86_64-darwin at acceptable speeds. Although on Big Sur only.
>
> The aarch64-Linux story is a bit constraint by powerful and fast CI
> machines but probabaly bearable for the time being. I doubt anyone really
> looks at those jobs anyway as they are permitted to fail. If aarch64 would
> become a bottle neck, I’d be inclined to just disable them. With the NCG
> soon this will likely become much more bearable as wel, even though we
> might want to run the nightly llvm builds.
>
> To be frank, I don’t see 9.2 happening in two weeks with the current CI.
>
> If we subtract aarch64-linux and windows builds we could probably do a
> full run in less than three hours maybe even less. And that is mostly
> because we have a serialized pipeline. I have discussed some ideas with Ben
> on prioritizing the first few stages by the faster ci machines to
> effectively fail fast and provide feedback.
>
> But yes. Working on ghc right now is quite painful due to long and
> unpredictable CI times.
>
> Cheers,
>  Moritz
>
> On Wed, 17 Feb 2021 at 6:31 PM, Sebastian Graf 
> wrote:
>
>> Hi Moritz,
>>
>> I, too, had my gripes with CI turnaround times in the past. Here's a
>> somewhat radical proposal:
>>
>>- Run "full-build" stage builds only on Marge MRs. Then we can assign
>>to Marge much earlier, but probably have to do a bit more of (manual)
>>bisecting of spoiled Marge batches.
>>   - I hope this gets rid of a bit of the friction of small MRs. I
>>   recently caught myself wanting to do a bunch of small, independent, but
>>   related changes as part of the same MR, simply because it's such a 
>> hassle
>>   to post them in individual MRs right now and also because it steals so 
>> much
>>   CI capacity.
>>- Regular MRs should still have the ability to easily run individual
>>builds of what is now the "full-build" stage, similar to how we can run
>>optional "hackage" builds today. This is probably useful to pin down the
>>reason for a spoiled Marge batch.
>>- The CI capacity we free up can probably be used to run a perf build
>>(such as the fedora release build) on the "build" stage (the one where we
>>currently run stack-hadrian-build and the validate-deb9-hadrian build), in
>>parallel.
>>- If we decide against the latter, a micro-optimisation could be to
>>cache the build artifacts of the "lint-base" build and continue the build
>>in the validate-deb9-hadrian build of the "build" stage.
>>
>> The usefulness of this approach depends on how many MRs cause metric
>> changes on different architectures.
>>
>> Another frustrating aspect is that if you want to merge an n-sized chain
>> of dependent changes individually, you have to
>>
>>- Open an MR for each change (initially the last change will be
>>comprised of n commits)
>>- Review first change, turn pipeline green   (A)
>>- Assign to Marge, wait for batch to be merged   (B)
>>- Review second change, turn pipeline green
>>- Assign to Marge, wait for batch to be merged
>>- ... and so on ...
>>
>> Note that (A) incurs many context switches for the dev and the latency of
>> *at least* one run of CI.
>> And then (B) incurs the latency of *at least* one full-build, if you're
>> lucky and the batch succeeds. I've recently seen batches that were
>> resubmitted by Marge at least 5 times due to spurious CI failures and
>> timeouts. I think this is a huge factor for latency.
>>
>> Although after (A), I should just pop the the patch off my mental stack,
>> that isn't particularly true, because Marge keeps on reminding me when a
>> 

Re: On CI

2021-02-17 Thread Moritz Angermann
At this point I believe we have ample Linux build capacity. Darwin looks
pretty good as well the ~4 M1s we have should in principle also be able to
build x86_64-darwin at acceptable speeds. Although on Big Sur only.

The aarch64-Linux story is a bit constraint by powerful and fast CI
machines but probabaly bearable for the time being. I doubt anyone really
looks at those jobs anyway as they are permitted to fail. If aarch64 would
become a bottle neck, I’d be inclined to just disable them. With the NCG
soon this will likely become much more bearable as wel, even though we
might want to run the nightly llvm builds.

To be frank, I don’t see 9.2 happening in two weeks with the current CI.

If we subtract aarch64-linux and windows builds we could probably do a full
run in less than three hours maybe even less. And that is mostly because we
have a serialized pipeline. I have discussed some ideas with Ben on
prioritizing the first few stages by the faster ci machines to effectively
fail fast and provide feedback.

But yes. Working on ghc right now is quite painful due to long and
unpredictable CI times.

Cheers,
 Moritz

On Wed, 17 Feb 2021 at 6:31 PM, Sebastian Graf  wrote:

> Hi Moritz,
>
> I, too, had my gripes with CI turnaround times in the past. Here's a
> somewhat radical proposal:
>
>- Run "full-build" stage builds only on Marge MRs. Then we can assign
>to Marge much earlier, but probably have to do a bit more of (manual)
>bisecting of spoiled Marge batches.
>   - I hope this gets rid of a bit of the friction of small MRs. I
>   recently caught myself wanting to do a bunch of small, independent, but
>   related changes as part of the same MR, simply because it's such a 
> hassle
>   to post them in individual MRs right now and also because it steals so 
> much
>   CI capacity.
>- Regular MRs should still have the ability to easily run individual
>builds of what is now the "full-build" stage, similar to how we can run
>optional "hackage" builds today. This is probably useful to pin down the
>reason for a spoiled Marge batch.
>- The CI capacity we free up can probably be used to run a perf build
>(such as the fedora release build) on the "build" stage (the one where we
>currently run stack-hadrian-build and the validate-deb9-hadrian build), in
>parallel.
>- If we decide against the latter, a micro-optimisation could be to
>cache the build artifacts of the "lint-base" build and continue the build
>in the validate-deb9-hadrian build of the "build" stage.
>
> The usefulness of this approach depends on how many MRs cause metric
> changes on different architectures.
>
> Another frustrating aspect is that if you want to merge an n-sized chain
> of dependent changes individually, you have to
>
>- Open an MR for each change (initially the last change will be
>comprised of n commits)
>- Review first change, turn pipeline green   (A)
>- Assign to Marge, wait for batch to be merged   (B)
>- Review second change, turn pipeline green
>- Assign to Marge, wait for batch to be merged
>- ... and so on ...
>
> Note that (A) incurs many context switches for the dev and the latency of
> *at least* one run of CI.
> And then (B) incurs the latency of *at least* one full-build, if you're
> lucky and the batch succeeds. I've recently seen batches that were
> resubmitted by Marge at least 5 times due to spurious CI failures and
> timeouts. I think this is a huge factor for latency.
>
> Although after (A), I should just pop the the patch off my mental stack,
> that isn't particularly true, because Marge keeps on reminding me when a
> stack fails or succeeds, both of which require at least some attention from
> me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next
> change.
>
> Maybe we can also learn from other projects like Rust, GCC or clang, which
> I haven't had a look at yet.
>
> Cheers,
> Sebastian
>
> Am Mi., 17. Feb. 2021 um 09:11 Uhr schrieb Moritz Angermann <
> moritz.angerm...@gmail.com>:
>
>> Friends,
>>
>> I've been looking at CI recently again, as I was facing CI turnaround
>> times of 9-12hs; and this just keeps dragging out and making progress hard.
>>
>> The pending pipeline currently has 2 darwin, and 15 windows builds
>> waiting. Windows builds on average take ~220minutes. We have five builders,
>> so we can expect this queue to be done in ~660 minutes assuming perfect
>> scheduling and good performance. That is 11hs! The next windows build can
>> be started in 11hs. Please check my math and tell me I'm wrong!
>>
>> If you submit a MR today, with some luck, you'll be able to know if it
>> will be mergeable some time tomorrow. At which point you can assign it to
>> marge, and marge, if you are lucky and the set of patches she tries to
>> merge together is mergeable, will merge you work into master probably some
>> time on Friday. If a job fails, well you have to start over again.
>>
>> What 

Re: On CI

2021-02-17 Thread Sebastian Graf
Hi Moritz,

I, too, had my gripes with CI turnaround times in the past. Here's a
somewhat radical proposal:

   - Run "full-build" stage builds only on Marge MRs. Then we can assign to
   Marge much earlier, but probably have to do a bit more of (manual)
   bisecting of spoiled Marge batches.
  - I hope this gets rid of a bit of the friction of small MRs. I
  recently caught myself wanting to do a bunch of small, independent, but
  related changes as part of the same MR, simply because it's such a hassle
  to post them in individual MRs right now and also because it
steals so much
  CI capacity.
   - Regular MRs should still have the ability to easily run individual
   builds of what is now the "full-build" stage, similar to how we can run
   optional "hackage" builds today. This is probably useful to pin down the
   reason for a spoiled Marge batch.
   - The CI capacity we free up can probably be used to run a perf build
   (such as the fedora release build) on the "build" stage (the one where we
   currently run stack-hadrian-build and the validate-deb9-hadrian build), in
   parallel.
   - If we decide against the latter, a micro-optimisation could be to
   cache the build artifacts of the "lint-base" build and continue the build
   in the validate-deb9-hadrian build of the "build" stage.

The usefulness of this approach depends on how many MRs cause metric
changes on different architectures.

Another frustrating aspect is that if you want to merge an n-sized chain of
dependent changes individually, you have to

   - Open an MR for each change (initially the last change will be
   comprised of n commits)
   - Review first change, turn pipeline green   (A)
   - Assign to Marge, wait for batch to be merged   (B)
   - Review second change, turn pipeline green
   - Assign to Marge, wait for batch to be merged
   - ... and so on ...

Note that (A) incurs many context switches for the dev and the latency of
*at least* one run of CI.
And then (B) incurs the latency of *at least* one full-build, if you're
lucky and the batch succeeds. I've recently seen batches that were
resubmitted by Marge at least 5 times due to spurious CI failures and
timeouts. I think this is a huge factor for latency.

Although after (A), I should just pop the the patch off my mental stack,
that isn't particularly true, because Marge keeps on reminding me when a
stack fails or succeeds, both of which require at least some attention from
me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next
change.

Maybe we can also learn from other projects like Rust, GCC or clang, which
I haven't had a look at yet.

Cheers,
Sebastian

Am Mi., 17. Feb. 2021 um 09:11 Uhr schrieb Moritz Angermann <
moritz.angerm...@gmail.com>:

> Friends,
>
> I've been looking at CI recently again, as I was facing CI turnaround
> times of 9-12hs; and this just keeps dragging out and making progress hard.
>
> The pending pipeline currently has 2 darwin, and 15 windows builds
> waiting. Windows builds on average take ~220minutes. We have five builders,
> so we can expect this queue to be done in ~660 minutes assuming perfect
> scheduling and good performance. That is 11hs! The next windows build can
> be started in 11hs. Please check my math and tell me I'm wrong!
>
> If you submit a MR today, with some luck, you'll be able to know if it
> will be mergeable some time tomorrow. At which point you can assign it to
> marge, and marge, if you are lucky and the set of patches she tries to
> merge together is mergeable, will merge you work into master probably some
> time on Friday. If a job fails, well you have to start over again.
>
> What are our options here? Ben has been pretty clear about not wanting a
> broken commit for windows to end up in the tree, and I'm there with him.
>
> Cheers,
>  Moritz
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Marge: "CI is taking too long"

2019-01-25 Thread Ben Gamari
Richard Eisenberg  writes:

> Marge has complained that
> https://gitlab.haskell.org/rae/ghc/-/jobs/17206 is taking too long.
> And indeed it seems stuck.
>
Indeed currently CI is a bit backed up. There are a few reasons for
this:

 * I am currently in the middle of a (now two-day-long) internet outage
   meaning a non-trivial fraction (roughly half) of our builder capacity
   is off-line. I have been reassured that service will be restored by 7
   PM today.

 * there has been a significant number of patches recently, especially
   since I have recently migrated a number of patches from Phabricator.

Consequently it doesn't surprise me that CI is taking a while.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: GitLab CI for patches across submodules

2019-01-06 Thread Simon Jakobi via ghc-devs
Am Sa., 5. Jan. 2019 um 22:18 Uhr schrieb Ben Gamari :

However, we can certainly use the upstream repo during CI builds.



I have opened !78 which should hopefully fix this. Perhaps you could

rebase on topp of this and check?
>

Thanks, Ben, that works for me.

What I hadn't realized before, is that having my haddock commit in my
Gitlab fork (sjakobi/haddock) apparently also makes it accessible through
ghc/haddock.
What is my-branch in sjakobi/haddock is sjakobi/my-branch in ghc/haddock.

Cheers,
Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: GitLab CI for patches across submodules

2019-01-05 Thread Ben Gamari
Simon Jakobi via ghc-devs  writes:

> Hi,
>
> I just tried to use GitLab CI to validate a GHC patch including changes to
> Haddock: https://gitlab.haskell.org/sjakobi/ghc/pipelines/842
>
> The problem is that the CI script tries to find my Haddock commit at
> https://gitlab.haskell.org/ghc/haddock. But that repo doesn't even allow
> merge request.
>
> Should the submodule origin for util/haddock maybe point at
> https://github.com/haskell/haddock instead?
>
In general we want to ensure that only *.haskell.org hosts are relied on
during normal builds (since some users build artifacts under very
restrictive sandbox conditions). However, we can certainly use the
upstream repo during CI builds.

I have opened !78 which should hopefully fix this. Perhaps you could
rebase on topp of this and check?

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs