Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-29 Thread Cheng Shao
When hadrian builds the binary-dist job, invoking tar and xz is
already the last step and there'll be no other ongoing jobs. But I do
agree with reverting, this minor optimization I proposed has caused
more trouble than its worth :/

On Thu, Sep 29, 2022 at 9:25 AM Bryan Richter  wrote:
>
> Matthew pointed out that the build system already parallelizes jobs, so it's 
> risky to force parallelization of any individual job. That means I should 
> just revert.
>
> On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao  wrote:
>>
>> I believe we can either modify ci.sh to disable parallel compression
>> for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
>> XZ_OPT=-9 for i386.
>>
>> On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter  
>> wrote:
>> >
>> > Aha: while i386-linux-deb9-validate sets no extra XZ options, 
>> > nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
>> >
>> > A revert would fix the problem, but presumably so would tweaking that 
>> > option. Does anyone have information that would lead to a better decision 
>> > here?
>> >
>> >
>> > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
>> >>
>> >> Sure, in which case pls revert it. Apologies for the impact, though
>> >> I'm still a bit curious, the i386 job did pass in the original MR.
>> >>
>> >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  
>> >> wrote:
>> >> >
>> >> > Yep, it seems to mostly be xz that is running out of memory. (All 
>> >> > recent builds that I sampled, but not all builds through all time.) 
>> >> > Thanks for pointing it out!
>> >> >
>> >> > I can revert the change.
>> >> >
>> >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>> >> >>
>> >> >> Hi Bryan,
>> >> >>
>> >> >> This may be an unintended fallout of !8940. Would you try starting an
>> >> >> i386 pipeline with it reversed to see if it solves the issue, in which
>> >> >> case we should revert or fix it in master?
>> >> >>
>> >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>> >> >>  wrote:
>> >> >> >
>> >> >> > Hi all,
>> >> >> >
>> >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been 
>> >> >> > failing consistently.
>> >> >> >
>> >> >> > They show up on the failure dashboard because the logs contain the 
>> >> >> > phrase "Cannot allocate memory".
>> >> >> >
>> >> >> > I haven't looked yet to see if they always fail in the same place, 
>> >> >> > but I'll do that soon. The first example I looked at, however, has 
>> >> >> > the line "xz: (stdin): Cannot allocate memory", so it's not GHC 
>> >> >> > (alone) causing the problem.
>> >> >> >
>> >> >> > As a consequence of showing up on the dashboard, the jobs get 
>> >> >> > restarted. Since they fail consistently, they keep getting 
>> >> >> > restarted. Since the jobs keep getting restarted, the pipelines stay 
>> >> >> > alive. When I checked just now, there were 8 nightly runs still 
>> >> >> > running. :) Thus I'm going to cancel the still-running 
>> >> >> > nightly-i386-linux-deb9-validate jobs and let the pipelines die in 
>> >> >> > peace. You can still find all examples of failed jobs on the 
>> >> >> > dashboard:
>> >> >> >
>> >> >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >> >> >
>> >> >> > To prevent future problems, it would be good if someone could help 
>> >> >> > me look into this. Otherwise I'll just disable the job. :(
>> >> >> > ___
>> >> >> > ghc-devs mailing list
>> >> >> > ghc-devs@haskell.org
>> >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-29 Thread Bryan Richter via ghc-devs
Matthew pointed out that the build system already parallelizes jobs, so
it's risky to force parallelization of any individual job. That means I
should just revert.

On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao  wrote:

> I believe we can either modify ci.sh to disable parallel compression
> for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
> XZ_OPT=-9 for i386.
>
> On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter 
> wrote:
> >
> > Aha: while i386-linux-deb9-validate sets no extra XZ options,
> nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
> >
> > A revert would fix the problem, but presumably so would tweaking that
> option. Does anyone have information that would lead to a better decision
> here?
> >
> >
> > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
> >>
> >> Sure, in which case pls revert it. Apologies for the impact, though
> >> I'm still a bit curious, the i386 job did pass in the original MR.
> >>
> >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter 
> wrote:
> >> >
> >> > Yep, it seems to mostly be xz that is running out of memory. (All
> recent builds that I sampled, but not all builds through all time.) Thanks
> for pointing it out!
> >> >
> >> > I can revert the change.
> >> >
> >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao 
> wrote:
> >> >>
> >> >> Hi Bryan,
> >> >>
> >> >> This may be an unintended fallout of !8940. Would you try starting an
> >> >> i386 pipeline with it reversed to see if it solves the issue, in
> which
> >> >> case we should revert or fix it in master?
> >> >>
> >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
> >> >>  wrote:
> >> >> >
> >> >> > Hi all,
> >> >> >
> >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >> >> >
> >> >> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >> >> >
> >> >> > I haven't looked yet to see if they always fail in the same place,
> but I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >> >> >
> >> >> > As a consequence of showing up on the dashboard, the jobs get
> restarted. Since they fail consistently, they keep getting restarted. Since
> the jobs keep getting restarted, the pipelines stay alive. When I checked
> just now, there were 8 nightly runs still running. :) Thus I'm going to
> cancel the still-running nightly-i386-linux-deb9-validate jobs and let the
> pipelines die in peace. You can still find all examples of failed jobs on
> the dashboard:
> >> >> >
> >> >> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >> >> >
> >> >> > To prevent future problems, it would be good if someone could help
> me look into this. Otherwise I'll just disable the job. :(
> >> >> > ___
> >> >> > ghc-devs mailing list
> >> >> > ghc-devs@haskell.org
> >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
I believe we can either modify ci.sh to disable parallel compression
for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
XZ_OPT=-9 for i386.

On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter  wrote:
>
> Aha: while i386-linux-deb9-validate sets no extra XZ options, 
> nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
>
> A revert would fix the problem, but presumably so would tweaking that option. 
> Does anyone have information that would lead to a better decision here?
>
>
> On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:
>>
>> Sure, in which case pls revert it. Apologies for the impact, though
>> I'm still a bit curious, the i386 job did pass in the original MR.
>>
>> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  
>> wrote:
>> >
>> > Yep, it seems to mostly be xz that is running out of memory. (All recent 
>> > builds that I sampled, but not all builds through all time.) Thanks for 
>> > pointing it out!
>> >
>> > I can revert the change.
>> >
>> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>> >>
>> >> Hi Bryan,
>> >>
>> >> This may be an unintended fallout of !8940. Would you try starting an
>> >> i386 pipeline with it reversed to see if it solves the issue, in which
>> >> case we should revert or fix it in master?
>> >>
>> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>> >>  wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > For the past week or so, nightly-i386-linux-deb9-validate has been 
>> >> > failing consistently.
>> >> >
>> >> > They show up on the failure dashboard because the logs contain the 
>> >> > phrase "Cannot allocate memory".
>> >> >
>> >> > I haven't looked yet to see if they always fail in the same place, but 
>> >> > I'll do that soon. The first example I looked at, however, has the line 
>> >> > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing 
>> >> > the problem.
>> >> >
>> >> > As a consequence of showing up on the dashboard, the jobs get 
>> >> > restarted. Since they fail consistently, they keep getting restarted. 
>> >> > Since the jobs keep getting restarted, the pipelines stay alive. When I 
>> >> > checked just now, there were 8 nightly runs still running. :) Thus I'm 
>> >> > going to cancel the still-running nightly-i386-linux-deb9-validate jobs 
>> >> > and let the pipelines die in peace. You can still find all examples of 
>> >> > failed jobs on the dashboard:
>> >> >
>> >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >> >
>> >> > To prevent future problems, it would be good if someone could help me 
>> >> > look into this. Otherwise I'll just disable the job. :(
>> >> > ___
>> >> > ghc-devs mailing list
>> >> > ghc-devs@haskell.org
>> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs
Aha: while i386-linux-deb9-validate sets no extra XZ options,
*nightly*-i386-linux-deb9-validate
(the failing job) sets "XZ_OPT = 9".

A revert would fix the problem, but presumably so would tweaking that
option. Does anyone have information that would lead to a better decision
here?


On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao  wrote:

> Sure, in which case pls revert it. Apologies for the impact, though
> I'm still a bit curious, the i386 job did pass in the original MR.
>
> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter 
> wrote:
> >
> > Yep, it seems to mostly be xz that is running out of memory. (All recent
> builds that I sampled, but not all builds through all time.) Thanks for
> pointing it out!
> >
> > I can revert the change.
> >
> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
> >>
> >> Hi Bryan,
> >>
> >> This may be an unintended fallout of !8940. Would you try starting an
> >> i386 pipeline with it reversed to see if it solves the issue, in which
> >> case we should revert or fix it in master?
> >>
> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
> >>  wrote:
> >> >
> >> > Hi all,
> >> >
> >> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >> >
> >> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >> >
> >> > I haven't looked yet to see if they always fail in the same place,
> but I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >> >
> >> > As a consequence of showing up on the dashboard, the jobs get
> restarted. Since they fail consistently, they keep getting restarted. Since
> the jobs keep getting restarted, the pipelines stay alive. When I checked
> just now, there were 8 nightly runs still running. :) Thus I'm going to
> cancel the still-running nightly-i386-linux-deb9-validate jobs and let the
> pipelines die in peace. You can still find all examples of failed jobs on
> the dashboard:
> >> >
> >> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >> >
> >> > To prevent future problems, it would be good if someone could help me
> look into this. Otherwise I'll just disable the job. :(
> >> > ___
> >> > ghc-devs mailing list
> >> > ghc-devs@haskell.org
> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
Sure, in which case pls revert it. Apologies for the impact, though
I'm still a bit curious, the i386 job did pass in the original MR.

On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter  wrote:
>
> Yep, it seems to mostly be xz that is running out of memory. (All recent 
> builds that I sampled, but not all builds through all time.) Thanks for 
> pointing it out!
>
> I can revert the change.
>
> On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:
>>
>> Hi Bryan,
>>
>> This may be an unintended fallout of !8940. Would you try starting an
>> i386 pipeline with it reversed to see if it solves the issue, in which
>> case we should revert or fix it in master?
>>
>> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>>  wrote:
>> >
>> > Hi all,
>> >
>> > For the past week or so, nightly-i386-linux-deb9-validate has been failing 
>> > consistently.
>> >
>> > They show up on the failure dashboard because the logs contain the phrase 
>> > "Cannot allocate memory".
>> >
>> > I haven't looked yet to see if they always fail in the same place, but 
>> > I'll do that soon. The first example I looked at, however, has the line 
>> > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the 
>> > problem.
>> >
>> > As a consequence of showing up on the dashboard, the jobs get restarted. 
>> > Since they fail consistently, they keep getting restarted. Since the jobs 
>> > keep getting restarted, the pipelines stay alive. When I checked just now, 
>> > there were 8 nightly runs still running. :) Thus I'm going to cancel the 
>> > still-running nightly-i386-linux-deb9-validate jobs and let the pipelines 
>> > die in peace. You can still find all examples of failed jobs on the 
>> > dashboard:
>> >
>> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>> >
>> > To prevent future problems, it would be good if someone could help me look 
>> > into this. Otherwise I'll just disable the job. :(
>> > ___
>> > ghc-devs mailing list
>> > ghc-devs@haskell.org
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs
Yep, it seems to mostly be xz that is running out of memory. (All recent
builds that I sampled, but not all builds through all time.) Thanks for
pointing it out!

I can revert the change.

On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao  wrote:

> Hi Bryan,
>
> This may be an unintended fallout of !8940. Would you try starting an
> i386 pipeline with it reversed to see if it solves the issue, in which
> case we should revert or fix it in master?
>
> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
>  wrote:
> >
> > Hi all,
> >
> > For the past week or so, nightly-i386-linux-deb9-validate has been
> failing consistently.
> >
> > They show up on the failure dashboard because the logs contain the
> phrase "Cannot allocate memory".
> >
> > I haven't looked yet to see if they always fail in the same place, but
> I'll do that soon. The first example I looked at, however, has the line
> "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
> problem.
> >
> > As a consequence of showing up on the dashboard, the jobs get restarted.
> Since they fail consistently, they keep getting restarted. Since the jobs
> keep getting restarted, the pipelines stay alive. When I checked just now,
> there were 8 nightly runs still running. :) Thus I'm going to cancel the
> still-running nightly-i386-linux-deb9-validate jobs and let the pipelines
> die in peace. You can still find all examples of failed jobs on the
> dashboard:
> >
> >
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
> >
> > To prevent future problems, it would be good if someone could help me
> look into this. Otherwise I'll just disable the job. :(
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
Hi Bryan,

This may be an unintended fallout of !8940. Would you try starting an
i386 pipeline with it reversed to see if it solves the issue, in which
case we should revert or fix it in master?

On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
 wrote:
>
> Hi all,
>
> For the past week or so, nightly-i386-linux-deb9-validate has been failing 
> consistently.
>
> They show up on the failure dashboard because the logs contain the phrase 
> "Cannot allocate memory".
>
> I haven't looked yet to see if they always fail in the same place, but I'll 
> do that soon. The first example I looked at, however, has the line "xz: 
> (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem.
>
> As a consequence of showing up on the dashboard, the jobs get restarted. 
> Since they fail consistently, they keep getting restarted. Since the jobs 
> keep getting restarted, the pipelines stay alive. When I checked just now, 
> there were 8 nightly runs still running. :) Thus I'm going to cancel the 
> still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die 
> in peace. You can still find all examples of failed jobs on the dashboard:
>
> https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate
>
> To prevent future problems, it would be good if someone could help me look 
> into this. Otherwise I'll just disable the job. :(
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs
Hi all,

For the past week or so, nightly-i386-linux-deb9-validate has been failing
consistently.

They show up on the failure dashboard because the logs contain the phrase
"Cannot allocate memory".

I haven't looked yet to see if they always fail in the same place, but I'll
do that soon. The first example I looked at, however, has the line "xz:
(stdin): Cannot allocate memory", so it's not GHC (alone) causing the
problem.

As a consequence of showing up on the dashboard, the jobs get restarted.
Since they fail consistently, they keep getting restarted. Since the jobs
keep getting restarted, the pipelines stay alive. When I checked just now,
there were 8 nightly runs still running. :) Thus I'm going to cancel the
still-running nightly-i386-linux-deb9-validate jobs and let the pipelines
die in peace. You can still find all examples of failed jobs on the
dashboard:

https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate

To prevent future problems, it would be good if someone could help me look
into this. Otherwise I'll just disable the job. :(
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs