Re: Consistent CI failure in job nightly-i386-linux-deb9-validate
When hadrian builds the binary-dist job, invoking tar and xz is already the last step and there'll be no other ongoing jobs. But I do agree with reverting, this minor optimization I proposed has caused more trouble than its worth :/ On Thu, Sep 29, 2022 at 9:25 AM Bryan Richter wrote: > > Matthew pointed out that the build system already parallelizes jobs, so it's > risky to force parallelization of any individual job. That means I should > just revert. > > On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao wrote: >> >> I believe we can either modify ci.sh to disable parallel compression >> for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable >> XZ_OPT=-9 for i386. >> >> On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter >> wrote: >> > >> > Aha: while i386-linux-deb9-validate sets no extra XZ options, >> > nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9". >> > >> > A revert would fix the problem, but presumably so would tweaking that >> > option. Does anyone have information that would lead to a better decision >> > here? >> > >> > >> > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao wrote: >> >> >> >> Sure, in which case pls revert it. Apologies for the impact, though >> >> I'm still a bit curious, the i386 job did pass in the original MR. >> >> >> >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter >> >> wrote: >> >> > >> >> > Yep, it seems to mostly be xz that is running out of memory. (All >> >> > recent builds that I sampled, but not all builds through all time.) >> >> > Thanks for pointing it out! >> >> > >> >> > I can revert the change. >> >> > >> >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao wrote: >> >> >> >> >> >> Hi Bryan, >> >> >> >> >> >> This may be an unintended fallout of !8940. Would you try starting an >> >> >> i386 pipeline with it reversed to see if it solves the issue, in which >> >> >> case we should revert or fix it in master? >> >> >> >> >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs >> >> >> wrote: >> >> >> > >> >> >> > Hi all, >> >> >> > >> >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been >> >> >> > failing consistently. >> >> >> > >> >> >> > They show up on the failure dashboard because the logs contain the >> >> >> > phrase "Cannot allocate memory". >> >> >> > >> >> >> > I haven't looked yet to see if they always fail in the same place, >> >> >> > but I'll do that soon. The first example I looked at, however, has >> >> >> > the line "xz: (stdin): Cannot allocate memory", so it's not GHC >> >> >> > (alone) causing the problem. >> >> >> > >> >> >> > As a consequence of showing up on the dashboard, the jobs get >> >> >> > restarted. Since they fail consistently, they keep getting >> >> >> > restarted. Since the jobs keep getting restarted, the pipelines stay >> >> >> > alive. When I checked just now, there were 8 nightly runs still >> >> >> > running. :) Thus I'm going to cancel the still-running >> >> >> > nightly-i386-linux-deb9-validate jobs and let the pipelines die in >> >> >> > peace. You can still find all examples of failed jobs on the >> >> >> > dashboard: >> >> >> > >> >> >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate >> >> >> > >> >> >> > To prevent future problems, it would be good if someone could help >> >> >> > me look into this. Otherwise I'll just disable the job. :( >> >> >> > ___ >> >> >> > ghc-devs mailing list >> >> >> > ghc-devs@haskell.org >> >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Consistent CI failure in job nightly-i386-linux-deb9-validate
Matthew pointed out that the build system already parallelizes jobs, so it's risky to force parallelization of any individual job. That means I should just revert. On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao wrote: > I believe we can either modify ci.sh to disable parallel compression > for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable > XZ_OPT=-9 for i386. > > On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter > wrote: > > > > Aha: while i386-linux-deb9-validate sets no extra XZ options, > nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9". > > > > A revert would fix the problem, but presumably so would tweaking that > option. Does anyone have information that would lead to a better decision > here? > > > > > > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao wrote: > >> > >> Sure, in which case pls revert it. Apologies for the impact, though > >> I'm still a bit curious, the i386 job did pass in the original MR. > >> > >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter > wrote: > >> > > >> > Yep, it seems to mostly be xz that is running out of memory. (All > recent builds that I sampled, but not all builds through all time.) Thanks > for pointing it out! > >> > > >> > I can revert the change. > >> > > >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao > wrote: > >> >> > >> >> Hi Bryan, > >> >> > >> >> This may be an unintended fallout of !8940. Would you try starting an > >> >> i386 pipeline with it reversed to see if it solves the issue, in > which > >> >> case we should revert or fix it in master? > >> >> > >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs > >> >> wrote: > >> >> > > >> >> > Hi all, > >> >> > > >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been > failing consistently. > >> >> > > >> >> > They show up on the failure dashboard because the logs contain the > phrase "Cannot allocate memory". > >> >> > > >> >> > I haven't looked yet to see if they always fail in the same place, > but I'll do that soon. The first example I looked at, however, has the line > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the > problem. > >> >> > > >> >> > As a consequence of showing up on the dashboard, the jobs get > restarted. Since they fail consistently, they keep getting restarted. Since > the jobs keep getting restarted, the pipelines stay alive. When I checked > just now, there were 8 nightly runs still running. :) Thus I'm going to > cancel the still-running nightly-i386-linux-deb9-validate jobs and let the > pipelines die in peace. You can still find all examples of failed jobs on > the dashboard: > >> >> > > >> >> > > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate > >> >> > > >> >> > To prevent future problems, it would be good if someone could help > me look into this. Otherwise I'll just disable the job. :( > >> >> > ___ > >> >> > ghc-devs mailing list > >> >> > ghc-devs@haskell.org > >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Consistent CI failure in job nightly-i386-linux-deb9-validate
I believe we can either modify ci.sh to disable parallel compression for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable XZ_OPT=-9 for i386. On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter wrote: > > Aha: while i386-linux-deb9-validate sets no extra XZ options, > nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9". > > A revert would fix the problem, but presumably so would tweaking that option. > Does anyone have information that would lead to a better decision here? > > > On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao wrote: >> >> Sure, in which case pls revert it. Apologies for the impact, though >> I'm still a bit curious, the i386 job did pass in the original MR. >> >> On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter >> wrote: >> > >> > Yep, it seems to mostly be xz that is running out of memory. (All recent >> > builds that I sampled, but not all builds through all time.) Thanks for >> > pointing it out! >> > >> > I can revert the change. >> > >> > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao wrote: >> >> >> >> Hi Bryan, >> >> >> >> This may be an unintended fallout of !8940. Would you try starting an >> >> i386 pipeline with it reversed to see if it solves the issue, in which >> >> case we should revert or fix it in master? >> >> >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs >> >> wrote: >> >> > >> >> > Hi all, >> >> > >> >> > For the past week or so, nightly-i386-linux-deb9-validate has been >> >> > failing consistently. >> >> > >> >> > They show up on the failure dashboard because the logs contain the >> >> > phrase "Cannot allocate memory". >> >> > >> >> > I haven't looked yet to see if they always fail in the same place, but >> >> > I'll do that soon. The first example I looked at, however, has the line >> >> > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing >> >> > the problem. >> >> > >> >> > As a consequence of showing up on the dashboard, the jobs get >> >> > restarted. Since they fail consistently, they keep getting restarted. >> >> > Since the jobs keep getting restarted, the pipelines stay alive. When I >> >> > checked just now, there were 8 nightly runs still running. :) Thus I'm >> >> > going to cancel the still-running nightly-i386-linux-deb9-validate jobs >> >> > and let the pipelines die in peace. You can still find all examples of >> >> > failed jobs on the dashboard: >> >> > >> >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate >> >> > >> >> > To prevent future problems, it would be good if someone could help me >> >> > look into this. Otherwise I'll just disable the job. :( >> >> > ___ >> >> > ghc-devs mailing list >> >> > ghc-devs@haskell.org >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Consistent CI failure in job nightly-i386-linux-deb9-validate
Aha: while i386-linux-deb9-validate sets no extra XZ options, *nightly*-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9". A revert would fix the problem, but presumably so would tweaking that option. Does anyone have information that would lead to a better decision here? On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao wrote: > Sure, in which case pls revert it. Apologies for the impact, though > I'm still a bit curious, the i386 job did pass in the original MR. > > On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter > wrote: > > > > Yep, it seems to mostly be xz that is running out of memory. (All recent > builds that I sampled, but not all builds through all time.) Thanks for > pointing it out! > > > > I can revert the change. > > > > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao wrote: > >> > >> Hi Bryan, > >> > >> This may be an unintended fallout of !8940. Would you try starting an > >> i386 pipeline with it reversed to see if it solves the issue, in which > >> case we should revert or fix it in master? > >> > >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs > >> wrote: > >> > > >> > Hi all, > >> > > >> > For the past week or so, nightly-i386-linux-deb9-validate has been > failing consistently. > >> > > >> > They show up on the failure dashboard because the logs contain the > phrase "Cannot allocate memory". > >> > > >> > I haven't looked yet to see if they always fail in the same place, > but I'll do that soon. The first example I looked at, however, has the line > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the > problem. > >> > > >> > As a consequence of showing up on the dashboard, the jobs get > restarted. Since they fail consistently, they keep getting restarted. Since > the jobs keep getting restarted, the pipelines stay alive. When I checked > just now, there were 8 nightly runs still running. :) Thus I'm going to > cancel the still-running nightly-i386-linux-deb9-validate jobs and let the > pipelines die in peace. You can still find all examples of failed jobs on > the dashboard: > >> > > >> > > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate > >> > > >> > To prevent future problems, it would be good if someone could help me > look into this. Otherwise I'll just disable the job. :( > >> > ___ > >> > ghc-devs mailing list > >> > ghc-devs@haskell.org > >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Consistent CI failure in job nightly-i386-linux-deb9-validate
Sure, in which case pls revert it. Apologies for the impact, though I'm still a bit curious, the i386 job did pass in the original MR. On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter wrote: > > Yep, it seems to mostly be xz that is running out of memory. (All recent > builds that I sampled, but not all builds through all time.) Thanks for > pointing it out! > > I can revert the change. > > On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao wrote: >> >> Hi Bryan, >> >> This may be an unintended fallout of !8940. Would you try starting an >> i386 pipeline with it reversed to see if it solves the issue, in which >> case we should revert or fix it in master? >> >> On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs >> wrote: >> > >> > Hi all, >> > >> > For the past week or so, nightly-i386-linux-deb9-validate has been failing >> > consistently. >> > >> > They show up on the failure dashboard because the logs contain the phrase >> > "Cannot allocate memory". >> > >> > I haven't looked yet to see if they always fail in the same place, but >> > I'll do that soon. The first example I looked at, however, has the line >> > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the >> > problem. >> > >> > As a consequence of showing up on the dashboard, the jobs get restarted. >> > Since they fail consistently, they keep getting restarted. Since the jobs >> > keep getting restarted, the pipelines stay alive. When I checked just now, >> > there were 8 nightly runs still running. :) Thus I'm going to cancel the >> > still-running nightly-i386-linux-deb9-validate jobs and let the pipelines >> > die in peace. You can still find all examples of failed jobs on the >> > dashboard: >> > >> > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate >> > >> > To prevent future problems, it would be good if someone could help me look >> > into this. Otherwise I'll just disable the job. :( >> > ___ >> > ghc-devs mailing list >> > ghc-devs@haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Consistent CI failure in job nightly-i386-linux-deb9-validate
Yep, it seems to mostly be xz that is running out of memory. (All recent builds that I sampled, but not all builds through all time.) Thanks for pointing it out! I can revert the change. On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao wrote: > Hi Bryan, > > This may be an unintended fallout of !8940. Would you try starting an > i386 pipeline with it reversed to see if it solves the issue, in which > case we should revert or fix it in master? > > On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs > wrote: > > > > Hi all, > > > > For the past week or so, nightly-i386-linux-deb9-validate has been > failing consistently. > > > > They show up on the failure dashboard because the logs contain the > phrase "Cannot allocate memory". > > > > I haven't looked yet to see if they always fail in the same place, but > I'll do that soon. The first example I looked at, however, has the line > "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the > problem. > > > > As a consequence of showing up on the dashboard, the jobs get restarted. > Since they fail consistently, they keep getting restarted. Since the jobs > keep getting restarted, the pipelines stay alive. When I checked just now, > there were 8 nightly runs still running. :) Thus I'm going to cancel the > still-running nightly-i386-linux-deb9-validate jobs and let the pipelines > die in peace. You can still find all examples of failed jobs on the > dashboard: > > > > > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate > > > > To prevent future problems, it would be good if someone could help me > look into this. Otherwise I'll just disable the job. :( > > ___ > > ghc-devs mailing list > > ghc-devs@haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Consistent CI failure in job nightly-i386-linux-deb9-validate
Hi Bryan, This may be an unintended fallout of !8940. Would you try starting an i386 pipeline with it reversed to see if it solves the issue, in which case we should revert or fix it in master? On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs wrote: > > Hi all, > > For the past week or so, nightly-i386-linux-deb9-validate has been failing > consistently. > > They show up on the failure dashboard because the logs contain the phrase > "Cannot allocate memory". > > I haven't looked yet to see if they always fail in the same place, but I'll > do that soon. The first example I looked at, however, has the line "xz: > (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem. > > As a consequence of showing up on the dashboard, the jobs get restarted. > Since they fail consistently, they keep getting restarted. Since the jobs > keep getting restarted, the pipelines stay alive. When I checked just now, > there were 8 nightly runs still running. :) Thus I'm going to cancel the > still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die > in peace. You can still find all examples of failed jobs on the dashboard: > > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate > > To prevent future problems, it would be good if someone could help me look > into this. Otherwise I'll just disable the job. :( > ___ > ghc-devs mailing list > ghc-devs@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Consistent CI failure in job nightly-i386-linux-deb9-validate
Hi all, For the past week or so, nightly-i386-linux-deb9-validate has been failing consistently. They show up on the failure dashboard because the logs contain the phrase "Cannot allocate memory". I haven't looked yet to see if they always fail in the same place, but I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem. As a consequence of showing up on the dashboard, the jobs get restarted. Since they fail consistently, they keep getting restarted. Since the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard: https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2=now-90d=now=5m=cannot_allocate To prevent future problems, it would be good if someone could help me look into this. Otherwise I'll just disable the job. :( ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs