Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
On 05/23/2017 10:22 AM, Sylvestre Ledru wrote: > Looks like similar to #862360? That was an arm64 issue, terminix was failing on armhf instead. Cheers, Julien
Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
Looks like similar to #862360? According to https://buildd.debian.org/status/logs.php?pkg=terminix=armhf the last 3 failures are only on hartmann S Le 23/05/2017 à 10:16, Matthias Klumpp a écrit : > Cc Sylvestre Ledru as he maintains LLVM and might know best about > changes done in the LLVM toolchain in Debian. > > I uploaded an LDC to unstable yesterday with no changes but it's LLVM > dependency changed to build against LLVM 4.0. With that version, the > bug did not happen at all on the buildds. > To be really certain it was gone, I used the harris porterbox again to > see if it compiles the exact version of Terminix correctly now, and > indeed it does. > Then, I tried to build Terminix with the exact LDC version from > Stretch before, and the bug also didn't show (4 builds in a row, just > to be sure - the bug did *always* happen on harris before). I had a > manually compiled version of LDC on that machine still, from previous > attempts to debug the issue, that was compiled with LLVM 3.8 last, and > building with that also didn't show the bug anymore. > > So, LDC 1.1.1 built with LLVM 3.8, 3.9 and 4.0 in Stretch and Sid does > not actually show this bug anymore. When jcristau removed LDC from > Stretch (yes, I am still not happy with the amount of > non-communication that was going on here!), the copy in there was > actually already working, because something else in the toolchain > changed and resolved the issue. > > So, this of course might be a bug in LDC that now just doesn't get > triggered anymore because something else has changed, but given the > amount of work put in this bug to find the issue in LDC and the code > where this bug actually happens in LDC, I think it's justified to > assume that this is not actually a bug in LDC at all. > > So, what's broken? LLVM 3.9 and 3.8 in Stretch received changes > lately, but I do fail to see anything in the changelog that would have > impacted this bug at all: > > ``` > llvm-toolchain-3.9 (1:3.9.1-8) unstable; urgency=medium > > * Really fix "use versioned symbols" for llvm > Thanks to Julien Cristau for the patch (Closes: #849098) > > -- Sylvestre LedruTue, 25 Apr 2017 15:10:10 +0200 > > llvm-toolchain-3.9 (1:3.9.1-7) unstable; urgency=medium > > * Limit the archs where the ocaml binding is built > Should fix the FTBFS > Currently amd64 arm64 armel armhf i386 > > -- Sylvestre Ledru Sat, 15 Apr 2017 12:03:30 +0200 > > llvm-toolchain-3.9 (1:3.9.1-6) unstable; urgency=medium > > * Upload in unstable > * Bring back ocaml. Thanks to Cyril Soldani (Closes: #858626) > > -- Sylvestre Ledru Fri, 14 Apr 2017 10:02:03 +0200 > > llvm-toolchain-3.9 (1:3.9.1-6~exp2) experimental; urgency=medium > > * Add override_dh_makeshlibs for the libllvm or liblldb versions > Thanks to Julien Cristau for the patch > * change the min version of the libclang1 symbols to 1:3.9.1-6~ > * Fix the symlink on scan-build-py > > -- Sylvestre Ledru Tue, 28 Mar 2017 06:32:40 +0200 > > llvm-toolchain-3.9 (1:3.9.1-6~exp1) experimental; urgency=medium > > [ Rebecca N. Palmer ] > * Allow '!pointer' in OpenCL (Closes: #857623) > * Add missing liblldb symlink (Closes: #857683) > * Use versioned symbols (Closes: #848368) > > -- Sylvestre Ledru Sun, 19 Mar 2017 10:12:03 +0100 > > llvm-toolchain-3.9 (1:3.9.1-5) unstable; urgency=medium > > * Fix the incorrect symlink to scan-build-py (Closes: #856869) > > -- Sylvestre Ledru Sun, 12 Mar 2017 10:01:10 +0100 > ``` > > There were also GCC updates, and quite a bit of other stuff has > changed as well, but since LDC now compiles the code correctly without > being recompiled itself, I think it's safe to rule out any bug in GCC > (as that's only used to build the C++ parts of LDC, and a > wrong-codegen bug would have persisted in the binaries). > > Not exactly sure where to go from here, but unless some major > revelation about this bug happens, I am very inclined to just close it > in a few weeks (and in case something like this happens again, we can > file a new bug). > > @Sylvestre: I know it's a long shot, but do you maybe know about > anything in LLVM that could have altered the codegen in any way, > recently in Stretch? From the changelogs, it doesn't really look like > it, but maybe I am missing something. Context on this bug: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857085#41 > > Cheers, > Matthias >
Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
Cc Sylvestre Ledru as he maintains LLVM and might know best about changes done in the LLVM toolchain in Debian. I uploaded an LDC to unstable yesterday with no changes but it's LLVM dependency changed to build against LLVM 4.0. With that version, the bug did not happen at all on the buildds. To be really certain it was gone, I used the harris porterbox again to see if it compiles the exact version of Terminix correctly now, and indeed it does. Then, I tried to build Terminix with the exact LDC version from Stretch before, and the bug also didn't show (4 builds in a row, just to be sure - the bug did *always* happen on harris before). I had a manually compiled version of LDC on that machine still, from previous attempts to debug the issue, that was compiled with LLVM 3.8 last, and building with that also didn't show the bug anymore. So, LDC 1.1.1 built with LLVM 3.8, 3.9 and 4.0 in Stretch and Sid does not actually show this bug anymore. When jcristau removed LDC from Stretch (yes, I am still not happy with the amount of non-communication that was going on here!), the copy in there was actually already working, because something else in the toolchain changed and resolved the issue. So, this of course might be a bug in LDC that now just doesn't get triggered anymore because something else has changed, but given the amount of work put in this bug to find the issue in LDC and the code where this bug actually happens in LDC, I think it's justified to assume that this is not actually a bug in LDC at all. So, what's broken? LLVM 3.9 and 3.8 in Stretch received changes lately, but I do fail to see anything in the changelog that would have impacted this bug at all: ``` llvm-toolchain-3.9 (1:3.9.1-8) unstable; urgency=medium * Really fix "use versioned symbols" for llvm Thanks to Julien Cristau for the patch (Closes: #849098) -- Sylvestre LedruTue, 25 Apr 2017 15:10:10 +0200 llvm-toolchain-3.9 (1:3.9.1-7) unstable; urgency=medium * Limit the archs where the ocaml binding is built Should fix the FTBFS Currently amd64 arm64 armel armhf i386 -- Sylvestre Ledru Sat, 15 Apr 2017 12:03:30 +0200 llvm-toolchain-3.9 (1:3.9.1-6) unstable; urgency=medium * Upload in unstable * Bring back ocaml. Thanks to Cyril Soldani (Closes: #858626) -- Sylvestre Ledru Fri, 14 Apr 2017 10:02:03 +0200 llvm-toolchain-3.9 (1:3.9.1-6~exp2) experimental; urgency=medium * Add override_dh_makeshlibs for the libllvm or liblldb versions Thanks to Julien Cristau for the patch * change the min version of the libclang1 symbols to 1:3.9.1-6~ * Fix the symlink on scan-build-py -- Sylvestre Ledru Tue, 28 Mar 2017 06:32:40 +0200 llvm-toolchain-3.9 (1:3.9.1-6~exp1) experimental; urgency=medium [ Rebecca N. Palmer ] * Allow '!pointer' in OpenCL (Closes: #857623) * Add missing liblldb symlink (Closes: #857683) * Use versioned symbols (Closes: #848368) -- Sylvestre Ledru Sun, 19 Mar 2017 10:12:03 +0100 llvm-toolchain-3.9 (1:3.9.1-5) unstable; urgency=medium * Fix the incorrect symlink to scan-build-py (Closes: #856869) -- Sylvestre Ledru Sun, 12 Mar 2017 10:01:10 +0100 ``` There were also GCC updates, and quite a bit of other stuff has changed as well, but since LDC now compiles the code correctly without being recompiled itself, I think it's safe to rule out any bug in GCC (as that's only used to build the C++ parts of LDC, and a wrong-codegen bug would have persisted in the binaries). Not exactly sure where to go from here, but unless some major revelation about this bug happens, I am very inclined to just close it in a few weeks (and in case something like this happens again, we can file a new bug). @Sylvestre: I know it's a long shot, but do you maybe know about anything in LLVM that could have altered the codegen in any way, recently in Stretch? From the changelogs, it doesn't really look like it, but maybe I am missing something. Context on this bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857085#41 Cheers, Matthias
Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
2017-05-22 13:32 GMT+02:00 Matthias Klumpp: > [...] > Sorry, I think I screwed up here (I thought the expiration date was > 25.May for some reason). I learned just now that jcristau force-hinted LDC out yesterday, although he knew that I already comitted a workaround (dropping armhf from Terminix) to t-p-u. I am not happy that I wasn't at least notified about that step beforehand (would've been so easy on IRC!). Making the best out of a bad situation, once we have a solution for this, I'll make a backport of the stuff that got dropped (about 8-10 packages), if the fix is suitablke for backports (if it's resolved by upgrading to a newer LLVM, we might have a problem). Cheers, Matthias -- I welcome VSRE emails. See http://vsre.info/
Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
Regarding the issue: The cause for it remains elusive to me. There is a bug happening on Debian's i386 buildds which apparently is the same thing that's happening on the armhf architecture, but I failed to reproduce it locally (built LDC 8 times in a row in a pristine Sid chroot with no failure, while it always FTBFS in Debian unstable). I have no idea on what could cause the bug, I did even read the LDC code section where this issue occurs a few times to check for some potential 64bit <-> 32bit fallacies but the code looks fine to me. I don't really know how to debug this further. The most puzzling thing is that this particular bug seems to be Debian specific, as no other distro shipping LDC has this issue (no issue in Fedora, OpenSUSE, etc.). As soon as there is a solution to this issue, I think I could upload LDC, Tilix, and AppStream Generator as well as their dependencies to Stretch backports. Cheers, Matthias
Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
2017-05-22 7:17 GMT+02:00 Petter Reinholdtsen: > I was sad to discover that ldc and all its dependencies were removed from > testing > today because of this issue. > > I guess no-one succeeded in figuring out what go wrong here? Eww, I completely forgot to reply to the bug report yesterday to defer the autoremoval a little, because I was originally planning to just remove Terminix on armhf. This is really bad because I need LDC on the AppStream DSA machine, it not being in Stretch means I can not build the AppStream generator on the appstream.d.o machine and have to do a manual binary upload, which does really suck a lot. Unfortunately I don't think policy leaves any room for getting LDC back into testing, especially since the version in unstable is a different, new one now due to my attempts to debug this issue in unstable (and the release team won't like that). Sorry, I think I screwed up here (I thought the expiration date was 25.May for some reason). Cheers, Matthias -- I welcome VSRE emails. See http://vsre.info/
Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
I was sad to discover that ldc and all its dependencies were removed from testing today because of this issue. I guess no-one succeeded in figuring out what go wrong here? -- Happy hacking Petter Reinholdtsen
Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
2017-04-25 8:27 GMT+02:00 Iain Buclaw: > [...] > If running in gdb makes the problem go away, have you tried turning on > core dumps in buildd? Yes, without useful results: https://github.com/ldc-developers/ldc/issues/2022#issuecomment-288481397
Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault
On 24 April 2017 at 20:04, Matthias Klumppwrote: > Hi! > > Here is a short summary on the bug: > > 1) It only happens on Debian buildds, Fedora and Arch are not > affected. I examined the build logs and the environments are > relatively similar, with a notable difference for Fedora being that > they build with GCC 7 > > 2) The issue itself is an infinite loop in > `TemplateInstance::needsCodegen` which shouldn't be possible, and > upstream has no idea why it happens. Debugging the issue is really > hard. > > 3) Suddenly we started to get the same bug reproducibly on i386 > buildds as well for LDC itself, see[1]. There, the bug is in the > bootstrap process, which uses an older compiler that worked fine > already for many Debian releases. This suggests that the actual issue > might be in a different portion of the code, or something else has > changed that is now triggering the issue. > > 4) Compiling LDC in a i386 chroot on an arm64 host always works > without any issues - the problem appears only on the buildd (which > itself is an amd64 host...). > > 5) The same applied to the crash on armhf which never happens when > cross-compiling, but only on a real armhf machine. > > 6) The problem is not LDC being miscompiled during bootstrap, nor does > compiling with LLVM 3.8 instead of 3.9 change anything. > > 7) The crash disappears when running in GDB (or valgrind) > > I am really out of ideas on this - upstream suggested some valgrinding > and gdb-ing, but doing that is very cumbersome as the only place where > I can reproduce this bug that isn't a Debian buildd is an armhf > porterbox (and armhf is really slow...), and as soon as you run the > application in gdb the crash vanishes. > > Creating a minimized testcase would take multiple weeks on armhf, and > ultimately failed a few weeks back - but we uncovered another bug in > the process, which was resolved meanwhile, so at least something good > came out of it. > > Any help would be highly appreciated! > Cheers, > Matthias Klumpp > > [1]: > https://buildd.debian.org/status/fetch.php?pkg=ldc=i386=1%3A1.1.1-3=1493055455=0 > If running in gdb makes the problem go away, have you tried turning on core dumps in buildd? -- Iain.