Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-05-23 Thread Julien Cristau
On 05/23/2017 10:22 AM, Sylvestre Ledru wrote:
> Looks like similar to #862360?

That was an arm64 issue, terminix was failing on armhf instead.

Cheers,
Julien



Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-05-23 Thread Sylvestre Ledru
Looks like similar to #862360?
According to
https://buildd.debian.org/status/logs.php?pkg=terminix=armhf
the last 3 failures are only on hartmann

S


Le 23/05/2017 à 10:16, Matthias Klumpp a écrit :
> Cc Sylvestre Ledru as he maintains LLVM and might know best about
> changes done in the LLVM toolchain in Debian.
> 
> I uploaded an LDC to unstable yesterday with no changes but it's LLVM
> dependency changed to build against LLVM 4.0. With that version, the
> bug did not happen at all on the buildds.
> To be really certain it was gone, I used the harris porterbox again to
> see if it compiles the exact version of Terminix correctly now, and
> indeed it does.
> Then, I tried to build Terminix with the exact LDC version from
> Stretch before, and the bug also didn't show (4 builds in a row, just
> to be sure - the bug did *always* happen on harris before). I had a
> manually compiled version of LDC on that machine still, from previous
> attempts to debug the issue, that was compiled with LLVM 3.8 last, and
> building with that also didn't show the bug anymore.
> 
> So, LDC 1.1.1 built with LLVM 3.8, 3.9 and 4.0 in Stretch and Sid does
> not actually show this bug anymore. When jcristau removed LDC from
> Stretch (yes, I am still not happy with the amount of
> non-communication that was going on here!), the copy in there was
> actually already working, because something else in the toolchain
> changed and resolved the issue.
> 
> So, this of course might be a bug in LDC that now just doesn't get
> triggered anymore because something else has changed, but given the
> amount of work put in this bug to find the issue in LDC and the code
> where this bug actually happens in LDC, I think it's justified to
> assume that this is not actually a bug in LDC at all.
> 
> So, what's broken? LLVM 3.9 and 3.8 in Stretch received changes
> lately, but I do fail to see anything in the changelog that would have
> impacted this bug at all:
> 
> ```
> llvm-toolchain-3.9 (1:3.9.1-8) unstable; urgency=medium
> 
>   * Really fix "use versioned symbols" for llvm
> Thanks to Julien Cristau for the patch (Closes: #849098)
> 
>  -- Sylvestre Ledru   Tue, 25 Apr 2017 15:10:10 +0200
> 
> llvm-toolchain-3.9 (1:3.9.1-7) unstable; urgency=medium
> 
>   * Limit the archs where the ocaml binding is built
> Should fix the FTBFS
> Currently amd64 arm64 armel armhf i386
> 
>  -- Sylvestre Ledru   Sat, 15 Apr 2017 12:03:30 +0200
> 
> llvm-toolchain-3.9 (1:3.9.1-6) unstable; urgency=medium
> 
>   * Upload in unstable
>   * Bring back ocaml. Thanks to Cyril Soldani (Closes: #858626)
> 
>  -- Sylvestre Ledru   Fri, 14 Apr 2017 10:02:03 +0200
> 
> llvm-toolchain-3.9 (1:3.9.1-6~exp2) experimental; urgency=medium
> 
>   * Add override_dh_makeshlibs for the libllvm or liblldb versions
> Thanks to Julien Cristau for the patch
>   * change the min version of the libclang1 symbols to 1:3.9.1-6~
>   * Fix the symlink on scan-build-py
> 
>  -- Sylvestre Ledru   Tue, 28 Mar 2017 06:32:40 +0200
> 
> llvm-toolchain-3.9 (1:3.9.1-6~exp1) experimental; urgency=medium
> 
>   [ Rebecca N. Palmer ]
>   * Allow '!pointer' in OpenCL (Closes: #857623)
>   * Add missing liblldb symlink (Closes: #857683)
>   * Use versioned symbols (Closes: #848368)
> 
>  -- Sylvestre Ledru   Sun, 19 Mar 2017 10:12:03 +0100
> 
> llvm-toolchain-3.9 (1:3.9.1-5) unstable; urgency=medium
> 
>   * Fix the incorrect symlink to scan-build-py (Closes: #856869)
> 
>  -- Sylvestre Ledru   Sun, 12 Mar 2017 10:01:10 +0100
> ```
> 
> There were also GCC updates, and quite a bit of other stuff has
> changed as well, but since LDC now compiles the code correctly without
> being recompiled itself, I think it's safe to rule out any bug in GCC
> (as that's only used to build the C++ parts of LDC, and a
> wrong-codegen bug would have persisted in the binaries).
> 
> Not exactly sure where to go from here, but unless some major
> revelation about this bug happens, I am very inclined to just close it
> in a few weeks (and in case something like this happens again, we can
> file a new bug).
> 
> @Sylvestre: I know it's a long shot, but do you maybe know about
> anything in LLVM that could have altered the codegen in any way,
> recently in Stretch? From the changelogs, it doesn't really look like
> it, but maybe I am missing something. Context on this bug:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857085#41
> 
> Cheers,
> Matthias
> 



Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-05-23 Thread Matthias Klumpp
Cc Sylvestre Ledru as he maintains LLVM and might know best about
changes done in the LLVM toolchain in Debian.

I uploaded an LDC to unstable yesterday with no changes but it's LLVM
dependency changed to build against LLVM 4.0. With that version, the
bug did not happen at all on the buildds.
To be really certain it was gone, I used the harris porterbox again to
see if it compiles the exact version of Terminix correctly now, and
indeed it does.
Then, I tried to build Terminix with the exact LDC version from
Stretch before, and the bug also didn't show (4 builds in a row, just
to be sure - the bug did *always* happen on harris before). I had a
manually compiled version of LDC on that machine still, from previous
attempts to debug the issue, that was compiled with LLVM 3.8 last, and
building with that also didn't show the bug anymore.

So, LDC 1.1.1 built with LLVM 3.8, 3.9 and 4.0 in Stretch and Sid does
not actually show this bug anymore. When jcristau removed LDC from
Stretch (yes, I am still not happy with the amount of
non-communication that was going on here!), the copy in there was
actually already working, because something else in the toolchain
changed and resolved the issue.

So, this of course might be a bug in LDC that now just doesn't get
triggered anymore because something else has changed, but given the
amount of work put in this bug to find the issue in LDC and the code
where this bug actually happens in LDC, I think it's justified to
assume that this is not actually a bug in LDC at all.

So, what's broken? LLVM 3.9 and 3.8 in Stretch received changes
lately, but I do fail to see anything in the changelog that would have
impacted this bug at all:

```
llvm-toolchain-3.9 (1:3.9.1-8) unstable; urgency=medium

  * Really fix "use versioned symbols" for llvm
Thanks to Julien Cristau for the patch (Closes: #849098)

 -- Sylvestre Ledru   Tue, 25 Apr 2017 15:10:10 +0200

llvm-toolchain-3.9 (1:3.9.1-7) unstable; urgency=medium

  * Limit the archs where the ocaml binding is built
Should fix the FTBFS
Currently amd64 arm64 armel armhf i386

 -- Sylvestre Ledru   Sat, 15 Apr 2017 12:03:30 +0200

llvm-toolchain-3.9 (1:3.9.1-6) unstable; urgency=medium

  * Upload in unstable
  * Bring back ocaml. Thanks to Cyril Soldani (Closes: #858626)

 -- Sylvestre Ledru   Fri, 14 Apr 2017 10:02:03 +0200

llvm-toolchain-3.9 (1:3.9.1-6~exp2) experimental; urgency=medium

  * Add override_dh_makeshlibs for the libllvm or liblldb versions
Thanks to Julien Cristau for the patch
  * change the min version of the libclang1 symbols to 1:3.9.1-6~
  * Fix the symlink on scan-build-py

 -- Sylvestre Ledru   Tue, 28 Mar 2017 06:32:40 +0200

llvm-toolchain-3.9 (1:3.9.1-6~exp1) experimental; urgency=medium

  [ Rebecca N. Palmer ]
  * Allow '!pointer' in OpenCL (Closes: #857623)
  * Add missing liblldb symlink (Closes: #857683)
  * Use versioned symbols (Closes: #848368)

 -- Sylvestre Ledru   Sun, 19 Mar 2017 10:12:03 +0100

llvm-toolchain-3.9 (1:3.9.1-5) unstable; urgency=medium

  * Fix the incorrect symlink to scan-build-py (Closes: #856869)

 -- Sylvestre Ledru   Sun, 12 Mar 2017 10:01:10 +0100
```

There were also GCC updates, and quite a bit of other stuff has
changed as well, but since LDC now compiles the code correctly without
being recompiled itself, I think it's safe to rule out any bug in GCC
(as that's only used to build the C++ parts of LDC, and a
wrong-codegen bug would have persisted in the binaries).

Not exactly sure where to go from here, but unless some major
revelation about this bug happens, I am very inclined to just close it
in a few weeks (and in case something like this happens again, we can
file a new bug).

@Sylvestre: I know it's a long shot, but do you maybe know about
anything in LLVM that could have altered the codegen in any way,
recently in Stretch? From the changelogs, it doesn't really look like
it, but maybe I am missing something. Context on this bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857085#41

Cheers,
Matthias



Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-05-22 Thread Matthias Klumpp
2017-05-22 13:32 GMT+02:00 Matthias Klumpp :
> [...]
> Sorry, I think I screwed up here (I thought the expiration date was
> 25.May for some reason).

I learned just now that jcristau force-hinted LDC out yesterday,
although he knew that I already comitted a workaround (dropping armhf
from Terminix) to t-p-u. I am not happy that I wasn't at least
notified about that step beforehand (would've been so easy on IRC!).

Making the best out of a bad situation, once we have a solution for
this, I'll make a backport of the stuff that got dropped (about 8-10
packages), if the fix is suitablke for backports (if it's resolved by
upgrading to a newer LLVM, we might have a problem).

Cheers,
Matthias

-- 
I welcome VSRE emails. See http://vsre.info/



Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-05-22 Thread Matthias Klumpp
Regarding the issue: The cause for it remains elusive to me. There is
a bug happening on Debian's i386 buildds which apparently is the same
thing that's happening on the armhf architecture, but I failed to
reproduce it locally (built LDC 8 times in a row in a pristine Sid
chroot with no failure, while it always FTBFS in Debian unstable).

I have no idea on what could cause the bug, I did even read the LDC
code section where this issue occurs a few times to check for some
potential 64bit <-> 32bit fallacies but the code looks fine to me. I
don't really know how to debug this further. The most puzzling thing
is that this particular bug seems to be Debian specific, as no other
distro shipping LDC has this issue (no issue in Fedora, OpenSUSE,
etc.).

As soon as there is a solution to this issue, I think I could upload
LDC, Tilix, and AppStream Generator as well as their dependencies to
Stretch backports.

Cheers,
Matthias



Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-05-22 Thread Matthias Klumpp
2017-05-22 7:17 GMT+02:00 Petter Reinholdtsen :
> I was sad to discover that ldc and all its dependencies were removed from 
> testing
> today because of this issue.
>
> I guess no-one succeeded in figuring out what go wrong here?

Eww, I completely forgot to reply to the bug report yesterday to defer
the autoremoval a little, because I was originally planning to just
remove Terminix on armhf. This is really bad because I need LDC on the
AppStream DSA machine, it not being in Stretch means I can not build
the AppStream generator on the appstream.d.o machine and have to do a
manual binary upload, which does really suck a lot.

Unfortunately I don't think policy leaves any room for getting LDC
back into testing, especially since the version in unstable is a
different, new one now due to my attempts to debug this issue in
unstable (and the release team won't like that).

Sorry, I think I screwed up here (I thought the expiration date was
25.May for some reason).

Cheers,
Matthias

-- 
I welcome VSRE emails. See http://vsre.info/



Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-05-21 Thread Petter Reinholdtsen
I was sad to discover that ldc and all its dependencies were removed from 
testing
today because of this issue.

I guess no-one succeeded in figuring out what go wrong here?

-- 
Happy hacking
Petter Reinholdtsen



Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-04-25 Thread Matthias Klumpp
2017-04-25 8:27 GMT+02:00 Iain Buclaw :
> [...]
> If running in gdb makes the problem go away, have you tried turning on
> core dumps in buildd?

Yes, without useful results:
https://github.com/ldc-developers/ldc/issues/2022#issuecomment-288481397



Bug#857085: [Pkg-d-devel] Bug#857085: terminix FTBFS on armhf: Error executing /usr/bin/ldc2: Segmentation fault

2017-04-25 Thread Iain Buclaw
On 24 April 2017 at 20:04, Matthias Klumpp  wrote:
> Hi!
>
> Here is a short summary on the bug:
>
> 1) It only happens on Debian buildds, Fedora and Arch are not
> affected. I examined the build logs and the environments are
> relatively similar, with a notable difference for Fedora being that
> they build with GCC 7
>
> 2) The issue itself is an infinite loop in
> `TemplateInstance::needsCodegen` which shouldn't be possible, and
> upstream has no idea why it happens. Debugging the issue is really
> hard.
>
> 3) Suddenly we started to get the same bug reproducibly on i386
> buildds as well for LDC itself, see[1]. There, the bug is in the
> bootstrap process, which uses an older compiler that worked fine
> already for many Debian releases. This suggests that the actual issue
> might be in a different portion of the code, or something else has
> changed that is now triggering the issue.
>
> 4) Compiling LDC in a i386 chroot on an arm64 host always works
> without any issues - the problem appears only on the buildd (which
> itself is an amd64 host...).
>
> 5) The same applied to the crash on armhf which never happens when
> cross-compiling, but only on a real armhf machine.
>
> 6) The problem is not LDC being miscompiled during bootstrap, nor does
> compiling with LLVM 3.8 instead of 3.9 change anything.
>
> 7) The crash disappears when running in GDB (or valgrind)
>
> I am really out of ideas on this - upstream suggested some valgrinding
> and gdb-ing, but doing that is very cumbersome as the only place where
> I can reproduce this bug that isn't a Debian buildd is an armhf
> porterbox (and armhf is really slow...), and as soon as you run the
> application in gdb the crash vanishes.
>
> Creating a minimized testcase would take multiple weeks on armhf, and
> ultimately failed a few weeks back - but we uncovered another bug in
> the process, which was resolved meanwhile, so at least something good
> came out of it.
>
> Any help would be highly appreciated!
> Cheers,
> Matthias Klumpp
>
> [1]: 
> https://buildd.debian.org/status/fetch.php?pkg=ldc=i386=1%3A1.1.1-3=1493055455=0
>

If running in gdb makes the problem go away, have you tried turning on
core dumps in buildd?

--
Iain.