Bug#903790: ncurses: parallel build failure

2018-08-19 Thread Thomas Dickey
On Sat, Aug 11, 2018 at 10:02:38AM +0200, Sven Joachim wrote:
> Control: tags -1 - unreproducible
> Control: forwarded -1 
> https://lists.gnu.org/archive/html/bug-ncurses/2018-08/msg00011.html
...
> The dependencies in debian/rules are indeed correct, but now I have
> figured out what the problem is: the upstream build system causes the
> libraries to be relinked when make is run again, e.g. when debian/rules
> runs "make install" to install them into debian/tmp.  I have reported
> this upstream at
> https://lists.gnu.org/archive/html/bug-ncurses/2018-08/msg00011.html.

I improved the "--disable-relink" option to address this (seems to work
for me, except with my Fedora test-package which loses its "provides"
information).

-- 
Thomas E. Dickey 
https://invisible-island.net
ftp://ftp.invisible-island.net


signature.asc
Description: Digital signature


Bug#903790: ncurses: parallel build failure

2018-08-11 Thread Sven Joachim
Control: tags -1 - unreproducible
Control: forwarded -1 
https://lists.gnu.org/archive/html/bug-ncurses/2018-08/msg00011.html

Am 15.07.2018 um 12:30 schrieb Sven Joachim:

> Am 14.07.2018 um 22:43 schrieb Helmut Grohne:
>
>> Source: ncurses
>> Version: 6.1+20180210-4
>> User: helm...@debian.org
>> Usertags: rebootstrap
>> Tags: unreproducible
>>
>> Since very recently, I see a weird build failure for ncurses.
>
>> Unfortunately, I lost the relevant build logs, but let me try to
>> give you as much data as I still have.
>>
>> Thus far, I've seen the failure twice. The linker complains about a
>> truncated libmenuw.so.
>
>> Presumably it happens during the objdir-test
>> build and linking libmenuw.so appears to happen concurrently.
>
> Hard to imagine how this could happen, considering that the test
> programs' configure script is only supposed to be run after the wide
> libraries have been built already, according to the dependencies in
> debian/rules.

The dependencies in debian/rules are indeed correct, but now I have
figured out what the problem is: the upstream build system causes the
libraries to be relinked when make is run again, e.g. when debian/rules
runs "make install" to install them into debian/tmp.  I have reported
this upstream at
https://lists.gnu.org/archive/html/bug-ncurses/2018-08/msg00011.html.

Since the event of R³ support in dpkg/debhelper and the reorganization
of the build-arch/build-indep targets in ncurses 6.1+20180210-4, it is
possible that the binary-indep and build-arch targets are run in
parallel.  In particular, the install-indep and build-test targets can
run in parallel, and then you have a race condition due to the relinking
of the libraries on which the test programs depend.

Until a proper fix is available, a possible workaround is to build the
test programs in the build-indep target (untested):

diff --git a/debian/rules b/debian/rules
index ff1bd307..1bf824bc 100755
--- a/debian/rules
+++ b/debian/rules
@@ -323,10 +323,10 @@ $(objdir-test)/config.status: build-wide config.guess-stamp
 		PKG_CONFIG_LIBDIR=$(wobjdir)/usr/lib/$(DEB_HOST_MULTIARCH)/pkgconfig \
 		$(relsrcdir)/test/configure $(CONFARGS-TEST)
 
-build-indep: build-normal build-wide
+build-indep: build-normal build-wide build-test
 	touch $@
 
-build-arch build: build-indep build-static build-wide-static build-test \
+build-arch build: build-indep build-static build-wide-static \
 	  build-legacy build-wide-legacy $(build_64) $(build_32)
 	touch $@
 

Cheers,
   Sven


Bug#903790: ncurses: parallel build failure

2018-07-21 Thread Sven Joachim
Am 20.07.2018 um 12:59 schrieb Helmut Grohne:

> On Sun, Jul 15, 2018 at 12:30:22PM +0200, Sven Joachim wrote:
>> Something like this, perhaps?
>> 
>> ,
>> | make[1]: Entering directory '/<>/ncurses-6.1+20180210/obj-test'
>> | gcc -g -O2
>> | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. 
>> -fstack-protector-strong
>> | -Wformat -Werror=format-security -o cardfile ../obj-test/cardfile.o
>> | -L/<>/ncurses-6.1+20180210/obj-wide/lib
>> | -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -I. -I../test
>> | -I../test -DHAVE_CONFIG_H -DDATA_DIR=\"/usr/share/ncurses-examples\"
>> | -Wdate-time -D_FORTIFY_SOURCE=2
>> | -I/<>/ncurses-6.1+20180210/obj-wide/include
>> | -D_DEFAULT_SOURCE -D_DEFAULT_SOURCE -D_GNU_SOURCE -D_GNU_SOURCE -g
>> | -O2
>> | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. 
>> -fstack-protector-strong
>> | -Wformat -Werror=format-security -lformw -lmenuw -lpanelw -lncursesw
>> | -ltinfo -lutil -lm
>> | /<>/ncurses-6.1+20180210/obj-wide/lib/libmenuw.so: file not 
>> recognized: File format not recognized
>> | collect2: error: ld returned 1 exit status
>> | Makefile:302: recipe for target 'cardfile' failed
>> | make[1]: *** [cardfile] Error 1
>> | make[1]: Leaving directory '/<>/ncurses-6.1+20180210/obj-test'
>> | debian/rules:443: recipe for target 'build-test' failed
>> `
>
> Yes, that's what was seeing. So that's the same thing and we can
> essentially rule out hardware defects.
>
>> It would be necessary to have access to the build tree on a failure in
>> order to debug the problem, build logs alone will most likely be
>> useless.  Since make currently treats -Otarget as -Oline, they have
>> become almost incomprehensible anyway. :-/
>
> My attempts at reproducing it with sbuild (native and cross) were
> fruitless, but running rebootstrap on tmpfs while enabling a parallel
> build should do. That was quite reliable on a small sample for me. Do
> you want to try yourself? Would you like me to prepare and share a tree?

I tried building ncurses on a tmpfs a few times, but it always
succeeded.  So if you can tar up a build tree where it failed, that
would be appreciated.

Cheers,
   Sven



Bug#903790: ncurses: parallel build failure

2018-07-20 Thread Helmut Grohne
Hi Sven,

On Sun, Jul 15, 2018 at 12:30:22PM +0200, Sven Joachim wrote:
> Something like this, perhaps?
> 
> ,
> | make[1]: Entering directory '/<>/ncurses-6.1+20180210/obj-test'
> | gcc -g -O2
> | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. 
> -fstack-protector-strong
> | -Wformat -Werror=format-security -o cardfile ../obj-test/cardfile.o
> | -L/<>/ncurses-6.1+20180210/obj-wide/lib
> | -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -I. -I../test
> | -I../test -DHAVE_CONFIG_H -DDATA_DIR=\"/usr/share/ncurses-examples\"
> | -Wdate-time -D_FORTIFY_SOURCE=2
> | -I/<>/ncurses-6.1+20180210/obj-wide/include
> | -D_DEFAULT_SOURCE -D_DEFAULT_SOURCE -D_GNU_SOURCE -D_GNU_SOURCE -g
> | -O2
> | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. 
> -fstack-protector-strong
> | -Wformat -Werror=format-security -lformw -lmenuw -lpanelw -lncursesw
> | -ltinfo -lutil -lm
> | /<>/ncurses-6.1+20180210/obj-wide/lib/libmenuw.so: file not 
> recognized: File format not recognized
> | collect2: error: ld returned 1 exit status
> | Makefile:302: recipe for target 'cardfile' failed
> | make[1]: *** [cardfile] Error 1
> | make[1]: Leaving directory '/<>/ncurses-6.1+20180210/obj-test'
> | debian/rules:443: recipe for target 'build-test' failed
> `

Yes, that's what was seeing. So that's the same thing and we can
essentially rule out hardware defects.

> It would be necessary to have access to the build tree on a failure in
> order to debug the problem, build logs alone will most likely be
> useless.  Since make currently treats -Otarget as -Oline, they have
> become almost incomprehensible anyway. :-/

My attempts at reproducing it with sbuild (native and cross) were
fruitless, but running rebootstrap on tmpfs while enabling a parallel
build should do. That was quite reliable on a small sample for me. Do
you want to try yourself? Would you like me to prepare and share a tree?

> Let's keep it open for now.  I have to make an upload anyway, then we
> will see what happens on the buildds.

Given my attempts at reproducing, I doubt that any buildd will reproduce
it.

Helmut



Bug#903790: ncurses: parallel build failure

2018-07-15 Thread Sven Joachim
Am 14.07.2018 um 22:43 schrieb Helmut Grohne:

> Source: ncurses
> Version: 6.1+20180210-4
> User: helm...@debian.org
> Usertags: rebootstrap
> Tags: unreproducible
>
> Hi Sven,
>
> Since very recently, I see a weird build failure for ncurses. I guess
> that it is related to Ben's make-dfsg upload, but I cannot tell for sure
> yet.

That's quite possible, with make 4.2.1-1 ncurses had lost some
parallelism support and Ben's upload restored that (see #890430).

> Unfortunately, I lost the relevant build logs, but let me try to
> give you as much data as I still have.
>
> Thus far, I've seen the failure twice. The linker complains about a
> truncated libmenuw.so.

Something like this, perhaps?

,
| make[1]: Entering directory '/<>/ncurses-6.1+20180210/obj-test'
| gcc -g -O2
| -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. 
-fstack-protector-strong
| -Wformat -Werror=format-security -o cardfile ../obj-test/cardfile.o
| -L/<>/ncurses-6.1+20180210/obj-wide/lib
| -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -I. -I../test
| -I../test -DHAVE_CONFIG_H -DDATA_DIR=\"/usr/share/ncurses-examples\"
| -Wdate-time -D_FORTIFY_SOURCE=2
| -I/<>/ncurses-6.1+20180210/obj-wide/include
| -D_DEFAULT_SOURCE -D_DEFAULT_SOURCE -D_GNU_SOURCE -D_GNU_SOURCE -g
| -O2
| -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. 
-fstack-protector-strong
| -Wformat -Werror=format-security -lformw -lmenuw -lpanelw -lncursesw
| -ltinfo -lutil -lm
| /<>/ncurses-6.1+20180210/obj-wide/lib/libmenuw.so: file not 
recognized: File format not recognized
| collect2: error: ld returned 1 exit status
| Makefile:302: recipe for target 'cardfile' failed
| make[1]: *** [cardfile] Error 1
| make[1]: Leaving directory '/<>/ncurses-6.1+20180210/obj-test'
| debian/rules:443: recipe for target 'build-test' failed
`

Some eight weeks ago doko reported such a failure in Ubuntu cosmic to me
in private mail.  Unfortunately I did not save the full build log
either, and the URL doko gave me now has a log from a successful build.

> Presumably it happens during the objdir-test
> build and linking libmenuw.so appears to happen concurrently.

Hard to imagine how this could happen, considering that the test
programs' configure script is only supposed to be run after the wide
libraries have been built already, according to the dependencies in
debian/rules.

> So what can I do? I think documenting the symptoms with this bug is
> vaguely useful. I suggest leaving the bug open for maybe two weeks to
> accumulate more detail (if any). If it turns out that nobody else can
> reproduce parallel build failures, the bug should be closed.

It would be necessary to have access to the build tree on a failure in
order to debug the problem, build logs alone will most likely be
useless.  Since make currently treats -Otarget as -Oline, they have
become almost incomprehensible anyway. :-/

> Hope this vague report helps. If not, please do close it.

Let's keep it open for now.  I have to make an upload anyway, then we
will see what happens on the buildds.

Cheers,
   Sven



Bug#903790: ncurses: parallel build failure

2018-07-14 Thread Helmut Grohne
Source: ncurses
Version: 6.1+20180210-4
User: helm...@debian.org
Usertags: rebootstrap
Tags: unreproducible

Hi Sven,

Since very recently, I see a weird build failure for ncurses. I guess
that it is related to Ben's make-dfsg upload, but I cannot tell for sure
yet. Unfortunately, I lost the relevant build logs, but let me try to
give you as much data as I still have.

Thus far, I've seen the failure twice. The linker complains about a
truncated libmenuw.so. Presumably it happens during the objdir-test
build and linking libmenuw.so appears to happen concurrently.

Settings that reproduced the issue:
 * cross build for m68k
 * DEB_BUILD_OPTIONS="nocheck noddebs parallel=9"
 * DEB_BUILD_PROFILES=nobiarch (shouldn't matter for m68k)
 * Everything on tmpfs.
 * SCHED_BATCH

If I set parallel=1 in the very same (rebootstrap) setting, I cannot
reproduce it. Since jenkins.d.n sets parallel=1, you cannot see it there
at all.

I then tried reproducing it outside rebootstrap using sbuild, but no
matter what I tried, I couldn't reproduce it there. Turning parallel up
or down didn't help, but I don't know how reliable the issue is.

During my testing, I once ran into a native, parallel build that hung.
In the hung state, there were two make processes each consuming CPU
indefinitely. Attaching to them with strace indicated that they weren't
doing any syscalls.

I somewhat suspect that there is no failure on ncurses' side, but the
make-dfsg side is broken, but I cannot prove that suspicion.

So what can I do? I think documenting the symptoms with this bug is
vaguely useful. I suggest leaving the bug open for maybe two weeks to
accumulate more detail (if any). If it turns out that nobody else can
reproduce parallel build failures, the bug should be closed.

Hope this vague report helps. If not, please do close it.

Helmut