Bug#903790: ncurses: parallel build failure
On Sat, Aug 11, 2018 at 10:02:38AM +0200, Sven Joachim wrote: > Control: tags -1 - unreproducible > Control: forwarded -1 > https://lists.gnu.org/archive/html/bug-ncurses/2018-08/msg00011.html ... > The dependencies in debian/rules are indeed correct, but now I have > figured out what the problem is: the upstream build system causes the > libraries to be relinked when make is run again, e.g. when debian/rules > runs "make install" to install them into debian/tmp. I have reported > this upstream at > https://lists.gnu.org/archive/html/bug-ncurses/2018-08/msg00011.html. I improved the "--disable-relink" option to address this (seems to work for me, except with my Fedora test-package which loses its "provides" information). -- Thomas E. Dickey https://invisible-island.net ftp://ftp.invisible-island.net signature.asc Description: Digital signature
Bug#903790: ncurses: parallel build failure
Control: tags -1 - unreproducible Control: forwarded -1 https://lists.gnu.org/archive/html/bug-ncurses/2018-08/msg00011.html Am 15.07.2018 um 12:30 schrieb Sven Joachim: > Am 14.07.2018 um 22:43 schrieb Helmut Grohne: > >> Source: ncurses >> Version: 6.1+20180210-4 >> User: helm...@debian.org >> Usertags: rebootstrap >> Tags: unreproducible >> >> Since very recently, I see a weird build failure for ncurses. > >> Unfortunately, I lost the relevant build logs, but let me try to >> give you as much data as I still have. >> >> Thus far, I've seen the failure twice. The linker complains about a >> truncated libmenuw.so. > >> Presumably it happens during the objdir-test >> build and linking libmenuw.so appears to happen concurrently. > > Hard to imagine how this could happen, considering that the test > programs' configure script is only supposed to be run after the wide > libraries have been built already, according to the dependencies in > debian/rules. The dependencies in debian/rules are indeed correct, but now I have figured out what the problem is: the upstream build system causes the libraries to be relinked when make is run again, e.g. when debian/rules runs "make install" to install them into debian/tmp. I have reported this upstream at https://lists.gnu.org/archive/html/bug-ncurses/2018-08/msg00011.html. Since the event of R³ support in dpkg/debhelper and the reorganization of the build-arch/build-indep targets in ncurses 6.1+20180210-4, it is possible that the binary-indep and build-arch targets are run in parallel. In particular, the install-indep and build-test targets can run in parallel, and then you have a race condition due to the relinking of the libraries on which the test programs depend. Until a proper fix is available, a possible workaround is to build the test programs in the build-indep target (untested): diff --git a/debian/rules b/debian/rules index ff1bd307..1bf824bc 100755 --- a/debian/rules +++ b/debian/rules @@ -323,10 +323,10 @@ $(objdir-test)/config.status: build-wide config.guess-stamp PKG_CONFIG_LIBDIR=$(wobjdir)/usr/lib/$(DEB_HOST_MULTIARCH)/pkgconfig \ $(relsrcdir)/test/configure $(CONFARGS-TEST) -build-indep: build-normal build-wide +build-indep: build-normal build-wide build-test touch $@ -build-arch build: build-indep build-static build-wide-static build-test \ +build-arch build: build-indep build-static build-wide-static \ build-legacy build-wide-legacy $(build_64) $(build_32) touch $@ Cheers, Sven
Bug#903790: ncurses: parallel build failure
Am 20.07.2018 um 12:59 schrieb Helmut Grohne: > On Sun, Jul 15, 2018 at 12:30:22PM +0200, Sven Joachim wrote: >> Something like this, perhaps? >> >> , >> | make[1]: Entering directory '/<>/ncurses-6.1+20180210/obj-test' >> | gcc -g -O2 >> | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. >> -fstack-protector-strong >> | -Wformat -Werror=format-security -o cardfile ../obj-test/cardfile.o >> | -L/<>/ncurses-6.1+20180210/obj-wide/lib >> | -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -I. -I../test >> | -I../test -DHAVE_CONFIG_H -DDATA_DIR=\"/usr/share/ncurses-examples\" >> | -Wdate-time -D_FORTIFY_SOURCE=2 >> | -I/<>/ncurses-6.1+20180210/obj-wide/include >> | -D_DEFAULT_SOURCE -D_DEFAULT_SOURCE -D_GNU_SOURCE -D_GNU_SOURCE -g >> | -O2 >> | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. >> -fstack-protector-strong >> | -Wformat -Werror=format-security -lformw -lmenuw -lpanelw -lncursesw >> | -ltinfo -lutil -lm >> | /<>/ncurses-6.1+20180210/obj-wide/lib/libmenuw.so: file not >> recognized: File format not recognized >> | collect2: error: ld returned 1 exit status >> | Makefile:302: recipe for target 'cardfile' failed >> | make[1]: *** [cardfile] Error 1 >> | make[1]: Leaving directory '/<>/ncurses-6.1+20180210/obj-test' >> | debian/rules:443: recipe for target 'build-test' failed >> ` > > Yes, that's what was seeing. So that's the same thing and we can > essentially rule out hardware defects. > >> It would be necessary to have access to the build tree on a failure in >> order to debug the problem, build logs alone will most likely be >> useless. Since make currently treats -Otarget as -Oline, they have >> become almost incomprehensible anyway. :-/ > > My attempts at reproducing it with sbuild (native and cross) were > fruitless, but running rebootstrap on tmpfs while enabling a parallel > build should do. That was quite reliable on a small sample for me. Do > you want to try yourself? Would you like me to prepare and share a tree? I tried building ncurses on a tmpfs a few times, but it always succeeded. So if you can tar up a build tree where it failed, that would be appreciated. Cheers, Sven
Bug#903790: ncurses: parallel build failure
Hi Sven, On Sun, Jul 15, 2018 at 12:30:22PM +0200, Sven Joachim wrote: > Something like this, perhaps? > > , > | make[1]: Entering directory '/<>/ncurses-6.1+20180210/obj-test' > | gcc -g -O2 > | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. > -fstack-protector-strong > | -Wformat -Werror=format-security -o cardfile ../obj-test/cardfile.o > | -L/<>/ncurses-6.1+20180210/obj-wide/lib > | -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -I. -I../test > | -I../test -DHAVE_CONFIG_H -DDATA_DIR=\"/usr/share/ncurses-examples\" > | -Wdate-time -D_FORTIFY_SOURCE=2 > | -I/<>/ncurses-6.1+20180210/obj-wide/include > | -D_DEFAULT_SOURCE -D_DEFAULT_SOURCE -D_GNU_SOURCE -D_GNU_SOURCE -g > | -O2 > | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. > -fstack-protector-strong > | -Wformat -Werror=format-security -lformw -lmenuw -lpanelw -lncursesw > | -ltinfo -lutil -lm > | /<>/ncurses-6.1+20180210/obj-wide/lib/libmenuw.so: file not > recognized: File format not recognized > | collect2: error: ld returned 1 exit status > | Makefile:302: recipe for target 'cardfile' failed > | make[1]: *** [cardfile] Error 1 > | make[1]: Leaving directory '/<>/ncurses-6.1+20180210/obj-test' > | debian/rules:443: recipe for target 'build-test' failed > ` Yes, that's what was seeing. So that's the same thing and we can essentially rule out hardware defects. > It would be necessary to have access to the build tree on a failure in > order to debug the problem, build logs alone will most likely be > useless. Since make currently treats -Otarget as -Oline, they have > become almost incomprehensible anyway. :-/ My attempts at reproducing it with sbuild (native and cross) were fruitless, but running rebootstrap on tmpfs while enabling a parallel build should do. That was quite reliable on a small sample for me. Do you want to try yourself? Would you like me to prepare and share a tree? > Let's keep it open for now. I have to make an upload anyway, then we > will see what happens on the buildds. Given my attempts at reproducing, I doubt that any buildd will reproduce it. Helmut
Bug#903790: ncurses: parallel build failure
Am 14.07.2018 um 22:43 schrieb Helmut Grohne: > Source: ncurses > Version: 6.1+20180210-4 > User: helm...@debian.org > Usertags: rebootstrap > Tags: unreproducible > > Hi Sven, > > Since very recently, I see a weird build failure for ncurses. I guess > that it is related to Ben's make-dfsg upload, but I cannot tell for sure > yet. That's quite possible, with make 4.2.1-1 ncurses had lost some parallelism support and Ben's upload restored that (see #890430). > Unfortunately, I lost the relevant build logs, but let me try to > give you as much data as I still have. > > Thus far, I've seen the failure twice. The linker complains about a > truncated libmenuw.so. Something like this, perhaps? , | make[1]: Entering directory '/<>/ncurses-6.1+20180210/obj-test' | gcc -g -O2 | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. -fstack-protector-strong | -Wformat -Werror=format-security -o cardfile ../obj-test/cardfile.o | -L/<>/ncurses-6.1+20180210/obj-wide/lib | -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -I. -I../test | -I../test -DHAVE_CONFIG_H -DDATA_DIR=\"/usr/share/ncurses-examples\" | -Wdate-time -D_FORTIFY_SOURCE=2 | -I/<>/ncurses-6.1+20180210/obj-wide/include | -D_DEFAULT_SOURCE -D_DEFAULT_SOURCE -D_GNU_SOURCE -D_GNU_SOURCE -g | -O2 | -fdebug-prefix-map=/<>/ncurses-6.1+20180210=. -fstack-protector-strong | -Wformat -Werror=format-security -lformw -lmenuw -lpanelw -lncursesw | -ltinfo -lutil -lm | /<>/ncurses-6.1+20180210/obj-wide/lib/libmenuw.so: file not recognized: File format not recognized | collect2: error: ld returned 1 exit status | Makefile:302: recipe for target 'cardfile' failed | make[1]: *** [cardfile] Error 1 | make[1]: Leaving directory '/<>/ncurses-6.1+20180210/obj-test' | debian/rules:443: recipe for target 'build-test' failed ` Some eight weeks ago doko reported such a failure in Ubuntu cosmic to me in private mail. Unfortunately I did not save the full build log either, and the URL doko gave me now has a log from a successful build. > Presumably it happens during the objdir-test > build and linking libmenuw.so appears to happen concurrently. Hard to imagine how this could happen, considering that the test programs' configure script is only supposed to be run after the wide libraries have been built already, according to the dependencies in debian/rules. > So what can I do? I think documenting the symptoms with this bug is > vaguely useful. I suggest leaving the bug open for maybe two weeks to > accumulate more detail (if any). If it turns out that nobody else can > reproduce parallel build failures, the bug should be closed. It would be necessary to have access to the build tree on a failure in order to debug the problem, build logs alone will most likely be useless. Since make currently treats -Otarget as -Oline, they have become almost incomprehensible anyway. :-/ > Hope this vague report helps. If not, please do close it. Let's keep it open for now. I have to make an upload anyway, then we will see what happens on the buildds. Cheers, Sven
Bug#903790: ncurses: parallel build failure
Source: ncurses Version: 6.1+20180210-4 User: helm...@debian.org Usertags: rebootstrap Tags: unreproducible Hi Sven, Since very recently, I see a weird build failure for ncurses. I guess that it is related to Ben's make-dfsg upload, but I cannot tell for sure yet. Unfortunately, I lost the relevant build logs, but let me try to give you as much data as I still have. Thus far, I've seen the failure twice. The linker complains about a truncated libmenuw.so. Presumably it happens during the objdir-test build and linking libmenuw.so appears to happen concurrently. Settings that reproduced the issue: * cross build for m68k * DEB_BUILD_OPTIONS="nocheck noddebs parallel=9" * DEB_BUILD_PROFILES=nobiarch (shouldn't matter for m68k) * Everything on tmpfs. * SCHED_BATCH If I set parallel=1 in the very same (rebootstrap) setting, I cannot reproduce it. Since jenkins.d.n sets parallel=1, you cannot see it there at all. I then tried reproducing it outside rebootstrap using sbuild, but no matter what I tried, I couldn't reproduce it there. Turning parallel up or down didn't help, but I don't know how reliable the issue is. During my testing, I once ran into a native, parallel build that hung. In the hung state, there were two make processes each consuming CPU indefinitely. Attaching to them with strace indicated that they weren't doing any syscalls. I somewhat suspect that there is no failure on ncurses' side, but the make-dfsg side is broken, but I cannot prove that suspicion. So what can I do? I think documenting the symptoms with this bug is vaguely useful. I suggest leaving the bug open for maybe two weeks to accumulate more detail (if any). If it turns out that nobody else can reproduce parallel build failures, the bug should be closed. Hope this vague report helps. If not, please do close it. Helmut