Bug#851790: installation-reports: DNS not working
On 2017-01-23 17:30, Wookey wrote: > On 2017-01-19 11:04 +0100, Cyril Brulebois wrote: > > Steve McIntyre(2017-01-19): > > > On Thu, Jan 19, 2017 at 08:57:54AM +0100, Aurelien Jarno wrote: > > > > > > > >The workaround are to make sure the chroots are up-to-date (which should > > > >be the case now on the build daemons). An other alternative would be to > > > >avoid copying a library in mklibs if it is already present in the image. > > > >That might break if some very strict dependencies are used, though > > > >I guess the way the udebs are downloaded, they should always have the > > > >same or a newer version than in the chroot. > > > > > > Thanks for the explanation - it's appreciated! > > > > Yeah, thanks for the confirmation. > > OK. I tested today's image (2017-01-23 04:56) and the install went > through OK, so we are back in sync and this issue is gone for now. It should > probably be retitled to something about library sync/using host libs > and left open until it's fixed propoerly. I have pushed a patch a few days ago that should fix the issue. Well I don't know if it should be considered as a fix or a hack, but at least it looks less a hack than the existing code... The longterm solution is clearly to fully get rid of mklibs. That should wait for after stretch though, as it requires new udebs from some packages and thus some coordination. Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net signature.asc Description: PGP signature
Bug#851790: installation-reports: DNS not working
On 2017-01-19 11:04 +0100, Cyril Brulebois wrote: > Steve McIntyre(2017-01-19): > > On Thu, Jan 19, 2017 at 08:57:54AM +0100, Aurelien Jarno wrote: > > > > > >The workaround are to make sure the chroots are up-to-date (which should > > >be the case now on the build daemons). An other alternative would be to > > >avoid copying a library in mklibs if it is already present in the image. > > >That might break if some very strict dependencies are used, though > > >I guess the way the udebs are downloaded, they should always have the > > >same or a newer version than in the chroot. > > > > Thanks for the explanation - it's appreciated! > > Yeah, thanks for the confirmation. OK. I tested today's image (2017-01-23 04:56) and the install went through OK, so we are back in sync and this issue is gone for now. It should probably be retitled to something about library sync/using host libs and left open until it's fixed propoerly. Wookey -- Principal hats: Linaro, Debian, Wookware, ARM http://wookware.org/ signature.asc Description: Digital signature
Bug#851790: installation-reports: DNS not working
Steve McIntyre(2017-01-19): > On Thu, Jan 19, 2017 at 08:57:54AM +0100, Aurelien Jarno wrote: > >On 2017-01-19 01:53, Cyril Brulebois wrote: > > > >> It's been a while since I last looked at/understood mklibs stuff though, > >> feel free to fix my suspicions/conclusions. > > > >The long term solution is to package all the libraries into udeb > >packages. That way we can simply get rid of the mklibs pass. > > > >The workaround are to make sure the chroots are up-to-date (which should > >be the case now on the build daemons). An other alternative would be to > >avoid copying a library in mklibs if it is already present in the image. > >That might break if some very strict dependencies are used, though > >I guess the way the udebs are downloaded, they should always have the > >same or a newer version than in the chroot. > > Thanks for the explanation - it's appreciated! Yeah, thanks for the confirmation. > Is there anything we could do to fail the build if versions are out of > sync, rather than let a broken build through? Well, I think Aurélien mentioned it: ensure chroots are up-to-date. Tweaking the buildscript might do the trick, I suppose. AFAIUI, the build isn't broken every time there's a divergence in versions anyway; you're sometimes (un)lucky. Can't really devote time right now to investigating the “let's not copy stuff over if it's already present” suggestion… KiBi. signature.asc Description: Digital signature
Bug#851790: installation-reports: DNS not working
On Thu, Jan 19, 2017 at 08:57:54AM +0100, Aurelien Jarno wrote: >On 2017-01-19 01:53, Cyril Brulebois wrote: > >> It's been a while since I last looked at/understood mklibs stuff though, >> feel free to fix my suspicions/conclusions. > >The long term solution is to package all the libraries into udeb >packages. That way we can simply get rid of the mklibs pass. > >The workaround are to make sure the chroots are up-to-date (which should >be the case now on the build daemons). An other alternative would be to >avoid copying a library in mklibs if it is already present in the image. >That might break if some very strict dependencies are used, though >I guess the way the udebs are downloaded, they should always have the >same or a newer version than in the chroot. Thanks for the explanation - it's appreciated! Is there anything we could do to fail the build if versions are out of sync, rather than let a broken build through? -- Steve McIntyre, Cambridge, UK.st...@einval.com < Aardvark> I dislike C++ to start with. C++11 just seems to be handing rope-creating factories for users to hang multiple instances of themselves.
Bug#851790: installation-reports: DNS not working
On 2017-01-19 01:53, Cyril Brulebois wrote: > Cyril Brulebois(2017-01-19): > > Summing up things from IRC: > > - in an uptodate sid chroot (both development version and minimal one > >with daily-build script): no DNS issues with the generated mini.iso > >(amd64, tested within stable's kvm on amd64). My image was also > >tested successfully by Steve, so not a setup issue. > > > > - I can reproduce the issue with dailies from 2017-01-18, and from > >2017-01-19 (I picked it in advance during its build on barriere). > > > > - I can reproduce the issue in the above mentioned sid chroot if > >I downgrade these packages to testing's version (-9 → -8): > > sudo apt-get install libc6:amd64=2.24-8 libc6-dev:amd64=2.24-8 > > libc-dev-bin=2.24-8 > > > > - The older set of libc* packages is installed in barriere's current > >amd64 sid chroot, too. > > > > - Steve could only produce broken images when building locally, until > >he upgraded his libc* packages as well. > > And since Steve was wondering how we could release something for Stretch > RC 1 with that… I've just confirmed that using -8 .deb with a -8 .udeb > (reversioned as -9+hack so that its being dropped under localudebs makes > it take precedence over sid's .udeb) leads to an image that's working > fine. And AFAICT that's the combination we had at the time. > > I suspect the implementation change through the following patch in -9: > glibc-2.24/debian/patches/any/cvs-resolv-internal-qtype.diff Indeed, I haven't done any test, but looking at the code, it shows that the changes assume that libresolv.so.2 and libnss_dns.so.2 stay in sync. > plus 283e7a294275a7da53258600deaaafbbec6b96c1 in debian-installer.git is > what is triggering the issue? (Explaining why host's libc .deb have an > impact on the built images.) Indeed this has been done to avoid crashes when libc6-udeb is re-installed at the beginning if the installer process, often causing crashes in case of version mismatch (because the libnss* libraries were in a different package). Now libc6-udeb is unpacked while building the image, but given we still have half a dozen of libraries without the corresponding udeb, mklibs copy them on the image, as well as the dependencies, which presumably includes libresolv.so.2. > It's been a while since I last looked at/understood mklibs stuff though, > feel free to fix my suspicions/conclusions. The long term solution is to package all the libraries into udeb packages. That way we can simply get rid of the mklibs pass. The workaround are to make sure the chroots are up-to-date (which should be the case now on the build daemons). An other alternative would be to avoid copying a library in mklibs if it is already present in the image. That might break if some very strict dependencies are used, though I guess the way the udebs are downloaded, they should always have the same or a newer version than in the chroot. Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net signature.asc Description: PGP signature
Bug#851790: installation-reports: DNS not working
Cyril Brulebois(2017-01-19): > Summing up things from IRC: > - in an uptodate sid chroot (both development version and minimal one >with daily-build script): no DNS issues with the generated mini.iso >(amd64, tested within stable's kvm on amd64). My image was also >tested successfully by Steve, so not a setup issue. > > - I can reproduce the issue with dailies from 2017-01-18, and from >2017-01-19 (I picked it in advance during its build on barriere). > > - I can reproduce the issue in the above mentioned sid chroot if >I downgrade these packages to testing's version (-9 → -8): > sudo apt-get install libc6:amd64=2.24-8 libc6-dev:amd64=2.24-8 > libc-dev-bin=2.24-8 > > - The older set of libc* packages is installed in barriere's current >amd64 sid chroot, too. > > - Steve could only produce broken images when building locally, until >he upgraded his libc* packages as well. And since Steve was wondering how we could release something for Stretch RC 1 with that… I've just confirmed that using -8 .deb with a -8 .udeb (reversioned as -9+hack so that its being dropped under localudebs makes it take precedence over sid's .udeb) leads to an image that's working fine. And AFAICT that's the combination we had at the time. I suspect the implementation change through the following patch in -9: glibc-2.24/debian/patches/any/cvs-resolv-internal-qtype.diff plus 283e7a294275a7da53258600deaaafbbec6b96c1 in debian-installer.git is what is triggering the issue? (Explaining why host's libc .deb have an impact on the built images.) It's been a while since I last looked at/understood mklibs stuff though, feel free to fix my suspicions/conclusions. KiBi. signature.asc Description: Digital signature
Bug#851790: installation-reports: DNS not working
Cyril Brulebois(2017-01-19): > You can try downgrading the kernel, but I'd usually look at glibc first > for DNS related issues. Summing up things from IRC: - in an uptodate sid chroot (both development version and minimal one with daily-build script): no DNS issues with the generated mini.iso (amd64, tested within stable's kvm on amd64). My image was also tested successfully by Steve, so not a setup issue. - I can reproduce the issue with dailies from 2017-01-18, and from 2017-01-19 (I picked it in advance during its build on barriere). - I can reproduce the issue in the above mentioned sid chroot if I downgrade these packages to testing's version (-9 → -8): sudo apt-get install libc6:amd64=2.24-8 libc6-dev:amd64=2.24-8 libc-dev-bin=2.24-8 - The older set of libc* packages is installed in barriere's current amd64 sid chroot, too. - Steve could only produce broken images when building locally, until he upgraded his libc* packages as well. → Copying glibc@packages for the time being, even I suspect this will likely disappear entirely once -9 propagates… KiBi. signature.asc Description: Digital signature
Bug#851790: installation-reports: DNS not working
Steve McIntyre(2017-01-18): > For the sake of completeness, I can confirm that the same problem > shows up when running on a different network too. I also see this > using the oldest amd64 daily I can grab (from 2017-01-17) which has > the 4.9 kernel too. > > I'll see if I can debug this, but I'm not sure where to look straight > away. You can try downgrading the kernel, but I'd usually look at glibc first for DNS related issues. KiBi. signature.asc Description: Digital signature
Bug#851790: installation-reports: DNS not working
On Wed, Jan 18, 2017 at 06:43:33PM +, Wookey wrote: >Package: installation-reports >Severity: grave >Tags: d-i >Justification: renders package unusable > >Dear Maintainer, > >The current installer, with the new 4.9 kernel, is unable to resolve >domains, so is quite seriously broken. hosts, not domains, but yes... >This was noted during install on an arm64 gigabyte MP30-AR1 >desktop/server, when choose-mirror failed, but it soon became clear >that DNS was not working. > >Testing on an x86 VM with the same daily image (18th Jan 2017) found >the same problem. Going back to the rc1 installer image (4.8 kernel) >it works OK. > >Tests showed that the network came up fine and things are pingable by IP, but >not name: ># ping wookware.org >ping: bad address 'wookware.org' ># ping 93.93.131.118 >PING 93.93.131.118 (93.93.131.118): 56 data bytes >64 bytes from 93.93.131.118: seq=0 ttl=50 time=19.892 ms > >similarly the failing line from the choose-mirror log works if an address is >inserted: >Jan 18 17:04:11 choose-mirror[31201]: DEBUG: command: wget --no-verbose >http://debian-mirror.cambridge.arm.com/debian/dists/stretch/Release -O - | >grep -E '^(Suite|Codename|Architectures):' > >Jan 18 17:04:11 choose-mirror[31201]: WARNING **: mirror does not support the >specified release (stretch) > ># wget --no-verbose >http://debian-mirror.cambridge.arm.com/debian/dists/stretch/Release -O - | >grep -E '^(Suite|Codename|Architectures):' >wget: unable to resolve host address 'debian-mirror.cambridge.arm.com' > ># wget --no-verbose http://10.1.194.51/debian/dists/stretch/Release -O - | gre >p -E '^(Suite|Codename|Architectures):' >Suite: testing >Codename: stretch >Architectures: amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x >2017-01-18 17:47:12 URL:http://10.1.194.51/debian/dists/stretch/Release >[177979/177979] -> "-" [1] > >resolv.conf is as expected: >search cambridge.arm.com >nameserver 10.1.2.24 >nameserver 10.1.2.23 > >(adding nameserver 8.8.8.8 makes no difference) > >Watching packets go by when doing a VM install it is clear that the >local DNS server returns the correct response, but this is being ignored >or lost by the D-I initrd. For the sake of completeness, I can confirm that the same problem shows up when running on a different network too. I also see this using the oldest amd64 daily I can grab (from 2017-01-17) which has the 4.9 kernel too. I'll see if I can debug this, but I'm not sure where to look straight away. -- Steve McIntyre, Cambridge, UK.st...@einval.com "Managing a volunteer open source project is a lot like herding kittens, except the kittens randomly appear and disappear because they have day jobs." -- Matt Mackall
Bug#851790: installation-reports: DNS not working
Package: installation-reports Severity: grave Tags: d-i Justification: renders package unusable Dear Maintainer, The current installer, with the new 4.9 kernel, is unable to resolve domains, so is quite seriously broken. This was noted during install on an arm64 gigabyte MP30-AR1 desktop/server, when choose-mirror failed, but it soon became clear that DNS was not working. Testing on an x86 VM with the same daily image (18th Jan 2017) found the same problem. Going back to the rc1 installer image (4.8 kernel) it works OK. Tests showed that the network came up fine and things are pingable by IP, but not name: # ping wookware.org ping: bad address 'wookware.org' # ping 93.93.131.118 PING 93.93.131.118 (93.93.131.118): 56 data bytes 64 bytes from 93.93.131.118: seq=0 ttl=50 time=19.892 ms similarly the failing line from the choose-mirror log works if an address is inserted: Jan 18 17:04:11 choose-mirror[31201]: DEBUG: command: wget --no-verbose http://debian-mirror.cambridge.arm.com/debian/dists/stretch/Release -O - | grep -E '^(Suite|Codename|Architectures):' Jan 18 17:04:11 choose-mirror[31201]: WARNING **: mirror does not support the specified release (stretch) # wget --no-verbose http://debian-mirror.cambridge.arm.com/debian/dists/stretch/Release -O - | grep -E '^(Suite|Codename|Architectures):' wget: unable to resolve host address 'debian-mirror.cambridge.arm.com' # wget --no-verbose http://10.1.194.51/debian/dists/stretch/Release -O - | gre p -E '^(Suite|Codename|Architectures):' Suite: testing Codename: stretch Architectures: amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x 2017-01-18 17:47:12 URL:http://10.1.194.51/debian/dists/stretch/Release [177979/177979] -> "-" [1] resolv.conf is as expected: search cambridge.arm.com nameserver 10.1.2.24 nameserver 10.1.2.23 (adding nameserver 8.8.8.8 makes no difference) Watching packets go by when doing a VM install it is clear that the local DNS server returns the correct response, but this is being ignored or lost by the D-I initrd. attached is an strace of strace ping wookware.org > /tmp/tracelog 2>&1 That gets a response from the server OK, but then goes on to ask the other one. a working strace then uses the provided IP address. So there is nothing obviously going wrong there. -- Package-specific info: Boot method: USB Image version: http://gemmei.acc.umu.se/cdimage/daily-builds/daily/arch-latest/arm64/iso-cd/debian-testing-arm64-netinst.iso Date: Machine: Gigabyte MP30-AR1, (and x86 VM) Partitions: (default guided LVM, with separate /home chosen) Base System Installation Checklist: [O] = OK, [E] = Error (please elaborate below), [ ] = didn't try it Initial boot: [O] Detect network card:[O] Configure network: [O] Detect CD: [O] Load installer modules: [O] Clock/timezone setup: [O] User/password setup:[O] Detect hard drives: [O] Partition hard drives: [O] Install base system:[E] Install tasks: [ ] Install boot loader:[ ] Overall install:[ ] Comments/Problems: Worked as expected until DNS needed -- Please make sure that the hardware-summary log file, and any other installation logs that you think would be useful are attached to this report. Please compress large files using gzip. Once you have filled out this report, mail it to sub...@bugs.debian.org. removed automatic info as this report written on different machine fro install. execve("/bin/ping", ["ping", "wookware.org"], [/* 14 vars */]) = 0 brk(NULL) = 0xc9893000 faccessat(AT_FDCWD, "/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x86677000 faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/aarch64-linux-gnu/tls/aarch64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/lib/aarch64-linux-gnu/tls/aarch64", 0xc66b2660, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/aarch64-linux-gnu/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/lib/aarch64-linux-gnu/tls", 0xc66b2660, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/aarch64-linux-gnu/aarch64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/lib/aarch64-linux-gnu/aarch64", 0xc66b2660, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/lib/aarch64-linux-gnu", {st_mode=S_IFDIR|0755, st_size=360, ...}, 0) = 0 openat(AT_FDCWD,