Bug#851790: installation-reports: DNS not working

2017-01-23 Thread Aurelien Jarno
On 2017-01-23 17:30, Wookey wrote:
> On 2017-01-19 11:04 +0100, Cyril Brulebois wrote:
> > Steve McIntyre  (2017-01-19):
> > > On Thu, Jan 19, 2017 at 08:57:54AM +0100, Aurelien Jarno wrote:
> > > >
> > > >The workaround are to make sure the chroots are up-to-date (which should
> > > >be the case now on the build daemons). An other alternative would be to
> > > >avoid copying a library in mklibs if it is already present in the image.
> > > >That might break if some very strict dependencies are used, though
> > > >I guess the way the udebs are downloaded, they should always have the
> > > >same or a newer version than in the chroot.
> > > 
> > > Thanks for the explanation - it's appreciated!
> > 
> > Yeah, thanks for the confirmation.
> 
> OK. I tested today's image (2017-01-23 04:56) and the install went
> through OK, so we are back in sync and this issue is gone for now. It should
> probably be retitled to something about library sync/using host libs
> and left open until it's fixed propoerly.

I have pushed a patch a few days ago that should fix the issue. Well I
don't know if it should be considered as a fix or a hack, but at least
it looks less a hack than the existing code...

The longterm solution is clearly to fully get rid of mklibs. That should
wait for after stretch though, as it requires new udebs from some
packages and thus some coordination.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


signature.asc
Description: PGP signature


Bug#851790: installation-reports: DNS not working

2017-01-23 Thread Wookey
On 2017-01-19 11:04 +0100, Cyril Brulebois wrote:
> Steve McIntyre  (2017-01-19):
> > On Thu, Jan 19, 2017 at 08:57:54AM +0100, Aurelien Jarno wrote:
> > >
> > >The workaround are to make sure the chroots are up-to-date (which should
> > >be the case now on the build daemons). An other alternative would be to
> > >avoid copying a library in mklibs if it is already present in the image.
> > >That might break if some very strict dependencies are used, though
> > >I guess the way the udebs are downloaded, they should always have the
> > >same or a newer version than in the chroot.
> > 
> > Thanks for the explanation - it's appreciated!
> 
> Yeah, thanks for the confirmation.

OK. I tested today's image (2017-01-23 04:56) and the install went
through OK, so we are back in sync and this issue is gone for now. It should
probably be retitled to something about library sync/using host libs
and left open until it's fixed propoerly.

Wookey
-- 
Principal hats:  Linaro, Debian, Wookware, ARM
http://wookware.org/


signature.asc
Description: Digital signature


Bug#851790: installation-reports: DNS not working

2017-01-19 Thread Cyril Brulebois
Steve McIntyre  (2017-01-19):
> On Thu, Jan 19, 2017 at 08:57:54AM +0100, Aurelien Jarno wrote:
> >On 2017-01-19 01:53, Cyril Brulebois wrote:
> >
> >> It's been a while since I last looked at/understood mklibs stuff though,
> >> feel free to fix my suspicions/conclusions.
> >
> >The long term solution is to package all the libraries into udeb
> >packages. That way we can simply get rid of the mklibs pass.
> >
> >The workaround are to make sure the chroots are up-to-date (which should
> >be the case now on the build daemons). An other alternative would be to
> >avoid copying a library in mklibs if it is already present in the image.
> >That might break if some very strict dependencies are used, though
> >I guess the way the udebs are downloaded, they should always have the
> >same or a newer version than in the chroot.
> 
> Thanks for the explanation - it's appreciated!

Yeah, thanks for the confirmation.

> Is there anything we could do to fail the build if versions are out of
> sync, rather than let a broken build through?

Well, I think Aurélien mentioned it: ensure chroots are up-to-date.
Tweaking the buildscript might do the trick, I suppose. AFAIUI, the
build isn't broken every time there's a divergence in versions anyway;
you're sometimes (un)lucky.

Can't really devote time right now to investigating the “let's not copy
stuff over if it's already present” suggestion…


KiBi.


signature.asc
Description: Digital signature


Bug#851790: installation-reports: DNS not working

2017-01-19 Thread Steve McIntyre
On Thu, Jan 19, 2017 at 08:57:54AM +0100, Aurelien Jarno wrote:
>On 2017-01-19 01:53, Cyril Brulebois wrote:
>
>> It's been a while since I last looked at/understood mklibs stuff though,
>> feel free to fix my suspicions/conclusions.
>
>The long term solution is to package all the libraries into udeb
>packages. That way we can simply get rid of the mklibs pass.
>
>The workaround are to make sure the chroots are up-to-date (which should
>be the case now on the build daemons). An other alternative would be to
>avoid copying a library in mklibs if it is already present in the image.
>That might break if some very strict dependencies are used, though
>I guess the way the udebs are downloaded, they should always have the
>same or a newer version than in the chroot.

Thanks for the explanation - it's appreciated!

Is there anything we could do to fail the build if versions are out of
sync, rather than let a broken build through?

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
< Aardvark> I dislike C++ to start with. C++11 just seems to be
handing rope-creating factories for users to hang multiple
instances of themselves.



Bug#851790: installation-reports: DNS not working

2017-01-19 Thread Aurelien Jarno
On 2017-01-19 01:53, Cyril Brulebois wrote:
> Cyril Brulebois  (2017-01-19):
> > Summing up things from IRC:
> >  - in an uptodate sid chroot (both development version and minimal one
> >with daily-build script): no DNS issues with the generated mini.iso
> >(amd64, tested within stable's kvm on amd64). My image was also
> >tested successfully by Steve, so not a setup issue.
> > 
> >  - I can reproduce the issue with dailies from 2017-01-18, and from
> >2017-01-19 (I picked it in advance during its build on barriere).
> > 
> >  - I can reproduce the issue in the above mentioned sid chroot if
> >I downgrade these packages to testing's version (-9 → -8):
> >  sudo apt-get install libc6:amd64=2.24-8 libc6-dev:amd64=2.24-8 
> > libc-dev-bin=2.24-8
> > 
> >  - The older set of libc* packages is installed in barriere's current
> >amd64 sid chroot, too.
> > 
> >  - Steve could only produce broken images when building locally, until
> >he upgraded his libc* packages as well.
> 
> And since Steve was wondering how we could release something for Stretch
> RC 1 with that… I've just confirmed that using -8 .deb with a -8 .udeb
> (reversioned as -9+hack so that its being dropped under localudebs makes
> it take precedence over sid's .udeb) leads to an image that's working
> fine. And AFAICT that's the combination we had at the time.
> 
> I suspect the implementation change through the following patch in -9:
> glibc-2.24/debian/patches/any/cvs-resolv-internal-qtype.diff

Indeed, I haven't done any test, but looking at the code, it shows that
the changes assume that libresolv.so.2 and libnss_dns.so.2 stay in sync.

> plus 283e7a294275a7da53258600deaaafbbec6b96c1 in debian-installer.git is
> what is triggering the issue? (Explaining why host's libc .deb have an
> impact on the built images.)

Indeed this has been done to avoid crashes when libc6-udeb is
re-installed at the beginning if the installer process, often causing
crashes in case of version mismatch (because the libnss* libraries were
in a different package).

Now libc6-udeb is unpacked while building the image, but given we still
have half a dozen of libraries without the corresponding udeb,
mklibs copy them on the image, as well as the dependencies, which
presumably includes libresolv.so.2.

> It's been a while since I last looked at/understood mklibs stuff though,
> feel free to fix my suspicions/conclusions.

The long term solution is to package all the libraries into udeb
packages. That way we can simply get rid of the mklibs pass.

The workaround are to make sure the chroots are up-to-date (which should
be the case now on the build daemons). An other alternative would be to
avoid copying a library in mklibs if it is already present in the image.
That might break if some very strict dependencies are used, though
I guess the way the udebs are downloaded, they should always have the
same or a newer version than in the chroot.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


signature.asc
Description: PGP signature


Bug#851790: installation-reports: DNS not working

2017-01-18 Thread Cyril Brulebois
Cyril Brulebois  (2017-01-19):
> Summing up things from IRC:
>  - in an uptodate sid chroot (both development version and minimal one
>with daily-build script): no DNS issues with the generated mini.iso
>(amd64, tested within stable's kvm on amd64). My image was also
>tested successfully by Steve, so not a setup issue.
> 
>  - I can reproduce the issue with dailies from 2017-01-18, and from
>2017-01-19 (I picked it in advance during its build on barriere).
> 
>  - I can reproduce the issue in the above mentioned sid chroot if
>I downgrade these packages to testing's version (-9 → -8):
>  sudo apt-get install libc6:amd64=2.24-8 libc6-dev:amd64=2.24-8 
> libc-dev-bin=2.24-8
> 
>  - The older set of libc* packages is installed in barriere's current
>amd64 sid chroot, too.
> 
>  - Steve could only produce broken images when building locally, until
>he upgraded his libc* packages as well.

And since Steve was wondering how we could release something for Stretch
RC 1 with that… I've just confirmed that using -8 .deb with a -8 .udeb
(reversioned as -9+hack so that its being dropped under localudebs makes
it take precedence over sid's .udeb) leads to an image that's working
fine. And AFAICT that's the combination we had at the time.

I suspect the implementation change through the following patch in -9:
glibc-2.24/debian/patches/any/cvs-resolv-internal-qtype.diff

plus 283e7a294275a7da53258600deaaafbbec6b96c1 in debian-installer.git is
what is triggering the issue? (Explaining why host's libc .deb have an
impact on the built images.)

It's been a while since I last looked at/understood mklibs stuff though,
feel free to fix my suspicions/conclusions.


KiBi.


signature.asc
Description: Digital signature


Bug#851790: installation-reports: DNS not working

2017-01-18 Thread Cyril Brulebois
Cyril Brulebois  (2017-01-19):
> You can try downgrading the kernel, but I'd usually look at glibc first
> for DNS related issues.

Summing up things from IRC:
 - in an uptodate sid chroot (both development version and minimal one
   with daily-build script): no DNS issues with the generated mini.iso
   (amd64, tested within stable's kvm on amd64). My image was also
   tested successfully by Steve, so not a setup issue.

 - I can reproduce the issue with dailies from 2017-01-18, and from
   2017-01-19 (I picked it in advance during its build on barriere).

 - I can reproduce the issue in the above mentioned sid chroot if
   I downgrade these packages to testing's version (-9 → -8):
 sudo apt-get install libc6:amd64=2.24-8 libc6-dev:amd64=2.24-8 
libc-dev-bin=2.24-8

 - The older set of libc* packages is installed in barriere's current
   amd64 sid chroot, too.

 - Steve could only produce broken images when building locally, until
   he upgraded his libc* packages as well.


→ Copying glibc@packages for the time being, even I suspect this will
  likely disappear entirely once -9 propagates…


KiBi.


signature.asc
Description: Digital signature


Bug#851790: installation-reports: DNS not working

2017-01-18 Thread Cyril Brulebois
Steve McIntyre  (2017-01-18):
> For the sake of completeness, I can confirm that the same problem
> shows up when running on a different network too. I also see this
> using the oldest amd64 daily I can grab (from 2017-01-17) which has
> the 4.9 kernel too.
> 
> I'll see if I can debug this, but I'm not sure where to look straight
> away.

You can try downgrading the kernel, but I'd usually look at glibc first
for DNS related issues.


KiBi.


signature.asc
Description: Digital signature


Bug#851790: installation-reports: DNS not working

2017-01-18 Thread Steve McIntyre
On Wed, Jan 18, 2017 at 06:43:33PM +, Wookey wrote:
>Package: installation-reports
>Severity: grave
>Tags: d-i
>Justification: renders package unusable
>
>Dear Maintainer,
>
>The current installer, with the new 4.9 kernel, is unable to resolve
>domains, so is quite seriously broken.

hosts, not domains, but yes...

>This was noted during install on an arm64 gigabyte MP30-AR1
>desktop/server, when choose-mirror failed, but it soon became clear
>that DNS was not working.
>
>Testing on an x86 VM with the same daily image (18th Jan 2017) found
>the same problem. Going back to the rc1 installer image (4.8 kernel)
>it works OK.
>
>Tests showed that the network came up fine and things are pingable by IP, but 
>not name:
># ping wookware.org   
>ping: bad address 'wookware.org'
># ping 93.93.131.118
>PING 93.93.131.118 (93.93.131.118): 56 data bytes
>64 bytes from 93.93.131.118: seq=0 ttl=50 time=19.892 ms
>
>similarly the failing line from the choose-mirror log works if an address is 
>inserted:
>Jan 18 17:04:11 choose-mirror[31201]: DEBUG: command: wget --no-verbose 
>http://debian-mirror.cambridge.arm.com/debian/dists/stretch/Release -O - | 
>grep -E '^(Suite|Codename|Architectures):' 
>  
>Jan 18 17:04:11 choose-mirror[31201]: WARNING **: mirror does not support the 
>specified release (stretch) 
>
># wget --no-verbose 
>http://debian-mirror.cambridge.arm.com/debian/dists/stretch/Release -O - | 
>grep -E '^(Suite|Codename|Architectures):'
>wget: unable to resolve host address 'debian-mirror.cambridge.arm.com'
>
># wget --no-verbose http://10.1.194.51/debian/dists/stretch/Release -O - | gre
>p -E '^(Suite|Codename|Architectures):'
>Suite: testing
>Codename: stretch
>Architectures: amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x
>2017-01-18 17:47:12 URL:http://10.1.194.51/debian/dists/stretch/Release 
>[177979/177979] -> "-" [1]
>
>resolv.conf is as expected:
>search cambridge.arm.com
>nameserver 10.1.2.24
>nameserver 10.1.2.23
>
>(adding nameserver 8.8.8.8 makes no difference)
>
>Watching packets go by when doing a VM install it is clear that the
>local DNS server returns the correct response, but this is being ignored
>or lost by the D-I initrd.

For the sake of completeness, I can confirm that the same problem
shows up when running on a different network too. I also see this
using the oldest amd64 daily I can grab (from 2017-01-17) which has
the 4.9 kernel too.

I'll see if I can debug this, but I'm not sure where to look straight
away.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
"Managing a volunteer open source project is a lot like herding
 kittens, except the kittens randomly appear and disappear because they
 have day jobs." -- Matt Mackall



Bug#851790: installation-reports: DNS not working

2017-01-18 Thread Wookey
Package: installation-reports
Severity: grave
Tags: d-i
Justification: renders package unusable

Dear Maintainer,

The current installer, with the new 4.9 kernel, is unable to resolve
domains, so is quite seriously broken.

This was noted during install on an arm64 gigabyte MP30-AR1
desktop/server, when choose-mirror failed, but it soon became clear
that DNS was not working.

Testing on an x86 VM with the same daily image (18th Jan 2017) found
the same problem. Going back to the rc1 installer image (4.8 kernel)
it works OK.

Tests showed that the network came up fine and things are pingable by IP, but 
not name:
# ping wookware.org   
ping: bad address 'wookware.org'
# ping 93.93.131.118
PING 93.93.131.118 (93.93.131.118): 56 data bytes
64 bytes from 93.93.131.118: seq=0 ttl=50 time=19.892 ms

similarly the failing line from the choose-mirror log works if an address is 
inserted:
Jan 18 17:04:11 choose-mirror[31201]: DEBUG: command: wget --no-verbose 
http://debian-mirror.cambridge.arm.com/debian/dists/stretch/Release -O - | grep 
-E '^(Suite|Codename|Architectures):'   

Jan 18 17:04:11 choose-mirror[31201]: WARNING **: mirror does not support the 
specified release (stretch) 

# wget --no-verbose 
http://debian-mirror.cambridge.arm.com/debian/dists/stretch/Release -O - | grep 
-E '^(Suite|Codename|Architectures):'
wget: unable to resolve host address 'debian-mirror.cambridge.arm.com'

# wget --no-verbose http://10.1.194.51/debian/dists/stretch/Release -O - | gre
p -E '^(Suite|Codename|Architectures):'
Suite: testing
Codename: stretch
Architectures: amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x
2017-01-18 17:47:12 URL:http://10.1.194.51/debian/dists/stretch/Release 
[177979/177979] -> "-" [1]

resolv.conf is as expected:
search cambridge.arm.com
nameserver 10.1.2.24
nameserver 10.1.2.23

(adding nameserver 8.8.8.8 makes no difference)

Watching packets go by when doing a VM install it is clear that the
local DNS server returns the correct response, but this is being ignored
or lost by the D-I initrd.

attached is an strace of strace ping wookware.org > /tmp/tracelog 2>&1

That gets a response from the server OK, but then goes on to ask the
other one. a working strace then uses the provided IP address. So
there is nothing obviously going wrong there.

-- Package-specific info:

Boot method: USB
Image version: 
http://gemmei.acc.umu.se/cdimage/daily-builds/daily/arch-latest/arm64/iso-cd/debian-testing-arm64-netinst.iso
Date: 

Machine: Gigabyte MP30-AR1, (and x86 VM)
Partitions: (default guided LVM, with separate /home chosen)


Base System Installation Checklist:
[O] = OK, [E] = Error (please elaborate below), [ ] = didn't try it

Initial boot:   [O]
Detect network card:[O]
Configure network:  [O]
Detect CD:  [O]
Load installer modules: [O]
Clock/timezone setup:   [O]
User/password setup:[O]
Detect hard drives: [O]
Partition hard drives:  [O]
Install base system:[E]
Install tasks:  [ ]
Install boot loader:[ ]
Overall install:[ ]

Comments/Problems:

Worked as expected until DNS needed

-- 

Please make sure that the hardware-summary log file, and any other
installation logs that you think would be useful are attached to this
report. Please compress large files using gzip.

Once you have filled out this report, mail it to sub...@bugs.debian.org.

removed automatic info as this report written on different machine fro install.
execve("/bin/ping", ["ping", "wookware.org"], [/* 14 vars */]) = 0
brk(NULL)   = 0xc9893000
faccessat(AT_FDCWD, "/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x86677000
faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such 
file or directory)
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/tls/aarch64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/lib/aarch64-linux-gnu/tls/aarch64", 0xc66b2660, 0) = 
-1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = 
-1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/lib/aarch64-linux-gnu/tls", 0xc66b2660, 0) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/aarch64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/lib/aarch64-linux-gnu/aarch64", 0xc66b2660, 0) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/lib/aarch64-linux-gnu", {st_mode=S_IFDIR|0755, 
st_size=360, ...}, 0) = 0
openat(AT_FDCWD,