Bug#778720: bind9: hangs / crashes on mips after some time (Was: Bug#839010: bug still there in jessie (mips))
Control: retitle -1 bind9: hangs / crashes on mips after some time Control: severity -1 grave [Moving to the main MIPS bug] Hi, On 12/04/17 15:37, James Cowgill wrote: > On 11/04/17 17:46, Steve Arnold wrote: >> On Wed, 5 Apr 2017 21:11:43 +0100 >> James Cowgill wrote: >>> On 05/04/17 20:31, Steve Arnold wrote: This is still a problem for mips/mipsel but stretch has the upstream fixes. Can you please add the stretch bind9 packages to jessie-backports? I'm building it now on edgerouter (albeit slowly) but a lot of other people running on this hardware could benefit from the fixes. Thanks in advance... >>> >>> This bug should already be fixed in jessie. Do you have the latest >>> version from jessie-security (1:9.9.5.dfsg-9+deb8u10)? >> >> That version has the worst of it, at least it's not consistent when >> it fails (different file names, etc). After updating all the way >> to 9.10.4-P5 (plus bumping libdb) it still has the INSIST failure; >> it just takes a few more hours before it dies: >> >> 11-Apr-2017 05:41:03.304 general: >> critical: ../../../lib/dns/rbtdb.c:9788: >> INSIST((rbtdb->rdatasets[header->node->locknum]).head != (header)) >> failed 11-Apr-2017 05:41:03.305 general: critical: exiting (due to >> assertion failure) > > I've managed to reproduce this fairly reliably (usually within a minute) > by sending massive amounts of DNS queries to bind9. > > The only MIPS specific bug I am aware of is #778720 which might be > causing this. There is a patch here which you can try and I'll also have > a look and see if I can fix it: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778720#15 > > If that's not the cause, I'll open a new bug for this. As discussed in #839010 and after looking at this a bit more, I think this bug should be release-critical for stretch. Although I can get bind9 to start working on MIPS, it usually crashes after a few minutes with it more likely to crash under load. The bug is related to atomics issues in libisc on MIPS. There was a patch in this bug to fix the MIPS assembly bits, but while that does fix the hangs it isn't enough to fix the crashes. I created a test program which tests the rwlock from libisc and managed to get a situation where 2 threads were both inside the same exclusive lock which obviously should not happen. I expect the weak memory ordering of MIPS has something to do with it (which would explain why this doesn't happen on x86 and other arches). I'll look at this over the coming week, but as a workaround, passing "--disable-atomic" on MIPS does resolve the crashes presumably at a performance cost. I also note that the arm* and s390x have no atomics code at all so it's not like MIPS will be the only arch without atomics support. Thanks, James signature.asc Description: OpenPGP digital signature
Bug#839010: bug still there in jessie (mips)
Hi, On 11/04/17 17:46, Steve Arnold wrote: > On Wed, 5 Apr 2017 21:11:43 +0100 > James Cowgill wrote: > >> Hi, >> >> On 05/04/17 20:31, Steve Arnold wrote: >>> This is still a problem for mips/mipsel but stretch has the >>> upstream fixes. Can you please add the stretch bind9 packages >>> to jessie-backports? I'm building it now on edgerouter (albeit >>> slowly) but a lot of other people running on this hardware could >>> benefit from the fixes. >>> >>> Thanks in advance... >> >> This bug should already be fixed in jessie. Do you have the latest >> version from jessie-security (1:9.9.5.dfsg-9+deb8u10)? > > That version has the worst of it, at least it's not consistent when > it fails (different file names, etc). After updating all the way > to 9.10.4-P5 (plus bumping libdb) it still has the INSIST failure; > it just takes a few more hours before it dies: > > 11-Apr-2017 05:41:03.304 general: > critical: ../../../lib/dns/rbtdb.c:9788: > INSIST((rbtdb->rdatasets[header->node->locknum]).head != (header)) > failed 11-Apr-2017 05:41:03.305 general: critical: exiting (due to > assertion failure) I've managed to reproduce this fairly reliably (usually within a minute) by sending massive amounts of DNS queries to bind9. The only MIPS specific bug I am aware of is #778720 which might be causing this. There is a patch here which you can try and I'll also have a look and see if I can fix it: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778720#15 If that's not the cause, I'll open a new bug for this. > It does not fail like this on x86 or ARM, but the other hosts I > have to compare are running Gentoo (mainly hardened profile) and we > still provide libdb4.8 (which is what bind links against on Gentoo > instead of libdb5.3-whatever on debian). #778720 is Debian specific which would explain why Gentoo works fine (even on MIPS). Thanks, James signature.asc Description: OpenPGP digital signature
Bug#839010: bug still there in jessie (mips)
On Wed, 5 Apr 2017 21:11:43 +0100 James Cowgill wrote: > Hi, > > On 05/04/17 20:31, Steve Arnold wrote: > > This is still a problem for mips/mipsel but stretch has the > > upstream fixes. Can you please add the stretch bind9 packages > > to jessie-backports? I'm building it now on edgerouter (albeit > > slowly) but a lot of other people running on this hardware could > > benefit from the fixes. > > > > Thanks in advance... > > This bug should already be fixed in jessie. Do you have the latest > version from jessie-security (1:9.9.5.dfsg-9+deb8u10)? That version has the worst of it, at least it's not consistent when it fails (different file names, etc). After updating all the way to 9.10.4-P5 (plus bumping libdb) it still has the INSIST failure; it just takes a few more hours before it dies: 11-Apr-2017 05:41:03.304 general: critical: ../../../lib/dns/rbtdb.c:9788: INSIST((rbtdb->rdatasets[header->node->locknum]).head != (header)) failed 11-Apr-2017 05:41:03.305 general: critical: exiting (due to assertion failure) It does not fail like this on x86 or ARM, but the other hosts I have to compare are running Gentoo (mainly hardened profile) and we still provide libdb4.8 (which is what bind links against on Gentoo instead of libdb5.3-whatever on debian). If you have an easy way to downgrade libdb or some other fix for bind, that would be awesome, otherwise I am running out of ideas... Thanks, Steve pgpsgxozJHe7c.pgp Description: OpenPGP digital signature
Bug#839010: bug still there in jessie (mips)
Hi, On 05/04/17 20:31, Steve Arnold wrote: > This is still a problem for mips/mipsel but stretch has the > upstream fixes. Can you please add the stretch bind9 packages to > jessie-backports? I'm building it now on edgerouter (albeit > slowly) but a lot of other people running on this hardware could > benefit from the fixes. > > Thanks in advance... This bug should already be fixed in jessie. Do you have the latest version from jessie-security (1:9.9.5.dfsg-9+deb8u10)? James signature.asc Description: OpenPGP digital signature
Bug#839010: bug still there in jessie (mips)
This is still a problem for mips/mipsel but stretch has the upstream fixes. Can you please add the stretch bind9 packages to jessie-backports? I'm building it now on edgerouter (albeit slowly) but a lot of other people running on this hardware could benefit from the fixes. Thanks in advance... -- Stephen L. Arnold Principal Scientist / System Architect sarn...@vctlabs.com Vanguard Computer Technology Labs, Inc. http://www.vctlabs.com 81 David Love Pl #212mobile: (805) 863-8299 Goleta, CA 93117lab: (805) 683-3503