Bug#778720: bind9: hangs / crashes on mips after some time (Was: Bug#839010: bug still there in jessie (mips))

2017-04-17 Thread James Cowgill
Control: retitle -1 bind9: hangs / crashes on mips after some time
Control: severity -1 grave

[Moving to the main MIPS bug]

Hi,

On 12/04/17 15:37, James Cowgill wrote:
> On 11/04/17 17:46, Steve Arnold wrote:
>> On Wed, 5 Apr 2017 21:11:43 +0100
>> James Cowgill  wrote:
>>> On 05/04/17 20:31, Steve Arnold wrote:
 This is still a problem for mips/mipsel but stretch has the
 upstream fixes.  Can you please add the stretch bind9 packages
 to jessie-backports?  I'm building it now on edgerouter (albeit
 slowly) but a lot of other people running on this hardware could
 benefit from the fixes.

 Thanks in advance...  
>>>
>>> This bug should already be fixed in jessie. Do you have the latest
>>> version from jessie-security (1:9.9.5.dfsg-9+deb8u10)?
>>
>> That version has the worst of it, at least it's not consistent when
>> it fails (different file names, etc).  After updating all the way
>> to 9.10.4-P5 (plus bumping libdb) it still has the INSIST failure;
>> it just takes a few more hours before it dies:
>>
>> 11-Apr-2017 05:41:03.304 general:
>> critical: ../../../lib/dns/rbtdb.c:9788:
>> INSIST((rbtdb->rdatasets[header->node->locknum]).head != (header))
>> failed 11-Apr-2017 05:41:03.305 general: critical: exiting (due to
>> assertion failure)
> 
> I've managed to reproduce this fairly reliably (usually within a minute)
> by sending massive amounts of DNS queries to bind9.
> 
> The only MIPS specific bug I am aware of is #778720 which might be
> causing this. There is a patch here which you can try and I'll also have
> a look and see if I can fix it:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778720#15
> 
> If that's not the cause, I'll open a new bug for this.

As discussed in #839010 and after looking at this a bit more, I think
this bug should be release-critical for stretch. Although I can get
bind9 to start working on MIPS, it usually crashes after a few minutes
with it more likely to crash under load.

The bug is related to atomics issues in libisc on MIPS. There was a
patch in this bug to fix the MIPS assembly bits, but while that does fix
the hangs it isn't enough to fix the crashes. I created a test program
which tests the rwlock from libisc and managed to get a situation where
2 threads were both inside the same exclusive lock which obviously
should not happen. I expect the weak memory ordering of MIPS has
something to do with it (which would explain why this doesn't happen on
x86 and other arches).

I'll look at this over the coming week, but as a workaround, passing
"--disable-atomic" on MIPS does resolve the crashes presumably at a
performance cost. I also note that the arm* and s390x have no atomics
code at all so it's not like MIPS will be the only arch without atomics
support.

Thanks,
James



signature.asc
Description: OpenPGP digital signature


Bug#839010: bug still there in jessie (mips)

2017-04-12 Thread James Cowgill
Hi,

On 11/04/17 17:46, Steve Arnold wrote:
> On Wed, 5 Apr 2017 21:11:43 +0100
> James Cowgill  wrote:
> 
>> Hi,
>>
>> On 05/04/17 20:31, Steve Arnold wrote:
>>> This is still a problem for mips/mipsel but stretch has the
>>> upstream fixes.  Can you please add the stretch bind9 packages
>>> to jessie-backports?  I'm building it now on edgerouter (albeit
>>> slowly) but a lot of other people running on this hardware could
>>> benefit from the fixes.
>>>
>>> Thanks in advance...  
>>
>> This bug should already be fixed in jessie. Do you have the latest
>> version from jessie-security (1:9.9.5.dfsg-9+deb8u10)?
> 
> That version has the worst of it, at least it's not consistent when
> it fails (different file names, etc).  After updating all the way
> to 9.10.4-P5 (plus bumping libdb) it still has the INSIST failure;
> it just takes a few more hours before it dies:
> 
> 11-Apr-2017 05:41:03.304 general:
> critical: ../../../lib/dns/rbtdb.c:9788:
> INSIST((rbtdb->rdatasets[header->node->locknum]).head != (header))
> failed 11-Apr-2017 05:41:03.305 general: critical: exiting (due to
> assertion failure)

I've managed to reproduce this fairly reliably (usually within a minute)
by sending massive amounts of DNS queries to bind9.

The only MIPS specific bug I am aware of is #778720 which might be
causing this. There is a patch here which you can try and I'll also have
a look and see if I can fix it:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778720#15

If that's not the cause, I'll open a new bug for this.

> It does not fail like this on x86 or ARM, but the other hosts I
> have to compare are running Gentoo (mainly hardened profile) and we
> still provide libdb4.8 (which is what bind links against on Gentoo
> instead of libdb5.3-whatever on debian).

#778720 is Debian specific which would explain why Gentoo works fine
(even on MIPS).

Thanks,
James



signature.asc
Description: OpenPGP digital signature


Bug#839010: bug still there in jessie (mips)

2017-04-11 Thread Steve Arnold
On Wed, 5 Apr 2017 21:11:43 +0100
James Cowgill  wrote:

> Hi,
> 
> On 05/04/17 20:31, Steve Arnold wrote:
> > This is still a problem for mips/mipsel but stretch has the
> > upstream fixes.  Can you please add the stretch bind9 packages
> > to jessie-backports?  I'm building it now on edgerouter (albeit
> > slowly) but a lot of other people running on this hardware could
> > benefit from the fixes.
> > 
> > Thanks in advance...  
> 
> This bug should already be fixed in jessie. Do you have the latest
> version from jessie-security (1:9.9.5.dfsg-9+deb8u10)?

That version has the worst of it, at least it's not consistent when
it fails (different file names, etc).  After updating all the way
to 9.10.4-P5 (plus bumping libdb) it still has the INSIST failure;
it just takes a few more hours before it dies:

11-Apr-2017 05:41:03.304 general:
critical: ../../../lib/dns/rbtdb.c:9788:
INSIST((rbtdb->rdatasets[header->node->locknum]).head != (header))
failed 11-Apr-2017 05:41:03.305 general: critical: exiting (due to
assertion failure)

It does not fail like this on x86 or ARM, but the other hosts I
have to compare are running Gentoo (mainly hardened profile) and we
still provide libdb4.8 (which is what bind links against on Gentoo
instead of libdb5.3-whatever on debian).

If you have an easy way to downgrade libdb or some other fix for
bind, that would be awesome, otherwise I am running out of ideas...

Thanks, Steve



pgpsgxozJHe7c.pgp
Description: OpenPGP digital signature


Bug#839010: bug still there in jessie (mips)

2017-04-05 Thread James Cowgill
Hi,

On 05/04/17 20:31, Steve Arnold wrote:
> This is still a problem for mips/mipsel but stretch has the
> upstream fixes.  Can you please add the stretch bind9 packages to
> jessie-backports?  I'm building it now on edgerouter (albeit
> slowly) but a lot of other people running on this hardware could
> benefit from the fixes.
> 
> Thanks in advance...

This bug should already be fixed in jessie. Do you have the latest
version from jessie-security (1:9.9.5.dfsg-9+deb8u10)?

James




signature.asc
Description: OpenPGP digital signature


Bug#839010: bug still there in jessie (mips)

2017-04-05 Thread Steve Arnold
This is still a problem for mips/mipsel but stretch has the
upstream fixes.  Can you please add the stretch bind9 packages to
jessie-backports?  I'm building it now on edgerouter (albeit
slowly) but a lot of other people running on this hardware could
benefit from the fixes.

Thanks in advance...

-- 
Stephen L. Arnold
Principal Scientist / System Architect   sarn...@vctlabs.com
Vanguard Computer Technology Labs, Inc.   http://www.vctlabs.com
81 David Love Pl #212mobile:  (805) 863-8299
Goleta, CA 93117lab:  (805) 683-3503