Control: retitle -1 bind9: hangs / crashes on mips after some time
Control: severity -1 grave

[Moving to the main MIPS bug]

Hi,

On 12/04/17 15:37, James Cowgill wrote:
> On 11/04/17 17:46, Steve Arnold wrote:
>> On Wed, 5 Apr 2017 21:11:43 +0100
>> James Cowgill <jcowg...@debian.org> wrote:
>>> On 05/04/17 20:31, Steve Arnold wrote:
>>>> This is still a problem for mips/mipsel but stretch has the
>>>> upstream fixes.  Can you please add the stretch bind9 packages
>>>> to jessie-backports?  I'm building it now on edgerouter (albeit
>>>> slowly) but a lot of other people running on this hardware could
>>>> benefit from the fixes.
>>>>
>>>> Thanks in advance...  
>>>
>>> This bug should already be fixed in jessie. Do you have the latest
>>> version from jessie-security (1:9.9.5.dfsg-9+deb8u10)?
>>
>> That version has the worst of it, at least it's not consistent when
>> it fails (different file names, etc).  After updating all the way
>> to 9.10.4-P5 (plus bumping libdb) it still has the INSIST failure;
>> it just takes a few more hours before it dies:
>>
>> 11-Apr-2017 05:41:03.304 general:
>> critical: ../../../lib/dns/rbtdb.c:9788:
>> INSIST((rbtdb->rdatasets[header->node->locknum]).head != (header))
>> failed 11-Apr-2017 05:41:03.305 general: critical: exiting (due to
>> assertion failure)
> 
> I've managed to reproduce this fairly reliably (usually within a minute)
> by sending massive amounts of DNS queries to bind9.
> 
> The only MIPS specific bug I am aware of is #778720 which might be
> causing this. There is a patch here which you can try and I'll also have
> a look and see if I can fix it:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778720#15
> 
> If that's not the cause, I'll open a new bug for this.

As discussed in #839010 and after looking at this a bit more, I think
this bug should be release-critical for stretch. Although I can get
bind9 to start working on MIPS, it usually crashes after a few minutes
with it more likely to crash under load.

The bug is related to atomics issues in libisc on MIPS. There was a
patch in this bug to fix the MIPS assembly bits, but while that does fix
the hangs it isn't enough to fix the crashes. I created a test program
which tests the rwlock from libisc and managed to get a situation where
2 threads were both inside the same exclusive lock which obviously
should not happen. I expect the weak memory ordering of MIPS has
something to do with it (which would explain why this doesn't happen on
x86 and other arches).

I'll look at this over the coming week, but as a workaround, passing
"--disable-atomic" on MIPS does resolve the crashes presumably at a
performance cost. I also note that the arm* and s390x have no atomics
code at all so it's not like MIPS will be the only arch without atomics
support.

Thanks,
James

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to