Re: Linux futex_wait() bug... [Yes. You read that right. UPDATE to LATEST PATCHES NOW].

Longchao Dong Wed, 15 Feb 2017 17:25:27 -0800

In fact, I met a very strange problem.
My c ++ program now calls HDFS‘s interfaces via jni, but they are all
blocked by  the same java object lock. I obtained the state of the process
by jstack. All threads are waiting to lock the object(0x00000006b30b3be8),
but no thread is holding it. Does anybody have clues?
The attachment is the output of jstack at that time.


On Wed, Feb 15, 2017 at 11:45 PM, Gil Tene <g...@azul.com> wrote:

> Don't know if this is the same bug. RHEL 7 kernel included fixes for this
> since some time in 2015.
>
> While one of my first courses of action when I see a suspicious FUTEX_WAIT
> hang situation is still to check kernel versions to rules this out (since
> this bug has wasted us a bunch of time in the past), keep in mind that not
> all things stuck in FUTEX_WAIT are futex_wait kernel bugs. The most likely
> explanations are usually actual application logic bugs involving actual
> deadlock or starvation.
>
> Does attaching and detaching from the process with gdb move it forward?
> [the original bug was missing the wakeup, and an attach/detach would "kick"
> the futex out of its slumber once]
>
> On Wednesday, February 15, 2017 at 6:33:45 AM UTC-8, Will Foster wrote:
>>
>>
>>
>> On Tuesday, February 14, 2017 at 4:01:52 PM UTC, Allen Reese wrote:
>>>
>>> This bug report seems to have a way to reproduce it:
>>> https://bugs.centos.org/view.php?id=8371
>>>
>>> Hope that helps.
>>>
>>> --Allen Reese
>>>
>>>
>>
>> I also see this on latest CentOS7.3 with Logstash, I've disabled huge
>> pages via
>> transparent_hugepage=never
>>
>> in grub.
>>
>> Here's what I get from strace against logstash (never fully comes up to
>> listen on TCP/5044)
>>
>> [root@host-01 ~]# strace -p 1292
>> Process 1292 attached
>> futex(0x7f80eff8a9d0, FUTEX_WAIT, 1312, NULL
>>
>>
>> I am hitting this issue on Logstash 5.2.1-1 while trying to upgrade my 
>> Ansible
>> playbooks <https://github.com/sadsfae/ansible-elk/issues/16> to the
>> latest ES versions.
>>
>>
>>
>>>
>>> ------------------------------
>>> *From:* Longchao Dong <donglo...@gmail.com>
>>> *To:* mechanical-sympathy <mechanica...@googlegroups.com>
>>> *Sent:* Monday, February 13, 2017 1:55 AM
>>> *Subject:* Re: Linux futex_wait() bug... [Yes. You read that right.
>>> UPDATE to LATEST PATCHES NOW].
>>>
>>> How to reproduce this issue ? Is it possible to show us the method ? I
>>> am also working on one strange pthread_cond_wait issue, but not sure
>>> if that one is related with this issue.
>>>
>>> On Wednesday, May 20, 2015 at 8:16:12 AM UTC+8, manis...@gmail.com
>>> wrote:
>>>
>>> I bumped on this error couple of months back when using CentOS 6.6 with
>>> 32 cores Dell server. After many days of debugging, I realized it to be a
>>> CentOS 6.6 bug and moved back to 6.5 and since then no such issues have
>>> been seen.
>>> I am able to reproduce this issue in 15 minutes of heavy load on my
>>> multi threaded c  code.
>>>
>>> On Wednesday, May 13, 2015 at 3:37:32 PM UTC-7, Gil Tene wrote:
>>>
>>> We had this one bite us hard and scare the %$^! out of us, so I figured
>>> I'd share the fear...
>>>
>>> The linux futex_wait call has been broken for about a year (in upstream
>>> since 3.14, around Jan 2014), and has just recently been fixed (in upstream
>>> 3.18, around October 2014). More importantly this breakage seems to have
>>> been back ported into major distros (e.g. into RHEL 6.6 and its cousins,
>>> released in October 2014), and the fix for it has only recently been back
>>> ported (e.g. RHEL 6.6.z and cousins have the fix).
>>>
>>> The impact of this kernel bug is very simple: user processes can
>>> deadlock and hang in seemingly impossible situations. A futex wait call
>>> (and anything using a futex wait) can stay blocked forever, even though it
>>> had been properly woken up by someone. Thread.park() in Java may stay
>>> parked. Etc. If you are lucky you may also find soft lockup messages in
>>> your dmesg logs. If you are not that lucky (like us, for example), you'll
>>> spend a couple of months of someone's time trying to find the fault in your
>>> code, when there is nothing there to find.
>>>
>>> This behavior seems to regularly appear in the wild on Haswell servers
>>> (all the machines where we have had customers hit it in the field and in
>>> labs been Haswells), and since Haswell servers are basically what you get
>>> if you buy a new machine now, or run on the cool new amazon EC2/GCE/Azure
>>> stuff, you are bound to experience some interesting behavior. I don't know
>>> of anyone that will see this as a good thing for production systems. Except
>>> for maybe Netflix (maybe we should call this the linux fumonkey).
>>>
>>> The commit for the *fix* is here:  https://github.com/torvalds/
>>> linux/commit/ 76835b0ebf8a7fe85beb03c7512141 9a7dec52f0
>>> <https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0>
>>>
>>> The commit explanation says that it fixes https://github.com/torvalds/
>>> linux/commit/ b0c29f79ecea0b6fbcefc999e70f28 43ae8306db
>>> <https://github.com/torvalds/linux/commit/b0c29f79ecea0b6fbcefc999e70f2843ae8306db>
>>> (presumably the bug introduced with that change), which was made in Jan of
>>> 2014into 3.14. That 3.14 code added logic to avoid taking a lock if the
>>> code knows that there are no waiters. It documents (pretty elaborately) how
>>> "…thus preventing tasks sleeping forever if wakers don't acknowledge all
>>> possible waiters" with logic that explains how memory barriers guarantee
>>> the correct order (see paragraph at line 141), which includes the statement
>>> "this is done by the barriers in get_futex_key_refs(), through either ihold
>>> or atomic_inc, depending on the futex type." (this assumption is the actual
>>> bug). The assumption is further reinforced in the fact that the change
>>> added a comment to every calls to get_futex_key_refs() in the code that
>>> says "/* implies MB (B) */".
>>>
>>> The problem was that get_futex_key_refs() did NOT imply a memory
>>> barrier. It only included a memory barrier for two explicit cases in a
>>> switch statement that checks the futex type, but did not have a default
>>> case handler, and therefor did not apply a memory barrier for other fuxtex
>>> types. Like private futexes. Which are a very commonly used type of futex.
>>>
>>> The fix is simple, an added default case for the switch that just has an
>>> explicit smp_mb() in it. There was a missing memory barrier in the wakeup
>>> path, and now (hopefully) it's not missing any more...
>>>
>>> So lets be clear: *RHEL 6.6 (and CentOS 6.6., and Scientific Linux
>>> 6.6.) are certainly broken on Haswell servers. *It is likely that
>>> recent versions other distros are too (SLES, Ubuntu, Debia, Oracle Linux,
>>> etc.). *The good news is that fixes are out there (including 6.6.z)*.
>>> But the bad news is that there is not much chatter saying "if you have a
>>> Haswell system, get to version X now". For some reason, people seem to not
>>> have noticed this or raised the alarm. We certainly haven't seen much
>>> "INSTALL PATCHES NOW" fear mongering. And we really need it, so *I'm
>>> hoping this posting will start a panic*.
>>>
>>> Bottom line: the bug is very real, but it probably only appeared in the
>>> 3.14 upstream version (and distro versions that had backported 
>>> https://github.com/
>>> torvalds/linux/commit/ b0c29f79ecea0b6fbcefc999e70f28 43ae8306db
>>> <https://github.com/torvalds/linux/commit/b0c29f79ecea0b6fbcefc999e70f2843ae8306db>
>>> , presumably after Jan 2014). The bug was fixed in 3.18 in October 2014,
>>> but backports probably took a while (and some may still be pending). I now
>>> for a fact that RHEL 6.6.z has the fix. I don't know about other distro
>>> families and versions (yet), but if someone else does, please post
>>> (including when was it broken, and when was it fixed).
>>>
>>> Note: I would like to profusely thank @aplokhotnyuk
>>> <https://twitter.com/search?f=realtime&q=giltene%20latest%20patches&src=typd>.
>>> His tweet
>>> <https://twitter.com/search?f=realtime&q=giltene%20latest%20patches&src=typd>
>>>  originally
>>> alerted me to the bug's existence, and started us down the path of figuring
>>> out the what//why/where/when behind it. Why this is not being shouted in
>>> the streets is a mystery to me, and scary in its own right. We were lucky
>>> enough that I had a "that looks suspiciously familiar" moment when I read
>>> that tweet, and that I put 3.14 and 1.618 together and thought enough to
>>> ask "Umm... have we only been seeing this bug on Haswell servers?".
>>>
>>>
>>>
>>> Without @aplokhotnyuk's tweet we'd probably still be searching for the
>>> nonexistent bugs in our own locking code... And since the tweet originated
>>> from another discussion on this group, it presents a rare "posting and
>>> reading twitter actually helps us solve bugs sometimes" example.
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to mechanical-sympathy+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "mechanical-sympathy" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/mechanical-sympathy/QbmpZxp6C64/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Best Regards,
董隆超

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jstack.log
Description: Binary data

Re: Linux futex_wait() bug... [Yes. You read that right. UPDATE to LATEST PATCHES NOW].

Reply via email to