Hi Nikolay,

Thanks for your reply on this.

Checked the stack trace for many of the stuck threads. Looks like all
of them are stuck in this loop...
[<ffffffff810031f2>] exit_to_usermode_loop+0x72/0xd0
[<ffffffff81003c16>] prepare_exit_to_usermode+0x26/0x30
[<ffffffff818390e5>] retint_user+0x8/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

Seems like it is stuck in a tight loop in exit_to_usermode_loop.
FWIW, we started seeing this issue with nodatacow btrfs mount option.
Previously we were running with default option of datacow.
However, this also coincides with fairly heavy unlink load that we've
been putting the system under.

Please let me know if there is anything else you can think of, based
on the above data.

Regards,
Shyam


On Thu, Mar 15, 2018 at 12:59 PM, Nikolay Borisov <nbori...@suse.com> wrote:
>
>
> On 15.03.2018 09:23, Shyam Prasad N wrote:
>> Hi,
>>
>> Our servers run some daemons that are scheduled to run many real time
>> threads. These threads serve the client nodes by performing I/O on top
>> of some set of disks, configured as DRBD pairs with disks on other
>> peer servers for high availability of data. Btrfs is the filesystem
>> that is configured on top of DRBD.
>>
>> While testing high availability with fairly high load, we have noticed
>> the following behaviour a couple of times: When the server which was
>> killed comes back up and gets ready and DRBD disks start syncing the
>> data between the disks, a performance hit is generally expected at the
>> peer node which has taken over the service now. However, the real time
>> threads (mentioned above) on the active node are hogging the CPUs. As
>> a part of debugging the issue, we tried to force a core dump on these
>> threads by using a SIGABRT. However, these threads were not responding
>> to any signals. Only after using real-time throttling (to reduce real
>> time CPU usage to 50%), and waiting around for a few minutes, we were
>> able to force a core dump. However, the corefile generated didn't have
>> much useful info (I think it was a partial/corrupted core dump).
>>
>> Based on the above behaviour, (signals not being picked up), it looks
>> to me like all these threads were likely stuck inside some system
>> call. And since majority of the system calls by these threads are VFS
>> calls on btrfs, I feel that these threads may have been stuck in some
>> I/O. Specifically, based on the CPU usage, in some spinlock (I'm open
>> to suggestions of other possibilities). And this is the reason I'm
>> posting on this mailing list.
>
> When you have a bunch of those threads get a dump of the stacks of all
> sleeping tasks by "echo w > /proc/sysrq-trigger" .
>
>>
>> Is there a known bug which might have caused this? Kernel version
>> we're using is 4.4.0.
>
> This is rather old kernel, you should at least be using latest 4.4.y
> stable kernel. BTRFS is a moving target and there are a lot of
> improvements made every release. So I'd suggest to try 4.14 at least on
> one offending machine.
>
>> If we go for a kernel upgrade, what are the chances of not seeing this
>> behaviour again?
>>
>> Or is my analysis of the problem entirely wrong? My feeling is that
>> this maybe some issue with using Btrfs when it doesn't get a response
>> from DRBD quickly enough.
>
> Feelings don't count for anything. Next time this happens extract
> stacktrace from the offending threads i.e. smapling /proc/[pid of
> hogging thread]/stack. Furthermore, if we assume that btrfs is indeed
> not getting responses fast enough this means most clients should really
> be stuck in io sleep and not doing any processing.
>
>
>> Because we have been using ext4 on top of DRBD for a long time, and
>> have never seen such issues during HA tests there.
>>



-- 
-Shyam
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to