Public bug reported:

we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36(ubuntu package 
linux-image-generic-hwe-16.04).
We have created raid10 using 22 960GB SSDs [1] . The problem we're
experiencing is that /usr/share/mdadm/checkarray
(executed by cron, included in a mdadm pkg) results in (soft?)
deadlock - load on the node spikes up to 500-700 and all I/O operations
are blocked for a period of time. We can see traces liek these [2] in
our kernel log.

e.g. it ends up in static state like

test@os-node1:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10]
md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4] dm-19[5] 
dm-17[3]
                    dm-16[21] dm-15[20] dm-14[2] dm-13[19] dm-12[18] dm-11[17]
                    dm-10[16] dm-9[15] dm-8[14] dm-7[13] dm-6[12] dm-5[11] 
dm-4[10] dm-3[1] dm-2[0]
      10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22] 
[UUUUUUUUUUUUUUUUUUUUUU]
      [===>.................]  check = 19.0% (1965748032/10313171968) 
finish=1034728.8min speed=134K/sec
      bitmap: 0/39 pages [0KB], 131072KB chunk
unused devices: <none>

and the only solution is to hard reboot the node. What we found out is that it
doesn't happen on idle raid, we have to generate some significant load
(10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the issue.

Anyone ever experienced similar issues? Do you have any suggestions how to
better trouble shoot this issue and maybe identify if disks or software layer
is responsible for this behavior

[1] http://www.samsung.com/us/dell/pdfs/PM1633a_Flyer_2016_v4.pdf
[2] https://gist.github.com/haad/09213bab1bc30a00c7d255c0bc60897b
[3] https://github.com/axboe/fio

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1776159

Title:
  mdadm raid soft lock-ups ubuntu kernel 4.13.0-36 Inbox  x

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36(ubuntu 
package linux-image-generic-hwe-16.04).
  We have created raid10 using 22 960GB SSDs [1] . The problem we're
  experiencing is that /usr/share/mdadm/checkarray
  (executed by cron, included in a mdadm pkg) results in (soft?)
  deadlock - load on the node spikes up to 500-700 and all I/O operations
  are blocked for a period of time. We can see traces liek these [2] in
  our kernel log.

  e.g. it ends up in static state like

  test@os-node1:~$ cat /proc/mdstat
  Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10]
  md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4] dm-19[5] 
dm-17[3]
                      dm-16[21] dm-15[20] dm-14[2] dm-13[19] dm-12[18] dm-11[17]
                      dm-10[16] dm-9[15] dm-8[14] dm-7[13] dm-6[12] dm-5[11] 
dm-4[10] dm-3[1] dm-2[0]
        10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22] 
[UUUUUUUUUUUUUUUUUUUUUU]
        [===>.................]  check = 19.0% (1965748032/10313171968) 
finish=1034728.8min speed=134K/sec
        bitmap: 0/39 pages [0KB], 131072KB chunk
  unused devices: <none>

  and the only solution is to hard reboot the node. What we found out is that it
  doesn't happen on idle raid, we have to generate some significant load
  (10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the issue.

  Anyone ever experienced similar issues? Do you have any suggestions how to
  better trouble shoot this issue and maybe identify if disks or software layer
  is responsible for this behavior

  [1] http://www.samsung.com/us/dell/pdfs/PM1633a_Flyer_2016_v4.pdf
  [2] https://gist.github.com/haad/09213bab1bc30a00c7d255c0bc60897b
  [3] https://github.com/axboe/fio

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776159/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to