On 08/10/2012 05:02 AM, Mel Gorman wrote:
On Thu, Aug 09, 2012 at 04:38:24PM -0600, Jim Schutt wrote:


Ok, this is an untested hack and I expect it would drop allocation success
rates again under load (but not as much). Can you test again and see what
effect, if any, it has please?

---8<---
mm: compaction: back out if contended

---

<snip>

Initial testing with this patch looks very good from
my perspective; CPU utilization stays reasonable,
write-out rate stays high, no signs of stress.
Here's an example after ~10 minutes under my test load:


Hmmm, I wonder if I should have tested this patch longer,
in view of the trouble I ran into testing the new patch?
See below.


Excellent, so it is contention that is the problem.

<SNIP>
I'll continue testing tomorrow to be sure nothing
shows up after continued testing.

If this passes your allocation success rate testing,
I'm happy with this performance for 3.6 - if not, I'll
be happy to test any further patches.


It does impair allocation success rates as I expected (they're still ok
but not as high as I'd like) so I implemented the following instead. It
attempts to backoff when contention is detected or compaction is taking
too long. It does not backoff as quickly as the first prototype did so
I'd like to see if it addresses your problem or not.

I really appreciate getting the chance to test out
your patchset.


I appreciate that you have a workload that demonstrates the problem and
will test patches. I will not abuse this and hope the keep the revisions
to a minimum.

Thanks.

---8<---
mm: compaction: Abort async compaction if locks are contended or taking too long


Hmmm, while testing this patch, a couple of my servers got
stuck after ~30 minutes or so, like this:

[ 2515.869936] INFO: task ceph-osd:30375 blocked for more than 120 seconds.
[ 2515.876630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2515.884447] ceph-osd        D 0000000000000000     0 30375      1 0x00000000
[ 2515.891531]  ffff8802e1a99e38 0000000000000082 ffff88056b38e298 
ffff8802e1a99fd8
[ 2515.899013]  ffff8802e1a98010 ffff8802e1a98000 ffff8802e1a98000 
ffff8802e1a98000
[ 2515.906482]  ffff8802e1a99fd8 ffff8802e1a98000 ffff880697d31700 
ffff8802e1a84500
[ 2515.913968] Call Trace:
[ 2515.916433]  [<ffffffff8147fded>] schedule+0x5d/0x60
[ 2515.921417]  [<ffffffff81480b25>] rwsem_down_failed_common+0x105/0x140
[ 2515.927938]  [<ffffffff81480b73>] rwsem_down_write_failed+0x13/0x20
[ 2515.934195]  [<ffffffff8124bcd3>] call_rwsem_down_write_failed+0x13/0x20
[ 2515.940934]  [<ffffffff8147edc5>] ? down_write+0x45/0x50
[ 2515.946244]  [<ffffffff81127b62>] sys_mprotect+0xd2/0x240
[ 2515.951640]  [<ffffffff81489412>] system_call_fastpath+0x16/0x1b
[ 2515.957646] INFO: task ceph-osd:95698 blocked for more than 120 seconds.
[ 2515.964330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2515.972141] ceph-osd        D 0000000000000000     0 95698      1 0x00000000
[ 2515.979223]  ffff8802b049fe38 0000000000000082 ffff88056b38e2a0 
ffff8802b049ffd8
[ 2515.986700]  ffff8802b049e010 ffff8802b049e000 ffff8802b049e000 
ffff8802b049e000
[ 2515.994176]  ffff8802b049ffd8 ffff8802b049e000 ffff8809832ddc00 
ffff880611592e00
[ 2516.001653] Call Trace:
[ 2516.004111]  [<ffffffff8147fded>] schedule+0x5d/0x60
[ 2516.009072]  [<ffffffff81480b25>] rwsem_down_failed_common+0x105/0x140
[ 2516.015589]  [<ffffffff81480b73>] rwsem_down_write_failed+0x13/0x20
[ 2516.021861]  [<ffffffff8124bcd3>] call_rwsem_down_write_failed+0x13/0x20
[ 2516.028555]  [<ffffffff8147edc5>] ? down_write+0x45/0x50
[ 2516.033859]  [<ffffffff81127b62>] sys_mprotect+0xd2/0x240
[ 2516.039248]  [<ffffffff81489412>] system_call_fastpath+0x16/0x1b
[ 2516.045248] INFO: task ceph-osd:95699 blocked for more than 120 seconds.
[ 2516.051934] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2516.059753] ceph-osd        D 0000000000000000     0 95699      1 0x00000000
[ 2516.066832]  ffff880c022d3dc8 0000000000000082 ffff880c022d2000 
ffff880c022d3fd8
[ 2516.074302]  ffff880c022d2010 ffff880c022d2000 ffff880c022d2000 
ffff880c022d2000
[ 2516.081784]  ffff880c022d3fd8 ffff880c022d2000 ffff8806224cc500 
ffff88096b64dc00
[ 2516.089254] Call Trace:
[ 2516.091702]  [<ffffffff8147fded>] schedule+0x5d/0x60
[ 2516.096656]  [<ffffffff81480b25>] rwsem_down_failed_common+0x105/0x140
[ 2516.103176]  [<ffffffff81480b73>] rwsem_down_write_failed+0x13/0x20
[ 2516.109443]  [<ffffffff8124bcd3>] call_rwsem_down_write_failed+0x13/0x20
[ 2516.116134]  [<ffffffff8147edc5>] ? down_write+0x45/0x50
[ 2516.121442]  [<ffffffff8111362e>] vm_mmap_pgoff+0x6e/0xb0
[ 2516.126861]  [<ffffffff8112486a>] sys_mmap_pgoff+0x18a/0x190
[ 2516.132552]  [<ffffffff8124bd6e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[ 2516.138985]  [<ffffffff81006b22>] sys_mmap+0x22/0x30
[ 2516.143945]  [<ffffffff81489412>] system_call_fastpath+0x16/0x1b
[ 2516.149949] INFO: task ceph-osd:95816 blocked for more than 120 seconds.
[ 2516.156632] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2516.164444] ceph-osd        D 0000000000000000     0 95816      1 0x00000000
[ 2516.171521]  ffff880332991e38 0000000000000082 ffff880332991de8 
ffff880332991fd8
[ 2516.178992]  ffff880332990010 ffff880332990000 ffff880332990000 
ffff880332990000
[ 2516.186466]  ffff880332991fd8 ffff880332990000 ffff880697d31700 
ffff880a92c32e00
[ 2516.193937] Call Trace:
[ 2516.196396]  [<ffffffff8147fded>] schedule+0x5d/0x60
[ 2516.201354]  [<ffffffff81480b25>] rwsem_down_failed_common+0x105/0x140
[ 2516.207886]  [<ffffffff81480b73>] rwsem_down_write_failed+0x13/0x20
[ 2516.214138]  [<ffffffff8124bcd3>] call_rwsem_down_write_failed+0x13/0x20
[ 2516.220843]  [<ffffffff8147edc5>] ? down_write+0x45/0x50
[ 2516.226145]  [<ffffffff81127b62>] sys_mprotect+0xd2/0x240
[ 2516.231548]  [<ffffffff81489412>] system_call_fastpath+0x16/0x1b
[ 2516.237545] INFO: task ceph-osd:95838 blocked for more than 120 seconds.
[ 2516.244248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2516.252067] ceph-osd        D 0000000000000000     0 95838      1 0x00000000
[ 2516.259159]  ffff8803f8281e38 0000000000000082 ffff88056b38e2a8 
ffff8803f8281fd8
[ 2516.266627]  ffff8803f8280010 ffff8803f8280000 ffff8803f8280000 
ffff8803f8280000
[ 2516.274094]  ffff8803f8281fd8 ffff8803f8280000 ffff8809a45f8000 
ffff880691d41700
[ 2516.281573] Call Trace:
[ 2516.284028]  [<ffffffff8147fded>] schedule+0x5d/0x60
[ 2516.289000]  [<ffffffff81480b25>] rwsem_down_failed_common+0x105/0x140
[ 2516.295513]  [<ffffffff81480b73>] rwsem_down_write_failed+0x13/0x20
[ 2516.301764]  [<ffffffff8124bcd3>] call_rwsem_down_write_failed+0x13/0x20
[ 2516.308450]  [<ffffffff8147edc5>] ? down_write+0x45/0x50
[ 2516.313753]  [<ffffffff81127b62>] sys_mprotect+0xd2/0x240
[ 2516.319157]  [<ffffffff81489412>] system_call_fastpath+0x16/0x1b
[ 2516.325154] INFO: task ceph-osd:95861 blocked for more than 120 seconds.
[ 2516.331844] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2516.339665] ceph-osd        D 0000000000000000     0 95861      1 0x00000000
[ 2516.346742]  ffff8805026e9e38 0000000000000082 ffff88056b38e2a0 
ffff8805026e9fd8
[ 2516.354221]  ffff8805026e8010 ffff8805026e8000 ffff8805026e8000 
ffff8805026e8000
[ 2516.361698]  ffff8805026e9fd8 ffff8805026e8000 ffff880611592e00 
ffff880948df0000
[ 2516.369174] Call Trace:
[ 2516.371623]  [<ffffffff8147fded>] schedule+0x5d/0x60
[ 2516.376582]  [<ffffffff81480b25>] rwsem_down_failed_common+0x105/0x140
[ 2516.383149]  [<ffffffff81480b73>] rwsem_down_write_failed+0x13/0x20
[ 2516.389404]  [<ffffffff8124bcd3>] call_rwsem_down_write_failed+0x13/0x20
[ 2516.396091]  [<ffffffff8147edc5>] ? down_write+0x45/0x50
[ 2516.401397]  [<ffffffff81127b62>] sys_mprotect+0xd2/0x240
[ 2516.406818]  [<ffffffff81489412>] system_call_fastpath+0x16/0x1b
[ 2516.412868] INFO: task ceph-osd:95899 blocked for more than 120 seconds.
[ 2516.419557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2516.427371] ceph-osd        D 0000000000000000     0 95899      1 0x00000000
[ 2516.434466]  ffff8801eaa9dd50 0000000000000082 0000000000000000 
ffff8801eaa9dfd8
[ 2516.442020]  ffff8801eaa9c010 ffff8801eaa9c000 ffff8801eaa9c000 
ffff8801eaa9c000
[ 2516.449594]  ffff8801eaa9dfd8 ffff8801eaa9c000 ffff8800865e5c00 
ffff8802b356c500
[ 2516.457079] Call Trace:
[ 2516.459534]  [<ffffffff8147fded>] schedule+0x5d/0x60
[ 2516.464519]  [<ffffffff81480b25>] rwsem_down_failed_common+0x105/0x140
[ 2516.471044]  [<ffffffff81480b95>] rwsem_down_read_failed+0x15/0x17
[ 2516.477222]  [<ffffffff8124bca4>] call_rwsem_down_read_failed+0x14/0x30
[ 2516.483830]  [<ffffffff8147ee07>] ? down_read+0x37/0x40
[ 2516.489050]  [<ffffffff81484c49>] do_page_fault+0x239/0x4a0
[ 2516.494627]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 2516.501143]  [<ffffffff8148154f>] page_fault+0x1f/0x30


I tried to capture a perf trace while this was going on, but it
never completed.  "ps" on this system reports lots of kernel threads
and some user-space stuff, but hangs part way through - no ceph
executables in the output, oddly.

I can retest your earlier patch for a longer period, to
see if it does the same thing, or I can do some other thing
if you tell me what it is.

Also, FWIW I sorted a little through SysRq-T output from such
a system; these bits looked interesting:

[ 3663.685097] INFO: rcu_sched self-detected stall on CPU { 17}  (t=60000 
jiffies)
[ 3663.685099] sending NMI to all CPUs:
[ 3663.685101] NMI backtrace for cpu 0
[ 3663.685102] CPU 0 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod 
ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca 
e1000 [last unloaded: scsi_wait_scan]
[ 3663.685138]
[ 3663.685140] Pid: 100027, comm: ceph-osd Not tainted 3.5.0-00019-g472719a 
#221 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.685142] RIP: 0010:[<ffffffff81480ed5>]  [<ffffffff81480ed5>] 
_raw_spin_lock_irqsave+0x45/0x60
[ 3663.685148] RSP: 0018:ffff880a08191898  EFLAGS: 00000012
[ 3663.685149] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000c5
[ 3663.685149] RDX: 00000000000000bf RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.685150] RBP: ffff880a081918a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.685151] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.685152] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.685153] FS:  00007fff90ae0700(0000) GS:ffff880627c00000(0000) 
knlGS:0000000000000000
[ 3663.685154] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.685155] CR2: ffffffffff600400 CR3: 00000002b8fbe000 CR4: 00000000000007f0
[ 3663.685156] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.685157] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.685158] Process ceph-osd (pid: 100027, threadinfo ffff880a08190000, task 
ffff880a9a29ae00)
[ 3663.685158] Stack:
[ 3663.685159]  000000000000130a 0000000000000000 ffff880a08191948 
ffffffff8111a760
[ 3663.685162]  ffffffff81a13420 0000000000000009 ffffea000004c240 
0000000000000000
[ 3663.685165]  ffff88063fffcba0 000000003fffcb98 ffff880a08191a18 
0000000000001600
[ 3663.685168] Call Trace:
[ 3663.685169]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.685173]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.685175]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.685178]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.685180]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.685182]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.685187]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.685190]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.685192]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.685195]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.685199]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.685202]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.685205]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.685208]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.685211]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.685213] Code: 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 00 f0 66 
0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 0f b6 13 <38> d1 75 f7 
5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66 2e 0f 1f
[ 3663.685238] NMI backtrace for cpu 3
[ 3663.685239] CPU 3 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod 
ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca 
e1000 [last unloaded: scsi_wait_scan]
[ 3663.685273]
[ 3663.685274] Pid: 101503, comm: ceph-osd Not tainted 3.5.0-00019-g472719a 
#221 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.685276] RIP: 0010:[<ffffffff81480ed2>]  [<ffffffff81480ed2>] 
_raw_spin_lock_irqsave+0x42/0x60
[ 3663.685280] RSP: 0018:ffff8806bce17898  EFLAGS: 00000006
[ 3663.685280] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000cb
[ 3663.685281] RDX: 00000000000000c5 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.685282] RBP: ffff8806bce178a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.685283] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.685284] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.685285] FS:  00007fffc8e60700(0000) GS:ffff880627c60000(0000) 
knlGS:0000000000000000
[ 3663.685286] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.685287] CR2: ffffffffff600400 CR3: 00000002cbd8c000 CR4: 00000000000007e0
[ 3663.685287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.685288] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.685289] Process ceph-osd (pid: 101503, threadinfo ffff8806bce16000, task 
ffff880c06580000)
[ 3663.685290] Stack:
[ 3663.685290]  0000000000001212 0000000000000000 ffff8806bce17948 
ffffffff8111a760
[ 3663.685294]  ffff8806244d5c00 0000000000000009 ffffea0000048440 
0000000000000000
[ 3663.685297]  ffff88063fffcba0 000000003fffcb98 ffff8806bce17a18 
0000000000001600
[ 3663.685300] Call Trace:
[ 3663.685301]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.685304]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.685306]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.685308]  [<ffffffff814018c4>] ? ip_finish_output+0x274/0x300
[ 3663.685311]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.685314]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.685316]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.685319]  [<ffffffff813b655b>] ? release_sock+0x6b/0x80
[ 3663.685322]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.685325]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.685327]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.685330]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.685332]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.685335]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.685337]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.685340]  [<ffffffff8106707d>] ? up_write+0x1d/0x20
[ 3663.685343]  [<ffffffff81113656>] ? vm_mmap_pgoff+0x96/0xb0
[ 3663.685347]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.685349]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.685352] Code: ff 65 48 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 
00 f0 66 0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 <0f> b6 13 38 
d1 75 f7 5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66
[ 3663.685378] NMI backtrace for cpu 6
[ 3663.685379] CPU 6 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core[ 3663.685402] Uhhuh. NMI received for unknown reason 3d on CPU 3.
[ 3663.685403]  mpt2sas[ 3663.685404] Do you have a strange power saving mode 
enabled?
[ 3663.685405]  scsi_transport_sas[ 3663.685406] Dazed and confused, but trying 
to continue
[ 3663.685407]  raid_class scsi_mod cxgb4 i2c_i801 i2c_core lpc_ich mfd_core 
ehci_hcd uhci_hcd i7core_edac edac_core dm_mod ioatdma nfs nfs_acl auth_rpcgss 
fscache lockd sunrpc broadcom tg3 bnx2 igb dca e1000 [last unloaded: 
scsi_wait_scan]
[ 3663.685420]
[ 3663.685422] Pid: 102943, comm: ceph-osd Not tainted 3.5.0-00019-g472719a 
#221 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.685424] RIP: 0010:[<ffffffff81480ed2>]  [<ffffffff81480ed2>] 
_raw_spin_lock_irqsave+0x42/0x60
[ 3663.685430] RSP: 0018:ffff88065c111898  EFLAGS: 00000006
[ 3663.685430] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000d9
[ 3663.685431] RDX: 00000000000000c5 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.685432] RBP: ffff88065c1118a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.685433] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.685433] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.685434] FS:  00007fffc693b700(0000) GS:ffff880c3fc00000(0000) 
knlGS:0000000000000000
[ 3663.685435] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.685436] CR2: ffffffffff600400 CR3: 000000048d1b1000 CR4: 00000000000007e0
[ 3663.685437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.685438] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.685439] Process ceph-osd (pid: 102943, threadinfo ffff88065c110000, task 
ffff880737b9ae00)
[ 3663.685439] Stack:
[ 3663.685440]  0000000000001d31 0000000000000000 ffff88065c111948 
ffffffff8111a760
[ 3663.685444]  ffff8806245b2e00 ffff88065c1118c8 0000000000000006 
0000000000000000
[ 3663.685447]  ffff88063fffcba0 000000003fffcb98 ffff88065c111a18 
0000000000002000
[ 3663.685450] Call Trace:
[ 3663.685451]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.685455]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.685458]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.685460]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.685462]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.685464]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.685469]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.685471]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.685474]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.685477]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.685481]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.685483]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.685487]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.685490]  [<ffffffff8106707d>] ? up_write+0x1d/0x20
[ 3663.685493]  [<ffffffff81113656>] ? vm_mmap_pgoff+0x96/0xb0
[ 3663.685497]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.685500]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.685502] Code: ff 65 48 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 
00 f0 66 0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 <0f> b6 13 38 
d1 75 f7 5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66
[ 3663.685527] NMI backtrace for cpu 1
[ 3663.685528] CPU 1 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod 
ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca 
e1000 [last unloaded: scsi_wait_scan]
[ 3663.685562]
[ 3663.685563] Pid: 30029, comm: ceph-osd Not tainted 3.5.0-00019-g472719a #221 
Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.685565] RIP: 0010:[<ffffffff81480ed2>]  [<ffffffff81480ed2>] 
_raw_spin_lock_irqsave+0x42/0x60
[ 3663.685569] RSP: 0018:ffff880563ae1898  EFLAGS: 00000006
[ 3663.685569] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000d6
[ 3663.685570] RDX: 00000000000000c5 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.685571] RBP: ffff880563ae18a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.685572] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.685573] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.685574] FS:  00007fffe86c9700(0000) GS:ffff880627c20000(0000) 
knlGS:0000000000000000
[ 3663.685575] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.685576] CR2: ffffffffff600400 CR3: 00000002cc584000 CR4: 00000000000007e0
[ 3663.685577] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.685577] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.685578] Process ceph-osd (pid: 30029, threadinfo ffff880563ae0000, task 
ffff880563adc500)
[ 3663.685579] Stack:
[ 3663.685579]  000000000000167f 0000000000000000 ffff880563ae1948 
ffffffff8111a760
[ 3663.685583]  ffff88063fffcc38 ffff88063fffcb98 000000000000256b 
0000000000000000
[ 3663.685586]  ffff88063fffcba0 0000000000000004 ffff880563ae1a18 
0000000000001a00
[ 3663.685589] Call Trace:
[ 3663.685590]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.685593]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.685595]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.685597]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.685599]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.685601]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.685604]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.685607]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.685609]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.685612]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.685614]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.685616]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.685619]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.685621]  [<ffffffff8106707d>] ? up_write+0x1d/0x20
[ 3663.685623]  [<ffffffff81113656>] ? vm_mmap_pgoff+0x96/0xb0
[ 3663.685626]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.685628]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.685630] Code: ff 65 48 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 
00 f0 66 0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 <0f> b6 13 38 
d1 75 f7 5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66
[ 3663.685656] NMI backtrace for cpu 12
[ 3663.685656] CPU 12 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod 
ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca 
e1000 [last unloaded: scsi_wait_scan]
[ 3663.685687]
[ 3663.685688] Pid: 97037, comm: ceph-osd Not tainted 3.5.0-00019-g472719a #221 
Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.685690] RIP: 0010:[<ffffffff81480ed2>]  [<ffffffff81480ed2>] 
_raw_spin_lock_irqsave+0x42/0x60
[ 3663.685693] RSP: 0018:ffff880092839898  EFLAGS: 00000016
[ 3663.685694] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000d4
[ 3663.685694] RDX: 00000000000000c5 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.685695] RBP: ffff8800928398a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.685696] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.685697] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.685698] FS:  00007fffcb183700(0000) GS:ffff880627cc0000(0000) 
knlGS:0000000000000000
[ 3663.685699] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.685700] CR2: ffffffffff600400 CR3: 0000000411741000 CR4: 00000000000007e0
[ 3663.685701] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.685702] Uhhuh. NMI received for unknown reason 3d on CPU 6.
[ 3663.685703] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.685704] Do you have a strange power saving mode enabled?
[ 3663.685705] Process ceph-osd (pid: 97037, threadinfo ffff880092838000, task 
ffff8805d127dc00)
[ 3663.685706] Dazed and confused, but trying to continue
[ 3663.685707] Stack:
[ 3663.685707]  000000000000358a 0000000000000000 ffff880092839948 
ffffffff8111a760
[ 3663.685711]  ffff8806245c4500 ffff8800928398c8 000000000000000c 
0000000000000000
[ 3663.685714]  ffff88063fffcba0 000000003fffcb98 ffff880092839a18 
0000000000003800
[ 3663.685717] Call Trace:
[ 3663.685717]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.685720]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.685722]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.685724]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.685727]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.685729]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.685731]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.685734]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.685736]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.685738]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.685740]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.685743]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.685745]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.685747]  [<ffffffff8106707d>] ? up_write+0x1d/0x20
[ 3663.685749]  [<ffffffff81113656>] ? vm_mmap_pgoff+0x96/0xb0
[ 3663.685752]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.685754]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.685756] Code: ff 65 48 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 
00 f0 66 0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 <0f> b6 13 38 
d1 75 f7 5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66
[ 3663.685781] NMI backtrace for cpu 14
[ 3663.685782] CPU 14 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod 
ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca 
e1000 [last unloaded: scsi_wait_scan]
[ 3663.685815]
[ 3663.685816] Pid: 97590, comm: ceph-osd Not tainted 3.5.0-00019-g472719a #221 
Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.685818] RIP: 0010:[<ffffffff81480ed2>]  [<ffffffff81480ed2>] 
_raw_spin_lock_irqsave+0x42/0x60
[ 3663.685821] RSP: 0018:ffff8803f97a9898  EFLAGS: 00000002
[ 3663.685822] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000c6
[ 3663.685823] RDX: 00000000000000c5 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.685823] RBP: ffff8803f97a98a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.685824] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.685825] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.685826] FS:  00007fffca577700(0000) GS:ffff880627d00000(0000) 
knlGS:0000000000000000
[ 3663.685827] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.685828] CR2: ffffffffff600400 CR3: 00000002e0986000 CR4: 00000000000007e0
[ 3663.685828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.685829] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.685830] Process ceph-osd (pid: 97590, threadinfo ffff8803f97a8000, task 
ffff88045554c500)
[ 3663.685831] Stack:
[ 3663.685831]  0000000000001cc3 0000000000000000 ffff8803f97a9948 
ffffffff8111a760
[ 3663.685834]  ffff8806245d8000 ffff8803f97a98c8 000000000000000e 
0000000000000000
[ 3663.685838]  ffff88063fffcba0 000000003fffcb98 ffff8803f97a9a18 
0000000000002000
[ 3663.685841] Call Trace:
[ 3663.685842]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.685844]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.685847]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.685849]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.685851]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.685853]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.685856]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.685859]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.685861]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.685864]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.685866]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.685868]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.685871]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.685873]  [<ffffffff8106707d>] ? up_write+0x1d/0x20
[ 3663.685875]  [<ffffffff81113656>] ? vm_mmap_pgoff+0x96/0xb0
[ 3663.685878]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.685880]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.685882] Code: ff 65 48 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 
00 f0 66 0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 <0f> b6 13 38 
d1 75 f7 5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66
[ 3663.685907] NMI backtrace for cpu 2
[ 3663.685908] CPU 2 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod 
ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca 
e1000 [last unloaded: scsi_wait_scan]
[ 3663.685939]
[ 3663.685941] Pid: 100053, comm: ceph-osd Not tainted 3.5.0-00019-g472719a 
#221 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.685943] RIP: 0010:[<ffffffff81480ed2>]  [<ffffffff81480ed2>] 
_raw_spin_lock_irqsave+0x42/0x60
[ 3663.685946] RSP: 0018:ffff8808da685898  EFLAGS: 00000012
[ 3663.685947] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000d3
[ 3663.685948] RDX: 00000000000000c6 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.685948] RBP: ffff8808da6858a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.685949] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.685950] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.685951] FS:  00007fff92c01700(0000) GS:ffff880627c40000(0000) 
knlGS:0000000000000000
[ 3663.685952] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.685953] CR2: ffffffffff600400 CR3: 00000002b8fbe000 CR4: 00000000000007e0
[ 3663.685954] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.685954] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.685955] Process ceph-osd (pid: 100053, threadinfo ffff8808da684000, task 
ffff880a05a92e00)
[ 3663.685956] Stack:
[ 3663.685956]  000000000000119b 0000000000000000 ffff8808da685948 
ffffffff8111a760
[ 3663.685959]  ffff8806244d4500 ffff8808da6858c8 0000000000000002 
0000000000000000
[ 3663.685962]  ffff88063fffcba0 000000003fffcb98 ffff8808da685a18 
0000000000001400
[ 3663.685966] Call Trace:
[ 3663.685966]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.685969]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.685971]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.685973]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.685976]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.685978]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.685981]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.685983]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.685986]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.685988]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.685990]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.685992]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.685995]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.685997]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.685999]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.686001] Code: ff 65 48 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 
00 f0 66 0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 <0f> b6 13 38 
d1 75 f7 5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66
[ 3663.686028] NMI backtrace for cpu 11
[ 3663.686028] CPU 11 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod 
ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca 
e1000 [last unloaded: scsi_wait_scan]
[ 3663.686062]
[ 3663.686064] Pid: 97756, comm: ceph-osd Not tainted 3.5.0-00019-g472719a #221 
Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.686066] RIP: 0010:[<ffffffff81480ed5>]  [<ffffffff81480ed5>] 
_raw_spin_lock_irqsave+0x45/0x60
[ 3663.686069] RSP: 0018:ffff880b11ecd898  EFLAGS: 00000006
[ 3663.686070] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000d8
[ 3663.686070] RDX: 00000000000000c6 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.686071] RBP: ffff880b11ecd8a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.686072] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.686073] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.686074] FS:  00007ffff36df700(0000) GS:ffff880c3fca0000(0000) 
knlGS:0000000000000000
[ 3663.686075] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.686076] CR2: ffffffffff600400 CR3: 00000002cae55000 CR4: 00000000000007e0
[ 3663.686077] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.686078] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.686079] Process ceph-osd (pid: 97756, threadinfo ffff880b11ecc000, task 
ffff880a79a51700)
[ 3663.686079] Stack:
[ 3663.686080]  0000000000001b3e 0000000000000000 ffff880b11ecd948 
ffffffff8111a760
[ 3663.686083]  ffff8806245c2e00 ffff880b11ecd8c8 000000000000000b 
0000000000000000
[ 3663.686086]  ffff88063fffcba0 000000003fffcb98 ffff880b11ecda18 
0000000000001e00
[ 3663.686089] Call Trace:
[ 3663.686090]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.686093]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.686095]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.686097]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.686099]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.686102]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.686105]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.686107]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.686110]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.686112]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.686114]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.686117]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.686119]  [<ffffffff8106707d>] ? up_write+0x1d/0x20
[ 3663.686121]  [<ffffffff81113656>] ? vm_mmap_pgoff+0x96/0xb0
[ 3663.686124]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.686126]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.686129] Code: 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 00 f0 66 
0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 0f b6 13 <38> d1 75 f7 
5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66 2e 0f 1f
[ 3663.686155] NMI backtrace for cpu 20
[ 3663.686155] CPU 20 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode 
serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en 
mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod 
ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca 
e1000 [last unloaded: scsi_wait_scan]
[ 3663.686189]
[ 3663.686190] Pid: 97755, comm: ceph-osd Not tainted 3.5.0-00019-g472719a #221 
Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.686193] RIP: 0010:[<ffffffff81480ed5>]  [<ffffffff81480ed5>] 
_raw_spin_lock_irqsave+0x45/0x60
[ 3663.686196] RSP: 0018:ffff88066d5af898  EFLAGS: 00000002
[ 3663.686196] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000cd
[ 3663.686197] RDX: 00000000000000c6 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.686198] RBP: ffff88066d5af8a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.686199] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.686199] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.686200] Uhhuh. NMI received for unknown reason 2d on CPU 11.
[ 3663.686201] FS:  00007ffff3ee0700(0000) GS:ffff880c3fd00000(0000) 
knlGS:0000000000000000
[ 3663.686202] Do you have a strange power saving mode enabled?
[ 3663.686203] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.686203] Dazed and confused, but trying to continue
[ 3663.686204] CR2: ffffffffff600400 CR3: 00000002cae55000 CR4: 00000000000007e0
[ 3663.686205] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.686206] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.686207] Process ceph-osd (pid: 97755, threadinfo ffff88066d5ae000, task 
ffff880a79a52e00)
[ 3663.686207] Stack:
[ 3663.686208]  0000000000001cbf 0000000000000000 ffff88066d5af948 
ffffffff8111a760
[ 3663.686211]  ffff8806245e9700 ffff88066d5af8c8 0000000000000014 
0000000000000000
[ 3663.686214]  ffff88063fffcba0 000000003fffcb98 ffff88066d5afa18 
0000000000002000
[ 3663.686217] Call Trace:
[ 3663.686218]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.686221]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.686223]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.686225]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.686228]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.686230]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.686233]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.686236]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.686238]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.686240]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.686243]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.686245]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.686247]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.686250]  [<ffffffff8106707d>] ? up_write+0x1d/0x20
[ 3663.686252]  [<ffffffff81113656>] ? vm_mmap_pgoff+0x96/0xb0
[ 3663.686254]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.686257]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.686259] Code: 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 01 00 00 f0 66 
0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 f3 90 0f b6 13 <38> d1 75 f7 
5b 4c 89 e0 41 5c c9 c3 66 66 66 66 66 66 2e 0f 1f
[ 3663.686284] NMI backtrace for cpu 13
[ 3663.686285] CPU 13 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm 
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net 
macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm 
crc32c_intel[ 3663.686300] Uhhuh. NMI received for unknown reason 2d on CPU 12.
[ 3663.686300]  ghash_clmulni_intel[ 3663.686301] Do you have a strange power 
saving mode enabled?
[ 3663.686301]  aesni_intel[ 3663.686302] Dazed and confused, but trying to 
continue
[ 3663.686302]  cryptd aes_x86_64 microcode serio_raw pcspkr ata_piix libata 
button mlx4_ib ib_mad ib_core mlx4_en mlx4_core mpt2sas scsi_transport_sas 
raid_class scsi_mod cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd 
i7core_edac edac_core dm_mod ioatdma nfs nfs_acl auth_rpcgss fscache lockd 
sunrpc broadcom tg3 bnx2 igb dca e1000 [last unloaded: scsi_wait_scan]
[ 3663.686318]
[ 3663.686319] Pid: 98427, comm: ceph-osd Not tainted 3.5.0-00019-g472719a #221 
Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 3663.686321] RIP: 0010:[<ffffffff81480ed0>]  [<ffffffff81480ed0>] 
_raw_spin_lock_irqsave+0x40/0x60
[ 3663.686324] RSP: 0018:ffff880356409898  EFLAGS: 00000016
[ 3663.686324] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000d2
[ 3663.686325] RDX: 00000000000000c6 RSI: 000000000000015a RDI: ffff88063fffcb00
[ 3663.686326] RBP: ffff8803564098a8 R08: 0000000000000000 R09: 0000000000000000
[ 3663.686327] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
[ 3663.686327] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
[ 3663.686328] FS:  00007fffc794b700(0000) GS:ffff880627ce0000(0000) 
knlGS:0000000000000000
[ 3663.686329] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3663.686330] CR2: ffffffffff600400 CR3: 00000002bc512000 CR4: 00000000000007e0
[ 3663.686331] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3663.686332] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3663.686333] Process ceph-osd (pid: 98427, threadinfo ffff880356408000, task 
ffff880027de5c00)
[ 3663.686333] Stack:
[ 3663.686333]  0000000000001061 0000000000000000 ffff880356409948 
ffffffff8111a760
[ 3663.686337]  ffff8806245c5c00 ffff8803564098c8 000000000000000d 
0000000000000000
[ 3663.686340]  ffff88063fffcba0 000000003fffcb98 ffff880356409a18 
0000000000001400
[ 3663.686343] Call Trace:
[ 3663.686343]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
[ 3663.686346]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
[ 3663.686348]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
[ 3663.686350]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
[ 3663.686352]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
[ 3663.686354]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
[ 3663.686357]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
[ 3663.686360]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
[ 3663.686362]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
[ 3663.686364]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
[ 3663.686366]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
[ 3663.686368]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
[ 3663.686370]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
[ 3663.686373]  [<ffffffff8106707d>] ? up_write+0x1d/0x20
[ 3663.686375]  [<ffffffff81113656>] ? vm_mmap_pgoff+0x96/0xb0
[ 3663.686377]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 3663.686379]  [<ffffffff8148154f>] page_fault+0x1f/0x30
[ 3663.686381] Code: 6a c5 ff 65 48 8b 14 25 48 b7 00 00 83 82 44 e0 ff ff 01 ba 00 
01 00 00 f0 66 0f c1 13 89 d1 66 c1 e9 08 38 d1 74 0d 0f 1f 40 00 <f3> 90 0f b6 
13 38 d1 75 f7 5b 4c 89 e0 41 5c c9 c3 66 66 66 66


Please let me know what I can do next to help sort this out.

Thanks -- Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to