On 07/15/2015 01:17 AM, Jeya Ganesh Babu Jegatheesan wrote: > Hi, > > We have a Openstack + Ceph cluster based on Giant release. We use ceph for > the VMs volumes including the boot volumes. Under load, we see the write > access to the volumes stuck from within the VM. The same would work after a > VM reboot. The issue is seen with and without rbd cache. Let me know if this > is some known issue and any way to debug further. The ceph cluster itself > seems to be clean. We have currently disabled scrub and deep scrub. 'ceph -s' > output as below. >
Are you seeing slow requests in the system? Are any of the disks under the OSDs 100% busy or close to it? Btw, the amount of PGs is rather high. You are at 88, while the formula recommends: num_osd * 100 / 3 = 14k (cluster total) Wido > cluster eaaeaa55-a8e7-4531-a5eb-03d73028b59d > health HEALTH_WARN noscrub,nodeep-scrub flag(s) set > monmap e71: 9 mons at > {gngsvc009a=10.163.43.1:6789/0,gngsvc009b=10.163.43.2:6789/0,gngsvc010a=10.163.43.5:6789/0,gngsvc010b=10.163.43.6:6789/0,gngsvc011a=10.163.43.9:6789/0,gngsvc011b=10.163.43.10:6789/0,gngsvc011c=10.163.43.11:6789/0,gngsvm010d=10.163.43.8:6789/0,gngsvm011d=10.163.43.12:6789/0}, > election epoch 22246, quorum 0,1,2,3,4,5,6,7,8 > gngsvc009a,gngsvc009b,gngsvc010a,gngsvc010b,gngsvm010d,gngsvc011a,gngsvc011b,gngsvc011c,gngsvm011d > osdmap e54600: 425 osds: 425 up, 425 in > flags noscrub,nodeep-scrub > pgmap v13257438: 37620 pgs, 4 pools, 134 TB data, 35289 kobjects > 402 TB used, 941 TB / 1344 TB avail > 37620 active+clean > client io 94059 kB/s rd, 313 MB/s wr, 4623 op/s > > > The traces we see in the VM's kernel are as below. > > [ 1080.552901] INFO: task jbd2/vdb-8:813 blocked for more than 120 seconds. > [ 1080.553027] Tainted: GF O 3.13.0-34-generic > #60~precise1-Ubuntu > [ 1080.553157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 1080.553295] jbd2/vdb-8 D ffff88003687e3e0 0 813 2 > 0x00000000 > [ 1080.553298] ffff880444fadb48 0000000000000002 ffff880455114440 > ffff880444fadfd8 > [ 1080.553302] 0000000000014440 0000000000014440 ffff88044a9317f0 > ffff88044b7917f0 > [ 1080.553303] ffff880444fadb48 ffff880455114cd8 ffff88044b7917f0 > ffffffff811fc670 > [ 1080.553307] Call Trace: > [ 1080.553309] [<ffffffff811fc670>] ? __wait_on_buffer+0x30/0x30 > [ 1080.553311] [<ffffffff8175b8b9>] schedule+0x29/0x70 > [ 1080.553313] [<ffffffff8175b98f>] io_schedule+0x8f/0xd0 > [ 1080.553315] [<ffffffff811fc67e>] sleep_on_buffer+0xe/0x20 > [ 1080.553316] [<ffffffff8175c052>] __wait_on_bit+0x62/0x90 > [ 1080.553318] [<ffffffff811fc670>] ? __wait_on_buffer+0x30/0x30 > [ 1080.553320] [<ffffffff8175c0fc>] out_of_line_wait_on_bit+0x7c/0x90 > [ 1080.553322] [<ffffffff810aff70>] ? wake_atomic_t_function+0x40/0x40 > [ 1080.553324] [<ffffffff811fc66e>] __wait_on_buffer+0x2e/0x30 > [ 1080.553326] [<ffffffff8129806b>] > jbd2_journal_commit_transaction+0x136b/0x1520 > [ 1080.553329] [<ffffffff810a1f75>] ? sched_clock_local+0x25/0x90 > [ 1080.553331] [<ffffffff8109a7b8>] ? finish_task_switch+0x128/0x170 > [ 1080.553333] [<ffffffff8107891f>] ? try_to_del_timer_sync+0x4f/0x70 > [ 1080.553334] [<ffffffff8129c5d8>] kjournald2+0xb8/0x240 > [ 1080.553336] [<ffffffff810afef0>] ? __wake_up_sync+0x20/0x20 > [ 1080.553338] [<ffffffff8129c520>] ? commit_timeout+0x10/0x10 > [ 1080.553340] [<ffffffff8108fa79>] kthread+0xc9/0xe0 > [ 1080.553343] [<ffffffff8108f9b0>] ? flush_kthread_worker+0xb0/0xb0 > [ 1080.553346] [<ffffffff8176827c>] ret_from_fork+0x7c/0xb0 > [ 1080.553349] [<ffffffff8108f9b0>] ? flush_kthread_worker+0xb0/0xb0 > > Thanks, > Jeyaganesh. > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com