Hi ceph-users,

Hoping to get some help with a tricky problem. I have a rhel7.1 VM guest
(host machine also rhel7.1) with root disk presented from ceph 0.94.2-0
(rbd) using libvirt.

The VM also has a second rbd for storage presented from the same ceph
cluster, also using libvirt.

The VM boots fine, no apparent issues with the OS root rbd. I am able to
mount the storage disk in the VM, and create a file system. I can even
transfer small files to it. But when I try to transfer a moderate size
files, eg. greater than 1GB, it seems to slow to a grinding halt and
eventually it locks up the whole system, and generates the kernel messages
below.

I have googled some *similar* issues around, but haven't come across some
solid advice/fix. So far I have tried modifying the libvirt disk cache
settings, tried using the latest mainline kernel (4.2+), different file
systems (ext4, xfs, zfs) all produce similar results. I suspect it may be
network related, as when I was using the mainline kernel I was transferring
some files to the storage disk and this message came up, and the transfer
seemed to stop at the same time:

Sep  1 15:31:22 nas1-rds NetworkManager[724]: <error> [1441085482.078646]
[platform/nm-linux-platform.c:2133] sysctl_set(): sysctl: failed to set
'/proc/sys/net/ipv6/conf/eth0/mtu' to '9000': (22) Invalid argument

I think maybe the key info to troubleshooting is that it seems to be OK for
files under 1GB.

Any ideas would be appreciated.

Cheers,
Raf


Sep  1 16:04:15 nas1-rds kernel: INFO: task kworker/u8:1:60 blocked for
more than 120 seconds.
Sep  1 16:04:15 nas1-rds kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  1 16:04:15 nas1-rds kernel: kworker/u8:1    D ffff88023fd93680     0
 60      2 0x00000000
Sep  1 16:04:15 nas1-rds kernel: Workqueue: writeback bdi_writeback_workfn
(flush-252:80)
Sep  1 16:04:15 nas1-rds kernel: ffff880230c136b0 0000000000000046
ffff8802313c4440 ffff880230c13fd8
Sep  1 16:04:15 nas1-rds kernel: ffff880230c13fd8 ffff880230c13fd8
ffff8802313c4440 ffff88023fd93f48
Sep  1 16:04:15 nas1-rds kernel: ffff880230c137b0 ffff880230fbcb08
ffffe8ffffd80ec0 ffff88022e827590
Sep  1 16:04:15 nas1-rds kernel: Call Trace:
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8160955d>] io_schedule+0x9d/0x130
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b8d5f>] bt_get+0x10f/0x1a0
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81098230>] ?
wake_up_bit+0x30/0x30
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b90ef>]
blk_mq_get_tag+0xbf/0xf0
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b4f3b>]
__blk_mq_alloc_request+0x1b/0x1f0
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b68a1>]
blk_mq_map_request+0x181/0x1e0
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b7a1a>]
blk_sq_make_request+0x9a/0x380
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa28f>] ?
generic_make_request_checks+0x24f/0x380
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa4a2>]
generic_make_request+0xe2/0x130
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa561>] submit_bio+0x71/0x150
Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01ddc55>]
ext4_io_submit+0x25/0x50 [ext4]
Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01dde09>]
ext4_bio_write_page+0x159/0x2e0 [ext4]
Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d4f6d>]
mpage_submit_page+0x5d/0x80 [ext4]
Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d5232>]
mpage_map_and_submit_buffers+0x172/0x2a0 [ext4]
Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01da313>]
ext4_writepages+0x733/0xd60 [ext4]
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81162b6e>]
do_writepages+0x1e/0x40
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811efe10>]
__writeback_single_inode+0x40/0x220
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0b0e>]
writeback_sb_inodes+0x25e/0x420
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0d6f>]
__writeback_inodes_wb+0x9f/0xd0
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f15b3>]
wb_writeback+0x263/0x2f0
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f2aec>]
bdi_writeback_workfn+0x1cc/0x460
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108f0ab>]
process_one_work+0x17b/0x470
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fe8b>]
worker_thread+0x11b/0x400
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fd70>] ?
rescuer_thread+0x400/0x400
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81613cfc>]
ret_from_fork+0x7c/0xb0
Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to