Hi Jan,

Thanks for the advice, hit the nail on the head.

I checked the limits and watched the no. of fd's and as it reached the soft
limit (1024) thats when the transfer came to a grinding halt and the vm
started locking up.

After your reply I also did some more googling and found another old thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-December/026187.html

I increased the max_files in qemu.conf and restarted libvirtd and the VM
(as per Dan's solution in thread above), and now it seems to be happy
copying any size files to the rbd. Confirmed the fd count is going past the
previous soft limit of 1024 also.

Thanks again!!
Raf

On 2 September 2015 at 18:44, Jan Schermer <j...@schermer.cz> wrote:

> 1) Take a look at the number of file descriptors the QEMU process is
> using, I think you are over the limits
>
> pid=pid of qemu process
>
> cat /proc/$pid/limits
> echo /proc/$pid/fd/* | wc -w
>
> 2) Jumbo frames may be the cause, are they enabled on the rest of the
> network? In any case, get rid of NetworkManager ASAP and set it manually,
> though it looks like your NIC might not support them.
>
> Jan
>
>
>
> > On 02 Sep 2015, at 01:44, Rafael Lopez <rafael.lo...@monash.edu> wrote:
> >
> > Hi ceph-users,
> >
> > Hoping to get some help with a tricky problem. I have a rhel7.1 VM guest
> (host machine also rhel7.1) with root disk presented from ceph 0.94.2-0
> (rbd) using libvirt.
> >
> > The VM also has a second rbd for storage presented from the same ceph
> cluster, also using libvirt.
> >
> > The VM boots fine, no apparent issues with the OS root rbd. I am able to
> mount the storage disk in the VM, and create a file system. I can even
> transfer small files to it. But when I try to transfer a moderate size
> files, eg. greater than 1GB, it seems to slow to a grinding halt and
> eventually it locks up the whole system, and generates the kernel messages
> below.
> >
> > I have googled some *similar* issues around, but haven't come across
> some solid advice/fix. So far I have tried modifying the libvirt disk cache
> settings, tried using the latest mainline kernel (4.2+), different file
> systems (ext4, xfs, zfs) all produce similar results. I suspect it may be
> network related, as when I was using the mainline kernel I was transferring
> some files to the storage disk and this message came up, and the transfer
> seemed to stop at the same time:
> >
> > Sep  1 15:31:22 nas1-rds NetworkManager[724]: <error>
> [1441085482.078646] [platform/nm-linux-platform.c:2133] sysctl_set():
> sysctl: failed to set '/proc/sys/net/ipv6/conf/eth0/mtu' to '9000': (22)
> Invalid argument
> >
> > I think maybe the key info to troubleshooting is that it seems to be OK
> for files under 1GB.
> >
> > Any ideas would be appreciated.
> >
> > Cheers,
> > Raf
> >
> >
> > Sep  1 16:04:15 nas1-rds kernel: INFO: task kworker/u8:1:60 blocked for
> more than 120 seconds.
> > Sep  1 16:04:15 nas1-rds kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep  1 16:04:15 nas1-rds kernel: kworker/u8:1    D ffff88023fd93680
>  0    60      2 0x00000000
> > Sep  1 16:04:15 nas1-rds kernel: Workqueue: writeback
> bdi_writeback_workfn (flush-252:80)
> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c136b0 0000000000000046
> ffff8802313c4440 ffff880230c13fd8
> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c13fd8 ffff880230c13fd8
> ffff8802313c4440 ffff88023fd93f48
> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c137b0 ffff880230fbcb08
> ffffe8ffffd80ec0 ffff88022e827590
> > Sep  1 16:04:15 nas1-rds kernel: Call Trace:
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8160955d>]
> io_schedule+0x9d/0x130
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b8d5f>] bt_get+0x10f/0x1a0
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81098230>] ?
> wake_up_bit+0x30/0x30
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b90ef>]
> blk_mq_get_tag+0xbf/0xf0
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b4f3b>]
> __blk_mq_alloc_request+0x1b/0x1f0
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b68a1>]
> blk_mq_map_request+0x181/0x1e0
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b7a1a>]
> blk_sq_make_request+0x9a/0x380
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa28f>] ?
> generic_make_request_checks+0x24f/0x380
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa4a2>]
> generic_make_request+0xe2/0x130
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa561>]
> submit_bio+0x71/0x150
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01ddc55>]
> ext4_io_submit+0x25/0x50 [ext4]
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01dde09>]
> ext4_bio_write_page+0x159/0x2e0 [ext4]
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d4f6d>]
> mpage_submit_page+0x5d/0x80 [ext4]
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d5232>]
> mpage_map_and_submit_buffers+0x172/0x2a0 [ext4]
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01da313>]
> ext4_writepages+0x733/0xd60 [ext4]
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81162b6e>]
> do_writepages+0x1e/0x40
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811efe10>]
> __writeback_single_inode+0x40/0x220
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0b0e>]
> writeback_sb_inodes+0x25e/0x420
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0d6f>]
> __writeback_inodes_wb+0x9f/0xd0
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f15b3>]
> wb_writeback+0x263/0x2f0
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f2aec>]
> bdi_writeback_workfn+0x1cc/0x460
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108f0ab>]
> process_one_work+0x17b/0x470
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fe8b>]
> worker_thread+0x11b/0x400
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fd70>] ?
> rescuer_thread+0x400/0x400
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ?
> kthread_create_on_node+0x140/0x140
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81613cfc>]
> ret_from_fork+0x7c/0xb0
> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ?
> kthread_create_on_node+0x140/0x140
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Rafael Lopez
Data Storage Administrator
Servers & Storage (eSolutions)
+61 3 990 59118
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to