We don't have thousands but these RBDs are in a pool backed by ~600ish.

I can see the fd count is up well past 10k, closer to 15k when I use a
decent number of RBDs (eg. 16 or 32) and seems to increase more the bigger
the file I write. Procs are almost 30k when writing a 50GB file across that
number of OSDs.

the change in qemu.conf worked for me, using rhel7.1 with systemd.


On 3 September 2015 at 19:46, Jan Schermer <j...@schermer.cz> wrote:

> You're like the 5th person here (including me) that was hit by this.
>
> Could I get some input from someone using CEPH with RBD and thousands of
> OSDs? How high did you have to go?
>
> I only have ~200 OSDs and I had to bump the limit up to 10000 for VMs that
> have multiple volumes attached, this doesn't seem right? I understand this
> is the effect of striping a volume accross multiple PGs, but shouldn't this
> be more limited or somehow garbage collected?
>
> And to get deeper - I suppose there will be one connection from QEMU to
> OSD for each NCQ queue? Or how does this work? blk-mq will likely be
> different again... Or is it decoupled from the virtio side of things by RBD
> cache if that's enabled?
>
> Anyway, out of the box, at least on OpenStack installations
> 1) anyone having more than a few OSDs should really bump this up by
> default.
> 2) librbd should handle this situation gracefully by recycling
> connections, instead of hanging
> 3) at least we should get a warning somewhere (in the libvirt/qemu log) -
> I don't think there's anything when the issue hits
>
> Should I make tickets for this?
>
> Jan
>
> On 03 Sep 2015, at 02:57, Rafael Lopez <rafael.lo...@monash.edu> wrote:
>
> Hi Jan,
>
> Thanks for the advice, hit the nail on the head.
>
> I checked the limits and watched the no. of fd's and as it reached the
> soft limit (1024) thats when the transfer came to a grinding halt and the
> vm started locking up.
>
> After your reply I also did some more googling and found another old
> thread:
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-December/026187.html
>
> I increased the max_files in qemu.conf and restarted libvirtd and the VM
> (as per Dan's solution in thread above), and now it seems to be happy
> copying any size files to the rbd. Confirmed the fd count is going past the
> previous soft limit of 1024 also.
>
> Thanks again!!
> Raf
>
> On 2 September 2015 at 18:44, Jan Schermer <j...@schermer.cz> wrote:
>
>> 1) Take a look at the number of file descriptors the QEMU process is
>> using, I think you are over the limits
>>
>> pid=pid of qemu process
>>
>> cat /proc/$pid/limits
>> echo /proc/$pid/fd/* | wc -w
>>
>> 2) Jumbo frames may be the cause, are they enabled on the rest of the
>> network? In any case, get rid of NetworkManager ASAP and set it manually,
>> though it looks like your NIC might not support them.
>>
>> Jan
>>
>>
>>
>> > On 02 Sep 2015, at 01:44, Rafael Lopez <rafael.lo...@monash.edu> wrote:
>> >
>> > Hi ceph-users,
>> >
>> > Hoping to get some help with a tricky problem. I have a rhel7.1 VM
>> guest (host machine also rhel7.1) with root disk presented from ceph
>> 0.94.2-0 (rbd) using libvirt.
>> >
>> > The VM also has a second rbd for storage presented from the same ceph
>> cluster, also using libvirt.
>> >
>> > The VM boots fine, no apparent issues with the OS root rbd. I am able
>> to mount the storage disk in the VM, and create a file system. I can even
>> transfer small files to it. But when I try to transfer a moderate size
>> files, eg. greater than 1GB, it seems to slow to a grinding halt and
>> eventually it locks up the whole system, and generates the kernel messages
>> below.
>> >
>> > I have googled some *similar* issues around, but haven't come across
>> some solid advice/fix. So far I have tried modifying the libvirt disk cache
>> settings, tried using the latest mainline kernel (4.2+), different file
>> systems (ext4, xfs, zfs) all produce similar results. I suspect it may be
>> network related, as when I was using the mainline kernel I was transferring
>> some files to the storage disk and this message came up, and the transfer
>> seemed to stop at the same time:
>> >
>> > Sep  1 15:31:22 nas1-rds NetworkManager[724]: <error>
>> [1441085482.078646] [platform/nm-linux-platform.c:2133] sysctl_set():
>> sysctl: failed to set '/proc/sys/net/ipv6/conf/eth0/mtu' to '9000': (22)
>> Invalid argument
>> >
>> > I think maybe the key info to troubleshooting is that it seems to be OK
>> for files under 1GB.
>> >
>> > Any ideas would be appreciated.
>> >
>> > Cheers,
>> > Raf
>> >
>> >
>> > Sep  1 16:04:15 nas1-rds kernel: INFO: task kworker/u8:1:60 blocked for
>> more than 120 seconds.
>> > Sep  1 16:04:15 nas1-rds kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > Sep  1 16:04:15 nas1-rds kernel: kworker/u8:1    D ffff88023fd93680
>>  0    60      2 0x00000000
>> > Sep  1 16:04:15 nas1-rds kernel: Workqueue: writeback
>> bdi_writeback_workfn (flush-252:80)
>> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c136b0 0000000000000046
>> ffff8802313c4440 ffff880230c13fd8
>> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c13fd8 ffff880230c13fd8
>> ffff8802313c4440 ffff88023fd93f48
>> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c137b0 ffff880230fbcb08
>> ffffe8ffffd80ec0 ffff88022e827590
>> > Sep  1 16:04:15 nas1-rds kernel: Call Trace:
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8160955d>]
>> io_schedule+0x9d/0x130
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b8d5f>] bt_get+0x10f/0x1a0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81098230>] ?
>> wake_up_bit+0x30/0x30
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b90ef>]
>> blk_mq_get_tag+0xbf/0xf0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b4f3b>]
>> __blk_mq_alloc_request+0x1b/0x1f0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b68a1>]
>> blk_mq_map_request+0x181/0x1e0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b7a1a>]
>> blk_sq_make_request+0x9a/0x380
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa28f>] ?
>> generic_make_request_checks+0x24f/0x380
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa4a2>]
>> generic_make_request+0xe2/0x130
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa561>]
>> submit_bio+0x71/0x150
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01ddc55>]
>> ext4_io_submit+0x25/0x50 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01dde09>]
>> ext4_bio_write_page+0x159/0x2e0 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d4f6d>]
>> mpage_submit_page+0x5d/0x80 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d5232>]
>> mpage_map_and_submit_buffers+0x172/0x2a0 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01da313>]
>> ext4_writepages+0x733/0xd60 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81162b6e>]
>> do_writepages+0x1e/0x40
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811efe10>]
>> __writeback_single_inode+0x40/0x220
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0b0e>]
>> writeback_sb_inodes+0x25e/0x420
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0d6f>]
>> __writeback_inodes_wb+0x9f/0xd0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f15b3>]
>> wb_writeback+0x263/0x2f0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f2aec>]
>> bdi_writeback_workfn+0x1cc/0x460
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108f0ab>]
>> process_one_work+0x17b/0x470
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fe8b>]
>> worker_thread+0x11b/0x400
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fd70>] ?
>> rescuer_thread+0x400/0x400
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ?
>> kthread_create_on_node+0x140/0x140
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81613cfc>]
>> ret_from_fork+0x7c/0xb0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ?
>> kthread_create_on_node+0x140/0x140
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Rafael Lopez
> Data Storage Administrator
> Servers & Storage (eSolutions)
> +61 3 990 59118
>
>
>


-- 
Rafael Lopez
Data Storage Administrator
Servers & Storage (eSolutions)
+61 3 990 59118
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to