I can almost guarantee what you're seeing is PG subfolder splitting. When the subfolders in a PG get X number of objects, it splits into 16 subfolders. Every cluster I manage has blocked requests and OSDs that get marked down while this is happening. To stop the OSDs getting marked down, I increase the osd_heartbeat_grace until the OSDs no longer mark themselves down during this process. Based on your email, it looks like starting at 5 minutes would be a good place. The blocked requests will still persist, but the OSDs aren't being marked down regularly and adding peering to the headache.
In 10.2.5 and 0.94.9, there was a way to take an OSD offline and tell it to split the subfolders of its PGs. I haven't done this yet, myself, but plan to figure it out the next time I come across this sort of behavior. On Wed, Apr 12, 2017 at 8:55 AM Jogi Hofmüller <j...@mur.at> wrote: > Dear all, > > we run a small cluster [1] that is exclusively used for virtualisation > (kvm/libvirt). Recently we started to run into performance problems > (slow requests, failing OSDs) for no *obvious* reason (at least not for > us). > > We do nightly snapshots of VM images and keep the snapshots for 14 > days. Currently we run 8 VMs in the cluster. > > At first it looked like the problem was related to snapshotting images > of VMs that were up and running (respectively deleting the snapshots > after 14 days). So we changed the procedure to first suspend the VM and > the snapshot its image(s). Snapshots are made at 4 am. > > When we removed *all* the old snapshots (the ones done of running VMs) > the cluster suddenly behaved 'normal' again, but after two days of > creating snapshots (not deleting any) of suspended VMs, the slow > requests started again (although by far not as frequent as before). > > This morning we experienced subsequent failures (e.g. osd.2 > IPv4:6800/1621 failed (2 reporters from different host after 49.976472 > >= grace 46.444312) of 4 of our 6 OSDs, resulting in HEALTH_WARN with > up to about 20% of PGs active+undersized+degraded or stale+active+clean > or remapped+peering. No OSD failure lasted longer than 4 minutes. After > 15 minutes everything was back to normal again. The noise started at > 6:25 am, a time when cron.daily scripts run here. > > We have no clue what could have caused this behavior :( There seems to > be no shortage of resources (CPU, RAM, network) that would explain what > happened, but maybe we did not look in the right places. So any hint on > where to look/what to look for would be greatly appreciated :) > > [1] cluster setup > > Three nodes: ceph1, ceph2, ceph3 > > ceph1 and ceph2 > > 1x Intel(R) Xeon(R) CPU E3-1275 v3 @ 3.50GHz > 32 GB RAM > RAID1 for OS > 1x Intel 530 Series SSDs (120GB) for Journals > 3x WDC WD2500BUCT-63TWBY0 for OSDs (1TB) > 2x Gbit Ethernet bonded (802.3ad) on HP 2920 Stack > > ceph3 > > virtual machine > 1 CPU > 4 GB RAM > > Software > > Debian GNU/Linux Jessie (8.7) > Kernel 3.16 > ceph 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f) > > Ceph Services > > 3 Monitors: ceph1, ceph2, ceph3 > > 6 OSDs: ceph1 (3), ceph2 (3) > > Regards, > -- > J.Hofmüller > > Nisiti > - Abie Nathan, 1927-2008 > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com