This is an expensive operation. You want to slow it down, not burden the OSDs.
> On Mar 21, 2020, at 5:46 AM, Jan Pekař - Imatic <jan.pe...@imatic.cz> wrote: > > Each node has 64GB RAM so it should be enough (12 OSD's = 48GB used). > >> On 21/03/2020 13.14, XuYun wrote: >> Bluestore requires more than 4G memory per OSD, do you have enough memory? >> >>> 2020年3月21日 下午8:09,Jan Pekař - Imatic <jan.pe...@imatic.cz> 写道: >>> >>> Hello, >>> >>> I have ceph cluster version 14.2.7 >>> (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) >>> >>> 4 nodes - each node 11 HDD, 1 SSD, 10Gbit network >>> >>> Cluster was empty, fresh install. We filled cluster with data (small >>> blocks) using RGW. >>> >>> Cluster is now used for testing so no client was using it during my admin >>> operations mentioned below >>> >>> After a while (7TB of data / 40M objects uploaded) we decided, that we >>> increase pg_num from 128 to 256 to better spread data and to speedup this >>> operation, I've set >>> >>> ceph config set mgr target_max_misplaced_ratio 1 >>> >>> so that whole cluster rebalance as quickly as it can. >>> >>> I have 3 issues/questions below: >>> >>> 1) >>> >>> I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to >>> restart with logged >>> >>> heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had >>> suicide timed out after 150 >>> >>> after a while OSD's were back so I continued after a while with my tests. >>> >>> My question - increasing number of PG with maximal >>> target_max_misplaced_ratio was too much for that OSDs? It is not >>> recommended to do it this way? I had no problem with this increase before, >>> but configuration of cluster was slightly different and it was luminous >>> version. >>> >>> 2) >>> >>> Rebuild was still slow so I increased number of backfills >>> >>> ceph tell osd.* injectargs "--osd-max-backfills 10" >>> >>> and reduced recovery sleep time >>> >>> ceph tell osd.* injectargs "--osd-recovery-sleep-hdd 0.01" >>> >>> and after few hours I noticed, that some of my OSD's were restarted during >>> recovery, in log I can see >>> >>> ... >>> >>> |2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy >>> 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 >>> 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread >>> 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 7fe1da154700 >>> 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had >>> timed out after 15 2020-03-21 06:41:36.888 7fe1e7769700 0 >>> log_channel(cluster) log [WRN] : Monitor daemon marked osd.7 down, but it >>> is still running 2020-03-21 06:41:36.888 7fe1e7769700 0 >>> log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 >>> 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574 start_waiting_for_healthy >>> | >>> >>> I observed network graph usage and network utilization was low during >>> recovery (10Gbit was not saturated). >>> >>> So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought >>> that OSD is using threads and HDD timeouts are not influencing heartbeats >>> to other OSD's and MON. It looks like it is not true. >>> >>> 3) >>> >>> After OSD was wrongly marked down I can see that cluster has object >>> degraded. There were no degraded object before that. >>> >>> Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 >>> pgs degraded, 8 pgs undersized >>> >>> It means that this OSD disconnection causes data degraded? How is it >>> possible, when no OSD was lost. Data should be on that OSD and after >>> peering should be everything OK. With luminous I had no problem, after OSD >>> up degraded objects where recovered/found during few seconds and cluster >>> was healthy within seconds. >>> >>> Thank you very much for additional info. I can perform additional tests you >>> recommend because cluster is used for testing purpose now. >>> >>> With regards >>> Jan Pekar >>> >>> -- >>> ============ >>> Ing. Jan Pekař >>> jan.pe...@imatic.cz >>> ---- >>> Imatic | Jagellonská 14 | Praha 3 | 130 00 >>> http://www.imatic.cz | +420326555326 >>> ============ >>> -- >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > -- > ============ > Ing. Jan Pekař > jan.pe...@imatic.cz > ---- > Imatic | Jagellonská 14 | Praha 3 | 130 00 > http://www.imatic.cz | +420326555326 > ============ > -- > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io