I am Not Sure but perhaps nodown/out could help to Finish? - Mehmet
Am 15. August 2017 16:01:57 MESZ schrieb Andreas Calminder <andreas.calmin...@klarna.com>: >Hi, >I got hit with osd suicide timeouts while deep-scrub runs on a >specific pg, there's a RH article >(https://access.redhat.com/solutions/2127471) suggesting changing >osd_scrub_thread_suicide_timeout' from 60s to a higher value, problem >is the article is for Hammer and the osd_scrub_thread_suicide_timeout >doesn't exist when running >ceph daemon osd.34 config show >and the default timeout (60s) suggested in the article doesn't really >match the sucide timeout time in the logs: > >2017-08-15 15:39:37.512216 7fb293137700 1 heartbeat_map is_healthy >'OSD::osd_op_tp thread 0x7fb231adf700' had suicide timed out after 150 >2017-08-15 15:39:37.518543 7fb293137700 -1 common/HeartbeatMap.cc: In >function 'bool ceph::HeartbeatMap::_check(const >ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fb293137700 >time 2017-08-15 15:39:37.512230 >common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout") > >The suicide timeout (150) does match the >osd_op_thread_suicide_timeout, however when I try changing this I get: >ceph daemon osd.34 config set osd_op_thread_suicide_timeout 300 >{ > "success": "osd_op_thread_suicide_timeout = '300' (unchangeable) " >} > >And the deep scrub will sucide timeout after 150 seconds, just like >before. > >The cluster is left with osd.34 flapping. Is there any way to let the >deep-scrub finish and get out of the infinite deep-scrub loop? > >Regards, >Andreas >_______________________________________________ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com