Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

Mehmet Tue, 15 Aug 2017 13:45:04 -0700

I am Not Sure but perhaps nodown/out could help to Finish?

- Mehmet


Am 15. August 2017 16:01:57 MESZ schrieb Andreas Calminder 
<andreas.calmin...@klarna.com>:
>Hi,
>I got hit with osd suicide timeouts while deep-scrub runs on a
>specific pg, there's a RH article
>(https://access.redhat.com/solutions/2127471) suggesting changing
>osd_scrub_thread_suicide_timeout' from 60s to a higher value, problem
>is the article is for Hammer and the osd_scrub_thread_suicide_timeout
>doesn't exist when running
>ceph daemon osd.34 config show
>and the default timeout (60s) suggested in the article doesn't really
>match the sucide timeout time in the logs:
>
>2017-08-15 15:39:37.512216 7fb293137700  1 heartbeat_map is_healthy
>'OSD::osd_op_tp thread 0x7fb231adf700' had suicide timed out after 150
>2017-08-15 15:39:37.518543 7fb293137700 -1 common/HeartbeatMap.cc: In
>function 'bool ceph::HeartbeatMap::_check(const
>ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fb293137700
>time 2017-08-15 15:39:37.512230
>common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")
>
>The suicide timeout (150) does match the
>osd_op_thread_suicide_timeout, however when I try changing this I get:
>ceph daemon osd.34 config set osd_op_thread_suicide_timeout 300
>{
>    "success": "osd_op_thread_suicide_timeout = '300' (unchangeable) "
>}
>
>And the deep scrub will sucide timeout after 150 seconds, just like
>before.
>
>The cluster is left with osd.34 flapping. Is there any way to let the
>deep-scrub finish and get out of the infinite deep-scrub loop?
>
>Regards,
>Andreas
>_______________________________________________
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

Reply via email to