Thanks Sage. We will provide a patch based on this. Thanks, Guang
On Aug 20, 2014, at 11:19 PM, Sage Weil <sw...@redhat.com> wrote: > On Wed, 20 Aug 2014, Guang Yang wrote: >> Thanks Greg. >> On Aug 20, 2014, at 6:09 AM, Gregory Farnum <g...@inktank.com> wrote: >> >>> On Mon, Aug 18, 2014 at 11:30 PM, Guang Yang <yguan...@outlook.com> wrote: >>>> Hi ceph-devel, >>>> David (cc?ed) reported a bug (http://tracker.ceph.com/issues/9128) which >>>> we came across in our test cluster during our failure testing, basically >>>> the way to reproduce it was to leave one OSD daemon down and in for a day, >>>> at the same time, keep giving write traffic. When the OSD daemon was >>>> started again, it hit suicide timeout and kill itself. >>>> >>>> After some analysis (details in the bug), David found that the op thread >>>> was busy searching for missing objects and once the volume to search >>>> increase, the thread is expected to work that long time, please refer to >>>> the bug for detailed logs. >>> >>> Can you talk a little more about what's going on here? At a quick >>> naive glance, I'm not seeing why leaving an OSD down and in should >>> require work based on the amount of write traffic. Perhaps if the rest >>> of the cluster was changing mappings?? >> We increased the down to out time interval from 5 minutes to 2 days to >> avoid migrating data back and forth which could increase latency, so >> that we target to mark OSD out manually. To achieve such, we are testing >> against some boundary cases to let the OSD down and in for like 1 day, >> however, when we try to bring it up again, it always failed due to hit >> the suicide timeout. > > Looking at the log snippet I see the PG had log range > > 5481'28667,5646'34066 > > Which is ~5500 log events. The default max is 10k. search_for_missing is > basically going to iterate over this list and check if the object is > present locally. > > If that's slow enough to trigger a suicide (which it seems to be), teh > fix is simple: as Greg says we just need to make it probe the internel > heartbeat code to indicate progress. In most contexts this is done by > passing a ThreadPool::TPHandle &handle into each method and then > calling handle.reset_tp_timeout() on each iteration. The same needs to be > done for search_for_missing... > > sage > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html