On Tue, Aug 4, 2015 at 3:23 PM, GuangYang <[email protected]> wrote: > Thanks for Sage, Yehuda and Sam's quick reply. > > Given the discussion so far, could I summarize into the following bullet > points: > > 1> The first step we would like to pursue is to implement the following > mechanism to avoid infinite waiting at radosgw side: > 1.1. radosgw - send OP with a *fast_fail* flag > 1.2. OSD - reply with -EAGAIN if the PG is *inactive* and the > *fast_fail* flag is set > 1.3. radosgw - upon receiving -EAGAIN, retry till a timeout interval is > reached (properly with some back-off?), and if it eventually fails, convert > -EAGAIN to some other error code and passes to upper layer.
I'm not crazy about the 'fast_fail' name, maybe we can come up with a better describing term. Also, not 100% sure the EAGAIN is the error we want to see. Maybe the flag on the request could specify what would be the error code to return in this case? I think it's a good plan to start with, we can adjust things later. > > 2> In terms of management of radosgw's worker threads, I think we either > pursue Sage's proposal (which could linearly increase the time it takes to > stuck all worker threads depending how many threads we expand), or simply try > sharding work queue (which we already has some basic building block)? The problem that I see with that proposal (missed it earlier, only seeing it now), is that when the threads actually wake up the system could become unusable. In any case, it's probably a lower priority at this point, we could rethink this area again later. Yehuda > > Can I start working on patch for <1> and then <2> as a lower priority? > > Thanks, > Guang > ---------------------------------------- >> Date: Tue, 4 Aug 2015 10:14:06 -0700 >> Subject: Re: radosgw - stuck ops >> From: [email protected] >> To: [email protected] >> CC: [email protected]; [email protected]; [email protected]; >> [email protected] >> >> On Tue, Aug 4, 2015 at 10:03 AM, Sage Weil <[email protected]> wrote: >>> On Tue, 4 Aug 2015, Yehuda Sadeh-Weinraub wrote: >>>> On Tue, Aug 4, 2015 at 9:55 AM, Sage Weil <[email protected]> wrote: >>>>>> One solution that I can think of is to determine before the read/write >>>>>> whether the pg we're about to access is healthy (or has been unhealthy >>>>>> for a >>>>>> short period of time), and if not to cancel the request before sending >>>>>> the >>>>>> operation. This could mitigate the problem you're seeing at the expense >>>>>> of >>>>>> availability in some cases. We'd need to have a way to query pg health >>>>>> through librados which we don't have right now afaik. >>>>>> Sage / Sam, does that make sense, and/or possible? >>>>> >>>>> This seems mostly impossible because we don't know ahead of time which >>>>> PG(s) a request is going to touch (it'll generally be a lot of them)? >>>>> >>>> >>>> Barring pgls() and such, each rados request that radosgw produces will >>>> only touch a single pg, right? >>> >>> Oh, yeah. I thought you meant before each RGW request. If it's at the >>> rados level then yeah, you could avoid stuck pgs, although I think a >>> better approach would be to make the OSD reply with -EAGAIN in that case >>> so that you know the op didn't happen. There would still be cases (though >>> more rare) where you weren't sure if the op happened or not (e.g., when >>> you send to osd A, it goes down, you resend to osd B, and then you get >>> EAGAIN/timeout). >> >> If done on the client side then we should only make it apply to the >> first request sent. Is it actually a problem if the osd triggered the >> error? >> >>> >>> What would you do when you get that failure/timeout, though? Is it >>> practical to abort the rgw request handling completely? >>> >> >> It should be like any error that happens through the transaction >> (e.g., client disconnection). >> >> Yehuda >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to [email protected] >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
