Thanks Sam for the detailed explanation. It is very clear and helpful. When that happened, there is no client traffic, but we did have recovery traffic during the time, and I think that should be the same.
Thanks, Guang ---------------------------------------- > Date: Thu, 13 Nov 2014 11:29:23 -0800 > Subject: Re: PG down > From: sam.j...@inktank.com > To: yguan...@outlook.com > CC: ceph-devel@vger.kernel.org > > Right, if you think about it, any objects written during the time > without 1,2,3 really do require 4 to recover. You can reduce the risk > of this by setting min_size to something greater than 8, but you also > won't be able to recover with fewer than min_size, so if you set > min_size to 9 and lose 1,2,3, you won't have lost data, but you won't > be able to recover until you reduce min_size. It's mainly there so > that you won't accept writes during a brief outage which brings you > down to 8. Note, I think you could have marked osd 8 lost and then > marked the unrecoverable objects lost. > -Sam > > On Thu, Nov 13, 2014 at 11:20 AM, GuangYang <yguan...@outlook.com> wrote: >> Thanks Sam for the quick response. Just want to make sure I understand it >> correctly: >> >> If we have [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] and all of 1,2,3 are down, >> the PG is active as we are using 8 + 3, and once 4 is down, even though we >> bring up 1,2,3, the PG could not become active unless we bring 4 up. Is my >> understanding correct here? >> >> Thanks, >> Guang >> >> ---------------------------------------- >>> Date: Thu, 13 Nov 2014 09:06:27 -0800 >>> Subject: Re: PG down >>> From: sam.j...@inktank.com >>> To: yguan...@outlook.com >>> CC: ceph-devel@vger.kernel.org >>> >>> It looks like the acting set went down to the min allowable size and >>> went active with osd 8. At that point you needed every member of that >>> acting set to go active later on to avoiding loosing writes. You can >>> prevent this by setting a min_size above the number of data chunks. >>> -Sam >>> >>> On Thu, Nov 13, 2014 at 4:15 AM, GuangYang <yguan...@outlook.com> wrote: >>>> Hi Sam, >>>> Yesterday there was one PG down in our cluster and I am confused by the PG >>>> state, I am not sure if it is a bug (or an issue has been fixed as I see a >>>> couple of related fixes in giant), it would be nice you can help to take a >>>> look. >>>> >>>> Here is what happened: >>>> >>>> We are using EC pool with 8 data chunks and 3 code chunks, saying the PG >>>> has up/acting set as [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], there was one >>>> OSD in the set down and up, so that it triggered PG recovering. However, >>>> when doing recover, the primary OSD crash as due to a corrupted file >>>> chunk, then another OSD become primary, start recover and crashed, and so >>>> on so forth until there are 4 OSDs down in the set and the PG is marked >>>> down. >>>> >>>> After that, we left the OSD having corrupted data down and started all >>>> other crashed OSDs, we expected the PG could become active, however, the >>>> PG is still down with the following query information: >>>> >>>> { "state": "down+remapped+inconsistent+peering", >>>> "epoch": 4469, >>>> "up": [ >>>> 377, >>>> 107, >>>> 328, >>>> 263, >>>> 395, >>>> 467, >>>> 352, >>>> 475, >>>> 333, >>>> 37, >>>> 380], >>>> "acting": [ >>>> 2147483647, >>>> 107, >>>> 328, >>>> 263, >>>> 395, >>>> 2147483647, >>>> 352, >>>> 475, >>>> 333, >>>> 37, >>>> 380], >>>> ... >>>> 377]}], >>>> "probing_osds": [ >>>> "37(9)", >>>> "107(1)", >>>> "263(3)", >>>> "328(2)", >>>> "333(8)", >>>> "352(6)", >>>> "377(0)", >>>> "380(10)", >>>> "395(4)", >>>> "467(5)", >>>> "475(7)"], >>>> "blocked": "peering is blocked due to down osds", >>>> "down_osds_we_would_probe": [ >>>> 8], >>>> "peering_blocked_by": [ >>>> { "osd": 8, >>>> "current_lost_at": 0, >>>> "comment": "starting or marking this osd lost may let us proceed"}]}, >>>> { "name": "Started", >>>> "enter_time": "2014-11-12 10:12:23.067369"}], >>>> } >>>> >>>> Here osd.8 is the one having corrupted data. >>>> >>>> The way we worked around this issue is to set norecover and start osd.8, >>>> get that PG active and then removed the object (via rados), unset >>>> norecover and things become clean again. But the most confusing part is >>>> that even we only left osd.8 down, the PG couldn't become active. >>>> >>>> We are using firefly v0.80.4. >>>> >>>> Thanks, >>>> Guang >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html