Thanks Sam for the detailed explanation. It is very clear and helpful.

When that happened, there is no client traffic, but we did have recovery 
traffic during the time, and I think that should be the same.

Thanks,
Guang

----------------------------------------
> Date: Thu, 13 Nov 2014 11:29:23 -0800
> Subject: Re: PG down
> From: sam.j...@inktank.com
> To: yguan...@outlook.com
> CC: ceph-devel@vger.kernel.org
>
> Right, if you think about it, any objects written during the time
> without 1,2,3 really do require 4 to recover. You can reduce the risk
> of this by setting min_size to something greater than 8, but you also
> won't be able to recover with fewer than min_size, so if you set
> min_size to 9 and lose 1,2,3, you won't have lost data, but you won't
> be able to recover until you reduce min_size. It's mainly there so
> that you won't accept writes during a brief outage which brings you
> down to 8. Note, I think you could have marked osd 8 lost and then
> marked the unrecoverable objects lost.
> -Sam
>
> On Thu, Nov 13, 2014 at 11:20 AM, GuangYang <yguan...@outlook.com> wrote:
>> Thanks Sam for the quick response. Just want to make sure I understand it 
>> correctly:
>>
>> If we have [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] and all of 1,2,3 are down, 
>> the PG is active as we are using 8 + 3, and once 4 is down, even though we 
>> bring up 1,2,3, the PG could not become active unless we bring 4 up. Is my 
>> understanding correct here?
>>
>> Thanks,
>> Guang
>>
>> ----------------------------------------
>>> Date: Thu, 13 Nov 2014 09:06:27 -0800
>>> Subject: Re: PG down
>>> From: sam.j...@inktank.com
>>> To: yguan...@outlook.com
>>> CC: ceph-devel@vger.kernel.org
>>>
>>> It looks like the acting set went down to the min allowable size and
>>> went active with osd 8. At that point you needed every member of that
>>> acting set to go active later on to avoiding loosing writes. You can
>>> prevent this by setting a min_size above the number of data chunks.
>>> -Sam
>>>
>>> On Thu, Nov 13, 2014 at 4:15 AM, GuangYang <yguan...@outlook.com> wrote:
>>>> Hi Sam,
>>>> Yesterday there was one PG down in our cluster and I am confused by the PG 
>>>> state, I am not sure if it is a bug (or an issue has been fixed as I see a 
>>>> couple of related fixes in giant), it would be nice you can help to take a 
>>>> look.
>>>>
>>>> Here is what happened:
>>>>
>>>> We are using EC pool with 8 data chunks and 3 code chunks, saying the PG 
>>>> has up/acting set as [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], there was one 
>>>> OSD in the set down and up, so that it triggered PG recovering. However, 
>>>> when doing recover, the primary OSD crash as due to a corrupted file 
>>>> chunk, then another OSD become primary, start recover and crashed, and so 
>>>> on so forth until there are 4 OSDs down in the set and the PG is marked 
>>>> down.
>>>>
>>>> After that, we left the OSD having corrupted data down and started all 
>>>> other crashed OSDs, we expected the PG could become active, however, the 
>>>> PG is still down with the following query information:
>>>>
>>>> { "state": "down+remapped+inconsistent+peering",
>>>> "epoch": 4469,
>>>> "up": [
>>>> 377,
>>>> 107,
>>>> 328,
>>>> 263,
>>>> 395,
>>>> 467,
>>>> 352,
>>>> 475,
>>>> 333,
>>>> 37,
>>>> 380],
>>>> "acting": [
>>>> 2147483647,
>>>> 107,
>>>> 328,
>>>> 263,
>>>> 395,
>>>> 2147483647,
>>>> 352,
>>>> 475,
>>>> 333,
>>>> 37,
>>>> 380],
>>>> ...
>>>> 377]}],
>>>> "probing_osds": [
>>>> "37(9)",
>>>> "107(1)",
>>>> "263(3)",
>>>> "328(2)",
>>>> "333(8)",
>>>> "352(6)",
>>>> "377(0)",
>>>> "380(10)",
>>>> "395(4)",
>>>> "467(5)",
>>>> "475(7)"],
>>>> "blocked": "peering is blocked due to down osds",
>>>> "down_osds_we_would_probe": [
>>>> 8],
>>>> "peering_blocked_by": [
>>>> { "osd": 8,
>>>> "current_lost_at": 0,
>>>> "comment": "starting or marking this osd lost may let us proceed"}]},
>>>> { "name": "Started",
>>>> "enter_time": "2014-11-12 10:12:23.067369"}],
>>>> }
>>>>
>>>> Here osd.8 is the one having corrupted data.
>>>>
>>>> The way we worked around this issue is to set norecover and start osd.8, 
>>>> get that PG active and then removed the object (via rados), unset 
>>>> norecover and things become clean again. But the most confusing part is 
>>>> that even we only left osd.8 down, the PG couldn't become active.
>>>>
>>>> We are using firefly v0.80.4.
>>>>
>>>> Thanks,
>>>> Guang
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
                                          

Reply via email to