Hello,
What is your current setup, 1 server pet data center with 12 osd each?
What is your current crush rule and LRC crush rule?
On Fri, Apr 28, 2023, 12:29 Michel Jouvin
<michel.jou...@ijclab.in2p3.fr> wrote:
Hi,
I think I found a possible cause of my PG down but still
understand why.
As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9,
m=6) but I have only 12 OSD servers in the cluster. To workaround the
problem I defined the failure domain as 'osd' with the reasoning
that as
I was using the LRC plugin, I had the warranty that I could loose
a site
without impact, thus the possibility to loose 1 OSD server. Am I
wrong?
Best regards,
Michel
Le 24/04/2023 à 13:24, Michel Jouvin a écrit :
> Hi,
>
> I'm still interesting by getting feedback from those using the LRC
> plugin about the right way to configure it... Last week I upgraded
> from Pacific to Quincy (17.2.6) with cephadm which is doing the
> upgrade host by host, checking if an OSD is ok to stop before
actually
> upgrading it. I had the surprise to see 1 or 2 PGs down at some
points
> in the upgrade (happened not for all OSDs but for every
> site/datacenter). Looking at the details with "ceph health
detail", I
> saw that for these PGs there was 3 OSDs down but I was expecting
the
> pool to be resilient to 6 OSDs down (5 for R/W access) so I'm
> wondering if there is something wrong in our pool configuration
(k=9,
> m=6, l=5).
>
> Cheers,
>
> Michel
>
> Le 06/04/2023 à 08:51, Michel Jouvin a écrit :
>> Hi,
>>
>> Is somebody using LRC plugin ?
>>
>> I came to the conclusion that LRC k=9, m=3, l=4 is not the
same as
>> jerasure k=9, m=6 in terms of protection against failures and
that I
>> should use k=9, m=6, l=5 to get a level of resilience >= jerasure
>> k=9, m=6. The example in the documentation (k=4, m=2, l=3)
suggests
>> that this LRC configuration gives something better than
jerasure k=4,
>> m=2 as it is resilient to 3 drive failures (but not 4 if I
understood
>> properly). So how many drives can fail in the k=9, m=6, l=5
>> configuration first without loosing RW access and second without
>> loosing data?
>>
>> Another thing that I don't quite understand is that a pool created
>> with this configuration (and failure domain=osd,
locality=datacenter)
>> has a min_size=3 (max_size=18 as expected). It seems wrong to
me, I'd
>> expected something ~10 (depending on answer to the previous
question)...
>>
>> Thanks in advance if somebody could provide some sort of
>> authoritative answer on these 2 questions. Best regards,
>>
>> Michel
>>
>> Le 04/04/2023 à 15:53, Michel Jouvin a écrit :
>>> Answering to myself, I found the reason for 2147483647: it's
>>> documented as a failure to find enough OSD (missing OSDs). And
it is
>>> normal as I selected different hosts for the 15 OSDs but I
have only
>>> 12 hosts!
>>>
>>> I'm still interested by an "expert" to confirm that LRC k=9,
m=3,
>>> l=4 configuration is equivalent, in terms of redundancy, to a
>>> jerasure configuration with k=9, m=6.
>>>
>>> Michel
>>>
>>> Le 04/04/2023 à 15:26, Michel Jouvin a écrit :
>>>> Hi,
>>>>
>>>> As discussed in another thread (Crushmap rule for
multi-datacenter
>>>> erasure coding), I'm trying to create an EC pool spanning 3
>>>> datacenters (datacenters are present in the crushmap), with the
>>>> objective to be resilient to 1 DC down, at least keeping the
>>>> readonly access to the pool and if possible the read-write
access,
>>>> and have a storage efficiency better than 3 replica (let say a
>>>> storage overhead <= 2).
>>>>
>>>> In the discussion, somebody mentioned LRC plugin as a possible
>>>> jerasure alternative to implement this without tweaking the
>>>> crushmap rule to implement the 2-step OSD allocation. I
looked at
>>>> the documentation
>>>>
(https://docs.ceph.com/en/latest/rados/operations/erasure-code-lrc/)
>>>> but I have some questions if someone has experience/expertise
with
>>>> this LRC plugin.
>>>>
>>>> I tried to create a rule for using 5 OSDs per datacenter (15 in
>>>> total), with 3 (9 in total) being data chunks and others being
>>>> coding chunks. For this, based of my understanding of
examples, I
>>>> used k=9, m=3, l=4. Is it right? Is this configuration
equivalent,
>>>> in terms of redundancy, to a jerasure configuration with k=9,
m=6?
>>>>
>>>> The resulting rule, which looks correct to me, is:
>>>>
>>>> --------
>>>>
>>>> {
>>>> "rule_id": 6,
>>>> "rule_name": "test_lrc_2",
>>>> "ruleset": 6,
>>>> "type": 3,
>>>> "min_size": 3,
>>>> "max_size": 15,
>>>> "steps": [
>>>> {
>>>> "op": "set_chooseleaf_tries",
>>>> "num": 5
>>>> },
>>>> {
>>>> "op": "set_choose_tries",
>>>> "num": 100
>>>> },
>>>> {
>>>> "op": "take",
>>>> "item": -4,
>>>> "item_name": "default~hdd"
>>>> },
>>>> {
>>>> "op": "choose_indep",
>>>> "num": 3,
>>>> "type": "datacenter"
>>>> },
>>>> {
>>>> "op": "chooseleaf_indep",
>>>> "num": 5,
>>>> "type": "host"
>>>> },
>>>> {
>>>> "op": "emit"
>>>> }
>>>> ]
>>>> }
>>>>
>>>> ------------
>>>>
>>>> Unfortunately, it doesn't work as expected: a pool created with
>>>> this rule ends up with its pages active+undersize, which is
>>>> unexpected for me. Looking at 'ceph health detail` output, I see
>>>> for each page something like:
>>>>
>>>> pg 52.14 is stuck undersized for 27m, current state
>>>> active+undersized, last acting
>>>>
[90,113,2147483647,103,64,147,164,177,2147483647,133,58,28,8,32,2147483647]
>>>>
>>>> For each PG, there is 3 '2147483647' entries and I guess it
is the
>>>> reason of the problem. What are these entries about? Clearly
it is
>>>> not OSD entries... Looks like a negative number, -1, which in
terms
>>>> of crushmap ID is the crushmap root (named "default" in our
>>>> configuration). Any trivial mistake I would have made?
>>>>
>>>> Thanks in advance for any help or for sharing any successful
>>>> configuration?
>>>>
>>>> Best regards,
>>>>
>>>> Michel
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io