Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-24 Thread Oliver Freyermuth

On 2019-10-24 09:46, Janne Johansson wrote:

(Slightly abbreviated)

Den tors 24 okt. 2019 kl 09:24 skrev Frank Schilder mailto:fr...@dtu.dk>>:

  What I learned are the following:

1) Avoid this work-around too few hosts for EC rule at all cost.

2) Do not use EC 2+1. It does not offer anything interesting for 
production. Use 4+2 (or 8+2, 8+3 if you have the hosts).

3) If you have no perspective of getting at least 7 servers in the long run 
(4+2=6 for EC profile, +1 for fail-over automatic rebuild), do not go for EC.

4) Before you start thinking about replicating to a second site, you should 
have a primary site running solid first.

This is collected from my experience. I would do things different now and 
maybe it helps you with deciding how to proceed. Its basically about what 
resources can you expect in the foreseeable future and what compromises are you 
willing to make with regards to sleep and sanity.


Amen to all of those points. We did similar-but-not-same mistakes on an EC 
cluster here. You are going to produce more tears than I/O if you make these 
mis-designs mentioned above.
We could add:

5) Never buy SMR drives, pretend they don't even exist. If a similar technology 
appears tomorrow for cheap SSD/NVME, skip it.


Amen from my side, too. Luckily, we only made a small fraction of these 
mistakes (running 4+2 on 6 servers and wondering about funny effects when 
taking one server offline,
while we still were testing the setup, before we finally descided to ask for a 
7th server), but this can in parts be extrapolated.

Concerning SMR, I learnt that SMR-awareness is on Ceph's roadmap (for 
host-managed SMR drives). Once that is available, host-managed SMR drives 
should be a well-working and cheap solution
especially for backup / WORM workloads.
But as of for now, even disk vendors will tell you to avoid SMR for datacenter 
setups (unless you have a storage system aware of it and host-managed drives).

Cheers,
Oliver



--
May the most significant bit of your life be positive.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-24 Thread Janne Johansson
(Slightly abbreviated)

Den tors 24 okt. 2019 kl 09:24 skrev Frank Schilder :

>  What I learned are the following:
>
> 1) Avoid this work-around too few hosts for EC rule at all cost.
>
> 2) Do not use EC 2+1. It does not offer anything interesting for
> production. Use 4+2 (or 8+2, 8+3 if you have the hosts).
>
> 3) If you have no perspective of getting at least 7 servers in the long
> run (4+2=6 for EC profile, +1 for fail-over automatic rebuild), do not go
> for EC.
>
> 4) Before you start thinking about replicating to a second site, you
> should have a primary site running solid first.
>
> This is collected from my experience. I would do things different now and
> maybe it helps you with deciding how to proceed. Its basically about what
> resources can you expect in the foreseeable future and what compromises are
> you willing to make with regards to sleep and sanity.
>

Amen to all of those points. We did similar-but-not-same mistakes on an EC
cluster here. You are going to produce more tears than I/O if you make
these mis-designs mentioned above.
We could add:

5) Never buy SMR drives, pretend they don't even exist. If a similar
technology appears tomorrow for cheap SSD/NVME, skip it.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-24 Thread Frank Schilder
I have some experience with an EC set-up with 2 shards per host, failure-domain 
is host, and also some multi-site wishful thinking of users. What I learned are 
the following:

1) Avoid this work-around too few hosts for EC rule at all cost. There are two 
types of resiliency in ceph. One is against hardware fails and the other is 
against admin fails. Using a non-standard crush set-up to accommodate for a 
lack of hosts dramatically reduces resiliency against admin fails. You will 
have down-time due to simple mistakes. You will need to adjust also other 
defaults, like min_size, to be able to do anything on this cluster without 
downtime, sweating every time and praying that nothing goes wrong. Use this 
only if there is a short-term horizon that it will be over.

2) Do not use EC 2+1. It does not offer anything interesting for production. 
Use 4+2 (or 8+2, 8+3 if you have the hosts). Here you can operate with non-zero 
redundancy while doing maintenance (min_size=5).

3) If you have no perspective of getting at least 7 servers in the long run 
(4+2=6 for EC profile, +1 for fail-over automatic rebuild), do not go for EC. 
If this helps in your negotiations, tell everyone that they either give you 
more servers now and get low-cost storage, or have to pay for expensive 
replicated storage forever.

4) Before you start thinking about replicating to a second site, you should 
have a primary site running solid first. I was in exactly the same situation, 
people expecting wonders with giving me half the stuff I need only. Simply do 
not do it. I wasted a lot of time on impossible requests. With the hardware you 
have, I would ditch the second DC and rather start building up a solid first DC 
to be mirrored later when people move over bags with money. You have 6 servers. 
That's a good start for an 4+2 EC pool. You will not have fail-over capacity, 
but at least you don't have to work around too many exceptions. The one you 
should be aware of though is this one: 
https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/?highlight=erasure%20code%20pgs#crush-gives-up-too-soon
 . If you had 7 servers, you would be out of trouble.

This is collected from my experience. I would do things different now and maybe 
it helps you with deciding how to proceed. Its basically about what resources 
can you expect in the foreseeable future and what compromises are you willing 
to make with regards to sleep and sanity.

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: ceph-users  on behalf of Salsa 

Sent: 21 October 2019 17:31
To: Martin Verges
Cc: ceph-users
Subject: Re: [ceph-users] Can't create erasure coded pools with k+m greater 
than hosts?

Just to clarify my situation, We have 2 datacenters with 3 hosts each, 12 4TB 
disks each host (2 are RAID with OS installed and the remaining 10 are used for 
Ceph). Right now I'm trying a single DC installation and intended to migrate to 
multi site mirroring DC1 to DC2, so if we lose DC1 we can activate DC2 (NOTE: I 
have no idea how this is setup and have not planned at all; I thought of geting 
DC1 to work first and later set the mirroring)

I don't think I'll be able to change the setup in any way, so my next question 
is: Should I go with a replica 3 or would an erasure 2,1 be ok?

There's a very small chance we get 2 extra hosts for each DC in a near future, 
but we'll probably use all the available storage space in the nearer future.

We're trying to use as much space as possible.

Thanks;

--
Salsa

Sent with ProtonMail<https://protonmail.com> Secure Email.

‐‐‐ Original Message ‐‐‐
On Monday, October 21, 2019 2:53 AM, Martin Verges  
wrote:

Just don't do such setups for production, It will be a lot of pain, trouble, 
and cause you problems.

Just take a cheap system, put some of the disks in it and do a way way better 
deployment than something like 4+2 on 3 hosts. Whatever you do with that 
cluster (example kernel update, reboot, PSU failure, ...) causes you and all 
attached clients, especially bad with VMs on that Ceph cluster, to stop any IO 
or even crash completely.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io<mailto:martin.ver...@croit.io>
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Sa., 19. Okt. 2019 um 01:51 Uhr schrieb Chris Taylor 
mailto:ctay...@eyonic.com>>:
Full disclosure - I have not created an erasure code pool yet!

I have been wanting to do the same thing that you are attempting and
have these links saved. I believe this is what you are looking for.

This link is for decompiling the CRUSH rules and recompiling:

https://docs.ceph.com/docs/luminous/rados/

Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-21 Thread Salsa
Just to clarify my situation, We have 2 datacenters with 3 hosts each, 12 4TB 
disks each host (2 are RAID with OS installed and the remaining 10 are used for 
Ceph). Right now I'm trying a single DC installation and intended to migrate to 
multi site mirroring DC1 to DC2, so if we lose DC1 we can activate DC2 (NOTE: I 
have no idea how this is setup and have not planned at all; I thought of geting 
DC1 to work first and later set the mirroring)

I don't think I'll be able to change the setup in any way, so my next question 
is: Should I go with a replica 3 or would an erasure 2,1 be ok?

There's a very small chance we get 2 extra hosts for each DC in a near future, 
but we'll probably use all the available storage space in the nearer future.

We're trying to use as much space as possible.

Thanks;

--
Salsa

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐ Original Message ‐‐‐
On Monday, October 21, 2019 2:53 AM, Martin Verges  
wrote:

> Just don't do such setups for production, It will be a lot of pain, trouble, 
> and cause you problems.
>
> Just take a cheap system, put some of the disks in it and do a way way better 
> deployment than something like 4+2 on 3 hosts. Whatever you do with that 
> cluster (example kernel update, reboot, PSU failure, ...) causes you and all 
> attached clients, especially bad with VMs on that Ceph cluster, to stop any 
> IO or even crash completely.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
> Am Sa., 19. Okt. 2019 um 01:51 Uhr schrieb Chris Taylor :
>
>> Full disclosure - I have not created an erasure code pool yet!
>>
>> I have been wanting to do the same thing that you are attempting and
>> have these links saved. I believe this is what you are looking for.
>>
>> This link is for decompiling the CRUSH rules and recompiling:
>>
>> https://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/
>>
>> This link is for creating the EC rules for 4+2 with only 3 hosts:
>>
>> https://ceph.io/planet/erasure-code-on-small-clusters/
>>
>> I hope that helps!
>>
>> Chris
>>
>> On 2019-10-18 2:55 pm, Salsa wrote:
>>> Ok, I'm lost here.
>>>
>>> How am I supposed to write a crush rule?
>>>
>>> So far I managed to run:
>>>
>>> #ceph osd crush rule dump test -o test.txt
>>>
>>> So I can edit the rule. Now I have two problems:
>>>
>>> 1. Whats the functions and operations to use here? Is there
>>> documentation anywhere abuot this?
>>> 2. How may I create a crush rule using this file? 'ceph osd crush rule
>>> create ... -i test.txt' does not work.
>>>
>>> Am I taking the wrong approach here?
>>>
>>>
>>> --
>>> Salsa
>>>
>>> Sent with ProtonMail Secure Email.
>>>
>>> ‐‐‐ Original Message ‐‐‐
>>> On Friday, October 18, 2019 3:56 PM, Paul Emmerich
>>>  wrote:
>>>
 Default failure domain in Ceph is "host" (see ec profile), i.e., you
 need at least k+m hosts (but at least k+m+1 is better for production
 setups).
 You can change that to OSD, but that's not a good idea for a
 production setup for obvious reasons. It's slightly better to write a
 crush rule that explicitly picks two disks on 3 different hosts

 Paul

 

 Paul Emmerich

 Looking for help with your Ceph cluster? Contact us at
 https://croit.io

 croit GmbH
 Freseniusstr. 31h
 81247 München
 www.croit.io
 Tel: +49 89 1896585 90

 On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote:

 > I have probably misunterstood how to create erasure coded pools so I may 
 > be in need of some theory and appreciate if you can point me to 
 > documentation that may clarify my doubts.
 > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host).
 > I tried to create an erasure code profile like so:
 > "
 >
 > ceph osd erasure-code-profile get ec4x2rs
 >
 > ==
 >
 > crush-device-class=
 > crush-failure-domain=host
 > crush-root=default
 > jerasure-per-chunk-alignment=false
 > k=4
 > m=2
 > plugin=jerasure
 > technique=reed_sol_van
 > w=8
 > "
 > If I create a pool using this profile or any profile where K+M > hosts , 
 > then the pool gets stuck.
 > "
 >
 > ceph -s
 >
 > 
 >
 > cluster:
 > id: eb4a

Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-20 Thread Martin Verges
Just don't do such setups for production, It will be a lot of pain,
trouble, and cause you problems.

Just take a cheap system, put some of the disks in it and do a way way
better deployment than something like 4+2 on 3 hosts. Whatever you do with
that cluster (example kernel update, reboot, PSU failure, ...) causes you
and all attached clients, especially bad with VMs on that Ceph cluster, to
stop any IO or even crash completely.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Sa., 19. Okt. 2019 um 01:51 Uhr schrieb Chris Taylor :

> Full disclosure - I have not created an erasure code pool yet!
>
> I have been wanting to do the same thing that you are attempting and
> have these links saved. I believe this is what you are looking for.
>
> This link is for decompiling the CRUSH rules and recompiling:
>
> https://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/
>
>
> This link is for creating the EC rules for 4+2 with only 3 hosts:
>
> https://ceph.io/planet/erasure-code-on-small-clusters/
>
>
> I hope that helps!
>
>
>
> Chris
>
>
> On 2019-10-18 2:55 pm, Salsa wrote:
> > Ok, I'm lost here.
> >
> > How am I supposed to write a crush rule?
> >
> > So far I managed to run:
> >
> > #ceph osd crush rule dump test -o test.txt
> >
> > So I can edit the rule. Now I have two problems:
> >
> > 1. Whats the functions and operations to use here? Is there
> > documentation anywhere abuot this?
> > 2. How may I create a crush rule using this file? 'ceph osd crush rule
> > create ... -i test.txt' does not work.
> >
> > Am I taking the wrong approach here?
> >
> >
> > --
> > Salsa
> >
> > Sent with ProtonMail Secure Email.
> >
> > ‐‐‐ Original Message ‐‐‐
> > On Friday, October 18, 2019 3:56 PM, Paul Emmerich
> >  wrote:
> >
> >> Default failure domain in Ceph is "host" (see ec profile), i.e., you
> >> need at least k+m hosts (but at least k+m+1 is better for production
> >> setups).
> >> You can change that to OSD, but that's not a good idea for a
> >> production setup for obvious reasons. It's slightly better to write a
> >> crush rule that explicitly picks two disks on 3 different hosts
> >>
> >> Paul
> >>
> >>
> 
> >>
> >> Paul Emmerich
> >>
> >> Looking for help with your Ceph cluster? Contact us at
> >> https://croit.io
> >>
> >> croit GmbH
> >> Freseniusstr. 31h
> >> 81247 München
> >> www.croit.io
> >> Tel: +49 89 1896585 90
> >>
> >> On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote:
> >>
> >> > I have probably misunterstood how to create erasure coded pools so I
> may be in need of some theory and appreciate if you can point me to
> documentation that may clarify my doubts.
> >> > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host).
> >> > I tried to create an erasure code profile like so:
> >> > "
> >> >
> >> > ceph osd erasure-code-profile get ec4x2rs
> >> >
> >> > ==
> >> >
> >> > crush-device-class=
> >> > crush-failure-domain=host
> >> > crush-root=default
> >> > jerasure-per-chunk-alignment=false
> >> > k=4
> >> > m=2
> >> > plugin=jerasure
> >> > technique=reed_sol_van
> >> > w=8
> >> > "
> >> > If I create a pool using this profile or any profile where K+M >
> hosts , then the pool gets stuck.
> >> > "
> >> >
> >> > ceph -s
> >> >
> >> > 
> >> >
> >> > cluster:
> >> > id: eb4aea44-0c63-4202-b826-e16ea60ed54d
> >> > health: HEALTH_WARN
> >> > Reduced data availability: 16 pgs inactive, 16 pgs incomplete
> >> > 2 pools have too many placement groups
> >> > too few PGs per OSD (4 < min 30)
> >> > services:
> >> > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d)
> >> > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02
> >> > osd: 30 osds: 30 up (since 2w), 30 in (since 2w)
> >> > data:
> >> > pools: 11 pools, 32 pgs
> >> > objects: 0 objects, 0 B
> >> > usage: 32 GiB used, 109 TiB / 109 TiB avail
> >> > pgs: 50.000% pgs not active
> >> > 16 active+clean
> >> > 16 creating+incomplete
> >> >
> >> > ceph osd pool ls
> >> >
> >> > =
> >> >
> >> > test_ec
> >> > test_ec2
> >> > "
> >> > The pool will never leave this "creating+incomplete" state.
> >> > The pools were created like this:
> >> > "
> >> >
> >> > ceph osd pool create test_ec2 16 16 erasure ec4x2rs
> >> >
> >> > 
> >> >
> >> > ceph osd pool create test_ec 16 16 erasure
> >> >

Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-18 Thread Chris Taylor

Full disclosure - I have not created an erasure code pool yet!

I have been wanting to do the same thing that you are attempting and 
have these links saved. I believe this is what you are looking for.


This link is for decompiling the CRUSH rules and recompiling:

https://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/


This link is for creating the EC rules for 4+2 with only 3 hosts:

https://ceph.io/planet/erasure-code-on-small-clusters/


I hope that helps!



Chris


On 2019-10-18 2:55 pm, Salsa wrote:

Ok, I'm lost here.

How am I supposed to write a crush rule?

So far I managed to run:

#ceph osd crush rule dump test -o test.txt

So I can edit the rule. Now I have two problems:

1. Whats the functions and operations to use here? Is there
documentation anywhere abuot this?
2. How may I create a crush rule using this file? 'ceph osd crush rule
create ... -i test.txt' does not work.

Am I taking the wrong approach here?


--
Salsa

Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Friday, October 18, 2019 3:56 PM, Paul Emmerich
 wrote:


Default failure domain in Ceph is "host" (see ec profile), i.e., you
need at least k+m hosts (but at least k+m+1 is better for production
setups).
You can change that to OSD, but that's not a good idea for a
production setup for obvious reasons. It's slightly better to write a
crush rule that explicitly picks two disks on 3 different hosts

Paul



Paul Emmerich

Looking for help with your Ceph cluster? Contact us at 
https://croit.io


croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote:

> I have probably misunterstood how to create erasure coded pools so I may be 
in need of some theory and appreciate if you can point me to documentation that 
may clarify my doubts.
> I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host).
> I tried to create an erasure code profile like so:
> "
>
> ceph osd erasure-code-profile get ec4x2rs
>
> ==
>
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=4
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
> "
> If I create a pool using this profile or any profile where K+M > hosts , then 
the pool gets stuck.
> "
>
> ceph -s
>
> 
>
> cluster:
> id: eb4aea44-0c63-4202-b826-e16ea60ed54d
> health: HEALTH_WARN
> Reduced data availability: 16 pgs inactive, 16 pgs incomplete
> 2 pools have too many placement groups
> too few PGs per OSD (4 < min 30)
> services:
> mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d)
> mgr: ceph01(active, since 74m), standbys: ceph03, ceph02
> osd: 30 osds: 30 up (since 2w), 30 in (since 2w)
> data:
> pools: 11 pools, 32 pgs
> objects: 0 objects, 0 B
> usage: 32 GiB used, 109 TiB / 109 TiB avail
> pgs: 50.000% pgs not active
> 16 active+clean
> 16 creating+incomplete
>
> ceph osd pool ls
>
> =
>
> test_ec
> test_ec2
> "
> The pool will never leave this "creating+incomplete" state.
> The pools were created like this:
> "
>
> ceph osd pool create test_ec2 16 16 erasure ec4x2rs
>
> 
>
> ceph osd pool create test_ec 16 16 erasure
>
> ===
>
> "
> The default profile pool is created correctly.
> My profiles are like this:
> "
>
> ceph osd erasure-code-profile get default
>
> ==
>
> k=2
> m=1
> plugin=jerasure
> technique=reed_sol_van
>
> ceph osd erasure-code-profile get ec4x2rs
>
> ==
>
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=4
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
> "
> From what I've read it seems to be possible to create erasure code pools with 
higher than hosts K+M. Is this not so?
> What am I doing wrong? Do I have to create any special crush map rule?
> --
> Salsa
> Sent with ProtonMail Secure Email.
>
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-18 Thread Salsa
Ok, I'm lost here.

How am I supposed to write a crush rule?

So far I managed to run:

#ceph osd crush rule dump test -o test.txt

So I can edit the rule. Now I have two problems:

1. Whats the functions and operations to use here? Is there documentation 
anywhere abuot this?
2. How may I create a crush rule using this file? 'ceph osd crush rule create 
... -i test.txt' does not work.

Am I taking the wrong approach here?


--
Salsa

Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Friday, October 18, 2019 3:56 PM, Paul Emmerich  
wrote:

> Default failure domain in Ceph is "host" (see ec profile), i.e., you
> need at least k+m hosts (but at least k+m+1 is better for production
> setups).
> You can change that to OSD, but that's not a good idea for a
> production setup for obvious reasons. It's slightly better to write a
> crush rule that explicitly picks two disks on 3 different hosts
>
> Paul
>
> 
>
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote:
>
> > I have probably misunterstood how to create erasure coded pools so I may be 
> > in need of some theory and appreciate if you can point me to documentation 
> > that may clarify my doubts.
> > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host).
> > I tried to create an erasure code profile like so:
> > "
> >
> > ceph osd erasure-code-profile get ec4x2rs
> >
> > ==
> >
> > crush-device-class=
> > crush-failure-domain=host
> > crush-root=default
> > jerasure-per-chunk-alignment=false
> > k=4
> > m=2
> > plugin=jerasure
> > technique=reed_sol_van
> > w=8
> > "
> > If I create a pool using this profile or any profile where K+M > hosts , 
> > then the pool gets stuck.
> > "
> >
> > ceph -s
> >
> > 
> >
> > cluster:
> > id: eb4aea44-0c63-4202-b826-e16ea60ed54d
> > health: HEALTH_WARN
> > Reduced data availability: 16 pgs inactive, 16 pgs incomplete
> > 2 pools have too many placement groups
> > too few PGs per OSD (4 < min 30)
> > services:
> > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d)
> > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02
> > osd: 30 osds: 30 up (since 2w), 30 in (since 2w)
> > data:
> > pools: 11 pools, 32 pgs
> > objects: 0 objects, 0 B
> > usage: 32 GiB used, 109 TiB / 109 TiB avail
> > pgs: 50.000% pgs not active
> > 16 active+clean
> > 16 creating+incomplete
> >
> > ceph osd pool ls
> >
> > =
> >
> > test_ec
> > test_ec2
> > "
> > The pool will never leave this "creating+incomplete" state.
> > The pools were created like this:
> > "
> >
> > ceph osd pool create test_ec2 16 16 erasure ec4x2rs
> >
> > 
> >
> > ceph osd pool create test_ec 16 16 erasure
> >
> > ===
> >
> > "
> > The default profile pool is created correctly.
> > My profiles are like this:
> > "
> >
> > ceph osd erasure-code-profile get default
> >
> > ==
> >
> > k=2
> > m=1
> > plugin=jerasure
> > technique=reed_sol_van
> >
> > ceph osd erasure-code-profile get ec4x2rs
> >
> > ==
> >
> > crush-device-class=
> > crush-failure-domain=host
> > crush-root=default
> > jerasure-per-chunk-alignment=false
> > k=4
> > m=2
> > plugin=jerasure
> > technique=reed_sol_van
> > w=8
> > "
> > From what I've read it seems to be possible to create erasure code pools 
> > with higher than hosts K+M. Is this not so?
> > What am I doing wrong? Do I have to create any special crush map rule?
> > --
> > Salsa
> > Sent with ProtonMail Secure Email.
> >
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-18 Thread Paul Emmerich
Default failure domain in Ceph is "host" (see ec profile), i.e., you
need at least k+m hosts (but at least k+m+1 is better for production
setups).
You can change that to OSD, but that's not a good idea for a
production setup for obvious reasons. It's slightly better to write a
crush rule that explicitly picks two disks on 3 different hosts


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Oct 18, 2019 at 8:45 PM Salsa  wrote:
>
> I have probably misunterstood how to create erasure coded pools so I may be 
> in need of some theory and appreciate if you can point me to documentation 
> that may clarify my doubts.
>
> I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host).
>
> I tried to create an erasure code profile like so:
>
> "
> # ceph osd erasure-code-profile get ec4x2rs
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=4
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
> "
>
> If I create a pool using this profile or any profile where K+M > hosts , then 
> the pool gets stuck.
>
> "
> # ceph -s
>   cluster:
> id: eb4aea44-0c63-4202-b826-e16ea60ed54d
> health: HEALTH_WARN
> Reduced data availability: 16 pgs inactive, 16 pgs incomplete
> 2 pools have too many placement groups
> too few PGs per OSD (4 < min 30)
>
>   services:
> mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d)
> mgr: ceph01(active, since 74m), standbys: ceph03, ceph02
> osd: 30 osds: 30 up (since 2w), 30 in (since 2w)
>
>   data:
> pools:   11 pools, 32 pgs
> objects: 0 objects, 0 B
> usage:   32 GiB used, 109 TiB / 109 TiB avail
> pgs: 50.000% pgs not active
>  16 active+clean
>  16 creating+incomplete
>
> # ceph osd pool ls
> test_ec
> test_ec2
> "
> The pool will never leave this "creating+incomplete" state.
>
> The pools were created like this:
> "
> # ceph osd pool create test_ec2 16 16 erasure ec4x2rs
> # ceph osd pool create test_ec 16 16 erasure
> "
> The default profile pool is created correctly.
>
> My profiles are like this:
> "
> # ceph osd erasure-code-profile get default
> k=2
> m=1
> plugin=jerasure
> technique=reed_sol_van
>
> # ceph osd erasure-code-profile get ec4x2rs
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=4
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
> "
>
> From what I've read it seems to be possible to create erasure code pools with 
> higher than hosts K+M. Is this not so?
> What am I doing wrong? Do I have to create any special crush map rule?
>
> --
> Salsa
>
> Sent with ProtonMail Secure Email.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-18 Thread Salsa
I have probably misunterstood how to create erasure coded pools so I may be in 
need of some theory and appreciate if you can point me to documentation that 
may clarify my doubts.

I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host).

I tried to create an erasure code profile like so:

"
# ceph osd erasure-code-profile get ec4x2rs
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8
"

If I create a pool using this profile or any profile where K+M > hosts , then 
the pool gets stuck.

"
# ceph -s
  cluster:
id: eb4aea44-0c63-4202-b826-e16ea60ed54d
health: HEALTH_WARN
Reduced data availability: 16 pgs inactive, 16 pgs incomplete
2 pools have too many placement groups
too few PGs per OSD (4 < min 30)

  services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d)
mgr: ceph01(active, since 74m), standbys: ceph03, ceph02
osd: 30 osds: 30 up (since 2w), 30 in (since 2w)

  data:
pools:   11 pools, 32 pgs
objects: 0 objects, 0 B
usage:   32 GiB used, 109 TiB / 109 TiB avail
pgs: 50.000% pgs not active
 16 active+clean
 16 creating+incomplete

# ceph osd pool ls
test_ec
test_ec2
"
The pool will never leave this "creating+incomplete" state.

The pools were created like this:
"
# ceph osd pool create test_ec2 16 16 erasure ec4x2rs
# ceph osd pool create test_ec 16 16 erasure
"
The default profile pool is created correctly.

My profiles are like this:
"
# ceph osd erasure-code-profile get default
k=2
m=1
plugin=jerasure
technique=reed_sol_van

# ceph osd erasure-code-profile get ec4x2rs
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8
"

From what I've read it seems to be possible to create erasure code pools with 
higher than hosts K+M. Is this not so?
What am I doing wrong? Do I have to create any special crush map rule?

--
Salsa

Sent with [ProtonMail](https://protonmail.com) Secure Email.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com