Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
On 2019-10-24 09:46, Janne Johansson wrote: (Slightly abbreviated) Den tors 24 okt. 2019 kl 09:24 skrev Frank Schilder mailto:fr...@dtu.dk>>: What I learned are the following: 1) Avoid this work-around too few hosts for EC rule at all cost. 2) Do not use EC 2+1. It does not offer anything interesting for production. Use 4+2 (or 8+2, 8+3 if you have the hosts). 3) If you have no perspective of getting at least 7 servers in the long run (4+2=6 for EC profile, +1 for fail-over automatic rebuild), do not go for EC. 4) Before you start thinking about replicating to a second site, you should have a primary site running solid first. This is collected from my experience. I would do things different now and maybe it helps you with deciding how to proceed. Its basically about what resources can you expect in the foreseeable future and what compromises are you willing to make with regards to sleep and sanity. Amen to all of those points. We did similar-but-not-same mistakes on an EC cluster here. You are going to produce more tears than I/O if you make these mis-designs mentioned above. We could add: 5) Never buy SMR drives, pretend they don't even exist. If a similar technology appears tomorrow for cheap SSD/NVME, skip it. Amen from my side, too. Luckily, we only made a small fraction of these mistakes (running 4+2 on 6 servers and wondering about funny effects when taking one server offline, while we still were testing the setup, before we finally descided to ask for a 7th server), but this can in parts be extrapolated. Concerning SMR, I learnt that SMR-awareness is on Ceph's roadmap (for host-managed SMR drives). Once that is available, host-managed SMR drives should be a well-working and cheap solution especially for backup / WORM workloads. But as of for now, even disk vendors will tell you to avoid SMR for datacenter setups (unless you have a storage system aware of it and host-managed drives). Cheers, Oliver -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
(Slightly abbreviated) Den tors 24 okt. 2019 kl 09:24 skrev Frank Schilder : > What I learned are the following: > > 1) Avoid this work-around too few hosts for EC rule at all cost. > > 2) Do not use EC 2+1. It does not offer anything interesting for > production. Use 4+2 (or 8+2, 8+3 if you have the hosts). > > 3) If you have no perspective of getting at least 7 servers in the long > run (4+2=6 for EC profile, +1 for fail-over automatic rebuild), do not go > for EC. > > 4) Before you start thinking about replicating to a second site, you > should have a primary site running solid first. > > This is collected from my experience. I would do things different now and > maybe it helps you with deciding how to proceed. Its basically about what > resources can you expect in the foreseeable future and what compromises are > you willing to make with regards to sleep and sanity. > Amen to all of those points. We did similar-but-not-same mistakes on an EC cluster here. You are going to produce more tears than I/O if you make these mis-designs mentioned above. We could add: 5) Never buy SMR drives, pretend they don't even exist. If a similar technology appears tomorrow for cheap SSD/NVME, skip it. -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
I have some experience with an EC set-up with 2 shards per host, failure-domain is host, and also some multi-site wishful thinking of users. What I learned are the following: 1) Avoid this work-around too few hosts for EC rule at all cost. There are two types of resiliency in ceph. One is against hardware fails and the other is against admin fails. Using a non-standard crush set-up to accommodate for a lack of hosts dramatically reduces resiliency against admin fails. You will have down-time due to simple mistakes. You will need to adjust also other defaults, like min_size, to be able to do anything on this cluster without downtime, sweating every time and praying that nothing goes wrong. Use this only if there is a short-term horizon that it will be over. 2) Do not use EC 2+1. It does not offer anything interesting for production. Use 4+2 (or 8+2, 8+3 if you have the hosts). Here you can operate with non-zero redundancy while doing maintenance (min_size=5). 3) If you have no perspective of getting at least 7 servers in the long run (4+2=6 for EC profile, +1 for fail-over automatic rebuild), do not go for EC. If this helps in your negotiations, tell everyone that they either give you more servers now and get low-cost storage, or have to pay for expensive replicated storage forever. 4) Before you start thinking about replicating to a second site, you should have a primary site running solid first. I was in exactly the same situation, people expecting wonders with giving me half the stuff I need only. Simply do not do it. I wasted a lot of time on impossible requests. With the hardware you have, I would ditch the second DC and rather start building up a solid first DC to be mirrored later when people move over bags with money. You have 6 servers. That's a good start for an 4+2 EC pool. You will not have fail-over capacity, but at least you don't have to work around too many exceptions. The one you should be aware of though is this one: https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/?highlight=erasure%20code%20pgs#crush-gives-up-too-soon . If you had 7 servers, you would be out of trouble. This is collected from my experience. I would do things different now and maybe it helps you with deciding how to proceed. Its basically about what resources can you expect in the foreseeable future and what compromises are you willing to make with regards to sleep and sanity. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: ceph-users on behalf of Salsa Sent: 21 October 2019 17:31 To: Martin Verges Cc: ceph-users Subject: Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts? Just to clarify my situation, We have 2 datacenters with 3 hosts each, 12 4TB disks each host (2 are RAID with OS installed and the remaining 10 are used for Ceph). Right now I'm trying a single DC installation and intended to migrate to multi site mirroring DC1 to DC2, so if we lose DC1 we can activate DC2 (NOTE: I have no idea how this is setup and have not planned at all; I thought of geting DC1 to work first and later set the mirroring) I don't think I'll be able to change the setup in any way, so my next question is: Should I go with a replica 3 or would an erasure 2,1 be ok? There's a very small chance we get 2 extra hosts for each DC in a near future, but we'll probably use all the available storage space in the nearer future. We're trying to use as much space as possible. Thanks; -- Salsa Sent with ProtonMail<https://protonmail.com> Secure Email. ‐‐‐ Original Message ‐‐‐ On Monday, October 21, 2019 2:53 AM, Martin Verges wrote: Just don't do such setups for production, It will be a lot of pain, trouble, and cause you problems. Just take a cheap system, put some of the disks in it and do a way way better deployment than something like 4+2 on 3 hosts. Whatever you do with that cluster (example kernel update, reboot, PSU failure, ...) causes you and all attached clients, especially bad with VMs on that Ceph cluster, to stop any IO or even crash completely. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io<mailto:martin.ver...@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 19. Okt. 2019 um 01:51 Uhr schrieb Chris Taylor mailto:ctay...@eyonic.com>>: Full disclosure - I have not created an erasure code pool yet! I have been wanting to do the same thing that you are attempting and have these links saved. I believe this is what you are looking for. This link is for decompiling the CRUSH rules and recompiling: https://docs.ceph.com/docs/luminous/rados/
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
Just to clarify my situation, We have 2 datacenters with 3 hosts each, 12 4TB disks each host (2 are RAID with OS installed and the remaining 10 are used for Ceph). Right now I'm trying a single DC installation and intended to migrate to multi site mirroring DC1 to DC2, so if we lose DC1 we can activate DC2 (NOTE: I have no idea how this is setup and have not planned at all; I thought of geting DC1 to work first and later set the mirroring) I don't think I'll be able to change the setup in any way, so my next question is: Should I go with a replica 3 or would an erasure 2,1 be ok? There's a very small chance we get 2 extra hosts for each DC in a near future, but we'll probably use all the available storage space in the nearer future. We're trying to use as much space as possible. Thanks; -- Salsa Sent with [ProtonMail](https://protonmail.com) Secure Email. ‐‐‐ Original Message ‐‐‐ On Monday, October 21, 2019 2:53 AM, Martin Verges wrote: > Just don't do such setups for production, It will be a lot of pain, trouble, > and cause you problems. > > Just take a cheap system, put some of the disks in it and do a way way better > deployment than something like 4+2 on 3 hosts. Whatever you do with that > cluster (example kernel update, reboot, PSU failure, ...) causes you and all > attached clients, especially bad with VMs on that Ceph cluster, to stop any > IO or even crash completely. > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.ver...@croit.io > Chat: https://t.me/MartinVerges > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io > YouTube: https://goo.gl/PGE1Bx > > Am Sa., 19. Okt. 2019 um 01:51 Uhr schrieb Chris Taylor : > >> Full disclosure - I have not created an erasure code pool yet! >> >> I have been wanting to do the same thing that you are attempting and >> have these links saved. I believe this is what you are looking for. >> >> This link is for decompiling the CRUSH rules and recompiling: >> >> https://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/ >> >> This link is for creating the EC rules for 4+2 with only 3 hosts: >> >> https://ceph.io/planet/erasure-code-on-small-clusters/ >> >> I hope that helps! >> >> Chris >> >> On 2019-10-18 2:55 pm, Salsa wrote: >>> Ok, I'm lost here. >>> >>> How am I supposed to write a crush rule? >>> >>> So far I managed to run: >>> >>> #ceph osd crush rule dump test -o test.txt >>> >>> So I can edit the rule. Now I have two problems: >>> >>> 1. Whats the functions and operations to use here? Is there >>> documentation anywhere abuot this? >>> 2. How may I create a crush rule using this file? 'ceph osd crush rule >>> create ... -i test.txt' does not work. >>> >>> Am I taking the wrong approach here? >>> >>> >>> -- >>> Salsa >>> >>> Sent with ProtonMail Secure Email. >>> >>> ‐‐‐ Original Message ‐‐‐ >>> On Friday, October 18, 2019 3:56 PM, Paul Emmerich >>> wrote: >>> Default failure domain in Ceph is "host" (see ec profile), i.e., you need at least k+m hosts (but at least k+m+1 is better for production setups). You can change that to OSD, but that's not a good idea for a production setup for obvious reasons. It's slightly better to write a crush rule that explicitly picks two disks on 3 different hosts Paul Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote: > I have probably misunterstood how to create erasure coded pools so I may > be in need of some theory and appreciate if you can point me to > documentation that may clarify my doubts. > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). > I tried to create an erasure code profile like so: > " > > ceph osd erasure-code-profile get ec4x2rs > > == > > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > If I create a pool using this profile or any profile where K+M > hosts , > then the pool gets stuck. > " > > ceph -s > > > > cluster: > id: eb4a
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
Just don't do such setups for production, It will be a lot of pain, trouble, and cause you problems. Just take a cheap system, put some of the disks in it and do a way way better deployment than something like 4+2 on 3 hosts. Whatever you do with that cluster (example kernel update, reboot, PSU failure, ...) causes you and all attached clients, especially bad with VMs on that Ceph cluster, to stop any IO or even crash completely. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 19. Okt. 2019 um 01:51 Uhr schrieb Chris Taylor : > Full disclosure - I have not created an erasure code pool yet! > > I have been wanting to do the same thing that you are attempting and > have these links saved. I believe this is what you are looking for. > > This link is for decompiling the CRUSH rules and recompiling: > > https://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/ > > > This link is for creating the EC rules for 4+2 with only 3 hosts: > > https://ceph.io/planet/erasure-code-on-small-clusters/ > > > I hope that helps! > > > > Chris > > > On 2019-10-18 2:55 pm, Salsa wrote: > > Ok, I'm lost here. > > > > How am I supposed to write a crush rule? > > > > So far I managed to run: > > > > #ceph osd crush rule dump test -o test.txt > > > > So I can edit the rule. Now I have two problems: > > > > 1. Whats the functions and operations to use here? Is there > > documentation anywhere abuot this? > > 2. How may I create a crush rule using this file? 'ceph osd crush rule > > create ... -i test.txt' does not work. > > > > Am I taking the wrong approach here? > > > > > > -- > > Salsa > > > > Sent with ProtonMail Secure Email. > > > > ‐‐‐ Original Message ‐‐‐ > > On Friday, October 18, 2019 3:56 PM, Paul Emmerich > > wrote: > > > >> Default failure domain in Ceph is "host" (see ec profile), i.e., you > >> need at least k+m hosts (but at least k+m+1 is better for production > >> setups). > >> You can change that to OSD, but that's not a good idea for a > >> production setup for obvious reasons. It's slightly better to write a > >> crush rule that explicitly picks two disks on 3 different hosts > >> > >> Paul > >> > >> > > >> > >> Paul Emmerich > >> > >> Looking for help with your Ceph cluster? Contact us at > >> https://croit.io > >> > >> croit GmbH > >> Freseniusstr. 31h > >> 81247 München > >> www.croit.io > >> Tel: +49 89 1896585 90 > >> > >> On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote: > >> > >> > I have probably misunterstood how to create erasure coded pools so I > may be in need of some theory and appreciate if you can point me to > documentation that may clarify my doubts. > >> > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). > >> > I tried to create an erasure code profile like so: > >> > " > >> > > >> > ceph osd erasure-code-profile get ec4x2rs > >> > > >> > == > >> > > >> > crush-device-class= > >> > crush-failure-domain=host > >> > crush-root=default > >> > jerasure-per-chunk-alignment=false > >> > k=4 > >> > m=2 > >> > plugin=jerasure > >> > technique=reed_sol_van > >> > w=8 > >> > " > >> > If I create a pool using this profile or any profile where K+M > > hosts , then the pool gets stuck. > >> > " > >> > > >> > ceph -s > >> > > >> > > >> > > >> > cluster: > >> > id: eb4aea44-0c63-4202-b826-e16ea60ed54d > >> > health: HEALTH_WARN > >> > Reduced data availability: 16 pgs inactive, 16 pgs incomplete > >> > 2 pools have too many placement groups > >> > too few PGs per OSD (4 < min 30) > >> > services: > >> > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) > >> > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 > >> > osd: 30 osds: 30 up (since 2w), 30 in (since 2w) > >> > data: > >> > pools: 11 pools, 32 pgs > >> > objects: 0 objects, 0 B > >> > usage: 32 GiB used, 109 TiB / 109 TiB avail > >> > pgs: 50.000% pgs not active > >> > 16 active+clean > >> > 16 creating+incomplete > >> > > >> > ceph osd pool ls > >> > > >> > = > >> > > >> > test_ec > >> > test_ec2 > >> > " > >> > The pool will never leave this "creating+incomplete" state. > >> > The pools were created like this: > >> > " > >> > > >> > ceph osd pool create test_ec2 16 16 erasure ec4x2rs > >> > > >> > > >> > > >> > ceph osd pool create test_ec 16 16 erasure > >> >
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
Full disclosure - I have not created an erasure code pool yet! I have been wanting to do the same thing that you are attempting and have these links saved. I believe this is what you are looking for. This link is for decompiling the CRUSH rules and recompiling: https://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/ This link is for creating the EC rules for 4+2 with only 3 hosts: https://ceph.io/planet/erasure-code-on-small-clusters/ I hope that helps! Chris On 2019-10-18 2:55 pm, Salsa wrote: Ok, I'm lost here. How am I supposed to write a crush rule? So far I managed to run: #ceph osd crush rule dump test -o test.txt So I can edit the rule. Now I have two problems: 1. Whats the functions and operations to use here? Is there documentation anywhere abuot this? 2. How may I create a crush rule using this file? 'ceph osd crush rule create ... -i test.txt' does not work. Am I taking the wrong approach here? -- Salsa Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Friday, October 18, 2019 3:56 PM, Paul Emmerich wrote: Default failure domain in Ceph is "host" (see ec profile), i.e., you need at least k+m hosts (but at least k+m+1 is better for production setups). You can change that to OSD, but that's not a good idea for a production setup for obvious reasons. It's slightly better to write a crush rule that explicitly picks two disks on 3 different hosts Paul Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote: > I have probably misunterstood how to create erasure coded pools so I may be in need of some theory and appreciate if you can point me to documentation that may clarify my doubts. > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). > I tried to create an erasure code profile like so: > " > > ceph osd erasure-code-profile get ec4x2rs > > == > > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > If I create a pool using this profile or any profile where K+M > hosts , then the pool gets stuck. > " > > ceph -s > > > > cluster: > id: eb4aea44-0c63-4202-b826-e16ea60ed54d > health: HEALTH_WARN > Reduced data availability: 16 pgs inactive, 16 pgs incomplete > 2 pools have too many placement groups > too few PGs per OSD (4 < min 30) > services: > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 > osd: 30 osds: 30 up (since 2w), 30 in (since 2w) > data: > pools: 11 pools, 32 pgs > objects: 0 objects, 0 B > usage: 32 GiB used, 109 TiB / 109 TiB avail > pgs: 50.000% pgs not active > 16 active+clean > 16 creating+incomplete > > ceph osd pool ls > > = > > test_ec > test_ec2 > " > The pool will never leave this "creating+incomplete" state. > The pools were created like this: > " > > ceph osd pool create test_ec2 16 16 erasure ec4x2rs > > > > ceph osd pool create test_ec 16 16 erasure > > === > > " > The default profile pool is created correctly. > My profiles are like this: > " > > ceph osd erasure-code-profile get default > > == > > k=2 > m=1 > plugin=jerasure > technique=reed_sol_van > > ceph osd erasure-code-profile get ec4x2rs > > == > > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > From what I've read it seems to be possible to create erasure code pools with higher than hosts K+M. Is this not so? > What am I doing wrong? Do I have to create any special crush map rule? > -- > Salsa > Sent with ProtonMail Secure Email. > > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
Ok, I'm lost here. How am I supposed to write a crush rule? So far I managed to run: #ceph osd crush rule dump test -o test.txt So I can edit the rule. Now I have two problems: 1. Whats the functions and operations to use here? Is there documentation anywhere abuot this? 2. How may I create a crush rule using this file? 'ceph osd crush rule create ... -i test.txt' does not work. Am I taking the wrong approach here? -- Salsa Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Friday, October 18, 2019 3:56 PM, Paul Emmerich wrote: > Default failure domain in Ceph is "host" (see ec profile), i.e., you > need at least k+m hosts (but at least k+m+1 is better for production > setups). > You can change that to OSD, but that's not a good idea for a > production setup for obvious reasons. It's slightly better to write a > crush rule that explicitly picks two disks on 3 different hosts > > Paul > > > > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote: > > > I have probably misunterstood how to create erasure coded pools so I may be > > in need of some theory and appreciate if you can point me to documentation > > that may clarify my doubts. > > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). > > I tried to create an erasure code profile like so: > > " > > > > ceph osd erasure-code-profile get ec4x2rs > > > > == > > > > crush-device-class= > > crush-failure-domain=host > > crush-root=default > > jerasure-per-chunk-alignment=false > > k=4 > > m=2 > > plugin=jerasure > > technique=reed_sol_van > > w=8 > > " > > If I create a pool using this profile or any profile where K+M > hosts , > > then the pool gets stuck. > > " > > > > ceph -s > > > > > > > > cluster: > > id: eb4aea44-0c63-4202-b826-e16ea60ed54d > > health: HEALTH_WARN > > Reduced data availability: 16 pgs inactive, 16 pgs incomplete > > 2 pools have too many placement groups > > too few PGs per OSD (4 < min 30) > > services: > > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) > > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 > > osd: 30 osds: 30 up (since 2w), 30 in (since 2w) > > data: > > pools: 11 pools, 32 pgs > > objects: 0 objects, 0 B > > usage: 32 GiB used, 109 TiB / 109 TiB avail > > pgs: 50.000% pgs not active > > 16 active+clean > > 16 creating+incomplete > > > > ceph osd pool ls > > > > = > > > > test_ec > > test_ec2 > > " > > The pool will never leave this "creating+incomplete" state. > > The pools were created like this: > > " > > > > ceph osd pool create test_ec2 16 16 erasure ec4x2rs > > > > > > > > ceph osd pool create test_ec 16 16 erasure > > > > === > > > > " > > The default profile pool is created correctly. > > My profiles are like this: > > " > > > > ceph osd erasure-code-profile get default > > > > == > > > > k=2 > > m=1 > > plugin=jerasure > > technique=reed_sol_van > > > > ceph osd erasure-code-profile get ec4x2rs > > > > == > > > > crush-device-class= > > crush-failure-domain=host > > crush-root=default > > jerasure-per-chunk-alignment=false > > k=4 > > m=2 > > plugin=jerasure > > technique=reed_sol_van > > w=8 > > " > > From what I've read it seems to be possible to create erasure code pools > > with higher than hosts K+M. Is this not so? > > What am I doing wrong? Do I have to create any special crush map rule? > > -- > > Salsa > > Sent with ProtonMail Secure Email. > > > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
Default failure domain in Ceph is "host" (see ec profile), i.e., you need at least k+m hosts (but at least k+m+1 is better for production setups). You can change that to OSD, but that's not a good idea for a production setup for obvious reasons. It's slightly better to write a crush rule that explicitly picks two disks on 3 different hosts Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Oct 18, 2019 at 8:45 PM Salsa wrote: > > I have probably misunterstood how to create erasure coded pools so I may be > in need of some theory and appreciate if you can point me to documentation > that may clarify my doubts. > > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). > > I tried to create an erasure code profile like so: > > " > # ceph osd erasure-code-profile get ec4x2rs > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > > If I create a pool using this profile or any profile where K+M > hosts , then > the pool gets stuck. > > " > # ceph -s > cluster: > id: eb4aea44-0c63-4202-b826-e16ea60ed54d > health: HEALTH_WARN > Reduced data availability: 16 pgs inactive, 16 pgs incomplete > 2 pools have too many placement groups > too few PGs per OSD (4 < min 30) > > services: > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 > osd: 30 osds: 30 up (since 2w), 30 in (since 2w) > > data: > pools: 11 pools, 32 pgs > objects: 0 objects, 0 B > usage: 32 GiB used, 109 TiB / 109 TiB avail > pgs: 50.000% pgs not active > 16 active+clean > 16 creating+incomplete > > # ceph osd pool ls > test_ec > test_ec2 > " > The pool will never leave this "creating+incomplete" state. > > The pools were created like this: > " > # ceph osd pool create test_ec2 16 16 erasure ec4x2rs > # ceph osd pool create test_ec 16 16 erasure > " > The default profile pool is created correctly. > > My profiles are like this: > " > # ceph osd erasure-code-profile get default > k=2 > m=1 > plugin=jerasure > technique=reed_sol_van > > # ceph osd erasure-code-profile get ec4x2rs > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > > From what I've read it seems to be possible to create erasure code pools with > higher than hosts K+M. Is this not so? > What am I doing wrong? Do I have to create any special crush map rule? > > -- > Salsa > > Sent with ProtonMail Secure Email. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't create erasure coded pools with k+m greater than hosts?
I have probably misunterstood how to create erasure coded pools so I may be in need of some theory and appreciate if you can point me to documentation that may clarify my doubts. I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). I tried to create an erasure code profile like so: " # ceph osd erasure-code-profile get ec4x2rs crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_sol_van w=8 " If I create a pool using this profile or any profile where K+M > hosts , then the pool gets stuck. " # ceph -s cluster: id: eb4aea44-0c63-4202-b826-e16ea60ed54d health: HEALTH_WARN Reduced data availability: 16 pgs inactive, 16 pgs incomplete 2 pools have too many placement groups too few PGs per OSD (4 < min 30) services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 osd: 30 osds: 30 up (since 2w), 30 in (since 2w) data: pools: 11 pools, 32 pgs objects: 0 objects, 0 B usage: 32 GiB used, 109 TiB / 109 TiB avail pgs: 50.000% pgs not active 16 active+clean 16 creating+incomplete # ceph osd pool ls test_ec test_ec2 " The pool will never leave this "creating+incomplete" state. The pools were created like this: " # ceph osd pool create test_ec2 16 16 erasure ec4x2rs # ceph osd pool create test_ec 16 16 erasure " The default profile pool is created correctly. My profiles are like this: " # ceph osd erasure-code-profile get default k=2 m=1 plugin=jerasure technique=reed_sol_van # ceph osd erasure-code-profile get ec4x2rs crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_sol_van w=8 " From what I've read it seems to be possible to create erasure code pools with higher than hosts K+M. Is this not so? What am I doing wrong? Do I have to create any special crush map rule? -- Salsa Sent with [ProtonMail](https://protonmail.com) Secure Email.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com