well, you should use M > 1, the more you have, less risk and more
performance.

You don't read twice as much data, you read it from different sources,
further more you can even read less data and have to rebuild it, because
on erasure pools you don't replicate the data.


On the other hand, the configuration it's not as bad as you think, its
just different.

3 nodes cluster

Replicate pool size = 2

    -you can take 1 failure, then re-balance and take another failure.
(max 2 separate)

    -you use 2*data space

    -you have to write 2*data, full data on one node and full data on
the second one.

Erasure code pool

    -you can only lose 1 node

    -you use less space

    -as you dont write 2*data, writes are also faster. You write half
data on one node, half data on the other and parity on separate nodes,
write work is a lot more distributed.

    -reads are slower because you need all the data parts.


On both configurations, if you have corrupted data you lose your data,
so that's not really a point to compare.

Replicate pool can achieve way more insensitive read works while Erasure
pools are thought to perform big writes but really few reads.


I have check myself that both configurations can work with a 3 node
cluster so it's not a better and a worse configuration, it really depend
on your work, and the best thing :) you can have both in the same OSDs!


El 24/10/2017 a las 12:37, Eino Tuominen escribió:
>
> Hello,
>
>
> Correct me if I'm wrong, but isn't your configuration just twice as
> bad as running with replication size=2? With replication size=2 when
> you lose a disk you lose data if there is even one defect block found
> when ceph is reconstructing the pgs that had a replica on the failed
> disk. No, with your setup you have to be able to read twice as much
> data correctly in order to reconstruct the pgs. When using EC I think
> that you have to use m>1 in production.
>
>
> -- 
>
>   Eino Tuominen
>
>
> ------------------------------------------------------------------------
> *From:* ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of
> Jorge Pinilla López <jorp...@unizar.es>
> *Sent:* Tuesday, October 24, 2017 11:24
> *To:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Erasure Pool OSD fail
>  
>
> Okay I think I can respond myself, the pool is created with a default
> min_size of 3, so when one of the OSDs goes down, the pool doenst
> perform any IO, manually changing the the pool min_size to 2 worked great.
>
>
> El 24/10/2017 a las 10:13, Jorge Pinilla López escribió:
>> I am testing erasure code pools and doing a rados test write to try
>> fault tolerace.
>> I have 3 Nodes with 1 OSD each, K=2 M=1.
>>
>> While performing the write (rados bench -p replicate 100 write), I
>> stop one of the OSDs daemons (example osd.0), simulating a node fail,
>> and then the hole write stops and I can't write any data anymore.
>>
>>     1      16        28        12   46.8121        48     1.01548   
>> 0.616034
>>     2      16        40        24   47.3907        48     1.04219   
>> 0.923728
>>     3      16        52        36   47.5889        48   
>> 0.593145      1.0038
>>     4      16        68        52   51.6633        64     1.39638    
>> 1.08098
>>     5      16        74        58    46.158        24     1.02699    
>> 1.10172
>>     6      16        83        67   44.4711        36     3.01542    
>> 1.18012
>>     7      16        95        79   44.9722        48    0.776493    
>> 1.24003
>>     8      16        95        79   39.3681         0           -    
>> 1.24003
>>     9      16        95        79   35.0061         0           -    
>> 1.24003
>>    10      16        95        79   31.5144         0           -    
>> 1.24003
>>    11      16        95        79   28.6561         0           -    
>> 1.24003
>>    12      16        95        79   26.2732         0           -    
>> 1.24003
>>
>> Its pretty clear where the OSD failed
>>
>> On the other hand, using a replicated pool, the client (rados test)
>> doesnt even notice the OSD fail, which is awesome.
>>
>> Is this a normal behaviour on EC pools?
>> ------------------------------------------------------------------------
>> *Jorge Pinilla López*
>> jorp...@unizar.es
>> Estudiante de ingenieria informática
>> Becario del area de sistemas (SICUZ)
>> Universidad de Zaragoza
>> PGP-KeyID: A34331932EBC715A
>> <http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> -- 
> ------------------------------------------------------------------------
> *Jorge Pinilla López*
> jorp...@unizar.es
> Estudiante de ingenieria informática
> Becario del area de sistemas (SICUZ)
> Universidad de Zaragoza
> PGP-KeyID: A34331932EBC715A
> <http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
> ------------------------------------------------------------------------

-- 
------------------------------------------------------------------------
*Jorge Pinilla López*
jorp...@unizar.es
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A
<http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
------------------------------------------------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to