Reading about min_size behavior, I understand next: if I set min_size=1 - it is
for reading & writing, so it is dangerous. Can I (via crush map) or developers
(via code changes) to separate min_size behavior for reading & writing? I event
trying to look into code, but IMHO developers can solve all more easy.

Thinking about this, I prefer to see next behaviour (for example): on write, if
PG number < size, but >= min_size - before writing heal this PG to consistent 
state.

It can contains from this steps (I don't understand now code structure, but have
probably ideas about tasks separating):

1) On repair (healing) proceed write pending PGs first (this is separated good
idea - IMHO) - to minimize client freezing;
2) Minimize period for this write pending PGs before healing (same);
3) Make min_size [optional] working only for read.

Or simple always clone ("heal") inconsistent PGs to "size" in write task (if
code enables it).

So write requests will be always protected from data loss (of course, still
possibility to invert written and offline OSDs in one pass in large cluster, but
this is minimal care for mind about min_size).

-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to