On 24/12/14 11:24, Rich Freeman wrote:
> On Tue, Dec 23, 2014 at 4:08 PM, Holger Hoffstätte
> <holger.hoffstae...@googlemail.com> wrote:
>> On Tue, 23 Dec 2014 21:54:00 +0100, Stefan G. Weichinger wrote:
>>
>>> In the other direction: what protects against these errors you mention?
>>
>> ceph scrub :)
>>
> 
> Are you sure about that?  I was under the impression that it just
> checked that everything was retrievable.  I'm not sure if it compares
> all the copies of everything to make sure that they match, and if they
> don't match I don't think that it has any way to know which one is
> right.  I believe an algorithm just picks one as the official version,
> and it may or may not be identical to the one that was originally
> stored.
> 
> If the data is on btrfs then it is protected from silent corruption
> since the filesystem will give an error when that node tries to read a
> file, and presumably the cluster will find another copy elsewhere.  On
> the other hand if the file were logically overwritten in some way
> above the btrfs layer then btrfs won't complain and the cluster won't
> realize the file has been corrupted.
> 
> If I'm wrong on this by all means point me to the truth.  From
> everything I read though I don't think that ceph maintains a list of
> checksums on all the data that is stored while it is at rest.
> 
> --
> Rich
> 

Scrub used to pick up and fix errors - well mostly fix.  Sometimes the
whole thing collapses in a heap.  The problem with small systems is that
they are already very I/O restricted and you add either a scrub or deep
scrub and it slows very noticeably more. On terrabytes of data it would
take many hours after which checking the logs might find another error
message so it had to be triggered again.  I suspect some errors I got
were btrfs related and but ceph certainly contributed its share.  Not
sure of the cause but they "seemed" to occur when the cluster was doing
anything other than idle.  As I used the "golden master/clone" approach
to vm's corruption in the wrong place was very noticeable :(

Towards the point I gave up it was getting better but I came to the
conclusion the expensive upgrades I needed to fix the I/O problems of
running lots of VM's at once wasn't worth it.

BillK


Reply via email to