On 24/12/14 11:24, Rich Freeman wrote: > On Tue, Dec 23, 2014 at 4:08 PM, Holger Hoffstätte > <holger.hoffstae...@googlemail.com> wrote: >> On Tue, 23 Dec 2014 21:54:00 +0100, Stefan G. Weichinger wrote: >> >>> In the other direction: what protects against these errors you mention? >> >> ceph scrub :) >> > > Are you sure about that? I was under the impression that it just > checked that everything was retrievable. I'm not sure if it compares > all the copies of everything to make sure that they match, and if they > don't match I don't think that it has any way to know which one is > right. I believe an algorithm just picks one as the official version, > and it may or may not be identical to the one that was originally > stored. > > If the data is on btrfs then it is protected from silent corruption > since the filesystem will give an error when that node tries to read a > file, and presumably the cluster will find another copy elsewhere. On > the other hand if the file were logically overwritten in some way > above the btrfs layer then btrfs won't complain and the cluster won't > realize the file has been corrupted. > > If I'm wrong on this by all means point me to the truth. From > everything I read though I don't think that ceph maintains a list of > checksums on all the data that is stored while it is at rest. > > -- > Rich >
Scrub used to pick up and fix errors - well mostly fix. Sometimes the whole thing collapses in a heap. The problem with small systems is that they are already very I/O restricted and you add either a scrub or deep scrub and it slows very noticeably more. On terrabytes of data it would take many hours after which checking the logs might find another error message so it had to be triggered again. I suspect some errors I got were btrfs related and but ceph certainly contributed its share. Not sure of the cause but they "seemed" to occur when the cluster was doing anything other than idle. As I used the "golden master/clone" approach to vm's corruption in the wrong place was very noticeable :( Towards the point I gave up it was getting better but I came to the conclusion the expensive upgrades I needed to fix the I/O problems of running lots of VM's at once wasn't worth it. BillK