Re: [ceph-users] Disk failures

list Wed, 08 Jun 2016 15:33:06 -0700

Am 8. Juni 2016 22:27:46 schrieb Krzysztof Nowicki<krzysztof.a.nowi...@gmail.com>:

Hi,

śr., 8.06.2016 o 21:35 użytkownik Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> napisał:

2016-06-08 20:49 GMT+02:00 Krzysztof Nowicki <
krzysztof.a.nowi...@gmail.com>:
> From my own experience with failing HDDs I've seen cases where the drive
was
> failing silently initially. This manifested itself in repeated deep scrub
> failures. Correct me if I'm wrong here, but Ceph keeps checksums of data
> being written and in case that data is read back corrupted on one of the
> OSDs this will be detected by scrub and reported as inconsistency. In
such
> cases automatic repair should be sufficient as having the checksums it is
> possible to tell which copy is correct. In such case the OSD will not be
> removed automatically and it's for the cluster administrator to get
> suspicious in case such an inconsistency occurs repeatedly and remove the
> OSD in question.

Ok but could this lead to data corruption? What would happens to the client
if a write fails?

If a write fails due to an IO error on the underlying HDD the OSD daemon
will most likely abort.
In case a write succeeds but gets corrupted by a silent HDD failure you
will have corrupted data on this OSD. I'm not sure if Ceph verifies the
checksums upon read, but if it doesn't then the data read back by the
client could be corrupted in case the corruption happened on the primary
OSD for that PG.
The behaviour could also be affected by the filesystem the OSD is running.
For example BTRFS is known for keeping data checksums and in such case
reading corrupted data will fail at filesystem level and the OSD will just
see an IO error.


> When the drive fails more severely and causes IO failures then the effect
> will most likely be an abort of the OSD daemon which causes the relevant
OSD
> to go down. The cause of the abort can be determined by examining the
logs.

In this case, healing and rebalancing is done automatically, right?
If I want a replica 3 and one OSD fails, the objects stored on that OSD
would
be automatically moved and replicated across the cluster to keep my
replica requirement?

Yes, this is correct.


> In any case SMART is your best friend and it is strongly advised to run
> smartd in order to get early warnings.

Yes, but SMART is not always reliable.

True, but it won't harm to have it running anyway.


All modern RAID controllers are able to read the whole disk (or disks)
looking for bad sectors or inconsistency,
the smart extended test doesn't do this

Strange. From what I understood the extended SMART test actually goes over
each sector and tests it for readability.

Regards
Chris



----------
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

Reply via email to