Re: [DRBD-user] proto c - corrupt files - directories missing

Christian Hammers Tue, 07 Jan 2014 07:51:40 -0800

Hello

On Tue, 7 Jan 2014 16:47:04 +0100
Stefan Bauer <[email protected]> wrote:


> -----Ursprüngliche Nachricht-----
> Von:  Christian Hammers <[email protected]>
> Gesendet:     Di 07.01.2014 15:48
> Betreff:      Re: [DRBD-user] proto c - corrupt files - directories missing
> An:   Stefan Bauer <[email protected]>; 
> CC:   [email protected]; 
> > Hello
> > 
> > Have you tried "drbdadm verify clusterdb_res" to check if the secondary is
> > really identical to the primary? 
> > 
> > I would assume that DRBD only detects corrupted data using checksum when 
> > reading and out-of-date data when comparing those checksums on write 
> > requests
> > but it cannot detect that the data on your secondary has accidentaly become
> > out-of-date.
> 
> Hi Christian,
> 
> Thank you for your time.
> 
> now it gets strange! I just started a resync after the second node was 
> offline.
> 
> [438614.558716] block drbd0: updated sync UUID 
> A712D7A357B968B7:5410F28F1CEC98E8:540FF28F1CEC98E8:736AAB121F6173C0
> [439240.761231] block drbd0: Resync done (total 626 sec; paused 0 sec; 111204 
> K/sec)
> [439240.761244] block drbd0: updated UUIDs 
> A712D7A357B968B7:0000000000000000:5410F28F1CEC98E8:540FF28F1CEC98E8
> [439240.761255] block drbd0: conn( SyncSource -> Connected ) pdsk( 
> Inconsistent -> UpToDate )
> [439240.854011] block drbd0: bitmap WRITE of 8933 pages took 23 jiffies
> [439240.854023] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk 
> bit-map.
> 
> After this i ran a verify and a bunch of out-of-sync were detected:

If your secondary was just offline for a short time, it only catches 
up the changes that were made during this time. It can therefore re-syncs
quite fast but it won't detect out-of-sync blocks that have existed
long ago.

The following messages explain why the filesystem on your secondary node
looks strange :)

> [439694.710861] block drbd0: Out of sync: start=73992, size=8 (sectors)
> [439695.086765] block drbd0: Out of sync: start=270448, size=8 (sectors)
> [439695.087157] block drbd0: Out of sync: start=270768, size=8 (sectors)
> [439695.087293] block drbd0: Out of sync: start=270824, size=8 (sectors)
...

> and so on. Am i right, after the whole verify process is 
> finished, my data should be in "real" sync? :)

No, according to the manpage "drbdadm verify" only marks blocks as
invalid but does not repair them. I found that unexpected, too.

Try "drbdadm invalidate clusterdb_res" on your *secondary* node.
This will start a complete resync from the primary node and
copies every block whose checksum mismatches. Can take some hours, 
though.

bye,

-christian-

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] proto c - corrupt files - directories missing

Reply via email to