Il 07-08-2017 12:59 Lars Ellenberg ha scritto:
DRBD does not at all interact with the layers above it,
so it does not know, and does not care, which entities may or may not
have cached data they read earlier.
Any entities that need cache coherency accross multiple instances
need to coordinate in some way.

But that is not DRBD specific at all,
and not even specific to clustering or multi-node setups.

This means that if you intend to use something that is NOT cluster aware
(or multi-instance aware) itself, you may need to add your own band-aid
locking and flushing "somewhere".

I remember that "in the old days", kernel buffer pages may linger for
quite some time, even if the corresponding devices was no longer open,
which caused problems with migrating VMs even with something as a shared
scsi device.  Integration scripts added explicit calls to sync and
blockdev --flushbufs and the like...

The kernel then learned to invalidate cache pages on last close,
so these hacks are no longer necessary (as long as no-one keeps
the device open when not actively used).

The other alternative is to always use "direct IO".

You can (destructively!) experiment with dual primary drbd,
make both nodes primary,

on node A,
watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 | strings"
watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 iflag=direct | strings"

on node B,
while sleep 0.1; do date +%F_%T.%N | dd of=/dev/drbd0 bs=4096
iflag=sync of=direct; done

iflag=sync padds with NUL to full bs,
of=direct makes sure it finds its way to DRBD
and not just into buffer cache pages

You should see both "watch" thingies show the date changes written on
the other node.

If you then do on "node A": sleep 10 < /dev/drbd0,
the non if=direct watch should show the same date for ten seconds,
because it gets its data from buffer cache, and the device is kept open
by the sleep.

Once the "open count" of the device drops down to zero again, the kernel
will invalidate the pages, and the next read will need to re-read from
disk (just as the "direct" read always does).

You can then do
"sleep 10 </dev/drbd10 & sleep 5 ; blockdev --flushbufs /dev/drbd0; wait",
and see the non-direct watch update the date just once after 5 seconds,
and then again once the sleep 10 has finished...

Again, this does not really have anything to do with DRBD,
but with how the kernel treats block devices,
and if and how entities coordinate alternating and concurrent access
to "things".

You can easily have two entities on the same node corrupt a boring plain
text file on a classic file system on just a single node, if they both
assume "exclusive access", and don't coordinate properly.

Great explanation Lars, thank you very much!

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to