from:"Jan Schermer"

Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-09-06 Thread Jan Schermer

Not sure you can mount snapshot (I always create a clone).
However I never saw anything about “drbd” filesystem - what distribution is 
this? Apparently it tries to be too clever…
Try creating a clone and mounting it instead, it’s safer anyway (saw bug in 
issue tracker that ZFS panics if you try to write to the snapshot or something 
like that…)

Other than that - yes, this should work fine.

Jan


> On 6 Sep 2017, at 13:23, Gionatan Danti  wrote:
> 
> On 19/08/2017 10:24, Yannis Milios wrote:
>> Option (b) seems more suitable for a 2 node drbd8 cluster in a 
>> primary/secondary setup. Haven't tried it so I cannot tell if there are any 
>> clurpits. My only concern in such setup would be if drbd corrupts silently 
>> the data on the lower level and zfs is not aware of that.Also, if you are 
>> *not* going to use live migration, and you can affort loosing some seconds 
>> of data on the secondary node in favor of better performance in the primary 
>> node, then you could consider using protocol A instead of C for the 
>> replication link.
> 
> Hi all,
> I "revive" this old thread to let you know I settled to use DRBD 8.4 on top 
> of ZVOLs.
> 
> I have a question for anyone using DRBD on top of a snapshot-capable backend 
> (eg: ZFS, LVM, etc)...
> 
> When snapshotting a DRBD block device, trying to mount it (the snapshot, not 
> the original volume!) results in the following error message:
> 
> [root@master7 tank]# mount /dev/zvol/tank/vol1\@snap1 /mnt/
> mount: unknown filesystem type 'drbd'
> 
> To successfully mount the snapshot volume, I need to specify the volume 
> filesystem, for example (the other options are xfs-specific):
> 
> [root@master7 tank]# mount -t xfs /dev/zvol/tank/vol1\@snap1 /mnt/ -o 
> ro,norecovery,nouuid
> 
> Is that the right approach? Or I am missing something?
> Thanks.
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.da...@assyoma.it - i...@assyoma.it
> GPG public key ID: FF5F32A8
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] csums-alg,verify-alg algorithm

2017-08-31 Thread Jan Schermer

There are algorithms suitable for detecting or correcting flipped bits, like 
CRC32 or even fletcher4 that ZFS uses for checksums.
Those are _NOT_ suitable for comparing data, think of them more like parity, 
and you WILL have false matches with them even without having lots of data as 
they are not collision resistant at all.
Then there are collision resistant algorithms as already outlined below.

I chose SHA512 because it’s only 20% slower than SHA1 and only about 5% slower 
than MD5, and in any case unlikely to cause a bottleneck when resyncing or even 
reading data most of the time.

Jan

> On 31 Aug 2017, at 10:12, 大川敬臣  wrote:
> 
> Thank you all .
> 
> I will discuss with my team and decide it.
> Your advice helped me!!
> 
> Thanks,
> 
> 2017-08-30 10:20 GMT+09:00 Robert Altnoeder  >:
> On 2017/08/29 6:28 AM, Digimer wrote:
> > On 2017-08-28 09:28 PM, 大川敬臣 wrote:
> >> I want to enable checksum-based synchronization by adding "csums-alg
> >> " to drbd.conf.
> >> [...]
> >> The algorithms (sha1, md5, crs32) are king of old ones. Can I use sha256?
> >> Is there some reason that sha256 is not used?
> >>
> 
> > So the real question is; How concerned are you that a) two
> > blocks don't match and b) those differences are just perfectly different
> > to cause a hash collision/false match?
> 
> Exactly. B) is very unlikely to be caused coincidentally, it's not even
> easy to create a hash collision intentionally, since hashing algorithms
> are specifically designed to make such collisions unlikely.
> 
> > The stronger the algorithm, the more load it will place on the system. I
> > would stick with something fast, maybe md5 at the most.
> 
> In general, yes, but not necessarily. SHA1 is typically only slightly
> slower than MD5, but much safer.
> With SHA2, SHA512 is actually significantly faster than SHA256 on 64 bit
> architectures.
> SHA224 is basically SHA256 with truncated output, and SHA384 is
> basically SHA512 with truncated output, so those will not improve
> performance over the version with full output length.
> 
> Very recent CPUs come with Intel's SSE SHA Instructions, those support
> SHA1 and SHA256, and using a special processor instruction will
> typically be faster than running most algorithms in software - so
> CPU-supported SHA256 may be faster than software-supported MD5.
> However, I doubt that there is already support for these instructions in
> the software or the compilers, because the instructions have only
> recently made it into Intel and AMD processors (the specification itself
> however is from 2013).
> 
> Enough theory though, in my opinion, for just creating checksums, it
> does not really matter a lot which algorithm you use.
> Personally, I'd probably choose SHA1.
> 
> Cheers,
> Robert
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com 
> http://lists.linbit.com/mailman/listinfo/drbd-user 
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Out-of-sync woes

2017-08-04 Thread Jan Schermer

AFAIK this should not affect data integrity at rest (related to “verify-alg”) 
but only in-flight (csum-alg), and even then at most few blocks (that are 
in-flight) should be affected? (btw shouldn’t stable_pages_required be enabled?)

I think it’s more likely he’s hitting a number of bugs that are getting fixed 
in DRBD, where it would simply not resync data while appearing 
Consistent/UpToDate etc. I urge you to look at drbdsetup status --verbose 
—statistics $resource and look for out-of-sync counter >0.

We used cache=none with qemu and switched to cache=writeback with no corruption 
- you just need to take care only to have it primary on one node then (works 
with live migrations if you know what you’re doing though).

Jan


> On 4 Aug 2017, at 09:55, Veit Wahlich  wrote:
> 
> Hi Luke,
> 
> I assume you are experiencing the results of data inconsistency by
> in-flight writes. This means that a process (here your VM's qemu) can
> change a block that already waits to be written to disk.
> Whether this happens (undetected) or not depends on how the data is
> accessed for writing and synced to disk.
> 
> For qemu, you have to consider two factors; the guest OS' file systems'
> configuration and qemu's disk caching configuration:
> On Linux guests, this usually only happens for guests with file systems,
> that are NOT mounted either sync or with barriers, and with block-backed
> swap.
> On Windows guests it always happens.
> For qemu it depends on how the disk caching strategy is configured and
> thus whether it allows in-fight writes or not.
> 
> The common position is to configure qemu for writethrough caching for
> all disks and leave your guests' OS unchanged. You will also have to
> ignore/override libvirt's warning about unsafe migration with this cache
> setting, as it only applies to file-backed VM disks, not
> blockdev-backed.
> I use this for hundreds of both Linux and Windows VMs backed by DRBD
> block devices and have no inconsistency problems at all since this
> change.
> 
> Changing qemu's caching strategy might affect performance.
> For performance reasons you are advised to use a hardware RAID
> controller with battery-backed write-back cache.
> 
> For consistency reasons you are advised to use real hardware RAID, too,
> as the in-flight block changing problem described above might also
> affect mdraid, dmraid/FakeRAID, LVM mirroring, etc. (depending on
> configuration).
> 
> Best regards,
> // Veit
> 
> 
> Am Freitag, den 04.08.2017, 11:11 +1200 schrieb Luke Pascoe:
>> Hello everyone.
>> 
>> I have a fairly simple 2-node CentOS 7 setup running KVM virtual
>> machines, with DRBD 8.4.9 between them.
>> 
>> There is one DRBD resource per VM, with at least 1 volume each,
>> totalling 47 volumes.
>> 
>> There's no clustering or heartbeat or other complexity. DRBD has it's
>> own Gig-E interface to sync over.
>> 
>> I recently migrated a host between nodes and it crashed. During
>> diagnostics I did a verification on the drbd volume for the host and
>> found that it had _a lot_ of out of sync blocks.
>> 
>> This led me to run a verification on all volumes, and while I didn't
>> find any other volumes with large numbers of out of sync blocks, there
>> were several with a few. I have disconnected and reconnected all these
>> volumes, to force them to resync.
>> 
>> I have now set up a nightly cron which will verify as many volumes as
>> it can in a 2 hour window, this means I get through the whole lot in
>> about a week.
>> 
>> Almost every night, it reports at least 1 volume which is out-of-sync,
>> and I'm trying to understand why that would be.
>> 
>> I did some research and the only likely candidate I could find was
>> related to TCP checksum offloading on the NICs, which I have now
>> disabled, but it has made no difference.
>> 
>> Any suggestions what might be going on here?
>> 
>> Thanks.
>> 
>> Luke Pascoe
>> ___
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problem with drbd and data alignment

2017-03-27 Thread Jan Schermer

lsblk doesn't only show partitions, but other info as well.
e.g.
sdd   8:48   0   477G  0 disk
└─zbackup_crypt 253:40   477G  0 crypt

that is not a partition, it just shows what the device is used for
it will be the same in your case.

Jan

> On 27 Mar 2017, at 15:05, Marco Marino  wrote:
> 
> Hi Robert,
> I think the problem is related to the fact that there is a (LVM) partition on 
> top of the drbd device. I tried different configurations and if I use a raw 
> device for the drbd resource and then I use the drbd device without 
> partitions as a PV the problem disappear. Anyway, I dont know if this is 
> really a problem because there is not overlap at all. I (double) checked this 
> using fdisk on /dev/sde and on /dev/drbd2. Both show me a partition that 
> starts at sector 2048. The only problem is that when I use lsblk I see that 
> inside /dev/sde there are 2 devices: /dev/sde1 and /dev/drbd2. I'd like to 
> know if this can cause some kind of problem with performance or data 
> corruption
> Probably the question is: can I use a raw device as PV? As suggested here 
> 
>  it is possible but there is some problem with the management if I need to 
> enlarge the volume but I don't need this.
> 
> Marco
> 
> 
>  
> 
> 2017-03-27 9:54 GMT+02:00 Robert Altnoeder  >:
> The system looks for partition information at a certain offset on a
> storage device. If DRBD is used to write directly to a raw storage
> device, the effect of partitioning the DRBD device on the backing device
> is the same as if the backing device had been partitioned directly with
> no DRBD involved.
> Therefore, the system will see the partition information on the backing
> device.
> 
> DRBD internal meta data is written to the end of the backing device
> (without using any partitioning). The space occupied by the internal
> meta data will be missing from the DRBD device's size (the DRBD device's
> size will be the backing device's size minus the space reserved for
> internal meta data).
> 
> On 03/25/2017 02:57 PM, Yannis Milios wrote:
> >
> > Not 100% sure but the first 'partition' could be where drbd stores
> > metadata?
> >
> > Your setup should be fine as long ad you can deal in an easy manner
> > with future resizing of the backing device (sde).
> >
> > Yannis
> >
> > root@iscsi2 ~]# lsblk
> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > 
> > sde 8:64 0 3,7T 0 disk
> > ├─sde1 8:65 0 3,7T 0 part
> > └─drbd2 147:2 0 3,7T 0 disk
> > --
> > Sent from Gmail Mobile
> >
> br,
> --
> Robert Altnoeder
> +43 1 817 82 92 0 
> robert.altnoe...@linbit.com 
> 
> LINBIT | Keeping The Digital World Running
> DRBD - Corosync - Pacemaker
> f / t / in / g+
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com 
> http://lists.linbit.com/mailman/listinfo/drbd-user 
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD packages for Proxmox 4.x

2016-11-26 Thread Jan Schermer

What a great way to communicate with your users.
True, Linus sometimes does the same.
But you are not Linus.
So I'd advise you to think before you alienate (more) people.
This is so very unprofessional...

Btw jasminj actually defended you if you can't see that in his first post 
there[1]
and he's not wrong about [2] either.
I guess I'd better not elaborate.
Get your act together and respect your users, please.

Jan

[1] 
https://forum.proxmox.com/threads/drbdmanage-license-change.30404/#post-152763 

[2] 
https://forum.proxmox.com/threads/drbdmanage-license-change.30404/#post-152979 



> On 26 Nov 2016, at 09:04, Roland Kammerer  wrote:
> 
> On Sat, Nov 26, 2016 at 06:38:00AM +0100, Jasmin J. wrote:
>> Hello!
>> 
>> There is an ongoing discussion in the Proxmox support forum concerning the
>> recent license change of drbdmange:
> 
> Yes, they are ongoing, no need to spread FUD here and now. Wait.
> 
> But hey, as I guess this is the same JasminJ from the Proxmox forum[1],
> you can replace it in the meantime with "some perl scripts", right? Oh,
> don't forget to manage a whole cluster of nodes, support ZFS and LVM,
> and their various thin combinations and variants, provide snapshots and
> resize of resources cluster wide, and don't forget about the inner
> workings of DRBD (you already proved to be an expert in this area on
> this ML), so make sure that if you combine thin and thick in the right
> way that thin allocations don't end up thick. It's easy, just a few
> lines of perl that replace the current 67647 lines of python, right?
> 
> I stop here, before I get too "friendly", but please, PLEASE, FFS don't
> come up with some BS you obviously have no clue about.
> 
> Regards, rck
> 
> [1]
> https://forum.proxmox.com/threads/drbdmanage-license-change.30404/#post-152979
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] ZFS

2016-10-17 Thread Jan Schermer


> On 16 Oct 2016, at 21:18, Gandalf Corvotempesta 
>  wrote:
> 
> Il 16 ott 2016 19:19, "Jan Schermer"  <mailto:j...@schermer.cz>> ha scritto:
> >
> > That would be us :)
> 
> Really? Can you describe your infrastructure?
> 

3 storages, many more hypervisors, data triplicated... that's the usual scenario

> > There seems to be some confusion.
> > Do you want to assemble ZFS on top of DRBD devices or do you want to use 
> > ZFS instead of LVM?
> 
> I would like to use zfs on top of drbd or, better, use drbd on top of zfs 
> raid.
> 
> I would like to avoid an hardware raid controller and use zfs for it, but i 
> don't know how to put drbd and zfs together
> 

We use ZFS on the storages, ZVOLs on top of that, each ZVOL makes up part of 
the DRBD resource that gets exported to the hypervisor.


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] ZFS

2016-10-16 Thread Jan Schermer


> On 16 Oct 2016, at 19:10, Roland Kammerer  wrote:
> 
> On Sat, Oct 15, 2016 at 02:20:23PM +0200, Gandalf Corvotempesta wrote:
>> Anyone using ZFS with DRBD, in production?
> 
> I'm aware of a customer that has multiple hundred DRBD resources on ZFS.
> 

That would be us :)

>> As I knwo, ZFS likes to have direct access to disks, to manage them on it's 
>> own.
>> How do you handle this with DRBD?
> 
> ZFS has a FS part and an LV part, it is possible to "break out" block
> devices of a zpool. These can then be used as DRBD devices. DRBDManage
> knows how to handle that, no difference compared to LVM from a DRBDMange
> user's point of view.
> 

There seems to be some confusion.
Do you want to assemble ZFS on top of DRBD devices or do you want to use ZFS 
instead of LVM?

Jan



> Regards, rck
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

2016-10-12 Thread Jan Schermer

Short in the dark - are the drives (or their controller if you're using raid) 
using any form of caching? It is conceivable that when resync is finished it 
tries flushing the data to the device, and if this takes way to long it 
could lead to timeout of the drbd kernel thread.
Is IO happening on those drives when they are resyncing?
Try running something like "sync ; sleep 1 ; sync" on the Inconsistent node 
when it's resyncing (I hope that won't kill your IO)

But that's really just a guess.

Jan

> On 12 Oct 2016, at 16:04, Eric Robinson  wrote:
> 
> This morning we are seeing an issue where drbd is repeatedly resyncing, 
> getting to 100%, and starting over, and never getting to an UpToDate/UpToDate 
> state.
>  
> On one node, it is logging this sequence over and over…
>  
> 
>  
> Oct 12 06:56:11 ha14a kernel: d-con ha02_mysql: Starting asender thread (from 
> drbd_r_ha02_mys [804])
> Oct 12 06:56:11 ha14a kernel: block drbd1: drbd_sync_handshake:
> Oct 12 06:56:11 ha14a kernel: block drbd1: self 
> 13FB9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9 bits:0 
> flags:0
> Oct 12 06:56:11 ha14a kernel: block drbd1: peer 
> 38E17129E5821B5F:13FB9B08BF812C5B:13FA9B08BF812C5B:13F99B08BF812C5B bits:0 
> flags:0
> Oct 12 06:56:11 ha14a kernel: block drbd1: uuid_compare()=-1 by rule 50
> Oct 12 06:56:11 ha14a kernel: block drbd1: Becoming sync target due to disk 
> states.
> Oct 12 06:56:11 ha14a kernel: block drbd1: peer( Unknown -> Primary ) conn( 
> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
> Oct 12 06:56:11 ha14a kernel: block drbd1: receive bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:56:11 ha14a kernel: block drbd1: send bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:56:11 ha14a kernel: block drbd1: conn( WFBitMapT -> WFSyncUUID )
> Oct 12 06:56:11 ha14a kernel: block drbd1: updated sync uuid 
> 13FC9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9
> Oct 12 06:56:11 ha14a kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-target minor-1
> Oct 12 06:56:11 ha14a kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-target minor-1 exit code 0 (0x0)
> Oct 12 06:56:11 ha14a kernel: block drbd1: conn( WFSyncUUID -> SyncTarget )
> Oct 12 06:56:11 ha14a kernel: block drbd1: Began resync as SyncTarget (will 
> sync 0 KB [0 bits set]).
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive in 
> time.
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary -> Unknown ) 
> conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: asender terminated
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Terminating drbd_a_ha02_mys
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Connection closed
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( NetworkFailure -> 
> Unconnected )
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: receiver terminated
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Restarting receiver thread
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: receiver (re)started
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( Unconnected -> 
> WFConnection )
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Handshake successful: Agreed 
> network protocol version 101
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Peer authenticated using 20 
> bytes HMAC
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( WFConnection -> 
> WFReportParams )
>  
> 
>  
> On the other node, it is saying this over and over…
>  
> 
>  
> Oct 12 06:58:51 ha14b kernel: block drbd1: drbd_sync_handshake:
> Oct 12 06:58:51 ha14b kernel: block drbd1: self 
> 38E17129E5821B5F:148D9B08BF812C5B:148C9B08BF812C5B:148B9B08BF812C5B bits:0 
> flags:0
> Oct 12 06:58:51 ha14b kernel: block drbd1: peer 
> 148D9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9 bits:0 
> flags:0
> Oct 12 06:58:51 ha14b kernel: block drbd1: uuid_compare()=1 by rule 70
> Oct 12 06:58:51 ha14b kernel: block drbd1: Becoming sync source due to disk 
> states.
> Oct 12 06:58:51 ha14b kernel: block drbd1: peer( Unknown -> Secondary ) conn( 
> WFReportParams -> WFBitMapS )
> Oct 12 06:58:51 ha14b kernel: block drbd1: send bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:58:51 ha14b kernel: block drbd1: receive bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:58:51 ha14b kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1
> Oct 12 06:58:51 ha14b kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1 exit code 0 (0x0)
> Oct 12 06:58:51 ha14b kernel: block drbd1: conn( WFBitMapS -> SyncSource )
> Oct 12 06:58:51 ha14b kernel: block drbd1: Began resync as SyncSource (will 
> sync 0 KB [0 bits set]).
> Oct 12 06:58:5

Re: [DRBD-user] how about read a block that not return to upper application in protocol C?

2016-09-05 Thread Jan Schermer

> On 05 Sep 2016, at 11:50, Lars Ellenberg  wrote:
> 
> On Mon, Sep 05, 2016 at 01:16:21AM +0800, Mia Lueng wrote:
>> Hi All:
>> In protocol C, a bio will return to upper application(execute
>> bi_endio()) when local bio is completed and  recieve the data ack
>> packet from peer.  But if  a write request to block N was submitted
>> and written to local disk, but not received the data ack from peer, a
>> read request to the same block N  is comming. The read request will
>> get the data of block N that was not returned to upper application.
>> 
>> Will this cause the application's(eg. oracle) logical error?
> 
> If you have dependencies between IO requests,
> you must not issue the second request,
> before the first has completed.
> 
> Think of local disk only.
> 
> You issue a WRITE to block X.
> Then, before that completed,
> you issue a READ to block X.
> (actual, direct, IO requests to the backend device,
> not catched by some intermediate caching layer)
> 
> The result of the READ is undefined.
> It may return old data, it may return new data,
> it may even return partially updated data.
> 
> Undefined.
> 

Actually I'm not sure this is true, depending of course on what you mean by 
"before that completed" - not completed or just not flushed? On a local disk 
even buffered write should cause subsequent reads to reflect the new contents, 
corner case here is DIRECT_IO on write but not on read, which is undefined. I'd 
expect that to be true with protocol C even in a multi-node setup, but I'm not 
sure what e.g. shared filesystems expect in this case.

Re: the original question - depends on how Oracle writes the data. If it writes 
the data synchronously then it will block until written everywhere, subsequent 
reads return the new data and that's how ACID compliant software should do it. 
If it doesn't use synchronous IO but a "weaker" variant like O_DIRECT, then 
that could present a race condition - O_DIRECT is not guaranteed to be 
unbuffered, it just works like that most of the time. And while some care is 
taken to accomodate applications that treat it like synchronous IO, I'd be vary 
to depend on it when more layers are involved that like to buffer stuff or if 
you simply have more than one application touching the same data.

Having said that, I expect DRBD to be doing the right thing, people use it for 
this (and I used it for this), but since enterprisey-software is almost always 
dependent on how things worked in the 80s it's something you should always test 
for yourself on a modern system :-)

> -- 
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT
> __
> please don't Cc me, but send to list -- I'm subscribed
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD over ZFS - or the other way around?

Re: [DRBD-user] csums-alg,verify-alg algorithm

Re: [DRBD-user] Out-of-sync woes

Re: [DRBD-user] Problem with drbd and data alignment

Re: [DRBD-user] DRBD packages for Proxmox 4.x

Re: [DRBD-user] ZFS

Re: [DRBD-user] ZFS

Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

Re: [DRBD-user] how about read a block that not return to upper application in protocol C?

9 matches

Site Navigation

Mail list logo

Footer information