On 2013-04-22T08:47:26, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote:

> > They're not in "D" because they are not waiting on disk IO, but have
> > a lot of network IO and data structure maintenance to handle.
> Interesting: While flooding a Gb network, the acieved mirroring rate is only 
> about 60MB/s. But we are not mirroring through the network, but throuch 
> 4Gb/FC (fully redundant fabrics).

Sure. You're seeing the locking overhead, and the effort needed to sync
the bitmap.

> > It doesn't, because this simplifies the dirty logging. It always will
> > write to leg 1 first, hence all read requests can always be satisfied
> > from leg 1 without the need to cluster-wide sync if leg 1 and leg 2 are
> > already in sync in the IO paths.
> See the performance of MD-RAID for a movtivation: MD-RAID is much faster.

Yes, but MD-RAID has the advantage of being node local, and not
cluster-wide. That is a significant performance shortcut.

If cLVM split the IO over both drives, the IO would be even worse, since
it would increase the effort needed to sync.

> > > 2) LVM should use a leg-internal bitmap to resynchronize the
> > > mirror in a non-stupid way
> > 
> > It does use a bitmap for syncing, if you created the lvmirror with a
> > persistent mirrorlog.
> That design is broken: if you have two separate storage systems in two
> locations, where do you put the bitmap? In HP-UX (similar as MD-RAID)
> each PV had ist own bitmap; with Linux-LVM you need a _third_ device
> to store the bitmap. That's nonsense.

For SLE HA 11 SP3, we've updated the code to support mirrored bitmaps.

Ulrich, you're judgmental and aggressive tone is not helpful nor
constructive. We all know that cLVM2 has deficiencies and unimplemented
features. Calling it "stupid" and "nonsense" is helping how?


> Yes, DRBD dual-primary also failed in out scenario: Manual repair was needed.
> The primary idea of mirroring is that systems keep running of one mirror leg 
> fails.

Hm? While not related to cLVM2, this is intriguing. That's rather
exactly how drbd should behave - unless, of course, you configured it so
that both peers of an active/active scenario are allowed to continue
running. In that case, data diverges, and automatic recovery is very
difficult.

The fence-peer etc stuff should prevent that from happening. If it
doesn't, raise a discussion on the drbd-users mailing list.

> > > So my advice is: Don't use it (for SLES11 SP2).
> > You should not use it if performance is your primary goal for using it,
> > no.
> See above. I can only assume cLVM was tested in a "toy environment" with 
> either tiny or extremely slow disks so that the disk limited the mirroring 
> speed.

It was mostly tested for functionality (with limitations), not
performance.

> Yes, I had complained about the massive logging of cLVM (which showed that 
> it's communication quite a lot (I'd say: way too much)), and the solution 
> being applied seems to be disabling logging. So the extensive communication 
> still happens.

It *needs* to communicate this much for a concurrent activation.

A first (and reasonably easy) step would be to introduce an LV-wide log
that would allow one node to fully own a cluster-mirrored LV
transparently, since most use cases do not actually perform concurrent
access. (Think Xen migration cases.)

And only upconvert to the fully-fledged cluster mode once it is actually
opened for writing on multiple nodes. And then figure out some way to
speed that up further. Alas, that is not trivial to do.

> > The CPU overhead will have improved some, but the basic design of cLVM2
> > mirroring hasn't changed a lot.
> > 
> > This is the same upstream and in all distributions, it is not SLES
> > specific.
> 
> There were some rumours that Redhat's LVM is ahead of SUSE's by at least one 
> generation...

At least not for released code, no, this is not the case.
ftp://sources.redhat.com/pub/lvm2/ - last release is from October. And
it doesn't improve this much at all.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to