On 2013-04-22T08:47:26, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote:
> > They're not in "D" because they are not waiting on disk IO, but have > > a lot of network IO and data structure maintenance to handle. > Interesting: While flooding a Gb network, the acieved mirroring rate is only > about 60MB/s. But we are not mirroring through the network, but throuch > 4Gb/FC (fully redundant fabrics). Sure. You're seeing the locking overhead, and the effort needed to sync the bitmap. > > It doesn't, because this simplifies the dirty logging. It always will > > write to leg 1 first, hence all read requests can always be satisfied > > from leg 1 without the need to cluster-wide sync if leg 1 and leg 2 are > > already in sync in the IO paths. > See the performance of MD-RAID for a movtivation: MD-RAID is much faster. Yes, but MD-RAID has the advantage of being node local, and not cluster-wide. That is a significant performance shortcut. If cLVM split the IO over both drives, the IO would be even worse, since it would increase the effort needed to sync. > > > 2) LVM should use a leg-internal bitmap to resynchronize the > > > mirror in a non-stupid way > > > > It does use a bitmap for syncing, if you created the lvmirror with a > > persistent mirrorlog. > That design is broken: if you have two separate storage systems in two > locations, where do you put the bitmap? In HP-UX (similar as MD-RAID) > each PV had ist own bitmap; with Linux-LVM you need a _third_ device > to store the bitmap. That's nonsense. For SLE HA 11 SP3, we've updated the code to support mirrored bitmaps. Ulrich, you're judgmental and aggressive tone is not helpful nor constructive. We all know that cLVM2 has deficiencies and unimplemented features. Calling it "stupid" and "nonsense" is helping how? > Yes, DRBD dual-primary also failed in out scenario: Manual repair was needed. > The primary idea of mirroring is that systems keep running of one mirror leg > fails. Hm? While not related to cLVM2, this is intriguing. That's rather exactly how drbd should behave - unless, of course, you configured it so that both peers of an active/active scenario are allowed to continue running. In that case, data diverges, and automatic recovery is very difficult. The fence-peer etc stuff should prevent that from happening. If it doesn't, raise a discussion on the drbd-users mailing list. > > > So my advice is: Don't use it (for SLES11 SP2). > > You should not use it if performance is your primary goal for using it, > > no. > See above. I can only assume cLVM was tested in a "toy environment" with > either tiny or extremely slow disks so that the disk limited the mirroring > speed. It was mostly tested for functionality (with limitations), not performance. > Yes, I had complained about the massive logging of cLVM (which showed that > it's communication quite a lot (I'd say: way too much)), and the solution > being applied seems to be disabling logging. So the extensive communication > still happens. It *needs* to communicate this much for a concurrent activation. A first (and reasonably easy) step would be to introduce an LV-wide log that would allow one node to fully own a cluster-mirrored LV transparently, since most use cases do not actually perform concurrent access. (Think Xen migration cases.) And only upconvert to the fully-fledged cluster mode once it is actually opened for writing on multiple nodes. And then figure out some way to speed that up further. Alas, that is not trivial to do. > > The CPU overhead will have improved some, but the basic design of cLVM2 > > mirroring hasn't changed a lot. > > > > This is the same upstream and in all distributions, it is not SLES > > specific. > > There were some rumours that Redhat's LVM is ahead of SUSE's by at least one > generation... At least not for released code, no, this is not the case. ftp://sources.redhat.com/pub/lvm2/ - last release is from October. And it doesn't improve this much at all. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems