Re: [Linux-ha-dev] Thinking about a new communications plugin

Lars Ellenberg Sat, 27 Nov 2010 15:03:36 -0800

On Sat, Nov 27, 2010 at 10:16:54AM -0700, Alan Robertson wrote:
> 
> I was talking about a half-dozen or so nodes.  Not to /implement/ a
> cloud, but to /run in/ a cloud someone else implemented.


I know. Of course.

Maybe we should have stuck closer to the original subject ;)
Yes, as was my first (one line) answer: there such a feature like you
proposed (reconfiguring unicast peers at runtime) could come in handy.

But again:
We will have to address the messaging size limits with heartbeat,
if it is going to be used with pacemaker 1.1.4/1.2.

Even on a 3-node cluster, pacemaker 1.1.4 and "standard" cts generated
cib breaks on current heartbeat with -EMSGSIZE.

So that's relevant to your half-dozen or so nodes case.

It is because pacemaker now, mainly for performance reasons, hands down
up to 128k (I think) FT_STRING fields, before starting bz2'ing the
payload itself (and handing it down as FT_BINARY.
Unless you configure heartbeat to use "traditional compression"
(compressing every packet > threshold, including control fields,
so every node will have to uncompress it before it knows that this was a
node message to someone else and can be dropped), that'll obviously end
up with broken communication.

Sure, the threshold when pacemaker starts bz2'ing is only a define.
And pacemaker could also pass down FT_UNCOMPRESS or whatever it was
(it sometimes does).

But I think it should be address in the comm layer anyways.

The heartbeat ipc layer should know that to pass data between various
processes on the same node, it does not need to compress/decompress
anything. (There is also too much memcpy'ing going on in there, but
let's not go there yet.)
Once a message actually hits a media type (leaves a node), it should,
depending on media type, be able to transparently compress messages --
without the ipc client knowing too much about the internals.  And it
should not mix ipc layer internal (crontrol) fields and client payload
fields in the same client visible and manipulatible name space.

Thus my suggestion to, for starters, encapsulate all client payload
in a sub-message, which can then be compressed, either partially, or
fully, even using the existing infrastruture.

Sure, "if it ain't broke, don't fix it".
But, just because we can "make it work", does not mean it could not do
with a little fixing here and there. Since, if we have to "make" it
work, that hints at things being at least slightly broken, some way.

Anyways.

Once the things I mentioned above cause so much pain that either
heartbeat gets fixed, or replaced, I can just say "I told you so"

But until then, you could probably already have implemented your
original proposal in the cumulative man hours spent writing and reading
this thread, and I'm sure I will get used. So please, just go ahead.

Cheers,


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Thinking about a new communications plugin

Reply via email to