On Thursday 20 March 2008 06:58:45 Carlo Marcelo Arenas Belon wrote:
> On Mon, Mar 17, 2008 at 10:31:10PM +0100, Paul Millar wrote:
> haven't used Xen in this setup, but had a similar setup using kvm 
[..]
Yes, this is similar to my setup.  The only major difference is I don't attach 
eth0 to the bridge; I use iptables to NAT out-going connections.

> > I've noticed two problems with this setup.  First, multicast binding to
> > particular device when sending UDP seems to be broken.
>
> binding to br0 works fine for me (can't send and listen to the metrics
> through multicast packets) and of course, sniffing the interface (br0 or
> eth0) show packets going out and in (in both at the same time, as they are
> bridged).

I suspect that this is working "by accident" :-)

If the multicast binding isn't actually taking place, your kernel will send 
packets to the first non-loopback device (this is my experience; however, its 
a kernel-specific algorithm), i.e. eth0.  As you say, since it is bridged, 
you will see traffic on both br0 and eth0.

You could test this by removing eth0 from the bridge and seeing whether you 
continue to see the multicast traffic on br0.

When I tell gmond to bind to the bridge interface ("br-xen"), I do see 
traffic, but only on eth0.  The debug output confirmed that gmond is picking 
up the option, but it somehow isn't acting on it.

> I suspect the br-xen device doesn't support multicast IOCTLs making your
> gmond deaf (have seen something like that with an intel wireless adapter
> once).

Well, that could be, but it worked with with a 3.0.x-series gmond and I've not 
changed the Xen installation.

My suspicion here is, with move to the 3.1 code-base, we've introduced a 
regression against bug #140:

http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=140


> > The second problem is with a Xen guest host transmitting over the
> > internal bridge. [...]  no actual metrics are recorded. 
>
> a 3.1 gmond sending its metrics to a 3.0 gmond showed something similar in
> my tests, and that is expected (compatibility is only granted at the XML
> layer between gmetad and gmond)

This may be true, but the "other" gmond is also from trunk.  If I telnet to 
its TCP port, I see it identify itself as:

        <GANGLIA_XML VERSION="3.1.0." SOURCE="gmond">


> did you see this problem with both 3.1 gmond?,

Yes, I'm only using code from the current trunk.  I wanted to avoid 
complications from running daemons from 3.0- and 3.1- codebase concurrently.

> is the same observed if the metrics are collected using unicast instead?

Hmm, good question.

I've just checked and it works.  Switching back to multicast and it fails.  In 
both cases I can see traffic the network traffic (on the bridge) consistent 
with metric updates being transmitted.

The failure mode is specific: only the metrics are lost whilst the host entry 
is maintained (with TN value resetting, as expected).

I suspect that the code that deals with updating gmond's cache of metrics is 
somehow confused, resulting in those metrics not being recorded; yet that 
something was received from the host *is* recorded, hence the host entry in 
maintain.

HTH,

Paul.

Attachment: signature.asc
Description: This is a digitally signed message part.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to