Re: [Ganglia-developers] g3 (really long)

Steven Wagner Thu, 27 Mar 2003 15:04:25 -0800

I can see I'm going to have to drop the microphone mathematics.


matt massie wrote:

so i'm pretty certain g3 will be a pure xml beast. no more xdr messageson the wire. here's my thinking on this...in no necessary order..

I'm going to shock you by saying I don't like this. I know, you're askingyourself why someone who's watching a dual-processor E420R take over 10seconds to parse a 3.6MB gmetad output is against the idea of using moreXML elsewhere in the program design.

It's very portable, I'm not arguing that point. On the monitoring cores Iam worried about speed and CPU cycles - I want the monitoring core to bevery high in one respect, very low in the other.


[insert joke here.]

our old messages where not grouped together.  and while they where very
small messages.. each message has a 52 byte header and the minimum
ethernet packet size is 64 octets.  which means that we are sending 64
bytes of data for each 8-12 byte message (and the header is 6x the size of
the data!).

Why not walk the metric tree and send a branch at a time as an XDR? Orsend the information about the metric tree layout in separate XDR packetson an on-demand or periodic basis?

another problem with having each individual metric multicast it's own datais that is disconnects related data.. e.g. CPU (user,sys,nice,idle).since these 4 related metrics are sent at different times they might nowalways represent the same time slice (and therefore might not add up toexactly 100%.. it's not always good to give 110%).

Sometimes they don't add up to 100% anyway. I think this happens mainly onSolaris.


"That's Carl's fault.  He's new."
"Sorry.  My bad."

the solution is to group the metric together somehow to be send at thesame time. we could do that using xdr or xml... but which is moreefficient... in terms of network and CPU?

Different users will answer this question differently. Ganglia's not beingused in just one situation. People managing a few large clusters will saythat CPU usage is more important than network usage (especially if the jobsbeing run are CPU-intensive except at either end where there's a relativelyshort burst of network traffic).

People linking smaller clusters over a wider area will answer the opposite- it's worth chewing up a few more CPU cycles if it means using a smallerpercentage of a slow link.


I got an idea.

of course without the newlines and formatting. the length of this exampleis 135 bytes... which contains 4 metrics expressed explicitly. in thepast each gmond have a metric lookup table compiled in which reduced themessage size. the explicit message format will mean that all wire datasources (gmond, gmetric, etc).. will all use the same format. it alsomeans we have no more metric collisions since everthing is explicit.

Still think we could try sending metrics out in an XDR table with ahashed-up value for "metric name" which corresponds to an entry in apreviously-transmitted metric attribute lookup table... keeps thetransmitted data simple, after all.

so.. with the current message method.. this message takes atleast 60 + 60 + 60 + 60 = 240 bytes... and it's flat.

Apples, meet Oranges. Oranges, meet Apples. :) I'm sure acarefully-thought-out XDR scheme wouldn't provide numbers like that...

this new explicit xml format will take 52 + 135 = 187 bytes. more infosent using less bandwidth... hierarchical too.

How much longer does it take to parse the 187 bytes of XML versus the 240bytes of XDR? Is there even a difference?

i'm sure we could think of a way to build an explicit hierarchical xdr
format which could rival the efficiency of this xml format.. but it would
not be nearly as accessible to developers.  imagine how easy it would be
to plug an app directly into the xml wire ... almost fun.  woohoo!

Isn't the plug-in API going to handle that? If they want to put an appthat communicates on the wire, theoretically they would link inlibganglia... (libg3?).

in the past i thought an xdr format would be more efficient on the CPU
side of things.. because i could send the metric branch name "/g/cpu"  or
whatever as an xdr_array so it doesn't need to be taken apart/parsed on
the receiving end... just read the data from each array element.  there
are tools which use the xdr description file (which we would provide) but
they are MUCH less available and easy to use than xml parsers.

Has there been a tremendous outcry from tool developers that the Gangliainformation isn't as accessible as they'd like it to be? If they want XMLthey can query a monitoring core, can't they?

also.. parsing the branch name can be made very efficient using regexlibraries... which use precompiled patterns for matching..

Could it win a bake-off against a similarly tuned XDR method? In terms ofspeed, CPU and scalability?

this leads into thoughts from the local wire format to the wide areaformat.


i love trees.  i have hugged many trees in my life and have been very
lucky that none of them have hugged me back.  remember the childhood
trauma of watching dorothy get pummeled with apples from the living
forest?  all because she picked fruit from a tree (hmmm).. now back to the
yellow brick road (btw... think out there now.. big.. the world...)

Really? Makes me think of something from some big fantasy movie that cameout last year that had that dude from the Matrix in it. Dungeons andDragons or something.

right now gmetad uses a very simple aggregation model. that will notscale (as we have painfully experienced). imagine a single DNS serverwith every host/ip pair in the world being served from it. ha!
what we need is
1. a URL like way of expressing the data we want
2. replace the aggregation model with a delegation model.

3. [you get to this below, but put it on the list, dammit!]  A QUERY MODEL!

Not many database apps that talk to a SQL-using back-end are writtenwithout usage of the "WHERE" or "LIMIT" clauses. :)

If I didn't have other coding commitments, I'd probably try and hack thisinto gmetad *now* ...

first.. the URL business... here is an example of a g3 URL...

/World/USA/California/Berkeley/UCB/Millennium Cluster//mm56/cpu/number
i'm thinking grand here.. but i really believe that in the end we willcreat a true internet overlay which will empower the internet in ways thathaven't been done before.

You're perilously close to using a Wired buzzword like "digital divide"right here. You may need to be deprogrammed.

so.. this URL only uses a single delimiter "/". feel free to debate whatyou think this delimited should be.. ':' might be a nice way to do it...
World:US:California:Berkeley:UCB:Millennium::mm56:cpu:number

.. i actually like the look of this a little more.. it's easier to read.


My eyes skimmed over the double-delimiter the first time I read it.  :)

this is not complete XML at all.. don't want it to be too busy.. i know
steve wagner could handle seeing all the tags since he likes to read raw
xml streams but i'm not sure about the rest of you.  :)


I pipe 'em to grep, actually.

'telnet gmetad-host 8651 | grep "HOST " | wc -l'

The reason I always view the raw XML (or pipe it through grep) is that Idon't want any parsing of the data to be done that I don't know about.

btw, mu means a "metric unit".  we can change that name but i like how it
matches with organizational unit AND i love the concept of mu from
buddhism matched with the MIU puzzle introduced to me by Hofstadter,
Godel, Escher and Bach.  i ramble. (if you want to learn more google "MU
Puzzle").


And it's also a Revenge of the Nerds reference.

so.. let's get back to the delegation model side of things.

For me, the purpose of the metadaemon is to handle requests from monitoringapps. The metadaemon should be the only thing polling any of themonitoring cores (which are, after all, on systems that should be workingon producing widgets). It's not entirely clear from this section whetheryou're referring just to the "nearest" metadaemon (yay) or actuallyreferring to an individual monitoring core (boo). So the first thing Ithought of when I read this section was, "Great, can I turn it off?"

Also, does this address the possibility of multiple metadaemons for thesame data source? People might wanna cluster their metadaemons you know...

i wish XPath/XQuery was mature and there was nice multi-platform support.i don't see that right now and i'm not sure how long it will be until it
happens.  most of the good XQuery stuff out there is written in Java. i
don't know if we want to start developing Java code.  maybe ...


Hmmmm... that might be fun on my Sun metadaemon box.  :)

[on second thought, I'm not sure a :) is appropriate at this point...]

i thinking POSIX regular expressions might be the way to go...

I'm still not entirely convinced that working with strings is the key tohigh speed, low CPU usage and high scalability...

i should have the g3 house ready to move into very soon... with a nicetree in the front yard.

Just remember to add windows, floors, doors and wallpaper in every room.Otherwise your Sims won't like it and they'll get very depressed and startslapping each other.


It'll be just like this list!

Oh.  Right.  My idea.

A metric pipelining plug-in with multicast and unicast support. Theplug-in would have to be configured with a list of nodes that it'sresponsible for (or an entire cluster - maybe we could just use URLs?) anda reporting interval for each. Just like the metadaemon, in reverse.Every interval seconds, it transmits the appropriate chunk of metrics inXML to its configured destination. On receiving the metric chunk, it'streated just as if it had originated locally, and gets re-transmitted overthe locally-configured multicast channel (obviously this only works if we*don't* break the pipelined data into individual metric chunks).

This would actually increase Ganglia scalability (at the price of somelatency over pipelined links) because it allows a finer degree of controlover multicast traffic, and each individual node in a very large clusterdoesn't have to deal with 50,000 small packets per second being firehosedat it (instead it's dealing with a few thousand larger packets closer tothe MTU value).

I can see that being a lot of fun for slow links... heck, after releasingthe source it should only be a matter of time before people turn that intoa notifier plug-in. :)


OK, that's all for now, I think...

Re: [Ganglia-developers] g3 (really long)

Reply via email to