I can see I'm going to have to drop the microphone mathematics.

matt massie wrote:
so i'm pretty certain g3 will be a pure xml beast. no more xdr messages on the wire. here's my thinking on this...in no necessary order..

I'm going to shock you by saying I don't like this. I know, you're asking yourself why someone who's watching a dual-processor E420R take over 10 seconds to parse a 3.6MB gmetad output is against the idea of using more XML elsewhere in the program design.

It's very portable, I'm not arguing that point. On the monitoring cores I am worried about speed and CPU cycles - I want the monitoring core to be very high in one respect, very low in the other.

[insert joke here.]

our old messages where not grouped together.  and while they where very
small messages.. each message has a 52 byte header and the minimum
ethernet packet size is 64 octets.  which means that we are sending 64
bytes of data for each 8-12 byte message (and the header is 6x the size of
the data!).

Why not walk the metric tree and send a branch at a time as an XDR? Or send the information about the metric tree layout in separate XDR packets on an on-demand or periodic basis?

another problem with having each individual metric multicast it's own data is that is disconnects related data.. e.g. CPU (user,sys,nice,idle). since these 4 related metrics are sent at different times they might now always represent the same time slice (and therefore might not add up to exactly 100%.. it's not always good to give 110%).

Sometimes they don't add up to 100% anyway. I think this happens mainly on Solaris.

"That's Carl's fault.  He's new."
"Sorry.  My bad."

the solution is to group the metric together somehow to be send at the same time. we could do that using xdr or xml... but which is more efficient... in terms of network and CPU?

Different users will answer this question differently. Ganglia's not being used in just one situation. People managing a few large clusters will say that CPU usage is more important than network usage (especially if the jobs being run are CPU-intensive except at either end where there's a relatively short burst of network traffic).

People linking smaller clusters over a wider area will answer the opposite - it's worth chewing up a few more CPU cycles if it means using a smaller percentage of a slow link.

I got an idea.

of course without the newlines and formatting. the length of this example is 135 bytes... which contains 4 metrics expressed explicitly. in the past each gmond have a metric lookup table compiled in which reduced the message size. the explicit message format will mean that all wire data sources (gmond, gmetric, etc).. will all use the same format. it also means we have no more metric collisions since everthing is explicit.

Still think we could try sending metrics out in an XDR table with a hashed-up value for "metric name" which corresponds to an entry in a previously-transmitted metric attribute lookup table... keeps the transmitted data simple, after all.

so.. with the current message method.. this message takes at least 60 + 60 + 60 + 60 = 240 bytes... and it's flat.

Apples, meet Oranges. Oranges, meet Apples. :) I'm sure a carefully-thought-out XDR scheme wouldn't provide numbers like that...

this new explicit xml format will take 52 + 135 = 187 bytes. more info sent using less bandwidth... hierarchical too.

How much longer does it take to parse the 187 bytes of XML versus the 240 bytes of XDR? Is there even a difference?

i'm sure we could think of a way to build an explicit hierarchical xdr
format which could rival the efficiency of this xml format.. but it would
not be nearly as accessible to developers.  imagine how easy it would be
to plug an app directly into the xml wire ... almost fun.  woohoo!

Isn't the plug-in API going to handle that? If they want to put an app that communicates on the wire, theoretically they would link in libganglia... (libg3?).

in the past i thought an xdr format would be more efficient on the CPU
side of things.. because i could send the metric branch name "/g/cpu"  or
whatever as an xdr_array so it doesn't need to be taken apart/parsed on
the receiving end... just read the data from each array element.  there
are tools which use the xdr description file (which we would provide) but
they are MUCH less available and easy to use than xml parsers.

Has there been a tremendous outcry from tool developers that the Ganglia information isn't as accessible as they'd like it to be? If they want XML they can query a monitoring core, can't they?

also.. parsing the branch name can be made very efficient using regex libraries... which use precompiled patterns for matching..

Could it win a bake-off against a similarly tuned XDR method? In terms of speed, CPU and scalability?

this leads into thoughts from the local wire format to the wide area format.

i love trees.  i have hugged many trees in my life and have been very
lucky that none of them have hugged me back.  remember the childhood
trauma of watching dorothy get pummeled with apples from the living
forest?  all because she picked fruit from a tree (hmmm).. now back to the
yellow brick road (btw... think out there now.. big.. the world...)

Really? Makes me think of something from some big fantasy movie that came out last year that had that dude from the Matrix in it. Dungeons and Dragons or something.

right now gmetad uses a very simple aggregation model. that will not scale (as we have painfully experienced). imagine a single DNS server with every host/ip pair in the world being served from it. ha!

what we need is
1. a URL like way of expressing the data we want
2. replace the aggregation model with a delegation model.
3. [you get to this below, but put it on the list, dammit!]  A QUERY MODEL!

Not many database apps that talk to a SQL-using back-end are written without usage of the "WHERE" or "LIMIT" clauses. :)

If I didn't have other coding commitments, I'd probably try and hack this into gmetad *now* ...

first.. the URL business... here is an example of a g3 URL...

/World/USA/California/Berkeley/UCB/Millennium Cluster//mm56/cpu/number

i'm thinking grand here.. but i really believe that in the end we will creat a true internet overlay which will empower the internet in ways that haven't been done before.

You're perilously close to using a Wired buzzword like "digital divide" right here. You may need to be deprogrammed.

so.. this URL only uses a single delimiter "/". feel free to debate what you think this delimited should be.. ':' might be a nice way to do it...

World:US:California:Berkeley:UCB:Millennium::mm56:cpu:number

.. i actually like the look of this a little more.. it's easier to read.

My eyes skimmed over the double-delimiter the first time I read it.  :)

this is not complete XML at all.. don't want it to be too busy.. i know
steve wagner could handle seeing all the tags since he likes to read raw
xml streams but i'm not sure about the rest of you.  :)

I pipe 'em to grep, actually.

'telnet gmetad-host 8651 | grep "HOST " | wc -l'

The reason I always view the raw XML (or pipe it through grep) is that I don't want any parsing of the data to be done that I don't know about.

btw, mu means a "metric unit".  we can change that name but i like how it
matches with organizational unit AND i love the concept of mu from
buddhism matched with the MIU puzzle introduced to me by Hofstadter,
Godel, Escher and Bach.  i ramble. (if you want to learn more google "MU
Puzzle").

And it's also a Revenge of the Nerds reference.

so.. let's get back to the delegation model side of things.

For me, the purpose of the metadaemon is to handle requests from monitoring apps. The metadaemon should be the only thing polling any of the monitoring cores (which are, after all, on systems that should be working on producing widgets). It's not entirely clear from this section whether you're referring just to the "nearest" metadaemon (yay) or actually referring to an individual monitoring core (boo). So the first thing I thought of when I read this section was, "Great, can I turn it off?"

Also, does this address the possibility of multiple metadaemons for the same data source? People might wanna cluster their metadaemons you know...

i wish XPath/XQuery was mature and there was nice multi-platform support. i don't see that right now and i'm not sure how long it will be until it
happens.  most of the good XQuery stuff out there is written in Java. i
don't know if we want to start developing Java code.  maybe ...

Hmmmm... that might be fun on my Sun metadaemon box.  :)

[on second thought, I'm not sure a :) is appropriate at this point...]

i thinking POSIX regular expressions might be the way to go...

I'm still not entirely convinced that working with strings is the key to high speed, low CPU usage and high scalability...

i should have the g3 house ready to move into very soon... with a nice tree in the front yard.

Just remember to add windows, floors, doors and wallpaper in every room. Otherwise your Sims won't like it and they'll get very depressed and start slapping each other.

It'll be just like this list!

Oh.  Right.  My idea.

A metric pipelining plug-in with multicast and unicast support. The plug-in would have to be configured with a list of nodes that it's responsible for (or an entire cluster - maybe we could just use URLs?) and a reporting interval for each. Just like the metadaemon, in reverse. Every interval seconds, it transmits the appropriate chunk of metrics in XML to its configured destination. On receiving the metric chunk, it's treated just as if it had originated locally, and gets re-transmitted over the locally-configured multicast channel (obviously this only works if we *don't* break the pipelined data into individual metric chunks).

This would actually increase Ganglia scalability (at the price of some latency over pipelined links) because it allows a finer degree of control over multicast traffic, and each individual node in a very large cluster doesn't have to deal with 50,000 small packets per second being firehosed at it (instead it's dealing with a few thousand larger packets closer to the MTU value).

I can see that being a lot of fun for slow links... heck, after releasing the source it should only be a matter of time before people turn that into a notifier plug-in. :)

OK, that's all for now, I think...


Reply via email to