[Ganglia-developers] g3 (really long)

matt massie Thu, 27 Mar 2003 13:09:03 -0800

prepare yourself.. i'm about to ramble a whole bunch here.  i wanted to
let you know where i am in coding/thinking g3 right now and to get any
rambling in return.  that's good juju.


so i'm pretty certain g3 will be a pure xml beast.  no more xdr messages 
on the wire.  here's my thinking on this...in no necessary order..

our old messages where not grouped together.  and while they where very
small messages.. each message has a 52 byte header and the minimum
ethernet packet size is 64 octets.  which means that we are sending 64
bytes of data for each 8-12 byte message (and the header is 6x the size of
the data!).

another problem with having each individual metric multicast it's own data 
is that is disconnects related data.. e.g. CPU (user,sys,nice,idle).  
since these 4 related metrics are sent at different times they might now 
always represent the same time slice (and therefore might not add up to 
exactly 100%.. it's not always good to give 110%).

the solution is to group the metric together somehow to be send at the 
same time.  we could do that using xdr or xml... but which is more 
efficient... in terms of network and CPU?

we could use the xdr_array to group metrics and prefix them with a branch 
string .. or we could use this xml...

<g3>
  <u n="/g/cpu" u="%" t="f" m="60">
    <m n="user" v="10.0"/>
    <m n="system" v="12.3"/>
    <m n="nice" v="0.0"/>
    <m n="idle" v="77.7"/>
  </u>
</g3>

of course without the newlines and formatting.  the length of this example 
is 135 bytes... which contains 4 metrics expressed explicitly.  in the 
past each gmond have a metric lookup table compiled in which reduced the 
message size.  the explicit message format will mean that all wire data 
sources (gmond, gmetric, etc).. will all use the same format.  it also 
means we have no more metric collisions since everthing is explicit.

remember the old name space was flat.. we would still need to add a branch 
attribute (which would be a xdr_string.. 4 bytes for the length and 4*n 
bytes to hold the string.. for example "/g/cpu" would be 4+4*2= 12 bytes).

so.. with the current message method.. this message takes at 
least 60 + 60 + 60 + 60 = 240 bytes... and it's flat.

this new explicit xml format will take 52 + 135 = 187 bytes.  more info 
sent using less bandwidth... hierarchical too.

i'm sure we could think of a way to build an explicit hierarchical xdr
format which could rival the efficiency of this xml format.. but it would
not be nearly as accessible to developers.  imagine how easy it would be
to plug an app directly into the xml wire ... almost fun.  woohoo!

in the past i thought an xdr format would be more efficient on the CPU
side of things.. because i could send the metric branch name "/g/cpu"  or
whatever as an xdr_array so it doesn't need to be taken apart/parsed on
the receiving end... just read the data from each array element.  there
are tools which use the xdr description file (which we would provide) but
they are MUCH less available and easy to use than xml parsers.

also.. parsing the branch name can be made very efficient using regex 
libraries... which use precompiled patterns for matching.. 

this leads into thoughts from the local wire format to the wide area 
format.

i love trees.  i have hugged many trees in my life and have been very
lucky that none of them have hugged me back.  remember the childhood
trauma of watching dorothy get pummeled with apples from the living
forest?  all because she picked fruit from a tree (hmmm).. now back to the
yellow brick road (btw... think out there now.. big.. the world...)

right now gmetad uses a very simple aggregation model.  that will not 
scale (as we have painfully experienced).  imagine a single DNS server 
with every host/ip pair in the world being served from it.  ha!

what we need is
1. a URL like way of expressing the data we want
2. replace the aggregation model with a delegation model.

first.. the URL business... here is an example of a g3 URL...

/World/USA/California/Berkeley/UCB/Millennium Cluster//mm56/cpu/number

i'm thinking grand here.. but i really believe that in the end we will 
creat a true internet overlay which will empower the internet in ways that 
haven't been done before.

so.. this URL only uses a single delimiter "/".  feel free to debate what 
you think this delimited should be..  ':' might be a nice way to do it...

World:US:California:Berkeley:UCB:Millennium::mm56:cpu:number

.. i actually like the look of this a little more.. it's easier to read.

so here is how the URL is read.. i'll call the data between the delimiters 
tokens.
----
1. any token before the double delimiter is considered an organizational 
unit (yes.. i stole that name from LDAP).  [btw, no more cluster or grid 
tags!]

2. the token immediately following the double delimiter is a host

3. all tokens following host are considered metric groups except the last
one which is considered a metric.
----
so for the example above.. the URL points the number of cpus on mm56 in 
the Millennium Cluster at UCBerkeley in Berkeley, California, USA.

at minimum at URL must have a single organizational unit, host, and 
metric.  /foo//bar/baz


here is an example of the XML to show how this URL ties in to the XML.

<ganglia_xml version="3">
<ou name="World">
  <ou name="US">
    <ou name="California">
      <ou name="Berkeley">
        <ou name="UCB">
          <ou name="Millennium Cluster">
            <host name="mm56">
              <mu name="cpu"">
                <metric name="number" value="2"/>
              </mu>
            </host>
          </ou>
        </ou>
      </ou>
    </ou>
  </ou>
</ou>
</ganglia_xml>

this is not complete XML at all.. don't want it to be too busy.. i know
steve wagner could handle seeing all the tags since he likes to read raw
xml streams but i'm not sure about the rest of you.  :)

btw, mu means a "metric unit".  we can change that name but i like how it
matches with organizational unit AND i love the concept of mu from
buddhism matched with the MIU puzzle introduced to me by Hofstadter,
Godel, Escher and Bach.  i ramble. (if you want to learn more google "MU
Puzzle").

so.. let's get back to the delegation model side of things.  

i'm going to start from the inside and work out.

say i have a "cluster" of machines (call it "Cluster A") that are all
multicasting xml messages to each other.  these xml messages have a branch
name (/cpu) and a group of metrics (user,system,nice,idle) [see the xml
wire format above].

when a node in Cluster A gets a message it dynamically generates a URL
based on the info it gets/has: 1. its organization units name "Cluster 
A", 2. the name of the remote host (taken from the message header), 3.
the message unit name (/cpu) and 4. the list of metrics 
(user,system,nice,idle).  for example.. Cluster A::host1:cpu:user or
Cluster A::host2:cpu:nice. 

this URL is used to place the data received into an internal hierarchical
data structure.  it serves as a "key" for placing the data into the
structure.  btw, this is not vaporware.  i have the tree library working
now and it is capable of about 65,000 inserts/sec.  (god i hope we are
never sending that many messages a second but it's nice to know it's not a
bottleneck).  these trees also allow concurrent threads to operate on the
data (insert/delete/update) and grows/shrinks as necessary.  it's alive!!  
mwahhhahhaha.  really though... i'm happy with how it is coming together..
i've ran the code through mpatrol and valgrind with happy results as well.

so this gmond shares all the data from it's internal tree structure as 
xml to upstream apps (namely gmetad).

so.. say we have a gmetad which is monitoring "Cluster A" .. along with 
"Cluster B" and "Cluster C".  let's say the admin wants these three 
clusters to be organized as the "Computer Science Clusters".. whatever.
this gmetad would translate the xml it gets from each Cluster directly 
into its internal tree structure.. prefixing the remote URLs with 
"Computer Science Clusters".

e.g.
Computer Science Clusters:Cluster A::host1:cpu:nice
Computer Science Clusters:Cluster C::host4:cpu:idle

etc etc etc.

gmetad doesn't simply aggregate though.  it summarizes and delegates.  
gmetad pushes summary info upstream with links (like hyperlinks) which 
point to get where to get more detailed information.

so that getting info on

World:US:California:Berkeley:UCB:Millennium Cluster::mm56:cpu:number

would require talk to the "World" gmetad which would give a quick summary 
of the world and then point you to the "US" gmetad.  the "US" gmetad would 
give you its summary and then point you to the "California" gmetad.. etc 
etc etc.  

so having a delegation model is important if we're going to scale the 
world (since it lends information "routing").. we also need to support 
some type of xml filtering (meaning gmetad will have to be interactive).

i wish XPath/XQuery was mature and there was nice multi-platform support.  
i don't see that right now and i'm not sure how long it will be until it
happens.  most of the good XQuery stuff out there is written in Java. i
don't know if we want to start developing Java code.  maybe ...

i thinking POSIX regular expressions might be the way to go...

there would be two regex components.. one for the organizational units and 
one for the host and metrics.  all talk more about this later.. ya know.. 
i'm sick of talking right now.  

i should have the g3 house ready to move into very soon... with a nice 
tree in the front yard.
-- 
matt

[Ganglia-developers] g3 (really long)

Reply via email to