On 12/06/2013 10:51 PM, Devon H. O'Dell wrote:
> 2013/12/6 Vladimir Vuksan <vli...@veus.hr>:
>> Hello everyone,
Hi!

>> For few weeks now we have had performance issues due to growth of
>> our monitoring setup. One of my colleagues Devon O'Dell volunteered
>> to help and below is an e-mail of his findings.
> 
> Hi! I joined the ML, so I'm around to answer questions. Nice to
> 'meet' you guys!
Thank you for your work! I have also some questions/ideas also but i am
still struggling with the internal gmond structures so it may take a
while until i can contribute also myself (plus i am not a programmer by
profession)


So:
You said that you are using a gmond to collect data from every machine.
The problem with the current implementation of gmond is that:
1. cannot be used for aggregation only (no metrics from localhost)
2. the cluster tagging is done at xml reporting level not at host level.

It would be nice to have possibility to have gmond aggregators that just
pass along a collection of metrics from multiple machines.
Also if the cluster tagging would be made at gmond reporting level it
would be possible to aggregate in an gmond metrics from different
clusters and gmetad would just write each metrics bundle in the
corresponding cluster space.

Moreover (it was discussed on the list without a clear conclusion) it
would be great if there can be introduced in gmond an UUID ID (without
regard of method of generation: from hardware or random generated)
that would be the actual key for identifying a machine.
It would be enough to have in gmond.conf in host section something like:
uuid = "some_uuid"
and
move override_hostname from globals to host in a form of an list
override_hostname_list="list_of names"
that would be reported to gmetad as a list of aliases (alongside the
reverse DNS result)
This will have the effect that the host be be search also by any of
former or present hostnames (resolved of not by DNS)

> Ganglia performance, but most of the low hanging fruit is now gone; at
> some me point it will require:
>
>  * writing a version of librrd (this probably also means changing the
> rrd file format),
We (ALICE experiment from CERN) use an tool named MonaLisa
(http://monalisa.caltech.edu) written in java that can take in many
hundredths of thousands of metrics and written them in postgres database.
One obvious advantage would be that there is no need of summarizing at
recording stage and also that you have access to the precise metrics
without losing information because of averaging.

Wouldn't be possible to adapt the gmetad to write the data in a postgres
database? One side effect would be that gweb can easily be on other
server (for security and load separation purposes) and make reports from
the database (also with the averaging mechanism implemented at reporting
level)

>  * replacing the hash table in Ganglia with one that performs better,
>  * changing the data serialization format from XML to one that is
> easier faster to parse,
i could just speaking nonsense as i dont understand exactly where is the
hash table is used (at the metrics collection step by gmond or gmetad?)
but couldn't be used for all communication the same xdr format (and
maybe the communication can be improved by using zeromq?)
(also with some standalone cli tool that would read and process the
output of an gmond). This would remove the need of an xml output, and
with the cli tool also would be the possibility of text human inspection
of the metrics. (eventually with the conversion to xml done by cli tool)

>  * using a different data structure than a hash table for metrics
> hierarchies (probably a tree with metrics stored at each level in
> contiguous memory and an index describing each metric at each level)
postgres tables?


>  * refactoring gmetad and gmond into a single process that shares memory

i dont think is a good idea as there are processes with different
functionality in mind. that would make an process very heavy even if you
dont start the gmetad part. (and basically what ganglia is excelling is
as a simple, light weight and robust agent based monitoring tool)

I would want to help if its possible but i would need also some mentoring.

Thank you!
Adrian

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
Sponsored by Intel(R) XDK 
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to