Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-25 Thread Carlo Marcelo Arenas Belon
On Thu, Dec 24, 2009 at 12:10:51PM +, Daniel Pocock wrote:
> Vladimir Vuksan wrote:
>>
>> The issue is value of this data. If these were financial transactions  
>> than no loss would be acceptable however these are not. They are  
>> performance, trending data which get "averaged" down as time goes by  
>> so loss of couple hours or even days of data is not tragic.
>
> I agree - it doesn't have to be perfect.

still the current implementation has ways to go and should be most likely
expanded for more data reliability as far as it doesn't cost to much.

> To come back to my own requirement though, it is about horizontal  
> scalability.  Let's say you have a hypothetical big enterprise that has  
> just decided to adopt Ganglia as a universal solution on every node in  
> every data center globally, including subsidiary companies, etc.
>
> No one really wants to manually map individual servers to clusters and  
> gmetad servers.  They want plug-and-play.

the currently federated model of gmetad helps slightly in that respect
as you would expect each one of the independent offices/units/datacenters
would have 1 gmetad locally (as far as it is big enough to handle the load)
to collect and aggregate data and 1 central gmetad that connects to all
the leaves for the centralized view.

of course you can also have more than 1 gmetad (even 1 per cluster per
location) and make the gmetad hierarchy tree a little larger.
 
> They just want to allocate some storage and gmetad hardware in each main  
> data center, plug them in, and watch the graphs appear.  If the CPU or  
> IO load gets too high on some of the gmetad servers in a particular  
> location, they want to re-distribute the load over the others in that  
> location.  When the IO load gets too high on all of the gmetads, they  
> want to be able to scale horizontally - add an extra 1 or 2 gmetad  
> servers and see the load distributed between them.

horizontal scalability like these would be ideal, but again, the added
complexity cost might be difficult to assimilate.

> Maybe this sounds a little bit like a Christmas wish-list, but does  
> anyone else feel that this is a valid requirement?  Imagine something  
> even bigger - if a state or national government decided to deploy the  
> gmond agent throughout all their departments in an effort to gather  
> utilization data - would it scale?  Would it be easy enough for a  
> diverse range of IT departments to just plug it in?

with enough planning and assuming the cluster tree is somehow balanced
it should work fine IMHO, but for very large clusters or ones that span
multiple locations and can't be split logically (clouds) you would soon
run into scalability issues, including as well memory pressure in the
gmond collectors.

> Carlo also made some comments about RDBMS instead of RRD.  This raises a  
> few discussion points:

I meant RDBMs alongside RRDs, as RRDs were specially designed to allow
for an efficient storage and summarization of metrics which is what is
most of the time needed.

For special cases where you need to have all data without any distortion
for a long time, then an ETL process with a RDBMS and some datawharehouse
is better fitted.

The ETL could be as simple as scanning the RRDs periodically and importing
the records into a database, but would be nice if this could be done
directly from gmetad by allowing for hooks during "write RRD" time.

This was indeed, one of the reasons why the python gmetad in trunk had
a modular design, so that a module for doing that could be written if
someone had interest on doing so.

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-24 Thread Daniel Pocock
Vladimir Vuksan wrote:
> On Mon, 21 Dec 2009, Spike Spiegel wrote:
>
>>> a. Get all the rrds (rsync) from gmetad2 before you restart gmetad1
>> which unless you have small amount or data or fast network between the
>> two nodes won't complete before the next write is initiated, meaning
>> they won't be identical.
>
>
> Granted they will never be identical. Even on fastest networks there 
> will be a window of data lost. On fast networks/smaller # of nodes it 
> will be small. On bigger networks a larger window etc.
>
>
>> how do you tell which one has most up to date data?
>
>
> This is in no respect an automatic processes (even though if I really 
> wanted to I could Point proxy to your primary node. If it fails point 
> to secondary or tertiary.
>
>
>> if you really mean "most recent" then both would, because both would
>> have fetched the last reading assuming they are both functional, but
>> gmetad1 would have a hole in its graphs. To me that does not really
>> count as up to date. Up to date would be the one with the most
>> complete data set which you have no way to identify programmatically.
>>
>> Also, assume now gmetad2 fails and both have holes, which one is the
>> most up to date?
>
> That is up to you decide. This is in no way perfect.
>
>
>> I guess it does if I look at it from your perspective which if I
>> understood it correctly implies that:
>> * some data loss doesn't matter
>> * manual interaction to fix things is ok
>>
>> But that isn't my perspective. Scalable (distributed) applications
>> should be able to guarantee by design no data loss in as many cases as
>> possible and not force you to centralized designs or hackery in order
>> to do so.
>>
>> There are ways to make this possible without changes to the current
>> gmetad code by adding a helper webservice that proxies the access to
>> rrd. This way it's perfectly fine to have different locations with
>> different data and the webservice will take care of interrogating one
>> or more gmetads/backends to retrieve the full set and present it to
>> the user. Fully distributed, no data loss. This could be of course
>> built into gmetad by making something like port 8652 access the rrds,
>> but to me that's the wrong path, makes gmetad's code more complicated
>> and it's potentially a functionality that has nothing to do with
>> ganglia and is backend dependent.
>
>
> The issue is value of this data. If these were financial transactions 
> than no loss would be acceptable however these are not. They are 
> performance, trending data which get "averaged" down as time goes by 
> so loss of couple hours or even days of data is not tragic.
>

I agree - it doesn't have to be perfect.

To come back to my own requirement though, it is about horizontal 
scalability.  Let's say you have a hypothetical big enterprise that has 
just decided to adopt Ganglia as a universal solution on every node in 
every data center globally, including subsidiary companies, etc.

No one really wants to manually map individual servers to clusters and 
gmetad servers.  They want plug-and-play.

They just want to allocate some storage and gmetad hardware in each main 
data center, plug them in, and watch the graphs appear.  If the CPU or 
IO load gets too high on some of the gmetad servers in a particular 
location, they want to re-distribute the load over the others in that 
location.  When the IO load gets too high on all of the gmetads, they 
want to be able to scale horizontally - add an extra 1 or 2 gmetad 
servers and see the load distributed between them.

Maybe this sounds a little bit like a Christmas wish-list, but does 
anyone else feel that this is a valid requirement?  Imagine something 
even bigger - if a state or national government decided to deploy the 
gmond agent throughout all their departments in an effort to gather 
utilization data - would it scale?  Would it be easy enough for a 
diverse range of IT departments to just plug it in?

Carlo also made some comments about RDBMS instead of RRD.  This raises a 
few discussion points:

a) An RDBMS shared by multiple gmetads could provide a suitable locking 
mechanism for each gmetad to show which clusters it is polling, thereby 
co-ordinating access to RRD files on SAN.  The list of clusters would be 
kept in a table, and if one gmetad could no longer poll a particular 
cluster (maybe to reduce IO load), it would lock the table, remove it's 
name from that row, and unlock the table.  Another gmetad could then 
lock the table, update the row with it's name, unlock again.

b) As for metric storage in RRD, I personally believe the RRD algorithm 
and API is quite appropriate for the type of data.  The question then is 
should gmetad write metrics directly to an RDBMS, or should rrdtool be 
modified to use RDBMS as a back end?  The whole RRD file structure could 
be represented in a series of database tables.  The individual RRA 
regions of the file could be mapped to BLOBs, or the data samples could 
be s

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-21 Thread Vladimir Vuksan
On Mon, 21 Dec 2009, Spike Spiegel wrote:

>> a. Get all the rrds (rsync) from gmetad2 before you restart gmetad1
> which unless you have small amount or data or fast network between the
> two nodes won't complete before the next write is initiated, meaning
> they won't be identical.


Granted they will never be identical. Even on fastest networks there will 
be a window of data lost. On fast networks/smaller # of nodes it will be 
small. On bigger networks a larger window etc.


> how do you tell which one has most up to date data?


This is in no respect an automatic processes (even though if I really 
wanted to I could Point proxy to your primary node. If it fails point to 
secondary or tertiary.


> if you really mean "most recent" then both would, because both would
> have fetched the last reading assuming they are both functional, but
> gmetad1 would have a hole in its graphs. To me that does not really
> count as up to date. Up to date would be the one with the most
> complete data set which you have no way to identify programmatically.
>
> Also, assume now gmetad2 fails and both have holes, which one is the
> most up to date?

That is up to you decide. This is in no way perfect.


> I guess it does if I look at it from your perspective which if I
> understood it correctly implies that:
> * some data loss doesn't matter
> * manual interaction to fix things is ok
>
> But that isn't my perspective. Scalable (distributed) applications
> should be able to guarantee by design no data loss in as many cases as
> possible and not force you to centralized designs or hackery in order
> to do so.
>
> There are ways to make this possible without changes to the current
> gmetad code by adding a helper webservice that proxies the access to
> rrd. This way it's perfectly fine to have different locations with
> different data and the webservice will take care of interrogating one
> or more gmetads/backends to retrieve the full set and present it to
> the user. Fully distributed, no data loss. This could be of course
> built into gmetad by making something like port 8652 access the rrds,
> but to me that's the wrong path, makes gmetad's code more complicated
> and it's potentially a functionality that has nothing to do with
> ganglia and is backend dependent.


The issue is value of this data. If these were financial transactions than 
no loss would be acceptable however these are not. They are performance, 
trending data which get "averaged" down as time goes by so loss of couple 
hours or even days of data is not tragic.

I have also seen many projects where we tried to avoid a particular "edge" 
case and in the process introduced a whole lot of new issue that were 
worse than the problem we started with. To this point I have ran 
removespikes.pl on RRDs numerous times to remove spikes in Ganglia data 
and in most cases it has worked yet in couple cases it ended up corrupting 
RRD files so that they couldn't be used by gmetad. Therefore I can 
reasonably forsee something like that happening in your implementation. 
Also I have seen in the past bugs (I remember a multicast bug we reported 
years ago) going unaddressed due to what I can only interpret lack of 
resources.

So if you weigh all the possibilities of things going wrong (and a lot 
can) and the resources available I'd say you are asking for trouble.

Vladimir

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-21 Thread Spike Spiegel
On Sun, Dec 20, 2009 at 7:35 PM, Vladimir Vuksan  wrote:
> If you lose a day or
> two or even a week of trending data that is not gonna be disaster as long
> as that data is present somewhere else.

sure, but where? how would the ganglia frontend tell?

> Thus I proposed a simple solution
> where even if one of the gmetads (gmetad1) fails you can either
>
> a. Get all the rrds (rsync) from gmetad2 before you restart gmetad1

which unless you have small amount or data or fast network between the
two nodes won't complete before the next write is initiated, meaning
they won't be identical.

> b. Simply start up gmetad1 and don't worry about the lost data

sure

> As far as which data is going to be displayed you can do either
>
> 1. Proxy traffic to Ganglia with most up to date data

how do you tell which one has most up to date data?

> 2. Change DNS record to point to Ganglia with most up to date data

same question, which one has most up to date data?

if you really mean "most recent" then both would, because both would
have fetched the last reading assuming they are both functional, but
gmetad1 would have a hole in its graphs. To me that does not really
count as up to date. Up to date would be the one with the most
complete data set which you have no way to identify programmatically.

Also, assume now gmetad2 fails and both have holes, which one is the
most up to date?

> To your last point there are chances that both gmetads fail in quick
> succession however I would think that would be a highly unlikely event.

it doesn't have to be in quick succession to find yourself in a
condition where you have holes in your data and no way to go back,
which is my main point: as much as you can say that no data loss
requirements aren't really a major concern for most people the fact
remains that with the current codebase you can't avoid that situation,
which imho isn't right.

> If you had requirements for such flawless performance you should be able to
> invest resources to resolve it.

I'm sorry, but I don't see it. Even with plenty resources you'd have
to either put some heavy restrictions in place like centralized data
on a SAN, which is not really something you'd want in a distributed
setup, or add plenty hacks to, say for example, replay the content of
rrds to some other place, but even in this case it's pretty quirky.

> Makes sense ?

I guess it does if I look at it from your perspective which if I
understood it correctly implies that:
* some data loss doesn't matter
* manual interaction to fix things is ok

But that isn't my perspective. Scalable (distributed) applications
should be able to guarantee by design no data loss in as many cases as
possible and not force you to centralized designs or hackery in order
to do so.

There are ways to make this possible without changes to the current
gmetad code by adding a helper webservice that proxies the access to
rrd. This way it's perfectly fine to have different locations with
different data and the webservice will take care of interrogating one
or more gmetads/backends to retrieve the full set and present it to
the user. Fully distributed, no data loss. This could be of course
built into gmetad by making something like port 8652 access the rrds,
but to me that's the wrong path, makes gmetad's code more complicated
and it's potentially a functionality that has nothing to do with
ganglia and is backend dependent.

thoughts?

-- 
"Behind every great man there's a great backpack" - B.

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Carlo Marcelo Arenas Belon
On Sun, Dec 20, 2009 at 04:02:36PM +, Spike Spiegel wrote:
> On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon
>  wrote:
> >
> >> b) you can afford to have duplicate storage - if your storage
> >> requirements are huge (retaining a lot of historic data or lot's of data
> >> at short polling intervals), you may not want to duplicate everything
> >
> > if you are planning to store a lot of historic data then you should be
> > using instead some sort of database, not RRDs and so I think this shouldn't
> > be an issue unless you explode the RRAs and try to abuse the RRDs as a RDBMs
> 
> I think there's a middle ground here that'd be interesting to explore,
> altho that's a different thread, but for kicks this is the gist: the
> common pattern for rrd storage is hour/day/month/year and I've always
> found it bogus.

I am sure the defaults provided were completely arbitrary (I think you missed
week) but make sense based on the fact that there were the smallest time unit
of their kind and also that they fit the standard gmond polling rates which
wouldn't accommodate for 1 min or 1 sec.

> In many cases I've needed higher resolution (down to
> the second) for the last 5-20 minutes, then intervals of an hr to a
> couple hrs, then a day to three days and then a week to 3 weeks etc
> etc, which increases your storage requirements, but  is imho not an
> abuse of rrd and still retains the many advantages of rrd over having
> to maintain a RDBMs.

agree, and the fact that it is not easy enough to do or requires a somehow
intrusive maintenance is a bug, but still possible for the reasons you
explain.

> > PS. I like the ideas on this thread, don't get me wrong, just that I agree
> > ?? ??with Vladimir that gmetad and RRDtool are probably not the sweet spot
> > ?? ??(cost wise) for scalability work even if I also agree that the vertical
> > ?? ??scalability of gmetad is suboptimal to say the least.
> 
> sort of. If you're looking at where your resources go to compute and
> deal with large amount of data, I agree. If you look at what it costs
> you or if it's even possible to create a fully scalable and resilient
> ganglia based monitoring infrastructure, I disagree.

not sure what part are you quoting here, but I have the feeling we probably
agree ;)

getting my ganglia developer hat, I dislike the fact that gmetad can't scale
horizontally like all well designed applications should, but the fact that
there is no solution for it to do so yet, means that the complexity involved
on making that change is probably not worth it in most (if not all) the cases
considering that hardware (to the levels needed most of the time) is cheap
anyway, as I really hope there is no one out there running gmetad in some
big iron solution, when some decent "PC" box with enough memory would do
mostly fine.

there are problems as well with the way federation currently works which
require more network bandwith and CPU that should be really needed and that
I would guess we should tackle first, specially considering the increase
of the XML sizes with 3.1 (which also has been worked around too) but for
that (getting my ganglia user hat) would assume most big installations will
stick with 3.0 anyway for now.

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Jesse Becker
On Sun, Dec 20, 2009 at 11:02, Spike Spiegel  wrote:
> On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon
>  wrote:
>>> a) you are only concerned with redundancy and not looking for
>>> scalability - when I say scalability, I refer to the idea of maybe 3 or
>>> more gmetads running in parallel collecting data from huge numbers of agents
>>
>> what is the bottleneck here?, CPUs for polling or IO?, if IO using memory
>> would be most likely all you really need (specially considering RAM is really
>> cheap and RRDs are very small), if CPUs then there might be somethings we
>> can do to help with that, but vertical scalability is what gmetad has, and
>> for that usually means going to a bigger box if you hit the limit on the
>> current one.
>
> Ime cpu isnt' really a problem, the big load is I/O and indeed moving
> the rrds to a ramdisk is the most common solution with pretty decent
> results.

I concur, for the moment. ;-)

If gmetad takes on more duties, in terms of more sophisticated
interactive access, built-in trickery for improving disk IO, etc, then
CPU could become an issue.  However, that's a really big "if," and a
problem for the future.

> I think there's a middle ground here that'd be interesting to explore,
> altho that's a different thread, but for kicks this is the gist: the
> common pattern for rrd storage is hour/day/month/year and I've always
> found it bogus. In many cases I've needed higher resolution (down to
> the second) for the last 5-20 minutes, then intervals of an hr to a
> couple hrs, then a day to three days and then a week to 3 weeks etc
> etc, which increases your storage requirements, but  is imho not an
> abuse of rrd and still retains the many advantages of rrd over having
> to maintain a RDBMs.

The d/w/m/y split is a good *starting point*.  Ganglia needs to ship
with some sort of sensible default configuration that essentially
works for many/most people.  You (singular or plural) are are free to
customize your RRD configuration as policy and storage capacity
require and permit.  Ganglia officially supports this via the RRD
config like in gmetad.conf.   and as your storage system permits.  In
the ideal world, you keep all data, at the highest resolution,
forever, but that usually isn't practical.


-- 
Jesse Becker

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Jesse Becker
On Sun, Dec 20, 2009 at 10:49, Spike Spiegel  wrote:
> On Mon, Dec 14, 2009 at 2:00 AM, Vladimir Vuksan  wrote:
>> I think you guys are complicating much :-). Can't you simply have multiple
>> gmetads in different sites poll a single gmond. That way if one gmetad fails
>> data is still available and updated on the other gmetads. That is what we
>> used to do.
>
> Would you mind explaining me why having multiple gmetads in different
> colos pulling form the same gmond is simpler than the infrastructure I

Conceptually, it may be simpler since you the two gmetad instances can
be considered 100% independent of each other; they just happen to have
the same polling targets.  There's not need, under normal
circumstances, for the two installs to deal with each other.  The
catch is when there is a failure, and you need to bring back one of
the two instances.  You trade simplicity in up-front configuration for
complexity during the recovery.

(Not trying to speak for Vladimir, just tossing in a few comments of my own.)

I do not claim that this is the proper solution for all places, but it
is *a* solution that is good enough for some.


-- 
Jesse Becker

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Spike Spiegel
On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon
 wrote:
>> a) you are only concerned with redundancy and not looking for
>> scalability - when I say scalability, I refer to the idea of maybe 3 or
>> more gmetads running in parallel collecting data from huge numbers of agents
>
> what is the bottleneck here?, CPUs for polling or IO?, if IO using memory
> would be most likely all you really need (specially considering RAM is really
> cheap and RRDs are very small), if CPUs then there might be somethings we
> can do to help with that, but vertical scalability is what gmetad has, and
> for that usually means going to a bigger box if you hit the limit on the
> current one.

Ime cpu isnt' really a problem, the big load is I/O and indeed moving
the rrds to a ramdisk is the most common solution with pretty decent
results.

>
>> b) you can afford to have duplicate storage - if your storage
>> requirements are huge (retaining a lot of historic data or lot's of data
>> at short polling intervals), you may not want to duplicate everything
>
> if you are planning to store a lot of historic data then you should be
> using instead some sort of database, not RRDs and so I think this shouldn't
> be an issue unless you explode the RRAs and try to abuse the RRDs as a RDBMs

I think there's a middle ground here that'd be interesting to explore,
altho that's a different thread, but for kicks this is the gist: the
common pattern for rrd storage is hour/day/month/year and I've always
found it bogus. In many cases I've needed higher resolution (down to
the second) for the last 5-20 minutes, then intervals of an hr to a
couple hrs, then a day to three days and then a week to 3 weeks etc
etc, which increases your storage requirements, but  is imho not an
abuse of rrd and still retains the many advantages of rrd over having
to maintain a RDBMs.

> Carlo
>
> PS. I like the ideas on this thread, don't get me wrong, just that I agree
>    with Vladimir that gmetad and RRDtool are probably not the sweet spot
>    (cost wise) for scalability work even if I also agree that the vertical
>    scalability of gmetad is suboptimal to say the least.

sort of. If you're looking at where your resources go to compute and
deal with large amount of data, I agree. If you look at what it costs
you or if it's even possible to create a fully scalable and resilient
ganglia based monitoring infrastructure, I disagree.

-- 
"Behind every great man there's a great backpack" - B.

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Spike Spiegel
On Mon, Dec 14, 2009 at 2:00 AM, Vladimir Vuksan  wrote:
> I think you guys are complicating much :-). Can't you simply have multiple
> gmetads in different sites poll a single gmond. That way if one gmetad fails
> data is still available and updated on the other gmetads. That is what we
> used to do.

Would you mind explaining me why having multiple gmetads in different
colos pulling form the same gmond is simpler than the infrastructure I
presented in my post? Furthermore, could you please show me how your
simpler solution addresses the problem of bringing back up the gmetad
that failed such has both gmetads would have the same data? And if
that's not what you had in mind, what's your strategy? Which data is
going to be displayed to the user? and what if the first gmetad that
didn't fail now fail while the restored one continues working?

thanks for your clarifications.

-- 
"Behind every great man there's a great backpack" - B.

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-14 Thread Carlo Marcelo Arenas Belon
On Mon, Dec 14, 2009 at 09:26:01AM +, Daniel Pocock wrote:
> Vladimir Vuksan wrote:
> > I think you guys are complicating much :-). Can't you simply have 
> > multiple gmetads in different sites poll a single gmond. That way if 
> > one gmetad fails data is still available and updated on the other 
> > gmetads. That is what we used to do.
> 
> That is a good solution under two conditions:
> 
> a) you are only concerned with redundancy and not looking for 
> scalability - when I say scalability, I refer to the idea of maybe 3 or 
> more gmetads running in parallel collecting data from huge numbers of agents

what is the bottleneck here?, CPUs for polling or IO?, if IO using memory
would be most likely all you really need (specially considering RAM is really
cheap and RRDs are very small), if CPUs then there might be somethings we
can do to help with that, but vertical scalability is what gmetad has, and
for that usually means going to a bigger box if you hit the limit on the
current one.

> b) you can afford to have duplicate storage - if your storage 
> requirements are huge (retaining a lot of historic data or lot's of data 
> at short polling intervals), you may not want to duplicate everything

if you are planning to store a lot of historic data then you should be
using instead some sort of database, not RRDs and so I think this shouldn't
be an issue unless you explode the RRAs and try to abuse the RRDs as a RDBMs

of course that means you have to add a process to gather your metric data
out of the RRDs to begin with and into your RDBMs but there shouldn't be a
need to be concerned with RRDs storage size, when you are most likely going
to be spending a lot more in that RDBMs storage (including snapshots and
mirrors and all those things that make DBAs feel warm inside, regardless of
budget)

Carlo

PS. I like the ideas on this thread, don't get me wrong, just that I agree
with Vladimir that gmetad and RRDtool are probably not the sweet spot
(cost wise) for scalability work even if I also agree that the vertical
scalability of gmetad is suboptimal to say the least.

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-14 Thread Daniel Pocock
Vladimir Vuksan wrote:
> I think you guys are complicating much :-). Can't you simply have 
> multiple gmetads in different sites poll a single gmond. That way if 
> one gmetad fails data is still available and updated on the other 
> gmetads. That is what we used to do.

That is a good solution under two conditions:

a) you are only concerned with redundancy and not looking for 
scalability - when I say scalability, I refer to the idea of maybe 3 or 
more gmetads running in parallel collecting data from huge numbers of agents

b) you can afford to have duplicate storage - if your storage 
requirements are huge (retaining a lot of historic data or lot's of data 
at short polling intervals), you may not want to duplicate everything



--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-13 Thread Vladimir Vuksan
I think you guys are complicating much :-). Can't you simply have multiple 
gmetads in different sites poll a single gmond. That way if one gmetad 
fails data is still available and updated on the other gmetads. That is 
what we used to do.

Vladimir

On Sun, 13 Dec 2009, Spike Spiegel wrote:

> indeed, os resources usage for caching should be tightly controlled.
> RRD does a pretty good job at that, and for example I know people that
> use collectd (which supports multiple output streams) and send data
> both remotely and keep a local copy with different retention policies
> to solve that problem.
>
>> This would be addressed by the use of SAN - there would only be one RRD
>> file, and the gmetad servers would need to be in some agreement so that they
>> both don't try to write the same file at the same time.
>
> sure, but even with a SAN you'd have to add some intelligence to
> gmetad, which from my pov is more than half of the work needed to
> achieve gmetad reliability and redundancy while keeping it's current
> distributed design.

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-13 Thread Spike Spiegel
On Fri, Dec 11, 2009 at 1:34 PM, Daniel Pocock  wrote:
> Thanks for sharing this - could you comment on the total number of RRDs per
> gmetad, and do you use rrdcached?

the largest colo has 140175 rrds and we use the tmpfs + cron hack, no rrdcached.

> I was thinking about gmetads attached to the same SAN, not a remote FS over
> IP.  In a SAN, each gmetad has a physical path to the disk (over fibre
> channel) and there are some filesystems (e.g. GFS) and locking systems (DLM)
> that would allow concurrent access to the raw devices.  If two gmetads mount
> the filesystem concurrently, you could tell one gmetad `stop monitoring
> cluster A, sync the RRDs' and then tell the other gmetad to start monitoring
> cluster A.
>
> DLM is quite a heavyweight locking system (cluster manager and heartbeat
> system required), some enterprises have solutions like Apache Zookeeper
> (Google has one called Chubby) and they can potentially allow the gmetad
> servers to agree on who is polling each cluster.

I see, and while I'm sure this solution works for many people and
might be popular in HPC environments I'm not really keen it's
something we'd want to go with ourselves, we tend to stick to "share
nothing" design which I realize has cons too, but as always it's a
matter of tradeoffs and even an implementation of paxos like chubby is
no silver bullet.

The other thing is of course costs. SANs aren't free and if I'm a
small gig, but for some reasons I actually have a clue and recognize
the importance of instrumenting everything, I wouldn't want to be
forced to having to add shared storage for the purpose of not losing
data.

>> I see two possible solutions:
>> 1. client caching
>> 2. built-in sync feature
>>
>> In 1. gmond would cache data locally if it could not contact the
>> remote end. This imho is the best solution because it helps not only
>> with head failures and maintenance, but possibly addresses a whole
>> bunch of other failure modes too.
>>
>
> The problem with that is that the XML is just a snapshot.  Maybe the XML
> could contain multiple values for each metric, e.g. all values since the
> last poll?  There would need to be some way of limiting memory usage too, so
> that an agent doesn't kill the machine if nothing is polling it.

indeed, os resources usage for caching should be tightly controlled.
RRD does a pretty good job at that, and for example I know people that
use collectd (which supports multiple output streams) and send data
both remotely and keep a local copy with different retention policies
to solve that problem.

> This would be addressed by the use of SAN - there would only be one RRD
> file, and the gmetad servers would need to be in some agreement so that they
> both don't try to write the same file at the same time.

sure, but even with a SAN you'd have to add some intelligence to
gmetad, which from my pov is more than half of the work needed to
achieve gmetad reliability and redundancy while keeping it's current
distributed design.


-- 
"Behind every great man there's a great backpack" - B.

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-11 Thread Daniel Pocock
Spike Spiegel wrote:
> On Wed, Nov 25, 2009 at 4:20 PM, Daniel Pocock  wrote:
>   
>> One problem I've been wondering about recently is the scalability of
>> gmetad/rrdtool.
>> 
>
> [cut]
>
>   
>> In a particularly large organisation, moving around the RRD files as
>> clusters grow could become quite a chore.  Is anyone putting their RRD
>> files on shared storage and/or making other arrangements to load balance
>> between multiple gmetad servers, either for efficiency or fault tolerance?
>> 
>
> We do. We run 8 gmetad servers, 2 in each colo x 3 colos + 2 centrals
> and rrds are stored in ram disk on each node. Nodes are setup with
> unicast and data is sent to both heads in the same colo for fault
> tolerance/redundancy. This is all good until you have a gmetad failure
> or need to perform maintenance on one of the nodes because at that
> point as data stops flowing in you will have to rsync back once you're
> done from the other head and it doesn't matter how you do it (live
> rsync or stop the other head during the sync process) you will lose
> data. That said it could be easily argued that you have no guarantee
> that both heads have the same data to start with because messages are
> udp and there's no guarantee either node will have not lost some data
> the other hasn't. Of course there is a noticeable difference between a
> random message loss and a say 15 window blackout during maintenance,
> but then if your partitions are small enough a live rsync could
> possibly incur in a small enough loss... it really depends.
>   
Thanks for sharing this - could you comment on the total number of RRDs 
per gmetad, and do you use rrdcached?
> As to share storage we haven't tried but my personal experience is
> that given how a local filesystem can't manage that many small writes
> and seeks using any kind of remote FS isn't going to work.
>   
I was thinking about gmetads attached to the same SAN, not a remote FS 
over IP.  In a SAN, each gmetad has a physical path to the disk (over 
fibre channel) and there are some filesystems (e.g. GFS) and locking 
systems (DLM) that would allow concurrent access to the raw devices.  If 
two gmetads mount the filesystem concurrently, you could tell one gmetad 
`stop monitoring cluster A, sync the RRDs' and then tell the other 
gmetad to start monitoring cluster A.

DLM is quite a heavyweight locking system (cluster manager and heartbeat 
system required), some enterprises have solutions like Apache Zookeeper 
(Google has one called Chubby) and they can potentially allow the gmetad 
servers to agree on who is polling each cluster.
> I see two possible solutions:
> 1. client caching
> 2. built-in sync feature
>
> In 1. gmond would cache data locally if it could not contact the
> remote end. This imho is the best solution because it helps not only
> with head failures and maintenance, but possibly addresses a whole
> bunch of other failure modes too.
>   
The problem with that is that the XML is just a snapshot.  Maybe the XML 
could contain multiple values for each metric, e.g. all values since the 
last poll?  There would need to be some way of limiting memory usage 
too, so that an agent doesn't kill the machine if nothing is polling it.
> 2. instead would make gmetad aware of when it got data last and be
> able to ask another gmetad for its missing data and keep fetching
> until the delta (data loss) is small enough (user configured) that it
> can again receive data from clients. This is probably harder to
> implement and still would not guarantee no data loss, but I don't
> think that's a goal. The interesting property of this approach is that
> it'd open the door for realtime merge of data from multiple gmetads so
> that as long that at least one node has received a message a client
> wouldn't ever see a gap effectively providing no data loss. I'm toying
> with this solution in a personal non-ganglia related project as it's
> applicable to anything with data stored in rrd over multiple
> locations.
>   
This would be addressed by the use of SAN - there would only be one RRD 
file, and the gmetad servers would need to be in some agreement so that 
they both don't try to write the same file at the same time.



--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-05 Thread Spike Spiegel
On Wed, Nov 25, 2009 at 4:20 PM, Daniel Pocock  wrote:
> One problem I've been wondering about recently is the scalability of
> gmetad/rrdtool.

[cut]

> In a particularly large organisation, moving around the RRD files as
> clusters grow could become quite a chore.  Is anyone putting their RRD
> files on shared storage and/or making other arrangements to load balance
> between multiple gmetad servers, either for efficiency or fault tolerance?

We do. We run 8 gmetad servers, 2 in each colo x 3 colos + 2 centrals
and rrds are stored in ram disk on each node. Nodes are setup with
unicast and data is sent to both heads in the same colo for fault
tolerance/redundancy. This is all good until you have a gmetad failure
or need to perform maintenance on one of the nodes because at that
point as data stops flowing in you will have to rsync back once you're
done from the other head and it doesn't matter how you do it (live
rsync or stop the other head during the sync process) you will lose
data. That said it could be easily argued that you have no guarantee
that both heads have the same data to start with because messages are
udp and there's no guarantee either node will have not lost some data
the other hasn't. Of course there is a noticeable difference between a
random message loss and a say 15 window blackout during maintenance,
but then if your partitions are small enough a live rsync could
possibly incur in a small enough loss... it really depends.

As to share storage we haven't tried but my personal experience is
that given how a local filesystem can't manage that many small writes
and seeks using any kind of remote FS isn't going to work.

I see two possible solutions:
1. client caching
2. built-in sync feature

In 1. gmond would cache data locally if it could not contact the
remote end. This imho is the best solution because it helps not only
with head failures and maintenance, but possibly addresses a whole
bunch of other failure modes too.
2. instead would make gmetad aware of when it got data last and be
able to ask another gmetad for its missing data and keep fetching
until the delta (data loss) is small enough (user configured) that it
can again receive data from clients. This is probably harder to
implement and still would not guarantee no data loss, but I don't
think that's a goal. The interesting property of this approach is that
it'd open the door for realtime merge of data from multiple gmetads so
that as long that at least one node has received a message a client
wouldn't ever see a gap effectively providing no data loss. I'm toying
with this solution in a personal non-ganglia related project as it's
applicable to anything with data stored in rrd over multiple
locations.

thanks

-- 
"Behind every great man there's a great backpack" - B.

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers