Re: [Ganglia-developers] gmetad and rrdtool scalability
On Thu, Dec 24, 2009 at 12:10:51PM +, Daniel Pocock wrote: > Vladimir Vuksan wrote: >> >> The issue is value of this data. If these were financial transactions >> than no loss would be acceptable however these are not. They are >> performance, trending data which get "averaged" down as time goes by >> so loss of couple hours or even days of data is not tragic. > > I agree - it doesn't have to be perfect. still the current implementation has ways to go and should be most likely expanded for more data reliability as far as it doesn't cost to much. > To come back to my own requirement though, it is about horizontal > scalability. Let's say you have a hypothetical big enterprise that has > just decided to adopt Ganglia as a universal solution on every node in > every data center globally, including subsidiary companies, etc. > > No one really wants to manually map individual servers to clusters and > gmetad servers. They want plug-and-play. the currently federated model of gmetad helps slightly in that respect as you would expect each one of the independent offices/units/datacenters would have 1 gmetad locally (as far as it is big enough to handle the load) to collect and aggregate data and 1 central gmetad that connects to all the leaves for the centralized view. of course you can also have more than 1 gmetad (even 1 per cluster per location) and make the gmetad hierarchy tree a little larger. > They just want to allocate some storage and gmetad hardware in each main > data center, plug them in, and watch the graphs appear. If the CPU or > IO load gets too high on some of the gmetad servers in a particular > location, they want to re-distribute the load over the others in that > location. When the IO load gets too high on all of the gmetads, they > want to be able to scale horizontally - add an extra 1 or 2 gmetad > servers and see the load distributed between them. horizontal scalability like these would be ideal, but again, the added complexity cost might be difficult to assimilate. > Maybe this sounds a little bit like a Christmas wish-list, but does > anyone else feel that this is a valid requirement? Imagine something > even bigger - if a state or national government decided to deploy the > gmond agent throughout all their departments in an effort to gather > utilization data - would it scale? Would it be easy enough for a > diverse range of IT departments to just plug it in? with enough planning and assuming the cluster tree is somehow balanced it should work fine IMHO, but for very large clusters or ones that span multiple locations and can't be split logically (clouds) you would soon run into scalability issues, including as well memory pressure in the gmond collectors. > Carlo also made some comments about RDBMS instead of RRD. This raises a > few discussion points: I meant RDBMs alongside RRDs, as RRDs were specially designed to allow for an efficient storage and summarization of metrics which is what is most of the time needed. For special cases where you need to have all data without any distortion for a long time, then an ETL process with a RDBMS and some datawharehouse is better fitted. The ETL could be as simple as scanning the RRDs periodically and importing the records into a database, but would be nice if this could be done directly from gmetad by allowing for hooks during "write RRD" time. This was indeed, one of the reasons why the python gmetad in trunk had a modular design, so that a module for doing that could be written if someone had interest on doing so. Carlo -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
Vladimir Vuksan wrote: > On Mon, 21 Dec 2009, Spike Spiegel wrote: > >>> a. Get all the rrds (rsync) from gmetad2 before you restart gmetad1 >> which unless you have small amount or data or fast network between the >> two nodes won't complete before the next write is initiated, meaning >> they won't be identical. > > > Granted they will never be identical. Even on fastest networks there > will be a window of data lost. On fast networks/smaller # of nodes it > will be small. On bigger networks a larger window etc. > > >> how do you tell which one has most up to date data? > > > This is in no respect an automatic processes (even though if I really > wanted to I could Point proxy to your primary node. If it fails point > to secondary or tertiary. > > >> if you really mean "most recent" then both would, because both would >> have fetched the last reading assuming they are both functional, but >> gmetad1 would have a hole in its graphs. To me that does not really >> count as up to date. Up to date would be the one with the most >> complete data set which you have no way to identify programmatically. >> >> Also, assume now gmetad2 fails and both have holes, which one is the >> most up to date? > > That is up to you decide. This is in no way perfect. > > >> I guess it does if I look at it from your perspective which if I >> understood it correctly implies that: >> * some data loss doesn't matter >> * manual interaction to fix things is ok >> >> But that isn't my perspective. Scalable (distributed) applications >> should be able to guarantee by design no data loss in as many cases as >> possible and not force you to centralized designs or hackery in order >> to do so. >> >> There are ways to make this possible without changes to the current >> gmetad code by adding a helper webservice that proxies the access to >> rrd. This way it's perfectly fine to have different locations with >> different data and the webservice will take care of interrogating one >> or more gmetads/backends to retrieve the full set and present it to >> the user. Fully distributed, no data loss. This could be of course >> built into gmetad by making something like port 8652 access the rrds, >> but to me that's the wrong path, makes gmetad's code more complicated >> and it's potentially a functionality that has nothing to do with >> ganglia and is backend dependent. > > > The issue is value of this data. If these were financial transactions > than no loss would be acceptable however these are not. They are > performance, trending data which get "averaged" down as time goes by > so loss of couple hours or even days of data is not tragic. > I agree - it doesn't have to be perfect. To come back to my own requirement though, it is about horizontal scalability. Let's say you have a hypothetical big enterprise that has just decided to adopt Ganglia as a universal solution on every node in every data center globally, including subsidiary companies, etc. No one really wants to manually map individual servers to clusters and gmetad servers. They want plug-and-play. They just want to allocate some storage and gmetad hardware in each main data center, plug them in, and watch the graphs appear. If the CPU or IO load gets too high on some of the gmetad servers in a particular location, they want to re-distribute the load over the others in that location. When the IO load gets too high on all of the gmetads, they want to be able to scale horizontally - add an extra 1 or 2 gmetad servers and see the load distributed between them. Maybe this sounds a little bit like a Christmas wish-list, but does anyone else feel that this is a valid requirement? Imagine something even bigger - if a state or national government decided to deploy the gmond agent throughout all their departments in an effort to gather utilization data - would it scale? Would it be easy enough for a diverse range of IT departments to just plug it in? Carlo also made some comments about RDBMS instead of RRD. This raises a few discussion points: a) An RDBMS shared by multiple gmetads could provide a suitable locking mechanism for each gmetad to show which clusters it is polling, thereby co-ordinating access to RRD files on SAN. The list of clusters would be kept in a table, and if one gmetad could no longer poll a particular cluster (maybe to reduce IO load), it would lock the table, remove it's name from that row, and unlock the table. Another gmetad could then lock the table, update the row with it's name, unlock again. b) As for metric storage in RRD, I personally believe the RRD algorithm and API is quite appropriate for the type of data. The question then is should gmetad write metrics directly to an RDBMS, or should rrdtool be modified to use RDBMS as a back end? The whole RRD file structure could be represented in a series of database tables. The individual RRA regions of the file could be mapped to BLOBs, or the data samples could be s
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Mon, 21 Dec 2009, Spike Spiegel wrote: >> a. Get all the rrds (rsync) from gmetad2 before you restart gmetad1 > which unless you have small amount or data or fast network between the > two nodes won't complete before the next write is initiated, meaning > they won't be identical. Granted they will never be identical. Even on fastest networks there will be a window of data lost. On fast networks/smaller # of nodes it will be small. On bigger networks a larger window etc. > how do you tell which one has most up to date data? This is in no respect an automatic processes (even though if I really wanted to I could Point proxy to your primary node. If it fails point to secondary or tertiary. > if you really mean "most recent" then both would, because both would > have fetched the last reading assuming they are both functional, but > gmetad1 would have a hole in its graphs. To me that does not really > count as up to date. Up to date would be the one with the most > complete data set which you have no way to identify programmatically. > > Also, assume now gmetad2 fails and both have holes, which one is the > most up to date? That is up to you decide. This is in no way perfect. > I guess it does if I look at it from your perspective which if I > understood it correctly implies that: > * some data loss doesn't matter > * manual interaction to fix things is ok > > But that isn't my perspective. Scalable (distributed) applications > should be able to guarantee by design no data loss in as many cases as > possible and not force you to centralized designs or hackery in order > to do so. > > There are ways to make this possible without changes to the current > gmetad code by adding a helper webservice that proxies the access to > rrd. This way it's perfectly fine to have different locations with > different data and the webservice will take care of interrogating one > or more gmetads/backends to retrieve the full set and present it to > the user. Fully distributed, no data loss. This could be of course > built into gmetad by making something like port 8652 access the rrds, > but to me that's the wrong path, makes gmetad's code more complicated > and it's potentially a functionality that has nothing to do with > ganglia and is backend dependent. The issue is value of this data. If these were financial transactions than no loss would be acceptable however these are not. They are performance, trending data which get "averaged" down as time goes by so loss of couple hours or even days of data is not tragic. I have also seen many projects where we tried to avoid a particular "edge" case and in the process introduced a whole lot of new issue that were worse than the problem we started with. To this point I have ran removespikes.pl on RRDs numerous times to remove spikes in Ganglia data and in most cases it has worked yet in couple cases it ended up corrupting RRD files so that they couldn't be used by gmetad. Therefore I can reasonably forsee something like that happening in your implementation. Also I have seen in the past bugs (I remember a multicast bug we reported years ago) going unaddressed due to what I can only interpret lack of resources. So if you weigh all the possibilities of things going wrong (and a lot can) and the resources available I'd say you are asking for trouble. Vladimir -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Sun, Dec 20, 2009 at 7:35 PM, Vladimir Vuksan wrote: > If you lose a day or > two or even a week of trending data that is not gonna be disaster as long > as that data is present somewhere else. sure, but where? how would the ganglia frontend tell? > Thus I proposed a simple solution > where even if one of the gmetads (gmetad1) fails you can either > > a. Get all the rrds (rsync) from gmetad2 before you restart gmetad1 which unless you have small amount or data or fast network between the two nodes won't complete before the next write is initiated, meaning they won't be identical. > b. Simply start up gmetad1 and don't worry about the lost data sure > As far as which data is going to be displayed you can do either > > 1. Proxy traffic to Ganglia with most up to date data how do you tell which one has most up to date data? > 2. Change DNS record to point to Ganglia with most up to date data same question, which one has most up to date data? if you really mean "most recent" then both would, because both would have fetched the last reading assuming they are both functional, but gmetad1 would have a hole in its graphs. To me that does not really count as up to date. Up to date would be the one with the most complete data set which you have no way to identify programmatically. Also, assume now gmetad2 fails and both have holes, which one is the most up to date? > To your last point there are chances that both gmetads fail in quick > succession however I would think that would be a highly unlikely event. it doesn't have to be in quick succession to find yourself in a condition where you have holes in your data and no way to go back, which is my main point: as much as you can say that no data loss requirements aren't really a major concern for most people the fact remains that with the current codebase you can't avoid that situation, which imho isn't right. > If you had requirements for such flawless performance you should be able to > invest resources to resolve it. I'm sorry, but I don't see it. Even with plenty resources you'd have to either put some heavy restrictions in place like centralized data on a SAN, which is not really something you'd want in a distributed setup, or add plenty hacks to, say for example, replay the content of rrds to some other place, but even in this case it's pretty quirky. > Makes sense ? I guess it does if I look at it from your perspective which if I understood it correctly implies that: * some data loss doesn't matter * manual interaction to fix things is ok But that isn't my perspective. Scalable (distributed) applications should be able to guarantee by design no data loss in as many cases as possible and not force you to centralized designs or hackery in order to do so. There are ways to make this possible without changes to the current gmetad code by adding a helper webservice that proxies the access to rrd. This way it's perfectly fine to have different locations with different data and the webservice will take care of interrogating one or more gmetads/backends to retrieve the full set and present it to the user. Fully distributed, no data loss. This could be of course built into gmetad by making something like port 8652 access the rrds, but to me that's the wrong path, makes gmetad's code more complicated and it's potentially a functionality that has nothing to do with ganglia and is backend dependent. thoughts? -- "Behind every great man there's a great backpack" - B. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Sun, Dec 20, 2009 at 04:02:36PM +, Spike Spiegel wrote: > On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon > wrote: > > > >> b) you can afford to have duplicate storage - if your storage > >> requirements are huge (retaining a lot of historic data or lot's of data > >> at short polling intervals), you may not want to duplicate everything > > > > if you are planning to store a lot of historic data then you should be > > using instead some sort of database, not RRDs and so I think this shouldn't > > be an issue unless you explode the RRAs and try to abuse the RRDs as a RDBMs > > I think there's a middle ground here that'd be interesting to explore, > altho that's a different thread, but for kicks this is the gist: the > common pattern for rrd storage is hour/day/month/year and I've always > found it bogus. I am sure the defaults provided were completely arbitrary (I think you missed week) but make sense based on the fact that there were the smallest time unit of their kind and also that they fit the standard gmond polling rates which wouldn't accommodate for 1 min or 1 sec. > In many cases I've needed higher resolution (down to > the second) for the last 5-20 minutes, then intervals of an hr to a > couple hrs, then a day to three days and then a week to 3 weeks etc > etc, which increases your storage requirements, but is imho not an > abuse of rrd and still retains the many advantages of rrd over having > to maintain a RDBMs. agree, and the fact that it is not easy enough to do or requires a somehow intrusive maintenance is a bug, but still possible for the reasons you explain. > > PS. I like the ideas on this thread, don't get me wrong, just that I agree > > ?? ??with Vladimir that gmetad and RRDtool are probably not the sweet spot > > ?? ??(cost wise) for scalability work even if I also agree that the vertical > > ?? ??scalability of gmetad is suboptimal to say the least. > > sort of. If you're looking at where your resources go to compute and > deal with large amount of data, I agree. If you look at what it costs > you or if it's even possible to create a fully scalable and resilient > ganglia based monitoring infrastructure, I disagree. not sure what part are you quoting here, but I have the feeling we probably agree ;) getting my ganglia developer hat, I dislike the fact that gmetad can't scale horizontally like all well designed applications should, but the fact that there is no solution for it to do so yet, means that the complexity involved on making that change is probably not worth it in most (if not all) the cases considering that hardware (to the levels needed most of the time) is cheap anyway, as I really hope there is no one out there running gmetad in some big iron solution, when some decent "PC" box with enough memory would do mostly fine. there are problems as well with the way federation currently works which require more network bandwith and CPU that should be really needed and that I would guess we should tackle first, specially considering the increase of the XML sizes with 3.1 (which also has been worked around too) but for that (getting my ganglia user hat) would assume most big installations will stick with 3.0 anyway for now. Carlo -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Sun, Dec 20, 2009 at 11:02, Spike Spiegel wrote: > On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon > wrote: >>> a) you are only concerned with redundancy and not looking for >>> scalability - when I say scalability, I refer to the idea of maybe 3 or >>> more gmetads running in parallel collecting data from huge numbers of agents >> >> what is the bottleneck here?, CPUs for polling or IO?, if IO using memory >> would be most likely all you really need (specially considering RAM is really >> cheap and RRDs are very small), if CPUs then there might be somethings we >> can do to help with that, but vertical scalability is what gmetad has, and >> for that usually means going to a bigger box if you hit the limit on the >> current one. > > Ime cpu isnt' really a problem, the big load is I/O and indeed moving > the rrds to a ramdisk is the most common solution with pretty decent > results. I concur, for the moment. ;-) If gmetad takes on more duties, in terms of more sophisticated interactive access, built-in trickery for improving disk IO, etc, then CPU could become an issue. However, that's a really big "if," and a problem for the future. > I think there's a middle ground here that'd be interesting to explore, > altho that's a different thread, but for kicks this is the gist: the > common pattern for rrd storage is hour/day/month/year and I've always > found it bogus. In many cases I've needed higher resolution (down to > the second) for the last 5-20 minutes, then intervals of an hr to a > couple hrs, then a day to three days and then a week to 3 weeks etc > etc, which increases your storage requirements, but is imho not an > abuse of rrd and still retains the many advantages of rrd over having > to maintain a RDBMs. The d/w/m/y split is a good *starting point*. Ganglia needs to ship with some sort of sensible default configuration that essentially works for many/most people. You (singular or plural) are are free to customize your RRD configuration as policy and storage capacity require and permit. Ganglia officially supports this via the RRD config like in gmetad.conf. and as your storage system permits. In the ideal world, you keep all data, at the highest resolution, forever, but that usually isn't practical. -- Jesse Becker -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Sun, Dec 20, 2009 at 10:49, Spike Spiegel wrote: > On Mon, Dec 14, 2009 at 2:00 AM, Vladimir Vuksan wrote: >> I think you guys are complicating much :-). Can't you simply have multiple >> gmetads in different sites poll a single gmond. That way if one gmetad fails >> data is still available and updated on the other gmetads. That is what we >> used to do. > > Would you mind explaining me why having multiple gmetads in different > colos pulling form the same gmond is simpler than the infrastructure I Conceptually, it may be simpler since you the two gmetad instances can be considered 100% independent of each other; they just happen to have the same polling targets. There's not need, under normal circumstances, for the two installs to deal with each other. The catch is when there is a failure, and you need to bring back one of the two instances. You trade simplicity in up-front configuration for complexity during the recovery. (Not trying to speak for Vladimir, just tossing in a few comments of my own.) I do not claim that this is the proper solution for all places, but it is *a* solution that is good enough for some. -- Jesse Becker -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon wrote: >> a) you are only concerned with redundancy and not looking for >> scalability - when I say scalability, I refer to the idea of maybe 3 or >> more gmetads running in parallel collecting data from huge numbers of agents > > what is the bottleneck here?, CPUs for polling or IO?, if IO using memory > would be most likely all you really need (specially considering RAM is really > cheap and RRDs are very small), if CPUs then there might be somethings we > can do to help with that, but vertical scalability is what gmetad has, and > for that usually means going to a bigger box if you hit the limit on the > current one. Ime cpu isnt' really a problem, the big load is I/O and indeed moving the rrds to a ramdisk is the most common solution with pretty decent results. > >> b) you can afford to have duplicate storage - if your storage >> requirements are huge (retaining a lot of historic data or lot's of data >> at short polling intervals), you may not want to duplicate everything > > if you are planning to store a lot of historic data then you should be > using instead some sort of database, not RRDs and so I think this shouldn't > be an issue unless you explode the RRAs and try to abuse the RRDs as a RDBMs I think there's a middle ground here that'd be interesting to explore, altho that's a different thread, but for kicks this is the gist: the common pattern for rrd storage is hour/day/month/year and I've always found it bogus. In many cases I've needed higher resolution (down to the second) for the last 5-20 minutes, then intervals of an hr to a couple hrs, then a day to three days and then a week to 3 weeks etc etc, which increases your storage requirements, but is imho not an abuse of rrd and still retains the many advantages of rrd over having to maintain a RDBMs. > Carlo > > PS. I like the ideas on this thread, don't get me wrong, just that I agree > with Vladimir that gmetad and RRDtool are probably not the sweet spot > (cost wise) for scalability work even if I also agree that the vertical > scalability of gmetad is suboptimal to say the least. sort of. If you're looking at where your resources go to compute and deal with large amount of data, I agree. If you look at what it costs you or if it's even possible to create a fully scalable and resilient ganglia based monitoring infrastructure, I disagree. -- "Behind every great man there's a great backpack" - B. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Mon, Dec 14, 2009 at 2:00 AM, Vladimir Vuksan wrote: > I think you guys are complicating much :-). Can't you simply have multiple > gmetads in different sites poll a single gmond. That way if one gmetad fails > data is still available and updated on the other gmetads. That is what we > used to do. Would you mind explaining me why having multiple gmetads in different colos pulling form the same gmond is simpler than the infrastructure I presented in my post? Furthermore, could you please show me how your simpler solution addresses the problem of bringing back up the gmetad that failed such has both gmetads would have the same data? And if that's not what you had in mind, what's your strategy? Which data is going to be displayed to the user? and what if the first gmetad that didn't fail now fail while the restored one continues working? thanks for your clarifications. -- "Behind every great man there's a great backpack" - B. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Mon, Dec 14, 2009 at 09:26:01AM +, Daniel Pocock wrote: > Vladimir Vuksan wrote: > > I think you guys are complicating much :-). Can't you simply have > > multiple gmetads in different sites poll a single gmond. That way if > > one gmetad fails data is still available and updated on the other > > gmetads. That is what we used to do. > > That is a good solution under two conditions: > > a) you are only concerned with redundancy and not looking for > scalability - when I say scalability, I refer to the idea of maybe 3 or > more gmetads running in parallel collecting data from huge numbers of agents what is the bottleneck here?, CPUs for polling or IO?, if IO using memory would be most likely all you really need (specially considering RAM is really cheap and RRDs are very small), if CPUs then there might be somethings we can do to help with that, but vertical scalability is what gmetad has, and for that usually means going to a bigger box if you hit the limit on the current one. > b) you can afford to have duplicate storage - if your storage > requirements are huge (retaining a lot of historic data or lot's of data > at short polling intervals), you may not want to duplicate everything if you are planning to store a lot of historic data then you should be using instead some sort of database, not RRDs and so I think this shouldn't be an issue unless you explode the RRAs and try to abuse the RRDs as a RDBMs of course that means you have to add a process to gather your metric data out of the RRDs to begin with and into your RDBMs but there shouldn't be a need to be concerned with RRDs storage size, when you are most likely going to be spending a lot more in that RDBMs storage (including snapshots and mirrors and all those things that make DBAs feel warm inside, regardless of budget) Carlo PS. I like the ideas on this thread, don't get me wrong, just that I agree with Vladimir that gmetad and RRDtool are probably not the sweet spot (cost wise) for scalability work even if I also agree that the vertical scalability of gmetad is suboptimal to say the least. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
Vladimir Vuksan wrote: > I think you guys are complicating much :-). Can't you simply have > multiple gmetads in different sites poll a single gmond. That way if > one gmetad fails data is still available and updated on the other > gmetads. That is what we used to do. That is a good solution under two conditions: a) you are only concerned with redundancy and not looking for scalability - when I say scalability, I refer to the idea of maybe 3 or more gmetads running in parallel collecting data from huge numbers of agents b) you can afford to have duplicate storage - if your storage requirements are huge (retaining a lot of historic data or lot's of data at short polling intervals), you may not want to duplicate everything -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
I think you guys are complicating much :-). Can't you simply have multiple gmetads in different sites poll a single gmond. That way if one gmetad fails data is still available and updated on the other gmetads. That is what we used to do. Vladimir On Sun, 13 Dec 2009, Spike Spiegel wrote: > indeed, os resources usage for caching should be tightly controlled. > RRD does a pretty good job at that, and for example I know people that > use collectd (which supports multiple output streams) and send data > both remotely and keep a local copy with different retention policies > to solve that problem. > >> This would be addressed by the use of SAN - there would only be one RRD >> file, and the gmetad servers would need to be in some agreement so that they >> both don't try to write the same file at the same time. > > sure, but even with a SAN you'd have to add some intelligence to > gmetad, which from my pov is more than half of the work needed to > achieve gmetad reliability and redundancy while keeping it's current > distributed design. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Fri, Dec 11, 2009 at 1:34 PM, Daniel Pocock wrote: > Thanks for sharing this - could you comment on the total number of RRDs per > gmetad, and do you use rrdcached? the largest colo has 140175 rrds and we use the tmpfs + cron hack, no rrdcached. > I was thinking about gmetads attached to the same SAN, not a remote FS over > IP. In a SAN, each gmetad has a physical path to the disk (over fibre > channel) and there are some filesystems (e.g. GFS) and locking systems (DLM) > that would allow concurrent access to the raw devices. If two gmetads mount > the filesystem concurrently, you could tell one gmetad `stop monitoring > cluster A, sync the RRDs' and then tell the other gmetad to start monitoring > cluster A. > > DLM is quite a heavyweight locking system (cluster manager and heartbeat > system required), some enterprises have solutions like Apache Zookeeper > (Google has one called Chubby) and they can potentially allow the gmetad > servers to agree on who is polling each cluster. I see, and while I'm sure this solution works for many people and might be popular in HPC environments I'm not really keen it's something we'd want to go with ourselves, we tend to stick to "share nothing" design which I realize has cons too, but as always it's a matter of tradeoffs and even an implementation of paxos like chubby is no silver bullet. The other thing is of course costs. SANs aren't free and if I'm a small gig, but for some reasons I actually have a clue and recognize the importance of instrumenting everything, I wouldn't want to be forced to having to add shared storage for the purpose of not losing data. >> I see two possible solutions: >> 1. client caching >> 2. built-in sync feature >> >> In 1. gmond would cache data locally if it could not contact the >> remote end. This imho is the best solution because it helps not only >> with head failures and maintenance, but possibly addresses a whole >> bunch of other failure modes too. >> > > The problem with that is that the XML is just a snapshot. Maybe the XML > could contain multiple values for each metric, e.g. all values since the > last poll? There would need to be some way of limiting memory usage too, so > that an agent doesn't kill the machine if nothing is polling it. indeed, os resources usage for caching should be tightly controlled. RRD does a pretty good job at that, and for example I know people that use collectd (which supports multiple output streams) and send data both remotely and keep a local copy with different retention policies to solve that problem. > This would be addressed by the use of SAN - there would only be one RRD > file, and the gmetad servers would need to be in some agreement so that they > both don't try to write the same file at the same time. sure, but even with a SAN you'd have to add some intelligence to gmetad, which from my pov is more than half of the work needed to achieve gmetad reliability and redundancy while keeping it's current distributed design. -- "Behind every great man there's a great backpack" - B. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
Spike Spiegel wrote: > On Wed, Nov 25, 2009 at 4:20 PM, Daniel Pocock wrote: > >> One problem I've been wondering about recently is the scalability of >> gmetad/rrdtool. >> > > [cut] > > >> In a particularly large organisation, moving around the RRD files as >> clusters grow could become quite a chore. Is anyone putting their RRD >> files on shared storage and/or making other arrangements to load balance >> between multiple gmetad servers, either for efficiency or fault tolerance? >> > > We do. We run 8 gmetad servers, 2 in each colo x 3 colos + 2 centrals > and rrds are stored in ram disk on each node. Nodes are setup with > unicast and data is sent to both heads in the same colo for fault > tolerance/redundancy. This is all good until you have a gmetad failure > or need to perform maintenance on one of the nodes because at that > point as data stops flowing in you will have to rsync back once you're > done from the other head and it doesn't matter how you do it (live > rsync or stop the other head during the sync process) you will lose > data. That said it could be easily argued that you have no guarantee > that both heads have the same data to start with because messages are > udp and there's no guarantee either node will have not lost some data > the other hasn't. Of course there is a noticeable difference between a > random message loss and a say 15 window blackout during maintenance, > but then if your partitions are small enough a live rsync could > possibly incur in a small enough loss... it really depends. > Thanks for sharing this - could you comment on the total number of RRDs per gmetad, and do you use rrdcached? > As to share storage we haven't tried but my personal experience is > that given how a local filesystem can't manage that many small writes > and seeks using any kind of remote FS isn't going to work. > I was thinking about gmetads attached to the same SAN, not a remote FS over IP. In a SAN, each gmetad has a physical path to the disk (over fibre channel) and there are some filesystems (e.g. GFS) and locking systems (DLM) that would allow concurrent access to the raw devices. If two gmetads mount the filesystem concurrently, you could tell one gmetad `stop monitoring cluster A, sync the RRDs' and then tell the other gmetad to start monitoring cluster A. DLM is quite a heavyweight locking system (cluster manager and heartbeat system required), some enterprises have solutions like Apache Zookeeper (Google has one called Chubby) and they can potentially allow the gmetad servers to agree on who is polling each cluster. > I see two possible solutions: > 1. client caching > 2. built-in sync feature > > In 1. gmond would cache data locally if it could not contact the > remote end. This imho is the best solution because it helps not only > with head failures and maintenance, but possibly addresses a whole > bunch of other failure modes too. > The problem with that is that the XML is just a snapshot. Maybe the XML could contain multiple values for each metric, e.g. all values since the last poll? There would need to be some way of limiting memory usage too, so that an agent doesn't kill the machine if nothing is polling it. > 2. instead would make gmetad aware of when it got data last and be > able to ask another gmetad for its missing data and keep fetching > until the delta (data loss) is small enough (user configured) that it > can again receive data from clients. This is probably harder to > implement and still would not guarantee no data loss, but I don't > think that's a goal. The interesting property of this approach is that > it'd open the door for realtime merge of data from multiple gmetads so > that as long that at least one node has received a message a client > wouldn't ever see a gap effectively providing no data loss. I'm toying > with this solution in a personal non-ganglia related project as it's > applicable to anything with data stored in rrd over multiple > locations. > This would be addressed by the use of SAN - there would only be one RRD file, and the gmetad servers would need to be in some agreement so that they both don't try to write the same file at the same time. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad and rrdtool scalability
On Wed, Nov 25, 2009 at 4:20 PM, Daniel Pocock wrote: > One problem I've been wondering about recently is the scalability of > gmetad/rrdtool. [cut] > In a particularly large organisation, moving around the RRD files as > clusters grow could become quite a chore. Is anyone putting their RRD > files on shared storage and/or making other arrangements to load balance > between multiple gmetad servers, either for efficiency or fault tolerance? We do. We run 8 gmetad servers, 2 in each colo x 3 colos + 2 centrals and rrds are stored in ram disk on each node. Nodes are setup with unicast and data is sent to both heads in the same colo for fault tolerance/redundancy. This is all good until you have a gmetad failure or need to perform maintenance on one of the nodes because at that point as data stops flowing in you will have to rsync back once you're done from the other head and it doesn't matter how you do it (live rsync or stop the other head during the sync process) you will lose data. That said it could be easily argued that you have no guarantee that both heads have the same data to start with because messages are udp and there's no guarantee either node will have not lost some data the other hasn't. Of course there is a noticeable difference between a random message loss and a say 15 window blackout during maintenance, but then if your partitions are small enough a live rsync could possibly incur in a small enough loss... it really depends. As to share storage we haven't tried but my personal experience is that given how a local filesystem can't manage that many small writes and seeks using any kind of remote FS isn't going to work. I see two possible solutions: 1. client caching 2. built-in sync feature In 1. gmond would cache data locally if it could not contact the remote end. This imho is the best solution because it helps not only with head failures and maintenance, but possibly addresses a whole bunch of other failure modes too. 2. instead would make gmetad aware of when it got data last and be able to ask another gmetad for its missing data and keep fetching until the delta (data loss) is small enough (user configured) that it can again receive data from clients. This is probably harder to implement and still would not guarantee no data loss, but I don't think that's a goal. The interesting property of this approach is that it'd open the door for realtime merge of data from multiple gmetads so that as long that at least one node has received a message a client wouldn't ever see a gap effectively providing no data loss. I'm toying with this solution in a personal non-ganglia related project as it's applicable to anything with data stored in rrd over multiple locations. thanks -- "Behind every great man there's a great backpack" - B. -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers