IMO, we need to make sure to decouple stats collections/updates and
display/visualization, so that people who want to write plugins for
existing monitoring systems like nagios, zenoss, ganglia or their
internal systems can do it without too much trouble. Providing plugins
for popular monitoring systems might be a better use of our time than
reinventing the rrd graph support, IMHO.

On Wed, Apr 21, 2010 at 5:38 PM, Sanjit Jhala <sjha...@gmail.com> wrote:
> Heres a little more detail on the design with emphasis on stats collection
> (as opposed to visualization)
>
> 1. Every minute (configable) the monitoring stats thread wakes up and issues
> a "COMMAND_GET_STATISTICS" call (protocol needs to be modified to include a
> reset flag)
>
> 2. The RangeServer (RS) gathers all the required stats and keeps the last
> observed values of the stats around.
>
> 3. RS sends back enum tagged stats back to the master. Enum tagging will
> allow us to easily add stats in future. To avoid unnecessary network load we
> want the RS to send back as little information as possible. Thus for stats
> which have not changed or might be zero the RS will not send back any value.
> Depending on the stat a missing value will be assumed to be zero (eg:QPS,
> read/write rate etc.) or will mean "no change from last recorded value"
> (eg:disk usage). RangeServer sends back per RS stats and per range stats
> (including individual access group info for things like disk/mem usage etc)
>
> 4. The Master constructs the entire set of stats filling in missing values
> with zero/last known value as appropriate. (fill in all zeros if a RS
> doesn't respond). Master also aggregates per table stats from the individual
> range stats reported by RSs.
>
> 5. Per RS stats: Master stores RS stats in RRDTool with one RRD for each
> server and stat bundle. For example say we start off with tracking 5 stats
> then they would go under run/monitoring/$SERVER_ID/
> stats_0.rrd. Now if we add a few more stats in a future release they would
> go under run/monitoring/$SERVER_ID/stats_1.rrd. The front end visualization
> tool would have to me modified to plot the new stats but this way we allow
> for backward compatibility.
>
> 6. Per Table stats: Master stores the last 10 mins of per table stats in
> memory and writes it out to disk under run/monitoring/table_stats.  (Do this
> atomically by writing to tmp file and then do mv tmp_file table_stata). We
> want to avoid storing per table stats in separate RRDs since this would not
> scale well for a large number of tables. Also, keeping more complete sets of
> stats on the Master could be a problem for the same reason. We could develop
> some tool in the future to roll up this data efficiently.
>
> 7. Master spits out all raw stats (ie per RS stats and per table stats) into
> a log file which can be rolled using a tool like cronolog and pruned via a
> cron job. In future we could have a tool which does something useful with
> this raw data.
>
> -Sanjit
>
> On Wed, Apr 21, 2010 at 5:24 PM, Sanjit Jhala <sjha...@gmail.com> wrote:
>>
>> Hi Joel,
>>
>> Thats a good point. For the per RangeServer stats we plan to use RRDTool
>> to store (and possibly graph) the stats. Looking into this a bit more, it
>> looks like RRDTool is perfect for multi resolution sample storage. So we
>> could have
>> res 0 (highest res): store for last 24 hrs
>> res 1: last 7 days
>> res 2: last month
>> res 3: last 18 months ?
>>
>> From the docs it looks like RRDTool is pretty convenient and flexible when
>> it comes to rolling up samples in this way. Also I think these values will
>> probably not be configurable since it seems like modifying an existing RRD
>> is non-trivial.
>>
>> -Sanjit
>>
>>
>> On Wed, Apr 21, 2010 at 2:38 PM, Joel Pitt <joel.p...@gmail.com> wrote:
>>>
>>> On Thu, Apr 22, 2010 at 4:39 AM, Vincent <vjchar...@gmail.com> wrote:
>>> > How much history sounds reasonable? In an email with Sanjit he
>>> > mentioned the following.
>>> > "These can all be user specified properties with some reasonable
>>> > default values. My guess is high res data for last 24 hrs and lower
>>> > res data for the last 7 days are reasonable defaults."
>>>
>>> I think lower res data should available for at least a month by
>>> default, so that people have at least some change to pick up on weekly
>>> trends. This of course depends just how much data gets generated, but
>>> having enough so that auto-correlation methods can be used to pick up
>>> temporal patterns would be nice.
>>>
>>> -Joel
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Hypertable User" group.
>>> To post to this group, send email to hypertable-u...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> hypertable-user+unsubscr...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/hypertable-user?hl=en.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Hypertable Development" group.
> To post to this group, send email to hypertable-...@googlegroups.com.
> To unsubscribe from this group, send email to
> hypertable-dev+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/hypertable-dev?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to hypertable-...@googlegroups.com.
To unsubscribe from this group, send email to 
hypertable-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

Reply via email to