Re: GSOC: Monitor Improvements

Miguel Pereira Wed, 24 Jul 2013 07:03:48 -0700

Mike, this might be what your are referring to, maybe not, for time series
visualization.


http://square.github.io/cubism/

Also, I found jmxtrans to be useful when writing metrics to ganglia /
graphite.

Cheers,
Miguel


On Mon, Apr 22, 2013 at 1:50 PM, Keith Turner <[email protected]> wrote:

> On Mon, Apr 22, 2013 at 12:42 PM, Supun Kamburugamuva <[email protected]
> >wrote:
>
> > Great.. we could certainly introduce the graph Mike and Keith have
> > mentioned.
> >
>
> I mentioned that it would be useful to display info collected from clients.
>  Tracing already collects this info.  The graph Mike mentioned may be
> useful for displaying trace info, maybe a plot per a trace field.
>
>
> >
> > Supun..
> >
> >
> > On Mon, Apr 22, 2013 at 12:02 PM, Keith Turner <[email protected]> wrote:
> >
> > > On Mon, Apr 22, 2013 at 11:42 AM, Mike Drob <[email protected]> wrote:
> > >
> > > > Adding on to the comment about summaries, averages, and outliers. If,
> > for
> > > > some reason, you end up with a two-hump population, then simply
> showing
> > > > averages will mask the split and lose a lot of valuable information.
> It
> > > is
> > > > often valuable to know that a particular set of users or servers are
> > > > experiencing degraded performance while the rest of the ecosystem is
> > > > healthy.
> > > >
> > > > This isn't something that shows up in a regular time series because
> the
> > > > secondary population is usually very small compared to the total
> > > > population. There was a graph for request latency of a service that I
> > saw
> > > > once that I really wish I could find again, maybe somebody on the
> list
> > > will
> > > > be able to chime in - It had timestamps on the x-axis, latency on the
> > y,
> > > > and each (x,y) point was colored on a gradient representing how many
> > > > requests were fulfilled at time x with latency y. This chart make it
> > > > immediately easy to see that most data points fit a normal
> distribution
> > > > with a low mean, but there was also a cluster at the top for some
> > reason.
> > > >
> > >
> > >
> > > That sounds really cool.  Maybe the y-axis/latency could be log scale.
> > > Inevitably a 3004 second operation will finish and obscure the
> > > smaller latencies.
> > >
> > > Sometimes its more useful to sample this type of info from the clients
> > > rather than tablet servers.   A tablet server may report low latencies,
> > but
> > > all clients using may experience high latencies because of a network
> > issue.
> > >   We could certainly consider making the client code report this info.
> > >
> > >
> > > >
> > > > I'd love to see that type of chart show up for tablet servers
> (probably
> > > not
> > > > as useful for tables).
> > > >
> > > > Mike
> > > >
> > > >
> > > > On Mon, Apr 22, 2013 at 9:05 AM, Eric Newton <[email protected]>
> > > > wrote:
> > > >
> > > > > Another thing to consider is scale.  On large clusters (many
> hundreds
> > > of
> > > > > nodes), more data is not helpful for visualization.  Instead,
> > > summaries,
> > > > > averages and outliers are important.
> > > > >
> > > > > For example, if one node is consistently slow, it is better to know
> > > that
> > > > > than to see one graph with low numbers in a sea of graphs.
> > > > >
> > > > > If the monitor collects information using JMX, collection time for
> > each
> > > > > node would be a good thing to know, too.
> > > > >
> > > > > -Eric
> > > > >
> > > > >
> > > > > On Sun, Apr 21, 2013 at 10:00 PM, Josh Elser <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > Supun,
> > > > > >
> > > > > > Yup, very much so. Having a way to consume any and all metrics
> via
> > > JMX
> > > > > > would simplify things for any consumers (internal or external).
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 04/21/2013 02:15 PM, Supun Kamburugamuva wrote:
> > > > > >
> > > > > >> Hi Josh,
> > > > > >>
> > > > > >> Thanks for the suggestions. I'll incorporate these to the
> > proposal.
> > > > > >>
> > > > > >> Another area I would like to work is on JMX. There is a Jira
> that
> > > says
> > > > > to
> > > > > >> replace the Monitor calls from Thrift to JMX (Accumulo 694). Do
> > you
> > > > > think
> > > > > >> this is a good addition to the Monitor?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Supun..
> > > > > >>
> > > > > >>
> > > > > >> On Sun, Apr 21, 2013 at 1:45 PM, Josh Elser <
> [email protected]
> > >
> > > > > wrote:
> > > > > >>
> > > > > >>  Supun,
> > > > > >>>
> > > > > >>> Looks good! Can I make some suggestions/comments?
> > > > > >>>
> > > > > >>> For: "Per table plots: ACCUMULO-594", I'd also like to see
> minor
> > > > > >>> compactions, major compactions, index cache hit rate, and data
> > > cache
> > > > > hit
> > > > > >>> rate per table (same graphs that are displayed system-wide when
> > you
> > > > > visit
> > > > > >>> http://${MONITOR_HOST}:50095/.
> > > > > >>>
> > > > > >>> For "Per tablet [server] plots", it would be neat if you could
> > also
> > > > > >>> extract some general statistics like top N least performing,
> top
> > N
> > > > > >>> highest
> > > > > >>> performing, etc. tablet servers. Ideally, this could correlate
> > with
> > > > > >>> servers
> > > > > >>> that may be having problems :).
> > > > > >>>
> > > > > >>> Do you see these proposed changes as being sufficient for 3-4
> > > months
> > > > of
> > > > > >>> 40hrs/week work? If you plan to really dig into these changes
> > > > (perhaps
> > > > > >>> reworking components of the monitor itself), I could perhaps
> see
> > > > this.
> > > > > Do
> > > > > >>> you have any ideas for more lofty goals that you could pursue
> as
> > > > well?
> > > > > I
> > > > > >>> don't want you/us to get one month into things and see you
> > complete
> > > > > >>> everything we initially planned to accomplish :)
> > > > > >>>
> > > > > >>> - Josh
> > > > > >>>
> > > > > >>>
> > > > > >>> On 04/21/2013 10:37 AM, Supun Kamburugamuva wrote:
> > > > > >>>
> > > > > >>>  Hi all,
> > > > > >>>>
> > > > > >>>> I would like to start writing the proposal for the GSoc. I've
> > put
> > > > > >>>> together
> > > > > >>>> some initial high level goals of the project. Please let me
> know
> > > > what
> > > > > I
> > > > > >>>> can
> > > > > >>>> improve.
> > > > > >>>>
> > > > > >>>> Per table plots: Accumulo 594
> > > > > >>>> ---------------------
> > > > > >>>>
> > > > > >>>> The goal of this is to display plots that explains the various
> > > > > >>>> activtities
> > > > > >>>> that happens per table. When we go to the tables page of the
> > > monitor
> > > > > and
> > > > > >>>> go
> > > > > >>>> to a specific table it displays some information in a table
> > > format.
> > > > We
> > > > > >>>> can
> > > > > >>>> argument this information by showing graphs for
> > > > > >>>>
> > > > > >>>> 1. Ingest entries
> > > > > >>>> 2. Ingest data size
> > > > > >>>> 3. Scan entries
> > > > > >>>> 4. Scan data size
> > > > > >>>>
> > > > > >>>> Per tablet plots
> > > > > >>>> ----------------------
> > > > > >>>>
> > > > > >>>> Same as in the table plots we can display information
> regarding
> > > > tablet
> > > > > >>>> servers in the tablet server page. The plots will display the
> > same
> > > > > >>>> information as table plots considering data per tablet server.
> > > > > >>>>
> > > > > >>>> Trace Visualization: Accumulo 1198
> > > > > >>>> ----------------------------
> > > > > >>>>
> > > > > >>>> Since we are displaying graphs about each tablet and each
> table
> > we
> > > > can
> > > > > >>>> add
> > > > > >>>> major and minor compaction graph to each table and each
> tablet.
> > > > > >>>>
> > > > > >>>> Or other option is to display this in a single graph in
> overview
> > > > page
> > > > > >>>> with
> > > > > >>>> different graph lines for different tables and tablets.
> > > > > >>>>
> > > > > >>>> Server type information : Accumulo 807
> > > > > >>>> ------------------------------****---
> > > > > >>>>
> > > > > >>>> For displaying this informations we can add a new page and
> > display
> > > > the
> > > > > >>>> information as a table. The table should specify the network
> > > address
> > > > > of
> > > > > >>>> the
> > > > > >>>> server, server type, weather it is active or in-active etc.
> > > > > >>>>
> > > > > >>>> Thanks,
> > > > > >>>> Supun...
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Supun Kamburugamuva
> > Member, Apache Software Foundation; http://www.apache.org
> > E-mail: [email protected];  Mobile: +1 812 369 6762
> > Blog: http://supunk.blogspot.com
> >
>

Re: GSOC: Monitor Improvements

Reply via email to