Re: Question on Metrics Server to Alibaba team

Bobby Evans Mon, 21 Mar 2016 10:05:51 -0700

I also want to make sure that storm can have at least a minimal feedback loop 
to nimbus for scheduling purposes.  Having it be pluggable to send metrics to 
another system is important, but so is having the ability for nimbus to be able 
to query the data (CPU, Memory, process latency, etc.) and adjust scheduling 
accordingly.  This is really required for automatic elasticity, intelligent 
resource over-committing, guaranteed SLAs, lots of important features that can 
differentiate storm from everything else that is out there.  
 - Bobby


    On Monday, March 21, 2016 11:36 AM, Harsha <[email protected]> wrote:
 

 One of the goals of this work and probably can be addressed in separate
jira is how the topology metrics reporter works. Today its a bolt thats
part of a topology graph that means its another node in the Topology DAG
that needs be tuned for better performance. Some of our users took
performance hits by deploying topology metrics reporter that can send
metrics to Ganglia. Ideally this collection should be asynchronous and
not be a node in topology DAG.

Shipping default metrics server and along with pluggable option for
users who wants to graphite or other timeline servers should be the
goal.

--Harsha


On Mon, Mar 21, 2016, at 08:49 AM, Abhishek Agarwal wrote:
> @Cody - The design looks good. Does the design allow to aggregate metrics
> at the task/executor level? Basically, number of distinct metrics is
> proportional to the number of distinct tasks, did you ever run into such
> a
> use case?
> 
> 
> On Mon, Mar 21, 2016 at 8:46 PM, Cody Innowhere <[email protected]>
> wrote:
> 
> > Also, you can read the code from our latest release JStorm 2.1.1.
> >
> > On Mon, Mar 21, 2016 at 11:10 PM, Cody Innowhere <[email protected]>
> > wrote:
> >
> > > @Jungtaek,
> > > We did some tests on codahale metrics, compared to meters/histograms,
> > > counters are quite fast. So we mainly focused on the optimization of
> > meters
> > > and histograms (they are indeed very slow) including double sampling,
> > > changing the clock from ns (System.nanoTime) to ms, etc.
> > > You can take a look at the
> > > "com.alipay.dw.jstorm.example.sequence.bolt.TotalCount" class of our
> > > sequence-split-merge example code, as the client code entry to metrics.
> > > After that, you may dig to TopologyMaster class, which is still part of a
> > > topology, and then to TopologyMetricsRunnable, which is a part of nimbus
> > > server, finally to MetricUploader plugin, this is where the metrics
> > > interfere with our "metrics server". Still, there're some nits in the
> > code,
> > > but I think that should be no big problem.
> > >
> > > I'd also like to point out that our "metrics server" is not strictly a
> > > real metrics server, since most of the duty lies on nimbus server and
> > > topology master, it's more appropriate to call it metrics storage. The
> > main
> > > reason for this is that we don't want to make a heavy-weight metrics
> > server
> > > out of JStorm, and this makes us very easy to maintain (we have teams
> > that
> > > specifically maintain HBase/OTS in Alibaba since they're so commonly used
> > > in production).
> > >
> > > On Mon, Mar 21, 2016 at 10:54 PM, Jungtaek Lim <[email protected]>
> > wrote:
> > >
> > >> Thanks Cody and Bobby for the explanation.
> > >>
> > >> Cody,
> > >> I took a look at design doc and looks promising, especially it doesn't
> > do
> > >> sampling when metric type is 'counter'. As far as I heard (I didn't try
> > >> it)
> > >> it becomes huge performance hit in Apache Storm when we change sample
> > rate
> > >> to 1.0.
> > >> Could you guide the entry point of metric feature in JStorm to dig into?
> > >>
> > >> And just a curiosity, did you consider extracting metric feature (which
> > is
> > >> done with TopologyMasters and Nimbuses) into separate component?
> > >> I understood your mention to 'metrics server' as separate component, but
> > >> after seeing design doc, feature seems to be implemented on Nimbus.
> > >>
> > >> Thanks,
> > >> Jungtaek Lim (HeartSaVioR)
> > >>
> > >> 2016년 3월 19일 (토) 오전 1:25, Cody Innowhere <[email protected]>님이 작성:
> > >>
> > >> > JStorm has provided a MetricUploader interface, which is similar to
> > >> > IMetricsConsumer in storm, and the underlying implementation is
> > >> pluggable,
> > >> > you can use HBase, or any other KV store that supports timeline
> > queries
> > >> or
> > >> > even a database(maybe for it's a small cluster). We provide model
> > >> classes
> > >> > in jstorm-core, as to what kinds of metrics data need to be stored,
> > it's
> > >> > totally up to the detailed implementation. Our internal implementation
> > >> uses
> > >> > OTS, which is a product of aliyun (
> > https://www.aliyun.com/product/ots/
> > >> ),
> > >> > but it's easy to adapt to other implementations.
> > >> >
> > >> > On Fri, Mar 18, 2016 at 11:52 PM, Bobby Evans
> > >> <[email protected]
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > Yes we originally wanted to try and use the Hadoop Timeline Server
> > for
> > >> > > storm metrics feedback to nimbus + UI + history like server.  But it
> > >> was
> > >> > > not stable at the time, so we stopped.  For the sake of playing
> > nicely
> > >> > with
> > >> > > the rest of the big data ecosystem I would like to see us support it
> > >> as
> > >> > an
> > >> > > option for metrics collection/query, but until the timeline server
> > v2
> > >> is
> > >> > > ready and released.  For me the important thing is that we have a
> > >> decent
> > >> > > time series DB that comes with storm by default and is pluggable so
> > we
> > >> > can
> > >> > > replace it with something else that has similar capabilities in the
> > >> > future.
> > >> > >  - Bobby
> > >> > >
> > >> > >    On Friday, March 18, 2016 10:39 AM, Cody Innowhere <
> > >> > > [email protected]> wrote:
> > >> > >
> > >> > >
> > >> > >  It's actually in Phase 2 of porting JStorm, but I'm absolutely ok
> > to
> > >> > > discuss this in advance.
> > >> > >
> > >> > > On Fri, Mar 18, 2016 at 11:31 PM, Cody Innowhere <
> > [email protected]
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Yes it's already in production.
> > >> > > > The implementation basically follows the design document in
> > >> > > > https://issues.apache.org/jira/browse/STORM-1329, you can take a
> > >> look
> > >> > > > first and feel free to ask questions.
> > >> > > >
> > >> > > > On Fri, Mar 18, 2016 at 10:19 PM, Jungtaek Lim <[email protected]
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > >> Hi,
> > >> > > >>
> > >> > > >> I got something to do with metrics so I'm seeking the pull
> > requests
> > >> > > which
> > >> > > >> addresses metrics.
> > >> > > >> And at #753 <https://github.com/apache/storm/pull/753> I found
> > >> Cody
> > >> > > said
> > >> > > >> we
> > >> > > >> (maybe it means Alibaba team) are currently working on Metrics
> > >> Server.
> > >> > > >> (I also found comment which said there was some talk while ago
> > >> around
> > >> > > >> integrating Hadoop timeline server. Seems like no one came up
> > with
> > >> the
> > >> > > >> result, and I prefer to avoid big dependency so I'm in favor of
> > >> > Metrics
> > >> > > >> Server for now.)
> > >> > > >>
> > >> > > >> I think that would improve metrics feature of Storm much better,
> > so
> > >> > I'd
> > >> > > >> like to see how the work is going. Sure it's only when there's no
> > >> > issue
> > >> > > >> for
> > >> > > >> you to work transparently. I just would like to prevent
> > >> duplication of
> > >> > > >> work, and would like to help if needed and possible.
> > >> > > >>
> > >> > > >> Thanks,
> > >> > > >> Jungtaek Lim (HeartSaVioR)
> > >> > > >>
> > >> > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
> 
> 
> 
> -- 
> Regards,
> Abhishek Agarwal

Re: Question on Metrics Server to Alibaba team

Reply via email to