Re: Question on Metrics Server to Alibaba team

Jungtaek Lim Mon, 21 Mar 2016 17:30:07 -0700

Harsha,

That's why I think new metric feature of JStorm looks promising.


According to design doc on https://issues.apache.org/jira/browse/STORM-1329,
there's no distinction between topology stat (which Apache Storm includes
to worker heartbeat) and built-in metrics (which should be handled with
separate consumer, as you stated).
All metrics are passed to Nimbus and Nimbus cached metrics, which implies
we can treat all metrics as same, and we can also provide built-in metrics
(including custom metrics) to users via REST API, too.

I thought about standalone metrics server process which handles whole
metric works (maybe TopologyMaster + Nimbus on design doc), but if current
implementation of metric feature on JStorm can take care of what I'm
assuming, I guess it's great enough.

Since I don't know about TopologyMaster, I just wonder that there're any
SPOFs (including soft) and how metrics work when if component of SPOF goes
down.
Since Cody gives digging point to take a look at, we can evaluate that
feature before phase 2.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 3월 22일 (화) 오전 1:36, Harsha <st...@harsha.io>님이 작성:

> One of the goals of this work and probably can be addressed in separate
> jira is how the topology metrics reporter works. Today its a bolt thats
> part of a topology graph that means its another node in the Topology DAG
> that needs be tuned for better performance. Some of our users took
> performance hits by deploying topology metrics reporter that can send
> metrics to Ganglia. Ideally this collection should be asynchronous and
> not be a node in topology DAG.
>
> Shipping default metrics server and along with pluggable option for
> users who wants to graphite or other timeline servers should be the
> goal.
>
> --Harsha
>
>
> On Mon, Mar 21, 2016, at 08:49 AM, Abhishek Agarwal wrote:
> > @Cody - The design looks good. Does the design allow to aggregate metrics
> > at the task/executor level? Basically, number of distinct metrics is
> > proportional to the number of distinct tasks, did you ever run into such
> > a
> > use case?
> >
> >
> > On Mon, Mar 21, 2016 at 8:46 PM, Cody Innowhere <e.neve...@gmail.com>
> > wrote:
> >
> > > Also, you can read the code from our latest release JStorm 2.1.1.
> > >
> > > On Mon, Mar 21, 2016 at 11:10 PM, Cody Innowhere <e.neve...@gmail.com>
> > > wrote:
> > >
> > > > @Jungtaek,
> > > > We did some tests on codahale metrics, compared to meters/histograms,
> > > > counters are quite fast. So we mainly focused on the optimization of
> > > meters
> > > > and histograms (they are indeed very slow) including double sampling,
> > > > changing the clock from ns (System.nanoTime) to ms, etc.
> > > > You can take a look at the
> > > > "com.alipay.dw.jstorm.example.sequence.bolt.TotalCount" class of our
> > > > sequence-split-merge example code, as the client code entry to
> metrics.
> > > > After that, you may dig to TopologyMaster class, which is still part
> of a
> > > > topology, and then to TopologyMetricsRunnable, which is a part of
> nimbus
> > > > server, finally to MetricUploader plugin, this is where the metrics
> > > > interfere with our "metrics server". Still, there're some nits in the
> > > code,
> > > > but I think that should be no big problem.
> > > >
> > > > I'd also like to point out that our "metrics server" is not strictly
> a
> > > > real metrics server, since most of the duty lies on nimbus server and
> > > > topology master, it's more appropriate to call it metrics storage.
> The
> > > main
> > > > reason for this is that we don't want to make a heavy-weight metrics
> > > server
> > > > out of JStorm, and this makes us very easy to maintain (we have teams
> > > that
> > > > specifically maintain HBase/OTS in Alibaba since they're so commonly
> used
> > > > in production).
> > > >
> > > > On Mon, Mar 21, 2016 at 10:54 PM, Jungtaek Lim <kabh...@gmail.com>
> > > wrote:
> > > >
> > > >> Thanks Cody and Bobby for the explanation.
> > > >>
> > > >> Cody,
> > > >> I took a look at design doc and looks promising, especially it
> doesn't
> > > do
> > > >> sampling when metric type is 'counter'. As far as I heard (I didn't
> try
> > > >> it)
> > > >> it becomes huge performance hit in Apache Storm when we change
> sample
> > > rate
> > > >> to 1.0.
> > > >> Could you guide the entry point of metric feature in JStorm to dig
> into?
> > > >>
> > > >> And just a curiosity, did you consider extracting metric feature
> (which
> > > is
> > > >> done with TopologyMasters and Nimbuses) into separate component?
> > > >> I understood your mention to 'metrics server' as separate
> component, but
> > > >> after seeing design doc, feature seems to be implemented on Nimbus.
> > > >>
> > > >> Thanks,
> > > >> Jungtaek Lim (HeartSaVioR)
> > > >>
> > > >> 2016년 3월 19일 (토) 오전 1:25, Cody Innowhere <e.neve...@gmail.com>님이
> 작성:
> > > >>
> > > >> > JStorm has provided a MetricUploader interface, which is similar
> to
> > > >> > IMetricsConsumer in storm, and the underlying implementation is
> > > >> pluggable,
> > > >> > you can use HBase, or any other KV store that supports timeline
> > > queries
> > > >> or
> > > >> > even a database(maybe for it's a small cluster). We provide model
> > > >> classes
> > > >> > in jstorm-core, as to what kinds of metrics data need to be
> stored,
> > > it's
> > > >> > totally up to the detailed implementation. Our internal
> implementation
> > > >> uses
> > > >> > OTS, which is a product of aliyun (
> > > https://www.aliyun.com/product/ots/
> > > >> ),
> > > >> > but it's easy to adapt to other implementations.
> > > >> >
> > > >> > On Fri, Mar 18, 2016 at 11:52 PM, Bobby Evans
> > > >> <ev...@yahoo-inc.com.invalid
> > > >> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Yes we originally wanted to try and use the Hadoop Timeline
> Server
> > > for
> > > >> > > storm metrics feedback to nimbus + UI + history like server.
> But it
> > > >> was
> > > >> > > not stable at the time, so we stopped.  For the sake of playing
> > > nicely
> > > >> > with
> > > >> > > the rest of the big data ecosystem I would like to see us
> support it
> > > >> as
> > > >> > an
> > > >> > > option for metrics collection/query, but until the timeline
> server
> > > v2
> > > >> is
> > > >> > > ready and released.  For me the important thing is that we have
> a
> > > >> decent
> > > >> > > time series DB that comes with storm by default and is
> pluggable so
> > > we
> > > >> > can
> > > >> > > replace it with something else that has similar capabilities in
> the
> > > >> > future.
> > > >> > >  - Bobby
> > > >> > >
> > > >> > >     On Friday, March 18, 2016 10:39 AM, Cody Innowhere <
> > > >> > > e.neve...@gmail.com> wrote:
> > > >> > >
> > > >> > >
> > > >> > >  It's actually in Phase 2 of porting JStorm, but I'm absolutely
> ok
> > > to
> > > >> > > discuss this in advance.
> > > >> > >
> > > >> > > On Fri, Mar 18, 2016 at 11:31 PM, Cody Innowhere <
> > > e.neve...@gmail.com
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Yes it's already in production.
> > > >> > > > The implementation basically follows the design document in
> > > >> > > > https://issues.apache.org/jira/browse/STORM-1329, you can
> take a
> > > >> look
> > > >> > > > first and feel free to ask questions.
> > > >> > > >
> > > >> > > > On Fri, Mar 18, 2016 at 10:19 PM, Jungtaek Lim <
> kabh...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > >> Hi,
> > > >> > > >>
> > > >> > > >> I got something to do with metrics so I'm seeking the pull
> > > requests
> > > >> > > which
> > > >> > > >> addresses metrics.
> > > >> > > >> And at #753 <https://github.com/apache/storm/pull/753> I
> found
> > > >> Cody
> > > >> > > said
> > > >> > > >> we
> > > >> > > >> (maybe it means Alibaba team) are currently working on
> Metrics
> > > >> Server.
> > > >> > > >> (I also found comment which said there was some talk while
> ago
> > > >> around
> > > >> > > >> integrating Hadoop timeline server. Seems like no one came up
> > > with
> > > >> the
> > > >> > > >> result, and I prefer to avoid big dependency so I'm in favor
> of
> > > >> > Metrics
> > > >> > > >> Server for now.)
> > > >> > > >>
> > > >> > > >> I think that would improve metrics feature of Storm much
> better,
> > > so
> > > >> > I'd
> > > >> > > >> like to see how the work is going. Sure it's only when
> there's no
> > > >> > issue
> > > >> > > >> for
> > > >> > > >> you to work transparently. I just would like to prevent
> > > >> duplication of
> > > >> > > >> work, and would like to help if needed and possible.
> > > >> > > >>
> > > >> > > >> Thanks,
> > > >> > > >> Jungtaek Lim (HeartSaVioR)
> > > >> > > >>
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Abhishek Agarwal
>

Re: Question on Metrics Server to Alibaba team

Reply via email to