Re: Are storm metrics reported through JMX too?

2016-10-11 Thread S G
Response on this important issue is pretty good. I am happily surprised :)

I want to mention our strategy for extracting metrics from other products.
We use jolokia_proxy (https://jolokia.org/features/proxy.html) to get JMX
beans from several softwares and feed them to telegraf. That way, we avoid
writing custom processors for all these different products.

Telegraf is quickly becoming a standard for metrics data. (Just see the
list of input plugins here:
https://github.com/influxdata/telegraf/tree/master/plugins/inputs). And it
integrates well with several outputs too (
https://github.com/influxdata/telegraf/tree/master/plugins/outputs).

Also since the metrics in JMX, they can be queried by jolokia-agent
installed per node. This avoids the extra metrics-consumer bolt which can
hit the topology throughtput too.

So I cast my vote in favor of JMX-implementation of metrics.
Other approaches are welcome to be discussed.

On Tue, Oct 11, 2016 at 7:30 PM, Alessandro Bellina <
abell...@yahoo-inc.com.invalid> wrote:

>  blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px
> #715FFA solid !important; padding-left:1ex !important;
> background-color:white !important; } Yeap that's a requirement from our
> perspective (working through this list).
> Sure I think as usual we can start with master with an eye for what would
> need to be back ported.
> Sent from Yahoo Mail for iPhone
>
>
> On Tuesday, October 11, 2016, 8:50 PM, P. Taylor Goetz 
> wrote:
>
> I hope I didn't come across as overly critical. You did the best with what
> you had to work with. Which isn't pretty.
>
> We could potentially do a parallel metrics API in 1.1, 1.2, or master and
> still stay close to semantic versioning...?
>
> -Taylor
>
> > On Oct 11, 2016, at 9:28 PM, Jungtaek Lim  wrote:
> >
> > Yeah I admit that configuration flag was bad for me also, but I have no
> > alternatives. Only way to avoid struggling with design limitation is
> revamp
> > / redesign.
> > Thanks S G for exposing willingness of volunteer and great news
> Alessandro
> > for that project.
> > Alessandro, could you forward the upcoming news for the project to dev@
> > list?
> >
> > - Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 10월 12일 (수) 오전 10:22, P. Taylor Goetz 님이 작성:
> >
> >> I was thinking on a smaller scale in terms of effort, but the more I
> think
> >> about it, the more supportive I would be of a full revamp (new API) for
> >> metrics based on Coda Hale's metrics library. It's proven and stable.
> I've
> >> used it many times. I think either approach would be roughly the same
> >> amount of work.
> >>
> >> Some of the metrics API improvements in the 1.1.x branch are nice, but
> >> IMHO are lipstick on a pig.
> >>
> >> With apologies to Jungtaek, who has done amazing work all across the
> >> codebase, I'm a little squeamish about the proposed change to metrics
> that
> >> changes the consumer API based on a configuration flag (don't know the
> PR
> >> number offhand).
> >>
> >> I'm +1 for moving in this direction (revamped metrics). Let's end the
> >> metrics pain.
> >>
> >> -Taylor
> >>
> >>> On Oct 11, 2016, at 10:15 AM, Bobby Evans  >
> >> wrote:
> >>>
> >>> I agree that IMetricsConsumer is not good, but the reality is that all
> >> of the metrics system needs to be redone.  The problem is that we ship
> an
> >> object as a metric.  If I get an object I have no idea what it is hand
> >> hence no idea how to report it or what to do with it.  What is more the
> >> common types we use in the metrics we provide are really not enough.
> For
> >> example CountMetric sends a Long.  Well when I get it in the metrics
> >> consumer I have no idea if I should report it like a counter or if I
> should
> >> report it like a gauge (something that every metrics system I have used
> >> wants to know).  But then we support pre-aggregation of the metrics with
> >> IReducer so the number I get might be an average instead of either a
> gauge
> >> or a counter, which no good metrics system will want to collect because
> I
> >> cannot aggregate it with anything else, the math just does not work.
> >>> The proposal I have said before and I still believe is that we need to
> >> put in place a parallel metrics API/system.  We will deprecate all of
> >> https://git.corp.yahoo.com/storm/storm/tree/master-
> security/storm-core/src/jvm/backtype/storm/metric/api
> >> and create a new parallel one that provides an API similar to
> >> http://metrics.dropwizard.io/3.1.0/.  I would even be fine in just
> using
> >> their API and exposing that to end users.  Dropwizard has solved all of
> >> these problems already and I don't see a reason to reinvent the wheel.
> I
> >> don't personally see a lot of value in trying to send all of the metrics
> >> through storm itslef.  I am fine if we are able to support that, but it
> is
> >> far from a requirement. - Bobby
> >>>
> >>>  On Monday, October 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread Alessandro Bellina
 blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px 
#715FFA solid !important; padding-left:1ex !important; background-color:white 
!important; } Yeap that's a requirement from our perspective (working through 
this list).
Sure I think as usual we can start with master with an eye for what would need 
to be back ported.
Sent from Yahoo Mail for iPhone


On Tuesday, October 11, 2016, 8:50 PM, P. Taylor Goetz  
wrote:

I hope I didn't come across as overly critical. You did the best with what you 
had to work with. Which isn't pretty.

We could potentially do a parallel metrics API in 1.1, 1.2, or master and still 
stay close to semantic versioning...?

-Taylor

> On Oct 11, 2016, at 9:28 PM, Jungtaek Lim  wrote:
> 
> Yeah I admit that configuration flag was bad for me also, but I have no
> alternatives. Only way to avoid struggling with design limitation is revamp
> / redesign.
> Thanks S G for exposing willingness of volunteer and great news Alessandro
> for that project.
> Alessandro, could you forward the upcoming news for the project to dev@
> list?
> 
> - Jungtaek Lim (HeartSaVioR)
> 
> 2016년 10월 12일 (수) 오전 10:22, P. Taylor Goetz 님이 작성:
> 
>> I was thinking on a smaller scale in terms of effort, but the more I think
>> about it, the more supportive I would be of a full revamp (new API) for
>> metrics based on Coda Hale's metrics library. It's proven and stable. I've
>> used it many times. I think either approach would be roughly the same
>> amount of work.
>> 
>> Some of the metrics API improvements in the 1.1.x branch are nice, but
>> IMHO are lipstick on a pig.
>> 
>> With apologies to Jungtaek, who has done amazing work all across the
>> codebase, I'm a little squeamish about the proposed change to metrics that
>> changes the consumer API based on a configuration flag (don't know the PR
>> number offhand).
>> 
>> I'm +1 for moving in this direction (revamped metrics). Let's end the
>> metrics pain.
>> 
>> -Taylor
>> 
>>> On Oct 11, 2016, at 10:15 AM, Bobby Evans 
>> wrote:
>>> 
>>> I agree that IMetricsConsumer is not good, but the reality is that all
>> of the metrics system needs to be redone.  The problem is that we ship an
>> object as a metric.  If I get an object I have no idea what it is hand
>> hence no idea how to report it or what to do with it.  What is more the
>> common types we use in the metrics we provide are really not enough.  For
>> example CountMetric sends a Long.  Well when I get it in the metrics
>> consumer I have no idea if I should report it like a counter or if I should
>> report it like a gauge (something that every metrics system I have used
>> wants to know).  But then we support pre-aggregation of the metrics with
>> IReducer so the number I get might be an average instead of either a gauge
>> or a counter, which no good metrics system will want to collect because I
>> cannot aggregate it with anything else, the math just does not work.
>>> The proposal I have said before and I still believe is that we need to
>> put in place a parallel metrics API/system.  We will deprecate all of
>> https://git.corp.yahoo.com/storm/storm/tree/master-security/storm-core/src/jvm/backtype/storm/metric/api
>> and create a new parallel one that provides an API similar to
>> http://metrics.dropwizard.io/3.1.0/.  I would even be fine in just using
>> their API and exposing that to end users.  Dropwizard has solved all of
>> these problems already and I don't see a reason to reinvent the wheel.  I
>> don't personally see a lot of value in trying to send all of the metrics
>> through storm itslef.  I am fine if we are able to support that, but it is
>> far from a requirement. - Bobby
>>> 
>>>  On Monday, October 10, 2016 10:47 PM, S G 
>> wrote:
>>> 
>>> 
>>> +1
>>> We can probably start by opening a JIRA for this and adding a design
>>> approach for the same?
>>> I would like to help in the coding-effort for this.
>>> 
>>> -SG
>>> 
 On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz 
>> wrote:
 
 I’ve been thinking about metrics lately, specifically the fact that
>> people
 tend to struggle with implementing a metrics consumer. (Like this one
>> [1]).
 
 The IMetricsConsumer interface is pretty low level, and common
 aggregations, calculations, etc. are left up to each individual
 implementation. That seems like an area where further abstraction would
 make it easier to support different back ends (Graphite, JMX, Splunk,
>> etc.).
 
 My thought is to create an abstract IMetricsConsumer implementation that
 does common aggregations and calculations, and then delegates to a
>> plugable
 “metrics sink” implementation (e.g. “IMetricsSink”, etc.). That would
 greatly simplify the effort required to integrate with various external
 metrics systems. I know of at least a few users that 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread Jungtaek Lim
No worries Taylor. I'm in favor of quality over quantity, and both you and
me were having valid but different aspects for that thing. And like I said,
I'm all for redesign metrics so no interest on current metrics if we plan
to move on.

Btw, I'm not sure how metrics will be changed, but requiring new metrics to
work with current metrics is not a lightweight. IMHO, it may be better to
forget any compatibilities and focus to implement it (that would be a part
of 2.0.0), and port back to 1.x if it is really possible to keep
compatibility.
On Wed, 12 Oct 2016 at 10:50 AM P. Taylor Goetz  wrote:

> I hope I didn't come across as overly critical. You did the best with what
> you had to work with. Which isn't pretty.
>
> We could potentially do a parallel metrics API in 1.1, 1.2, or master and
> still stay close to semantic versioning...?
>
> -Taylor
>
> > On Oct 11, 2016, at 9:28 PM, Jungtaek Lim  wrote:
> >
> > Yeah I admit that configuration flag was bad for me also, but I have no
> > alternatives. Only way to avoid struggling with design limitation is
> revamp
> > / redesign.
> > Thanks S G for exposing willingness of volunteer and great news
> Alessandro
> > for that project.
> > Alessandro, could you forward the upcoming news for the project to dev@
> > list?
> >
> > - Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 10월 12일 (수) 오전 10:22, P. Taylor Goetz 님이 작성:
> >
> >> I was thinking on a smaller scale in terms of effort, but the more I
> think
> >> about it, the more supportive I would be of a full revamp (new API) for
> >> metrics based on Coda Hale's metrics library. It's proven and stable.
> I've
> >> used it many times. I think either approach would be roughly the same
> >> amount of work.
> >>
> >> Some of the metrics API improvements in the 1.1.x branch are nice, but
> >> IMHO are lipstick on a pig.
> >>
> >> With apologies to Jungtaek, who has done amazing work all across the
> >> codebase, I'm a little squeamish about the proposed change to metrics
> that
> >> changes the consumer API based on a configuration flag (don't know the
> PR
> >> number offhand).
> >>
> >> I'm +1 for moving in this direction (revamped metrics). Let's end the
> >> metrics pain.
> >>
> >> -Taylor
> >>
> >>> On Oct 11, 2016, at 10:15 AM, Bobby Evans  >
> >> wrote:
> >>>
> >>> I agree that IMetricsConsumer is not good, but the reality is that all
> >> of the metrics system needs to be redone.  The problem is that we ship
> an
> >> object as a metric.  If I get an object I have no idea what it is hand
> >> hence no idea how to report it or what to do with it.  What is more the
> >> common types we use in the metrics we provide are really not enough.
> For
> >> example CountMetric sends a Long.  Well when I get it in the metrics
> >> consumer I have no idea if I should report it like a counter or if I
> should
> >> report it like a gauge (something that every metrics system I have used
> >> wants to know).  But then we support pre-aggregation of the metrics with
> >> IReducer so the number I get might be an average instead of either a
> gauge
> >> or a counter, which no good metrics system will want to collect because
> I
> >> cannot aggregate it with anything else, the math just does not work.
> >>> The proposal I have said before and I still believe is that we need to
> >> put in place a parallel metrics API/system.  We will deprecate all of
> >>
> https://git.corp.yahoo.com/storm/storm/tree/master-security/storm-core/src/jvm/backtype/storm/metric/api
> >> and create a new parallel one that provides an API similar to
> >> http://metrics.dropwizard.io/3.1.0/.  I would even be fine in just
> using
> >> their API and exposing that to end users.  Dropwizard has solved all of
> >> these problems already and I don't see a reason to reinvent the wheel.
> I
> >> don't personally see a lot of value in trying to send all of the metrics
> >> through storm itslef.  I am fine if we are able to support that, but it
> is
> >> far from a requirement. - Bobby
> >>>
> >>>   On Monday, October 10, 2016 10:47 PM, S G  >
> >> wrote:
> >>>
> >>>
> >>> +1
> >>> We can probably start by opening a JIRA for this and adding a design
> >>> approach for the same?
> >>> I would like to help in the coding-effort for this.
> >>>
> >>> -SG
> >>>
>  On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz 
> >> wrote:
> 
>  I’ve been thinking about metrics lately, specifically the fact that
> >> people
>  tend to struggle with implementing a metrics consumer. (Like this one
> >> [1]).
> 
>  The IMetricsConsumer interface is pretty low level, and common
>  aggregations, calculations, etc. are left up to each individual
>  implementation. That seems like an area where further abstraction
> would
>  make it easier to support different back ends (Graphite, JMX, Splunk,
> >> etc.).
> 
>  My thought is 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread P. Taylor Goetz
I hope I didn't come across as overly critical. You did the best with what you 
had to work with. Which isn't pretty.

We could potentially do a parallel metrics API in 1.1, 1.2, or master and still 
stay close to semantic versioning...?

-Taylor

> On Oct 11, 2016, at 9:28 PM, Jungtaek Lim  wrote:
> 
> Yeah I admit that configuration flag was bad for me also, but I have no
> alternatives. Only way to avoid struggling with design limitation is revamp
> / redesign.
> Thanks S G for exposing willingness of volunteer and great news Alessandro
> for that project.
> Alessandro, could you forward the upcoming news for the project to dev@
> list?
> 
> - Jungtaek Lim (HeartSaVioR)
> 
> 2016년 10월 12일 (수) 오전 10:22, P. Taylor Goetz 님이 작성:
> 
>> I was thinking on a smaller scale in terms of effort, but the more I think
>> about it, the more supportive I would be of a full revamp (new API) for
>> metrics based on Coda Hale's metrics library. It's proven and stable. I've
>> used it many times. I think either approach would be roughly the same
>> amount of work.
>> 
>> Some of the metrics API improvements in the 1.1.x branch are nice, but
>> IMHO are lipstick on a pig.
>> 
>> With apologies to Jungtaek, who has done amazing work all across the
>> codebase, I'm a little squeamish about the proposed change to metrics that
>> changes the consumer API based on a configuration flag (don't know the PR
>> number offhand).
>> 
>> I'm +1 for moving in this direction (revamped metrics). Let's end the
>> metrics pain.
>> 
>> -Taylor
>> 
>>> On Oct 11, 2016, at 10:15 AM, Bobby Evans 
>> wrote:
>>> 
>>> I agree that IMetricsConsumer is not good, but the reality is that all
>> of the metrics system needs to be redone.  The problem is that we ship an
>> object as a metric.  If I get an object I have no idea what it is hand
>> hence no idea how to report it or what to do with it.  What is more the
>> common types we use in the metrics we provide are really not enough.  For
>> example CountMetric sends a Long.  Well when I get it in the metrics
>> consumer I have no idea if I should report it like a counter or if I should
>> report it like a gauge (something that every metrics system I have used
>> wants to know).  But then we support pre-aggregation of the metrics with
>> IReducer so the number I get might be an average instead of either a gauge
>> or a counter, which no good metrics system will want to collect because I
>> cannot aggregate it with anything else, the math just does not work.
>>> The proposal I have said before and I still believe is that we need to
>> put in place a parallel metrics API/system.  We will deprecate all of
>> https://git.corp.yahoo.com/storm/storm/tree/master-security/storm-core/src/jvm/backtype/storm/metric/api
>> and create a new parallel one that provides an API similar to
>> http://metrics.dropwizard.io/3.1.0/.  I would even be fine in just using
>> their API and exposing that to end users.  Dropwizard has solved all of
>> these problems already and I don't see a reason to reinvent the wheel.  I
>> don't personally see a lot of value in trying to send all of the metrics
>> through storm itslef.  I am fine if we are able to support that, but it is
>> far from a requirement. - Bobby
>>> 
>>>   On Monday, October 10, 2016 10:47 PM, S G 
>> wrote:
>>> 
>>> 
>>> +1
>>> We can probably start by opening a JIRA for this and adding a design
>>> approach for the same?
>>> I would like to help in the coding-effort for this.
>>> 
>>> -SG
>>> 
 On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz 
>> wrote:
 
 I’ve been thinking about metrics lately, specifically the fact that
>> people
 tend to struggle with implementing a metrics consumer. (Like this one
>> [1]).
 
 The IMetricsConsumer interface is pretty low level, and common
 aggregations, calculations, etc. are left up to each individual
 implementation. That seems like an area where further abstraction would
 make it easier to support different back ends (Graphite, JMX, Splunk,
>> etc.).
 
 My thought is to create an abstract IMetricsConsumer implementation that
 does common aggregations and calculations, and then delegates to a
>> plugable
 “metrics sink” implementation (e.g. “IMetricsSink”, etc.). That would
 greatly simplify the effort required to integrate with various external
 metrics systems. I know of at least a few users that would be
>> interested,
 one is currently scraping the logs from LoggingMetricsConsumer and
>> polling
 the Storm REST API for their metrics.
 
 -Taylor
 
 [1] http://twocentsonsoftware.blogspot.co.il/2014/12/
 sending-out-storm-metrics.html
 
 
> On Oct 10, 2016, at 12:14 PM, Bobby Evans >> 
 wrote:
> 
> First of all the server exposes essentially the same interface that the
 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread P. Taylor Goetz
That sounds like an awesome idea, Alessandro!

I would encourage you to teach the students the Apache Way, not just how it 
applies to the Apache Storm project, but in other projects as well. And 
remember the ASF tenet "if it didn't happen on a mailing list, it didn't 
happen." ;)

If there's anything I can do to help, let me know. Looking forward to seeing 
new faces on the mailing lists.

-Taylor



> On Oct 11, 2016, at 8:58 PM, Alessandro Bellina 
>  wrote:
> 
> blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px 
> #715FFA solid !important; padding-left:1ex !important; background-color:white 
> !important; }  Sounds great S G. Thanks for pointers Jungtaek. 
> We were approached by a group of students from the University of Illinois who 
> are working on an open source class this semester. We proposed revamping 
> stats to them as a project. They'll focus will be more along the lines of 
> getting the out of the box metrics to work using rocks db initially, then 
> move to remove stats from zookeeper if they have time. The idea being to make 
> meaningful steps towards a new metrics system or parts of it. Getting help, 
> direction, and working with the community is exactly the type of experience 
> they were hoping to get.
> I am spending time looking at the two metrics systems in storm today, 
> jstorm's approach, and anything else I can get my hands on. I'm going to help 
> mentor the group with the help of others at Yahoo. So I also volunteer for 
> this task.
> Thanks,
> Alessandro
> 
> Sent from Yahoo Mail for iPhone
> 
> 
> On Tuesday, October 11, 2016, 7:34 PM, S G  wrote:
> 
> I would like to volunteer for this (Have contributed to Solr, Avro and Hive
> previously).
> But I would need a little guidance initially to get started because I
> haven't dug too deep in storm's code-base.
> 
> -SG
> 
> 
> 
>> On Tue, Oct 11, 2016 at 3:52 PM, Jungtaek Lim  wrote:
>> 
>> Alessandro, that was before Verisign introduced storm-graphite. In result
>> he
>> followed Storm's metric system.
>> 
>> I initiated the discussion around metrics several times (mostly first half
>> of this year), and from many times the results were that all the metrics
>> interfaces are not good and we need to rework. Finding a way with resorting
>> current interface doesn't help.
>> How we renew the metrics system is the matter. I was waiting on phase of
>> JStorm merger so that we can evaluate JStorm's new metrics system. I guess
>> adopting that is enough changes but I saw that Alibaba met memory issue.
>> How they handle this seems not exposed. For me the thing is JStorm merger
>> is going really slow (nearly stopped), so going to phase 2 might not happen
>> on this year.
>> For seeking other ways, I guess Kafka and Flink renews their metrics
>> (KafkaStream for Kafka) so we may want to take a look.
>> 
>> Anyway, someone volunteer to renew metrics system via Dropwizard or JStorm
>> metrics it would be awesome. I'm focused to improve Storm SQL and there's
>> no other active contributor on Storm SQL yet so I couldn't look at the
>> other side.
>> 
>> - Jungtaek Lim (HeartSaVioR)
>> 
>> 2016년 10월 12일 (수) 오전 3:15, Alessandro Bellina
>> 님이 작성:
>> 
>>> sorry, hopefully the link goes through now:
>>> http://www.michael-noll.com/blog/2013/11/06/sending-
>> metrics-from-storm-to-graphite
>>> 
>>> 
>>> Sending Metrics from Storm to Graphite - Michael G. Noll
>>> By Michael G. Noll
>>> Sending application-level metrics from Storm topologies to the Graphite
>>> monitoring system
>>> 
>>> 
>>> 
>>> On Tuesday, October 11, 2016 1:13 PM, Alessandro Bellina <
>>> abell...@yahoo-inc.com.INVALID> wrote:
>>> 
>>> 
>>> 
>>> I think what Bobby is referring to is that the metrics consumer is
>> another
>>> bolt, so stats are flowing through storm.
>>> What does changing the model to polling buy us? I could see cases were
>>> we'd need more error handling for instance slow/busy workers.
>>> If we think that writing a new system is the way to go (say with codahale
>>> throughout), would working on an abstraction layer that is used by the
>>> daemons but also by end-users be a good place to start? With codahale as
>>> the implementation?
>>> Looks like Michael Noll has done a lot work with codahale, for instance:
>>> Sending Metrics from Storm to Graphite - Michael G. Noll.
>>> 
>>> |
>>> |
>>> |
>>> |  ||
>>> 
>>>   |
>>> 
>>>   |
>>> |
>>> |  |
>>> Sending Metrics from Storm to Graphite - Michael G. Noll
>>> By Michael G. Noll Sending application-level metrics from Storm
>> topologies
>>> to the Graphite monitoring system  |  |
>>> 
>>>   |
>>> 
>>>   |
>>> 
>>> 
>>> 
>>> Thanks,
>>> Alessandro
>>> On Tuesday, October 11, 2016 11:07 AM, S G <
>> sg.online.em...@gmail.com>
>>> wrote:
>>> 
>>> 
>>> 
>>> "Dropwizard has solved all of these problems already and I don't see a
>>> reason to reinvent the wheel" - I 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread Jungtaek Lim
Yeah I admit that configuration flag was bad for me also, but I have no
alternatives. Only way to avoid struggling with design limitation is revamp
/ redesign.
Thanks S G for exposing willingness of volunteer and great news Alessandro
for that project.
Alessandro, could you forward the upcoming news for the project to dev@
list?

- Jungtaek Lim (HeartSaVioR)

2016년 10월 12일 (수) 오전 10:22, P. Taylor Goetz 님이 작성:

> I was thinking on a smaller scale in terms of effort, but the more I think
> about it, the more supportive I would be of a full revamp (new API) for
> metrics based on Coda Hale's metrics library. It's proven and stable. I've
> used it many times. I think either approach would be roughly the same
> amount of work.
>
> Some of the metrics API improvements in the 1.1.x branch are nice, but
> IMHO are lipstick on a pig.
>
> With apologies to Jungtaek, who has done amazing work all across the
> codebase, I'm a little squeamish about the proposed change to metrics that
> changes the consumer API based on a configuration flag (don't know the PR
> number offhand).
>
> I'm +1 for moving in this direction (revamped metrics). Let's end the
> metrics pain.
>
> -Taylor
>
> > On Oct 11, 2016, at 10:15 AM, Bobby Evans 
> wrote:
> >
> > I agree that IMetricsConsumer is not good, but the reality is that all
> of the metrics system needs to be redone.  The problem is that we ship an
> object as a metric.  If I get an object I have no idea what it is hand
> hence no idea how to report it or what to do with it.  What is more the
> common types we use in the metrics we provide are really not enough.  For
> example CountMetric sends a Long.  Well when I get it in the metrics
> consumer I have no idea if I should report it like a counter or if I should
> report it like a gauge (something that every metrics system I have used
> wants to know).  But then we support pre-aggregation of the metrics with
> IReducer so the number I get might be an average instead of either a gauge
> or a counter, which no good metrics system will want to collect because I
> cannot aggregate it with anything else, the math just does not work.
> > The proposal I have said before and I still believe is that we need to
> put in place a parallel metrics API/system.  We will deprecate all of
> https://git.corp.yahoo.com/storm/storm/tree/master-security/storm-core/src/jvm/backtype/storm/metric/api
> and create a new parallel one that provides an API similar to
> http://metrics.dropwizard.io/3.1.0/.  I would even be fine in just using
> their API and exposing that to end users.  Dropwizard has solved all of
> these problems already and I don't see a reason to reinvent the wheel.  I
> don't personally see a lot of value in trying to send all of the metrics
> through storm itslef.  I am fine if we are able to support that, but it is
> far from a requirement. - Bobby
> >
> >On Monday, October 10, 2016 10:47 PM, S G 
> wrote:
> >
> >
> > +1
> > We can probably start by opening a JIRA for this and adding a design
> > approach for the same?
> > I would like to help in the coding-effort for this.
> >
> > -SG
> >
> >> On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz 
> wrote:
> >>
> >> I’ve been thinking about metrics lately, specifically the fact that
> people
> >> tend to struggle with implementing a metrics consumer. (Like this one
> [1]).
> >>
> >> The IMetricsConsumer interface is pretty low level, and common
> >> aggregations, calculations, etc. are left up to each individual
> >> implementation. That seems like an area where further abstraction would
> >> make it easier to support different back ends (Graphite, JMX, Splunk,
> etc.).
> >>
> >> My thought is to create an abstract IMetricsConsumer implementation that
> >> does common aggregations and calculations, and then delegates to a
> plugable
> >> “metrics sink” implementation (e.g. “IMetricsSink”, etc.). That would
> >> greatly simplify the effort required to integrate with various external
> >> metrics systems. I know of at least a few users that would be
> interested,
> >> one is currently scraping the logs from LoggingMetricsConsumer and
> polling
> >> the Storm REST API for their metrics.
> >>
> >> -Taylor
> >>
> >> [1] http://twocentsonsoftware.blogspot.co.il/2014/12/
> >> sending-out-storm-metrics.html
> >>
> >>
> >>> On Oct 10, 2016, at 12:14 PM, Bobby Evans  >
> >> wrote:
> >>>
> >>> First of all the server exposes essentially the same interface that the
> >> IMetricsConsumer exposes.  It mostly just adds a bunch of overhead in
> the
> >> middle to serialize out the objects send them over http to another
> process
> >> which then has to deserialize them and process them.  If you really
> don't
> >> need the metrics to show up on a special known box you can have that
> exact
> >> same code running inside the metrics consumer without all of the
> overhead.
> >>> The 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread P. Taylor Goetz
I was thinking on a smaller scale in terms of effort, but the more I think 
about it, the more supportive I would be of a full revamp (new API) for metrics 
based on Coda Hale's metrics library. It's proven and stable. I've used it many 
times. I think either approach would be roughly the same amount of work.

Some of the metrics API improvements in the 1.1.x branch are nice, but IMHO are 
lipstick on a pig. 

With apologies to Jungtaek, who has done amazing work all across the codebase, 
I'm a little squeamish about the proposed change to metrics that changes the 
consumer API based on a configuration flag (don't know the PR number offhand).

I'm +1 for moving in this direction (revamped metrics). Let's end the metrics 
pain.

-Taylor

> On Oct 11, 2016, at 10:15 AM, Bobby Evans  wrote:
> 
> I agree that IMetricsConsumer is not good, but the reality is that all of the 
> metrics system needs to be redone.  The problem is that we ship an object as 
> a metric.  If I get an object I have no idea what it is hand hence no idea 
> how to report it or what to do with it.  What is more the common types we use 
> in the metrics we provide are really not enough.  For example CountMetric 
> sends a Long.  Well when I get it in the metrics consumer I have no idea if I 
> should report it like a counter or if I should report it like a gauge 
> (something that every metrics system I have used wants to know).  But then we 
> support pre-aggregation of the metrics with IReducer so the number I get 
> might be an average instead of either a gauge or a counter, which no good 
> metrics system will want to collect because I cannot aggregate it with 
> anything else, the math just does not work.
> The proposal I have said before and I still believe is that we need to put in 
> place a parallel metrics API/system.  We will deprecate all of 
> https://git.corp.yahoo.com/storm/storm/tree/master-security/storm-core/src/jvm/backtype/storm/metric/api
>  and create a new parallel one that provides an API similar to 
> http://metrics.dropwizard.io/3.1.0/.  I would even be fine in just using 
> their API and exposing that to end users.  Dropwizard has solved all of these 
> problems already and I don't see a reason to reinvent the wheel.  I don't 
> personally see a lot of value in trying to send all of the metrics through 
> storm itslef.  I am fine if we are able to support that, but it is far from a 
> requirement. - Bobby 
> 
>On Monday, October 10, 2016 10:47 PM, S G  
> wrote:
> 
> 
> +1
> We can probably start by opening a JIRA for this and adding a design
> approach for the same?
> I would like to help in the coding-effort for this.
> 
> -SG
> 
>> On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz  wrote:
>> 
>> I’ve been thinking about metrics lately, specifically the fact that people
>> tend to struggle with implementing a metrics consumer. (Like this one [1]).
>> 
>> The IMetricsConsumer interface is pretty low level, and common
>> aggregations, calculations, etc. are left up to each individual
>> implementation. That seems like an area where further abstraction would
>> make it easier to support different back ends (Graphite, JMX, Splunk, etc.).
>> 
>> My thought is to create an abstract IMetricsConsumer implementation that
>> does common aggregations and calculations, and then delegates to a plugable
>> “metrics sink” implementation (e.g. “IMetricsSink”, etc.). That would
>> greatly simplify the effort required to integrate with various external
>> metrics systems. I know of at least a few users that would be interested,
>> one is currently scraping the logs from LoggingMetricsConsumer and polling
>> the Storm REST API for their metrics.
>> 
>> -Taylor
>> 
>> [1] http://twocentsonsoftware.blogspot.co.il/2014/12/
>> sending-out-storm-metrics.html
>> 
>> 
>>> On Oct 10, 2016, at 12:14 PM, Bobby Evans 
>> wrote:
>>> 
>>> First of all the server exposes essentially the same interface that the
>> IMetricsConsumer exposes.  It mostly just adds a bunch of overhead in the
>> middle to serialize out the objects send them over http to another process
>> which then has to deserialize them and process them.  If you really don't
>> need the metrics to show up on a special known box you can have that exact
>> same code running inside the metrics consumer without all of the overhead.
>>> The server/client are insecure, have to deal with thread issues that a
>> normal IMetricsConsumer does not, and are not written to be robust (If the
>> HTTP server is down the consumer crashes and continues to crash until the
>> server is brought back up).  It was written very quickly for a test
>> situation and it honestly never crossed my mind that anyone would want to
>> use it in production.
>>> 
>>> - Bobby
>>> 
>>> On Monday, October 10, 2016 10:59 AM, S G 
>> wrote:
>>> 
>>> 
>>> Thanks Bobby.
>>> 
>>> If we 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread Alessandro Bellina
 blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px 
#715FFA solid !important; padding-left:1ex !important; background-color:white 
!important; }  Sounds great S G. Thanks for pointers Jungtaek. 
We were approached by a group of students from the University of Illinois who 
are working on an open source class this semester. We proposed revamping stats 
to them as a project. They'll focus will be more along the lines of getting the 
out of the box metrics to work using rocks db initially, then move to remove 
stats from zookeeper if they have time. The idea being to make meaningful steps 
towards a new metrics system or parts of it. Getting help, direction, and 
working with the community is exactly the type of experience they were hoping 
to get.
I am spending time looking at the two metrics systems in storm today, jstorm's 
approach, and anything else I can get my hands on. I'm going to help mentor the 
group with the help of others at Yahoo. So I also volunteer for this task.
Thanks,
Alessandro

Sent from Yahoo Mail for iPhone


On Tuesday, October 11, 2016, 7:34 PM, S G  wrote:

I would like to volunteer for this (Have contributed to Solr, Avro and Hive
previously).
But I would need a little guidance initially to get started because I
haven't dug too deep in storm's code-base.

-SG



On Tue, Oct 11, 2016 at 3:52 PM, Jungtaek Lim  wrote:

> Alessandro, that was before Verisign introduced storm-graphite. In result
> he
> followed Storm's metric system.
>
> I initiated the discussion around metrics several times (mostly first half
> of this year), and from many times the results were that all the metrics
> interfaces are not good and we need to rework. Finding a way with resorting
> current interface doesn't help.
> How we renew the metrics system is the matter. I was waiting on phase of
> JStorm merger so that we can evaluate JStorm's new metrics system. I guess
> adopting that is enough changes but I saw that Alibaba met memory issue.
> How they handle this seems not exposed. For me the thing is JStorm merger
> is going really slow (nearly stopped), so going to phase 2 might not happen
> on this year.
> For seeking other ways, I guess Kafka and Flink renews their metrics
> (KafkaStream for Kafka) so we may want to take a look.
>
> Anyway, someone volunteer to renew metrics system via Dropwizard or JStorm
> metrics it would be awesome. I'm focused to improve Storm SQL and there's
> no other active contributor on Storm SQL yet so I couldn't look at the
> other side.
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2016년 10월 12일 (수) 오전 3:15, Alessandro Bellina
> 님이 작성:
>
> > sorry, hopefully the link goes through now:
> > http://www.michael-noll.com/blog/2013/11/06/sending-
> metrics-from-storm-to-graphite
> >
> >
> > Sending Metrics from Storm to Graphite - Michael G. Noll
> > By Michael G. Noll
> > Sending application-level metrics from Storm topologies to the Graphite
> > monitoring system
> >
> >
> >
> > On Tuesday, October 11, 2016 1:13 PM, Alessandro Bellina <
> > abell...@yahoo-inc.com.INVALID> wrote:
> >
> >
> >
> > I think what Bobby is referring to is that the metrics consumer is
> another
> > bolt, so stats are flowing through storm.
> > What does changing the model to polling buy us? I could see cases were
> > we'd need more error handling for instance slow/busy workers.
> > If we think that writing a new system is the way to go (say with codahale
> > throughout), would working on an abstraction layer that is used by the
> > daemons but also by end-users be a good place to start? With codahale as
> > the implementation?
> > Looks like Michael Noll has done a lot work with codahale, for instance:
> > Sending Metrics from Storm to Graphite - Michael G. Noll.
> >
> > |
> > |
> > |
> > |  |    |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > Sending Metrics from Storm to Graphite - Michael G. Noll
> > By Michael G. Noll Sending application-level metrics from Storm
> topologies
> > to the Graphite monitoring system  |  |
> >
> >  |
> >
> >  |
> >
> >
> >
> > Thanks,
> > Alessandro
> >    On Tuesday, October 11, 2016 11:07 AM, S G <
> sg.online.em...@gmail.com>
> > wrote:
> >
> >
> >
> > "Dropwizard has solved all of these problems already and I don't see a
> > reason to reinvent the wheel" - I love dropwizard too and many of the
> other
> > tools have switched to using the same too.
> >
> > "I don't personally see a lot of value in trying to send all of the
> metrics
> > through storm itself" - How about every node reporting its own metrics
> by a
> > URL ? That ways there is no need for a metrics-consumer that can
> bottleneck
> > the whole topology. We can then provide a separate server that can query
> > all nodes to get those metrics and aggregate them. Only cluster wide
> > metrics should be reported by the storm-UI's REST API (assuming there are
> > not too many of those).
> >
> > On Tue, Oct 11, 2016 at 7:15 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread P. Taylor Goetz


> On Oct 11, 2016, at 8:34 PM, S G  wrote:
> 
> I would like to volunteer for this (Have contributed to Solr, Avro and Hive
> previously).
> But I would need a little guidance initially to get started because I
> haven't dug too deep in storm's code-base.
> 
> -SG
> 
> 
> 
>> On Tue, Oct 11, 2016 at 3:52 PM, Jungtaek Lim  wrote:
>> 
>> Alessandro, that was before Verisign introduced storm-graphite. In result
>> he
>> followed Storm's metric system.
>> 
>> I initiated the discussion around metrics several times (mostly first half
>> of this year), and from many times the results were that all the metrics
>> interfaces are not good and we need to rework. Finding a way with resorting
>> current interface doesn't help.
>> How we renew the metrics system is the matter. I was waiting on phase of
>> JStorm merger so that we can evaluate JStorm's new metrics system. I guess
>> adopting that is enough changes but I saw that Alibaba met memory issue.
>> How they handle this seems not exposed. For me the thing is JStorm merger
>> is going really slow (nearly stopped), so going to phase 2 might not happen
>> on this year.
>> For seeking other ways, I guess Kafka and Flink renews their metrics
>> (KafkaStream for Kafka) so we may want to take a look.
>> 
>> Anyway, someone volunteer to renew metrics system via Dropwizard or JStorm
>> metrics it would be awesome. I'm focused to improve Storm SQL and there's
>> no other active contributor on Storm SQL yet so I couldn't look at the
>> other side.
>> 
>> - Jungtaek Lim (HeartSaVioR)
>> 
>> 2016년 10월 12일 (수) 오전 3:15, Alessandro Bellina
>> 님이 작성:
>> 
>>> sorry, hopefully the link goes through now:
>>> http://www.michael-noll.com/blog/2013/11/06/sending-
>> metrics-from-storm-to-graphite
>>> 
>>> 
>>> Sending Metrics from Storm to Graphite - Michael G. Noll
>>> By Michael G. Noll
>>> Sending application-level metrics from Storm topologies to the Graphite
>>> monitoring system
>>> 
>>> 
>>> 
>>> On Tuesday, October 11, 2016 1:13 PM, Alessandro Bellina <
>>> abell...@yahoo-inc.com.INVALID> wrote:
>>> 
>>> 
>>> 
>>> I think what Bobby is referring to is that the metrics consumer is
>> another
>>> bolt, so stats are flowing through storm.
>>> What does changing the model to polling buy us? I could see cases were
>>> we'd need more error handling for instance slow/busy workers.
>>> If we think that writing a new system is the way to go (say with codahale
>>> throughout), would working on an abstraction layer that is used by the
>>> daemons but also by end-users be a good place to start? With codahale as
>>> the implementation?
>>> Looks like Michael Noll has done a lot work with codahale, for instance:
>>> Sending Metrics from Storm to Graphite - Michael G. Noll.
>>> 
>>> |
>>> |
>>> |
>>> |   ||
>>> 
>>>  |
>>> 
>>>  |
>>> |
>>> |   |
>>> Sending Metrics from Storm to Graphite - Michael G. Noll
>>> By Michael G. Noll Sending application-level metrics from Storm
>> topologies
>>> to the Graphite monitoring system  |   |
>>> 
>>>  |
>>> 
>>>  |
>>> 
>>> 
>>> 
>>> Thanks,
>>> Alessandro
>>>On Tuesday, October 11, 2016 11:07 AM, S G <
>> sg.online.em...@gmail.com>
>>> wrote:
>>> 
>>> 
>>> 
>>> "Dropwizard has solved all of these problems already and I don't see a
>>> reason to reinvent the wheel" - I love dropwizard too and many of the
>> other
>>> tools have switched to using the same too.
>>> 
>>> "I don't personally see a lot of value in trying to send all of the
>> metrics
>>> through storm itself" - How about every node reporting its own metrics
>> by a
>>> URL ? That ways there is no need for a metrics-consumer that can
>> bottleneck
>>> the whole topology. We can then provide a separate server that can query
>>> all nodes to get those metrics and aggregate them. Only cluster wide
>>> metrics should be reported by the storm-UI's REST API (assuming there are
>>> not too many of those).
>>> 
>>> On Tue, Oct 11, 2016 at 7:15 AM, Bobby Evans >> 
>>> wrote:
>>> 
 I agree that IMetricsConsumer is not good, but the reality is that all
>> of
 the metrics system needs to be redone.  The problem is that we ship an
 object as a metric.  If I get an object I have no idea what it is hand
 hence no idea how to report it or what to do with it.  What is more the
 common types we use in the metrics we provide are really not enough.
>> For
 example CountMetric sends a Long.  Well when I get it in the metrics
 consumer I have no idea if I should report it like a counter or if I
>>> should
 report it like a gauge (something that every metrics system I have used
 wants to know).  But then we support pre-aggregation of the metrics
>> with
 IReducer so the number I get might be an average instead of either a
>>> gauge
 or a counter, which no good metrics system will want to collect
>> because I
 cannot aggregate it with anything else, 

[GitHub] storm pull request #1443: Log.warn if found a message in kafka topic larger ...

2016-10-11 Thread erikdw
Github user erikdw commented on a diff in the pull request:

https://github.com/apache/storm/pull/1443#discussion_r82918924
  
--- Diff: 
external/storm-kafka/src/jvm/org/apache/storm/kafka/KafkaUtils.java ---
@@ -218,6 +219,11 @@ public static ByteBufferMessageSet 
fetchMessages(KafkaConfig config, SimpleConsu
 }
 } else {
 msgs = fetchResponse.messageSet(topic, partitionId);
+if (msgs.sizeInBytes() > 0 && msgs.validBytes() == 0) {
+LOG.warn(String.format("Found a message larger than the 
maximum fetch size (%d bytes) of this consumer on topic " +
+"%s partition %d at fetch offset %d. 
Increase the fetch size, or decrease the maximum message size the broker will 
allow."
+, config.fetchSizeBytes, partition.topic, 
partition.partition, offset));
--- End diff --

Nit: Comma at the front of the line is aesthetically displeasing to me 
(i.e., ugly). ;-)

More importantly, what is the value in `msgs.sizeInBytes()` if it's not the 
message size?  i.e., in the [review comments you 
said](https://github.com/apache/storm/pull/1443#issuecomment-221819857): 

> It seems we can not get the actual size of the message.

So I wonder what the value is in `msgs.sizeInBytes()` then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Are storm metrics reported through JMX too?

2016-10-11 Thread S G
I would like to volunteer for this (Have contributed to Solr, Avro and Hive
previously).
But I would need a little guidance initially to get started because I
haven't dug too deep in storm's code-base.

-SG



On Tue, Oct 11, 2016 at 3:52 PM, Jungtaek Lim  wrote:

> Alessandro, that was before Verisign introduced storm-graphite. In result
> he
> followed Storm's metric system.
>
> I initiated the discussion around metrics several times (mostly first half
> of this year), and from many times the results were that all the metrics
> interfaces are not good and we need to rework. Finding a way with resorting
> current interface doesn't help.
> How we renew the metrics system is the matter. I was waiting on phase of
> JStorm merger so that we can evaluate JStorm's new metrics system. I guess
> adopting that is enough changes but I saw that Alibaba met memory issue.
> How they handle this seems not exposed. For me the thing is JStorm merger
> is going really slow (nearly stopped), so going to phase 2 might not happen
> on this year.
> For seeking other ways, I guess Kafka and Flink renews their metrics
> (KafkaStream for Kafka) so we may want to take a look.
>
> Anyway, someone volunteer to renew metrics system via Dropwizard or JStorm
> metrics it would be awesome. I'm focused to improve Storm SQL and there's
> no other active contributor on Storm SQL yet so I couldn't look at the
> other side.
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2016년 10월 12일 (수) 오전 3:15, Alessandro Bellina
> 님이 작성:
>
> > sorry, hopefully the link goes through now:
> > http://www.michael-noll.com/blog/2013/11/06/sending-
> metrics-from-storm-to-graphite
> >
> >
> > Sending Metrics from Storm to Graphite - Michael G. Noll
> > By Michael G. Noll
> > Sending application-level metrics from Storm topologies to the Graphite
> > monitoring system
> >
> >
> >
> > On Tuesday, October 11, 2016 1:13 PM, Alessandro Bellina <
> > abell...@yahoo-inc.com.INVALID> wrote:
> >
> >
> >
> > I think what Bobby is referring to is that the metrics consumer is
> another
> > bolt, so stats are flowing through storm.
> > What does changing the model to polling buy us? I could see cases were
> > we'd need more error handling for instance slow/busy workers.
> > If we think that writing a new system is the way to go (say with codahale
> > throughout), would working on an abstraction layer that is used by the
> > daemons but also by end-users be a good place to start? With codahale as
> > the implementation?
> > Looks like Michael Noll has done a lot work with codahale, for instance:
> > Sending Metrics from Storm to Graphite - Michael G. Noll.
> >
> > |
> > |
> > |
> > |   ||
> >
> >   |
> >
> >   |
> > |
> > |   |
> > Sending Metrics from Storm to Graphite - Michael G. Noll
> > By Michael G. Noll Sending application-level metrics from Storm
> topologies
> > to the Graphite monitoring system  |   |
> >
> >   |
> >
> >   |
> >
> >
> >
> > Thanks,
> > Alessandro
> > On Tuesday, October 11, 2016 11:07 AM, S G <
> sg.online.em...@gmail.com>
> > wrote:
> >
> >
> >
> > "Dropwizard has solved all of these problems already and I don't see a
> > reason to reinvent the wheel" - I love dropwizard too and many of the
> other
> > tools have switched to using the same too.
> >
> > "I don't personally see a lot of value in trying to send all of the
> metrics
> > through storm itself" - How about every node reporting its own metrics
> by a
> > URL ? That ways there is no need for a metrics-consumer that can
> bottleneck
> > the whole topology. We can then provide a separate server that can query
> > all nodes to get those metrics and aggregate them. Only cluster wide
> > metrics should be reported by the storm-UI's REST API (assuming there are
> > not too many of those).
> >
> > On Tue, Oct 11, 2016 at 7:15 AM, Bobby Evans  >
> > wrote:
> >
> > > I agree that IMetricsConsumer is not good, but the reality is that all
> of
> > > the metrics system needs to be redone.  The problem is that we ship an
> > > object as a metric.  If I get an object I have no idea what it is hand
> > > hence no idea how to report it or what to do with it.  What is more the
> > > common types we use in the metrics we provide are really not enough.
> For
> > > example CountMetric sends a Long.  Well when I get it in the metrics
> > > consumer I have no idea if I should report it like a counter or if I
> > should
> > > report it like a gauge (something that every metrics system I have used
> > > wants to know).  But then we support pre-aggregation of the metrics
> with
> > > IReducer so the number I get might be an average instead of either a
> > gauge
> > > or a counter, which no good metrics system will want to collect
> because I
> > > cannot aggregate it with anything else, the math just does not work.
> > > The proposal I have said before and I still believe is that we need to
> > put
> > > in place a parallel metrics 

[GitHub] storm pull request #1720: STORM-1546: Adding Read and Write Aggregations for...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1720


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1720: STORM-1546: Adding Read and Write Aggregations for Pacema...

2016-10-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1720
  
Thanks for the change. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Are storm metrics reported through JMX too?

2016-10-11 Thread Jungtaek Lim
Alessandro, that was before Verisign introduced storm-graphite. In result he
followed Storm's metric system.

I initiated the discussion around metrics several times (mostly first half
of this year), and from many times the results were that all the metrics
interfaces are not good and we need to rework. Finding a way with resorting
current interface doesn't help.
How we renew the metrics system is the matter. I was waiting on phase of
JStorm merger so that we can evaluate JStorm's new metrics system. I guess
adopting that is enough changes but I saw that Alibaba met memory issue.
How they handle this seems not exposed. For me the thing is JStorm merger
is going really slow (nearly stopped), so going to phase 2 might not happen
on this year.
For seeking other ways, I guess Kafka and Flink renews their metrics
(KafkaStream for Kafka) so we may want to take a look.

Anyway, someone volunteer to renew metrics system via Dropwizard or JStorm
metrics it would be awesome. I'm focused to improve Storm SQL and there's
no other active contributor on Storm SQL yet so I couldn't look at the
other side.

- Jungtaek Lim (HeartSaVioR)

2016년 10월 12일 (수) 오전 3:15, Alessandro Bellina
님이 작성:

> sorry, hopefully the link goes through now:
> http://www.michael-noll.com/blog/2013/11/06/sending-metrics-from-storm-to-graphite
>
>
> Sending Metrics from Storm to Graphite - Michael G. Noll
> By Michael G. Noll
> Sending application-level metrics from Storm topologies to the Graphite
> monitoring system
>
>
>
> On Tuesday, October 11, 2016 1:13 PM, Alessandro Bellina <
> abell...@yahoo-inc.com.INVALID> wrote:
>
>
>
> I think what Bobby is referring to is that the metrics consumer is another
> bolt, so stats are flowing through storm.
> What does changing the model to polling buy us? I could see cases were
> we'd need more error handling for instance slow/busy workers.
> If we think that writing a new system is the way to go (say with codahale
> throughout), would working on an abstraction layer that is used by the
> daemons but also by end-users be a good place to start? With codahale as
> the implementation?
> Looks like Michael Noll has done a lot work with codahale, for instance:
> Sending Metrics from Storm to Graphite - Michael G. Noll.
>
> |
> |
> |
> |   ||
>
>   |
>
>   |
> |
> |   |
> Sending Metrics from Storm to Graphite - Michael G. Noll
> By Michael G. Noll Sending application-level metrics from Storm topologies
> to the Graphite monitoring system  |   |
>
>   |
>
>   |
>
>
>
> Thanks,
> Alessandro
> On Tuesday, October 11, 2016 11:07 AM, S G 
> wrote:
>
>
>
> "Dropwizard has solved all of these problems already and I don't see a
> reason to reinvent the wheel" - I love dropwizard too and many of the other
> tools have switched to using the same too.
>
> "I don't personally see a lot of value in trying to send all of the metrics
> through storm itself" - How about every node reporting its own metrics by a
> URL ? That ways there is no need for a metrics-consumer that can bottleneck
> the whole topology. We can then provide a separate server that can query
> all nodes to get those metrics and aggregate them. Only cluster wide
> metrics should be reported by the storm-UI's REST API (assuming there are
> not too many of those).
>
> On Tue, Oct 11, 2016 at 7:15 AM, Bobby Evans 
> wrote:
>
> > I agree that IMetricsConsumer is not good, but the reality is that all of
> > the metrics system needs to be redone.  The problem is that we ship an
> > object as a metric.  If I get an object I have no idea what it is hand
> > hence no idea how to report it or what to do with it.  What is more the
> > common types we use in the metrics we provide are really not enough.  For
> > example CountMetric sends a Long.  Well when I get it in the metrics
> > consumer I have no idea if I should report it like a counter or if I
> should
> > report it like a gauge (something that every metrics system I have used
> > wants to know).  But then we support pre-aggregation of the metrics with
> > IReducer so the number I get might be an average instead of either a
> gauge
> > or a counter, which no good metrics system will want to collect because I
> > cannot aggregate it with anything else, the math just does not work.
> > The proposal I have said before and I still believe is that we need to
> put
> > in place a parallel metrics API/system.  We will deprecate all of
> > https://git.corp.yahoo.com/storm/storm/tree/master-
> > security/storm-core/src/jvm/backtype/storm/metric/api and create a new
> > parallel one that provides an API similar to http://metrics.dropwizard.
> > io/3.1.0/.  I would even be fine in just using their API and exposing
> > that to end users.  Dropwizard has solved all of these problems already
> and
> > I don't see a reason to reinvent the wheel.  I don't personally see a lot
> > of value in trying to send all of the metrics 

[GitHub] storm issue #1697: STORM-2018: Supervisor V2

2016-10-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1697
  
@revans2 
It would be better to address STORM-2131 here as well. Please pull #1724 
here.
And could you update the pull request according to the review comments? 
Supervisor V2 is the one I would want to include to 1.1.0.

Thanks in advance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1714: STORM-2125 Use Calcite's implementation of Rex Compiler

2016-10-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1714
  
Calcite 1.10.0 RC1 vote passed and artifact is available to Maven.
Applied HeartSaVioR@b2bcf9b with changing Calcite version from 
1.10.0-SNAPSHOT to 1.10.0.
Also rebased and squashed the commits.

@arunmahadevan I guess this would be the last review for this patch. Please 
take a look again. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1681: STORM-1444 Support EXPLAIN statement in StormSQL

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1681


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1689: STORM-2099 Introduce new sql external module: stor...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1689


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1720: STORM-1546: Adding Read and Write Aggregations for Pacema...

2016-10-11 Thread knusbaum
Github user knusbaum commented on the issue:

https://github.com/apache/storm/pull/1720
  
Also updated the docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1720: STORM-1546: Adding Read and Write Aggregations for Pacema...

2016-10-11 Thread knusbaum
Github user knusbaum commented on the issue:

https://github.com/apache/storm/pull/1720
  
@HeartSaVioR I've synced the java files between the branches so there are 
almost no differences.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-10-11 Thread jfenc91
Github user jfenc91 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r82872195
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -266,26 +266,31 @@ private void doSeekRetriableTopicPartitions() {
 if (offsetAndMeta != null) {
 kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1);  // 
seek to the next offset that is ready to commit in next commit cycle
 } else {
-kafkaConsumer.seekToEnd(toArrayList(rtp));// Seek to 
last committed offset
+kafkaConsumer.seek(rtp, acked.get(rtp).committedOffset + 
1);// Seek to last committed offset
 }
 }
 }
 
 //  emit  =
 private void emit() {
-emitTupleIfNotEmitted(waitingToEmit.next());
-waitingToEmit.remove();
+while(!emitTupleIfNotEmitted(waitingToEmit.next()) && 
waitingToEmit.hasNext()) {
+waitingToEmit.remove();
+}
 }
 
-// emits one tuple per record
-private void emitTupleIfNotEmitted(ConsumerRecord record) {
+
+//Emits one tuple per record
+//@return true if tuple was emitted
+private boolean emitTupleIfNotEmitted(ConsumerRecord record) {
 final TopicPartition tp = new TopicPartition(record.topic(), 
record.partition());
 final KafkaSpoutMessageId msgId = new KafkaSpoutMessageId(record);
 
 if (acked.containsKey(tp) && acked.get(tp).contains(msgId)) {   // 
has been acked
 LOG.trace("Tuple for record [{}] has already been acked. 
Skipping", record);
+return false;
--- End diff --

I removed the return statements @harshach 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1733: [STORM-2134] - improving the current scheduling strategy ...

2016-10-11 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/1733
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Are storm metrics reported through JMX too?

2016-10-11 Thread Alessandro Bellina
sorry, hopefully the link goes through now: 
http://www.michael-noll.com/blog/2013/11/06/sending-metrics-from-storm-to-graphite

   
Sending Metrics from Storm to Graphite - Michael G. Noll
By Michael G. Noll
Sending application-level metrics from Storm topologies to the Graphite 
monitoring system   



On Tuesday, October 11, 2016 1:13 PM, Alessandro Bellina 
 wrote:



I think what Bobby is referring to is that the metrics consumer is another 
bolt, so stats are flowing through storm.
What does changing the model to polling buy us? I could see cases were we'd 
need more error handling for instance slow/busy workers.
If we think that writing a new system is the way to go (say with codahale 
throughout), would working on an abstraction layer that is used by the daemons 
but also by end-users be a good place to start? With codahale as the 
implementation?
Looks like Michael Noll has done a lot work with codahale, for instance: 
Sending Metrics from Storm to Graphite - Michael G. Noll.
  
|  
|  
|  
|   ||

  |

  |
|  
|   |  
Sending Metrics from Storm to Graphite - Michael G. Noll
By Michael G. Noll Sending application-level metrics from Storm topologies to 
the Graphite monitoring system  |   |

  |

  |



Thanks,
Alessandro
On Tuesday, October 11, 2016 11:07 AM, S G  
wrote:



"Dropwizard has solved all of these problems already and I don't see a
reason to reinvent the wheel" - I love dropwizard too and many of the other
tools have switched to using the same too.

"I don't personally see a lot of value in trying to send all of the metrics
through storm itself" - How about every node reporting its own metrics by a
URL ? That ways there is no need for a metrics-consumer that can bottleneck
the whole topology. We can then provide a separate server that can query
all nodes to get those metrics and aggregate them. Only cluster wide
metrics should be reported by the storm-UI's REST API (assuming there are
not too many of those).

On Tue, Oct 11, 2016 at 7:15 AM, Bobby Evans 
wrote:

> I agree that IMetricsConsumer is not good, but the reality is that all of
> the metrics system needs to be redone.  The problem is that we ship an
> object as a metric.  If I get an object I have no idea what it is hand
> hence no idea how to report it or what to do with it.  What is more the
> common types we use in the metrics we provide are really not enough.  For
> example CountMetric sends a Long.  Well when I get it in the metrics
> consumer I have no idea if I should report it like a counter or if I should
> report it like a gauge (something that every metrics system I have used
> wants to know).  But then we support pre-aggregation of the metrics with
> IReducer so the number I get might be an average instead of either a gauge
> or a counter, which no good metrics system will want to collect because I
> cannot aggregate it with anything else, the math just does not work.
> The proposal I have said before and I still believe is that we need to put
> in place a parallel metrics API/system.  We will deprecate all of
> https://git.corp.yahoo.com/storm/storm/tree/master-
> security/storm-core/src/jvm/backtype/storm/metric/api and create a new
> parallel one that provides an API similar to http://metrics.dropwizard.
> io/3.1.0/.  I would even be fine in just using their API and exposing
> that to end users.  Dropwizard has solved all of these problems already and
> I don't see a reason to reinvent the wheel.  I don't personally see a lot
> of value in trying to send all of the metrics through storm itslef.  I am
> fine if we are able to support that, but it is far from a requirement. -
> Bobby
>
>On Monday, October 10, 2016 10:47 PM, S G 
> wrote:
>
>
>  +1
> We can probably start by opening a JIRA for this and adding a design
> approach for the same?
> I would like to help in the coding-effort for this.
>
> -SG
>
> On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz 
> wrote:
>
> > I’ve been thinking about metrics lately, specifically the fact that
> people
> > tend to struggle with implementing a metrics consumer. (Like this one
> [1]).
> >
> > The IMetricsConsumer interface is pretty low level, and common
> > aggregations, calculations, etc. are left up to each individual
> > implementation. That seems like an area where further abstraction would
> > make it easier to support different back ends (Graphite, JMX, Splunk,
> etc.).
> >
> > My thought is to create an abstract IMetricsConsumer implementation that
> > does common aggregations and calculations, and then delegates to a
> plugable
> > “metrics sink” implementation (e.g. “IMetricsSink”, etc.). That would
> > greatly simplify the effort required to integrate with various external
> > metrics systems. I know of at least a few users that would be interested,
> > one is currently scraping the logs from LoggingMetricsConsumer 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread Alessandro Bellina
I think what Bobby is referring to is that the metrics consumer is another 
bolt, so stats are flowing through storm.
What does changing the model to polling buy us? I could see cases were we'd 
need more error handling for instance slow/busy workers.
If we think that writing a new system is the way to go (say with codahale 
throughout), would working on an abstraction layer that is used by the daemons 
but also by end-users be a good place to start? With codahale as the 
implementation?
Looks like Michael Noll has done a lot work with codahale, for instance: 
Sending Metrics from Storm to Graphite - Michael G. Noll.
  
|  
|  
|  
|   ||

  |

  |
|  
|   |  
Sending Metrics from Storm to Graphite - Michael G. Noll
 By Michael G. Noll Sending application-level metrics from Storm topologies to 
the Graphite monitoring system  |   |

  |

  |

 

Thanks,
Alessandro
On Tuesday, October 11, 2016 11:07 AM, S G  
wrote:
 

 "Dropwizard has solved all of these problems already and I don't see a
reason to reinvent the wheel" - I love dropwizard too and many of the other
tools have switched to using the same too.

"I don't personally see a lot of value in trying to send all of the metrics
through storm itself" - How about every node reporting its own metrics by a
URL ? That ways there is no need for a metrics-consumer that can bottleneck
the whole topology. We can then provide a separate server that can query
all nodes to get those metrics and aggregate them. Only cluster wide
metrics should be reported by the storm-UI's REST API (assuming there are
not too many of those).

On Tue, Oct 11, 2016 at 7:15 AM, Bobby Evans 
wrote:

> I agree that IMetricsConsumer is not good, but the reality is that all of
> the metrics system needs to be redone.  The problem is that we ship an
> object as a metric.  If I get an object I have no idea what it is hand
> hence no idea how to report it or what to do with it.  What is more the
> common types we use in the metrics we provide are really not enough.  For
> example CountMetric sends a Long.  Well when I get it in the metrics
> consumer I have no idea if I should report it like a counter or if I should
> report it like a gauge (something that every metrics system I have used
> wants to know).  But then we support pre-aggregation of the metrics with
> IReducer so the number I get might be an average instead of either a gauge
> or a counter, which no good metrics system will want to collect because I
> cannot aggregate it with anything else, the math just does not work.
> The proposal I have said before and I still believe is that we need to put
> in place a parallel metrics API/system.  We will deprecate all of
> https://git.corp.yahoo.com/storm/storm/tree/master-
> security/storm-core/src/jvm/backtype/storm/metric/api and create a new
> parallel one that provides an API similar to http://metrics.dropwizard.
> io/3.1.0/.  I would even be fine in just using their API and exposing
> that to end users.  Dropwizard has solved all of these problems already and
> I don't see a reason to reinvent the wheel.  I don't personally see a lot
> of value in trying to send all of the metrics through storm itslef.  I am
> fine if we are able to support that, but it is far from a requirement. -
> Bobby
>
>    On Monday, October 10, 2016 10:47 PM, S G 
> wrote:
>
>
>  +1
> We can probably start by opening a JIRA for this and adding a design
> approach for the same?
> I would like to help in the coding-effort for this.
>
> -SG
>
> On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz 
> wrote:
>
> > I’ve been thinking about metrics lately, specifically the fact that
> people
> > tend to struggle with implementing a metrics consumer. (Like this one
> [1]).
> >
> > The IMetricsConsumer interface is pretty low level, and common
> > aggregations, calculations, etc. are left up to each individual
> > implementation. That seems like an area where further abstraction would
> > make it easier to support different back ends (Graphite, JMX, Splunk,
> etc.).
> >
> > My thought is to create an abstract IMetricsConsumer implementation that
> > does common aggregations and calculations, and then delegates to a
> plugable
> > “metrics sink” implementation (e.g. “IMetricsSink”, etc.). That would
> > greatly simplify the effort required to integrate with various external
> > metrics systems. I know of at least a few users that would be interested,
> > one is currently scraping the logs from LoggingMetricsConsumer and
> polling
> > the Storm REST API for their metrics.
> >
> > -Taylor
> >
> > [1] http://twocentsonsoftware.blogspot.co.il/2014/12/
> > sending-out-storm-metrics.html
> >
> >
> > > On Oct 10, 2016, at 12:14 PM, Bobby Evans  >
> > wrote:
> > >
> > > First of all the server exposes essentially the same interface that the
> > IMetricsConsumer exposes.  It mostly just adds a 

Re: Are storm metrics reported through JMX too?

2016-10-11 Thread S G
"Dropwizard has solved all of these problems already and I don't see a
reason to reinvent the wheel" - I love dropwizard too and many of the other
tools have switched to using the same too.

"I don't personally see a lot of value in trying to send all of the metrics
through storm itself" - How about every node reporting its own metrics by a
URL ? That ways there is no need for a metrics-consumer that can bottleneck
the whole topology. We can then provide a separate server that can query
all nodes to get those metrics and aggregate them. Only cluster wide
metrics should be reported by the storm-UI's REST API (assuming there are
not too many of those).

On Tue, Oct 11, 2016 at 7:15 AM, Bobby Evans 
wrote:

> I agree that IMetricsConsumer is not good, but the reality is that all of
> the metrics system needs to be redone.  The problem is that we ship an
> object as a metric.  If I get an object I have no idea what it is hand
> hence no idea how to report it or what to do with it.  What is more the
> common types we use in the metrics we provide are really not enough.  For
> example CountMetric sends a Long.  Well when I get it in the metrics
> consumer I have no idea if I should report it like a counter or if I should
> report it like a gauge (something that every metrics system I have used
> wants to know).  But then we support pre-aggregation of the metrics with
> IReducer so the number I get might be an average instead of either a gauge
> or a counter, which no good metrics system will want to collect because I
> cannot aggregate it with anything else, the math just does not work.
> The proposal I have said before and I still believe is that we need to put
> in place a parallel metrics API/system.  We will deprecate all of
> https://git.corp.yahoo.com/storm/storm/tree/master-
> security/storm-core/src/jvm/backtype/storm/metric/api and create a new
> parallel one that provides an API similar to http://metrics.dropwizard.
> io/3.1.0/.  I would even be fine in just using their API and exposing
> that to end users.  Dropwizard has solved all of these problems already and
> I don't see a reason to reinvent the wheel.  I don't personally see a lot
> of value in trying to send all of the metrics through storm itslef.  I am
> fine if we are able to support that, but it is far from a requirement. -
> Bobby
>
> On Monday, October 10, 2016 10:47 PM, S G 
> wrote:
>
>
>  +1
> We can probably start by opening a JIRA for this and adding a design
> approach for the same?
> I would like to help in the coding-effort for this.
>
> -SG
>
> On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz 
> wrote:
>
> > I’ve been thinking about metrics lately, specifically the fact that
> people
> > tend to struggle with implementing a metrics consumer. (Like this one
> [1]).
> >
> > The IMetricsConsumer interface is pretty low level, and common
> > aggregations, calculations, etc. are left up to each individual
> > implementation. That seems like an area where further abstraction would
> > make it easier to support different back ends (Graphite, JMX, Splunk,
> etc.).
> >
> > My thought is to create an abstract IMetricsConsumer implementation that
> > does common aggregations and calculations, and then delegates to a
> plugable
> > “metrics sink” implementation (e.g. “IMetricsSink”, etc.). That would
> > greatly simplify the effort required to integrate with various external
> > metrics systems. I know of at least a few users that would be interested,
> > one is currently scraping the logs from LoggingMetricsConsumer and
> polling
> > the Storm REST API for their metrics.
> >
> > -Taylor
> >
> > [1] http://twocentsonsoftware.blogspot.co.il/2014/12/
> > sending-out-storm-metrics.html
> >
> >
> > > On Oct 10, 2016, at 12:14 PM, Bobby Evans  >
> > wrote:
> > >
> > > First of all the server exposes essentially the same interface that the
> > IMetricsConsumer exposes.  It mostly just adds a bunch of overhead in the
> > middle to serialize out the objects send them over http to another
> process
> > which then has to deserialize them and process them.  If you really don't
> > need the metrics to show up on a special known box you can have that
> exact
> > same code running inside the metrics consumer without all of the
> overhead.
> > > The server/client are insecure, have to deal with thread issues that a
> > normal IMetricsConsumer does not, and are not written to be robust (If
> the
> > HTTP server is down the consumer crashes and continues to crash until the
> > server is brought back up).  It was written very quickly for a test
> > situation and it honestly never crossed my mind that anyone would want to
> > use it in production.
> > >
> > > - Bobby
> > >
> > >On Monday, October 10, 2016 10:59 AM, S G <
> sg.online.em...@gmail.com>
> > wrote:
> > >
> > >
> > > Thanks Bobby.
> > >
> > > If we write our own metrics consumer, how do 

[GitHub] storm pull request #1685: STORM-2097: Improve logging in trident core and ex...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1685


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1724: STORM-2131: Add blob command to worker-launcher, m...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1724


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1679: STORM-2087: storm-kafka-client - tuples not always...

2016-10-11 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1679#discussion_r82825994
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -266,26 +266,31 @@ private void doSeekRetriableTopicPartitions() {
 if (offsetAndMeta != null) {
 kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1);  // 
seek to the next offset that is ready to commit in next commit cycle
 } else {
-kafkaConsumer.seekToEnd(toArrayList(rtp));// Seek to 
last committed offset
+kafkaConsumer.seek(rtp, acked.get(rtp).committedOffset + 
1);// Seek to last committed offset
 }
 }
 }
 
 //  emit  =
 private void emit() {
-emitTupleIfNotEmitted(waitingToEmit.next());
-waitingToEmit.remove();
+while(!emitTupleIfNotEmitted(waitingToEmit.next()) && 
waitingToEmit.hasNext()) {
+waitingToEmit.remove();
+}
 }
 
-// emits one tuple per record
-private void emitTupleIfNotEmitted(ConsumerRecord record) {
+
+//Emits one tuple per record
+//@return true if tuple was emitted
+private boolean emitTupleIfNotEmitted(ConsumerRecord record) {
 final TopicPartition tp = new TopicPartition(record.topic(), 
record.partition());
 final KafkaSpoutMessageId msgId = new KafkaSpoutMessageId(record);
 
 if (acked.containsKey(tp) && acked.get(tp).contains(msgId)) {   // 
has been acked
 LOG.trace("Tuple for record [{}] has already been acked. 
Skipping", record);
+return false;
--- End diff --

why are we adding returns here. Having one return makes the method look 
cleaner. This is an if else conditions and we are returning false at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1689: STORM-2099 Introduce new sql external module: storm-sql-r...

2016-10-11 Thread harshach
Github user harshach commented on the issue:

https://github.com/apache/storm/pull/1689
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1681: STORM-1444 Support EXPLAIN statement in StormSQL

2016-10-11 Thread harshach
Github user harshach commented on the issue:

https://github.com/apache/storm/pull/1681
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1733: [STORM-2134] - improving the current scheduling strategy ...

2016-10-11 Thread jerrypeng
Github user jerrypeng commented on the issue:

https://github.com/apache/storm/pull/1733
  
@revans2 thanks for the reviews.  I have addressed your comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1724: STORM-2131: Add blob command to worker-launcher, make sto...

2016-10-11 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/1724
  
+1 looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1724: STORM-2131: Add blob command to worker-launcher, m...

2016-10-11 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1724#discussion_r82815297
  
--- Diff: storm-core/src/native/worker-launcher/impl/worker-launcher.c ---
@@ -485,14 +482,14 @@ int setup_stormdist_dir(const char* local_dir) {
 
   case FTS_DP:// A directory being visited in post-order
   case FTS_DOT:   // A dot directory
+  case FTS_SL:// A symbolic link
+  case FTS_SLNONE:// A broken symbolic link
--- End diff --

I talked with @tgravescs and ran some tests.  It works just fine, and it 
should not be a big deal.  I was a bit worried about the default permissions on 
the symlinks but they are open, and because they are all under a directory that 
should be locked down I am not concerned abut it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Are storm metrics reported through JMX too?

2016-10-11 Thread Bobby Evans
I agree that IMetricsConsumer is not good, but the reality is that all of the 
metrics system needs to be redone.  The problem is that we ship an object as a 
metric.  If I get an object I have no idea what it is hand hence no idea how to 
report it or what to do with it.  What is more the common types we use in the 
metrics we provide are really not enough.  For example CountMetric sends a 
Long.  Well when I get it in the metrics consumer I have no idea if I should 
report it like a counter or if I should report it like a gauge (something that 
every metrics system I have used wants to know).  But then we support 
pre-aggregation of the metrics with IReducer so the number I get might be an 
average instead of either a gauge or a counter, which no good metrics system 
will want to collect because I cannot aggregate it with anything else, the math 
just does not work.
The proposal I have said before and I still believe is that we need to put in 
place a parallel metrics API/system.  We will deprecate all of 
https://git.corp.yahoo.com/storm/storm/tree/master-security/storm-core/src/jvm/backtype/storm/metric/api
 and create a new parallel one that provides an API similar to 
http://metrics.dropwizard.io/3.1.0/.  I would even be fine in just using their 
API and exposing that to end users.  Dropwizard has solved all of these 
problems already and I don't see a reason to reinvent the wheel.  I don't 
personally see a lot of value in trying to send all of the metrics through 
storm itslef.  I am fine if we are able to support that, but it is far from a 
requirement. - Bobby 

On Monday, October 10, 2016 10:47 PM, S G  wrote:
 

 +1
We can probably start by opening a JIRA for this and adding a design
approach for the same?
I would like to help in the coding-effort for this.

-SG

On Mon, Oct 10, 2016 at 1:51 PM, P. Taylor Goetz  wrote:

> I’ve been thinking about metrics lately, specifically the fact that people
> tend to struggle with implementing a metrics consumer. (Like this one [1]).
>
> The IMetricsConsumer interface is pretty low level, and common
> aggregations, calculations, etc. are left up to each individual
> implementation. That seems like an area where further abstraction would
> make it easier to support different back ends (Graphite, JMX, Splunk, etc.).
>
> My thought is to create an abstract IMetricsConsumer implementation that
> does common aggregations and calculations, and then delegates to a plugable
> “metrics sink” implementation (e.g. “IMetricsSink”, etc.). That would
> greatly simplify the effort required to integrate with various external
> metrics systems. I know of at least a few users that would be interested,
> one is currently scraping the logs from LoggingMetricsConsumer and polling
> the Storm REST API for their metrics.
>
> -Taylor
>
> [1] http://twocentsonsoftware.blogspot.co.il/2014/12/
> sending-out-storm-metrics.html
>
>
> > On Oct 10, 2016, at 12:14 PM, Bobby Evans 
> wrote:
> >
> > First of all the server exposes essentially the same interface that the
> IMetricsConsumer exposes.  It mostly just adds a bunch of overhead in the
> middle to serialize out the objects send them over http to another process
> which then has to deserialize them and process them.  If you really don't
> need the metrics to show up on a special known box you can have that exact
> same code running inside the metrics consumer without all of the overhead.
> > The server/client are insecure, have to deal with thread issues that a
> normal IMetricsConsumer does not, and are not written to be robust (If the
> HTTP server is down the consumer crashes and continues to crash until the
> server is brought back up).  It was written very quickly for a test
> situation and it honestly never crossed my mind that anyone would want to
> use it in production.
> >
> > - Bobby
> >
> >    On Monday, October 10, 2016 10:59 AM, S G 
> wrote:
> >
> >
> > Thanks Bobby.
> >
> > If we write our own metrics consumer, how do we ensure that it is better
> > than HttpForwardingMetricsServer? In other words, what aspects of the
> > HttpForwardingMetricsServer
> > should we avoid to make our own metrics consumer better and ready for
> > production?
> >
> > Is versign/storm-graphite 
> > production
> > ready?
> >
> > Also, we should add a line about production-readiness of
> > HttpForwardingMetricsServer
> > in the documentation at http://storm.apache.org/
> releases/1.0.2/Metrics.html
> > (We were just about to think seriously on using this for production as we
> > thought this to be the standard solution for metrics' consumption in 1.0+
> > version).
> >
> > -SG
> >
> > On Mon, Oct 10, 2016 at 6:37 AM, Bobby Evans 
> wrote:
> >
> >> First of all there really are two different sets of metrics.  One set is
> >> the topology metrics and the other set is the 

[GitHub] storm issue #1702: Update Eventhub-Client jar dependency to 1.0.1

2016-10-11 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/1702
  
I am not really familiar with EventHub all that much so I don't really feel 
all that confident in reviewing it.  I mostly jumped on to answer a few generic 
questions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1729: [STORM-2144] Fix Storm-sql group-by behavior in standalon...

2016-10-11 Thread satishd
Github user satishd commented on the issue:

https://github.com/apache/storm/pull/1729
  
Pushed to master branch also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1729: [STORM-2144] Fix Storm-sql group-by behavior in standalon...

2016-10-11 Thread satishd
Github user satishd commented on the issue:

https://github.com/apache/storm/pull/1729
  
Thanks @arunmahadevan for the fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1729: [STORM-2144] Fix Storm-sql group-by behavior in st...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1729


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1729: [STORM-2144] Fix Storm-sql group-by behavior in standalon...

2016-10-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1729
  
Spend some time to look into it but it looks like breaking changes are 
needed.
Interpreter expects static input table before initializing Interpreter, 
which is valid for most of case, but Storm SQL standalone mode works like 
streaming, emitting input tuples and receive the outputs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1702: Update Eventhub-Client jar dependency to 1.0.1

2016-10-11 Thread raviperi
Github user raviperi commented on the issue:

https://github.com/apache/storm/pull/1702
  
@revans2 Please let me know if you have further comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---