Re: Looking for professionnal consulting on Apache Storm

2018-01-12 Thread Jason Kania
 Hi Michaël,
I cannot speak for your time zone, but I am from Ottawa, Canada and have 
contracted on Storm before. When we needed further deep expertise, we used 
Hortonworks who had French speakers at the time. I am not a French speaker 
myself.

Cheers,
Jason

On Friday, January 12, 2018, 11:57:53 AM EST, Michaël Melchiore 
 wrote:  
 
 Dear Storm users,
I am actively looking for professional consulting regarding my team use of 
Apache Storm in our project. Could someone point me to companies providing such 
services ?

The mission would take place in Toulouse, France. I would strongly prefer a 
french speaker :)

Kind regards,

Michaël Melchiore


  

Re: [SURVEY] What version of Storm are you using?

2016-08-17 Thread Jason Kania
Hi,
Our building automation company uses 0.9.6.
We are not using the latest version because we need stability over new 
functionality. We would always prefer to try upgrading after a few patch 
revisions have been delivered.
Thanks,
Jason
  From: P. Taylor Goetz 
 To: user@storm.apache.org 
 Sent: Wednesday, August 17, 2016 11:11 AM
 Subject: [SURVEY] What version of Storm are you using?
   
On the Storm developer list, there are a number of discussions regarding ending 
support for various older versions of Storm. In order to make an informed 
decision I’d like to get an idea of what versions the user community is 
actively using. I’d like to ask the user community to answer the following 
questions so we can best determine which version lines we should continue to 
support, and which ones can be EOL’ed.

1. What version of Storm are you currently using?

2. If you are not on the most recent version, what is preventing you from 
upgrading?


Thanks in advance.

-Taylor

  

Re: Debugging tuples being replayed despite acks

2016-08-09 Thread Jason Kania
Thanks for the response.
The code is essentially:
logger.warn( "tuple received: {}", tuple);outputCollector.ack(tuple);return;

It doesn't go any further either when I step through it or run it directly.
  From: Priyank Shah 
 To: "user@storm.apache.org" ; Jason Kania 
 
 Sent: Tuesday, August 9, 2016 1:32 PM
 Subject: Re: Debugging tuples being replayed despite acks
   
You mentioned that you are making sure that its acking everything in first 
bolt. Is there any other bolt downstream that could be failing the tuples?
From: Jason Kania
Reply-To: "user@storm.apache.org", Jason Kania
Date: Tuesday, August 9, 2016 at 10:28 AM
To: User
Subject: Debugging tuples being replayed despite acks

Hello,
I would appreciate any suggestions as nobody seems to have answered my previous 
requests for assistance.
I am currently using 0.10.0 and am having serious difficulties. I am unable to 
upgrade at present and have looked for any bugs on this issue with no luck.
I am trying to figure out why acked tuples keep getting replayed. These tuples 
have been acked previously but just in case something in the topology was 
wrong, I am now automatically acking everything in the first bolt and yet I 
keep getting them replayed. I tried putting a breakpoint in the Kafka Spout and 
can see that nextTuple() is hit but the ack() method of KafkSpout is never 
reached.
I would try debugging into clojure but have had no luck doing so and cannot 
seem to find a path to make that successful.
Any suggestions on any front at all would be appreciated.
Thanks,
Jason

   

Debugging tuples being replayed despite acks

2016-08-09 Thread Jason Kania
Hello,
I would appreciate any suggestions as nobody seems to have answered my previous 
requests for assistance.
I am currently using 0.10.0 and am having serious difficulties. I am unable to 
upgrade at present and have looked for any bugs on this issue with no luck.
I am trying to figure out why acked tuples keep getting replayed. These tuples 
have been acked previously but just in case something in the topology was 
wrong, I am now automatically acking everything in the first bolt and yet I 
keep getting them replayed. I tried putting a breakpoint in the Kafka Spout and 
can see that nextTuple() is hit but the ack() method of KafkSpout is never 
reached.
I would try debugging into clojure but have had no luck doing so and cannot 
seem to find a path to make that successful.
Any suggestions on any front at all would be appreciated.
Thanks,
Jason

Storm Kafka retention of in progress messages

2016-08-08 Thread Jason Kania
Hello,
I am wondering how long the Kafka Spout retains a reference to messages that 
are in progress but have not been acked. I have a situation where I am getting 
no failed messages in the UI but getting many retries when the topology is 
restarted. I am wondering if the Kafka Spout might be forgetting about these 
messages by the time the ack is coming back. The processing of the message can 
take up to 5 hours in some cases so the timeout is set to this duration.
Thanks,
Jason

KafkaSpout not receiving new messages until restart

2016-08-04 Thread Jason Kania
Hello,
I am using Storm 0.10.0 and have an application pulling content off a Kafka 
topic via the Kafka spout but after some time nothing is being injected. 
However, after a restart of the topic, more messages come in for a while. I 
have done my best to confirm that there are no tuple timeouts by tracing code 
and have looked at the tuple visualization but found no lost tuples. There is 
no indication of any tuples lost or any timeouts.
I am logging every time something comes into the first bolt and see nothing 
there either.
At this point, I am not sure what I can do to debug further.
At this point we cannot upgrade.
Any suggestions to get to the bottom of this?
Thanks,
Jason

Re: Connection refused during topology deployment

2016-06-28 Thread Jason Kania
Thanks for the suggestion.
All the configuration is using IP addresses at this point. Are you suggesting 
more than ensuring that /etc/hosts and hostname map properly?
Thanks,
Jason

  From: Jacob Johansen 
 To: user@storm.apache.org; Jason Kania  
 Sent: Tuesday, June 28, 2016 9:13 PM
 Subject: Re: Connection refused during topology deployment
   
Probably relates to the host name on the box, box host name's need to
match those from DNS or in the host files of all of the nodes in the
storm cluster
Jacob Johansen


On Tue, Jun 28, 2016 at 8:07 PM, Jason Kania  wrote:
> Hello,
>
> I am currently getting a "Connection refused" exception when the supervisor
> attempts to download the topology from Nimbus. I have checked that the
> Thrift port is open and accessible on Nimbus from the node where the
> Supervisor is running.
>
> Are there any suggestions on how to diagnose this?
> Are there other ports to be concerned with than 6627?
>
> java.lang.RuntimeException:
> org.apache.thrift7.transport.TTransportException: java.net.ConnectException:
> Connection refused
>    at
> backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:21)
> ~[storm-core-10.0.0.jar:na]
>    at backtype.storm.utils.Utils.downloadFromMaster(Utils.java:226)
> ~[storm-core-10.0.0.jar:na]
>    ...
>
> Thanks,
>
> Jason


  

Connection refused during topology deployment

2016-06-28 Thread Jason Kania
Hello,
I am currently getting a "Connection refused" exception when the supervisor 
attempts to download the topology from Nimbus. I have checked that the Thrift 
port is open and accessible on Nimbus from the node where the Supervisor is 
running.
Are there any suggestions on how to diagnose this?Are there other ports to be 
concerned with than 6627?
java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: 
java.net.ConnectException: Connection refused
at 
backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:21) 
~[storm-core-10.0.0.jar:na]
at backtype.storm.utils.Utils.downloadFromMaster(Utils.java:226) 
~[storm-core-10.0.0.jar:na]
...
Thanks,
Jason


Netty disconnects

2016-06-16 Thread Jason Kania
We have a topology where a job can take up to several minutes to complete and 
are encountering a situation in which we are getting frequent netty dropped 
messages. The result is that we have to timeout each message which is causing 
the system to be idle a lot of the time.
I am wondering if there is way to detect the dropping of messages by netty to 
allow immediate resend or if there is any other mechanism other than timeouts. 
If someone has suggestions on how to detect what is causing the dropped 
messages that would also be helpful. Right now, we get one or two dropped 
messages for minute which enough to have us idle the vast majority of the time.
Thanks,
Jason

Re: Access to storm fieldsGrouping hashing method in bolt/spout

2016-06-11 Thread Jason Kania
Hi Satish,
Thanks for the response. That does sound something like what we need. We'll 
give it shot to see if covers our use case.
Thanks,
Jason

  From: Satish Duggana 
 To: user@storm.apache.org; Jason Kania  
 Sent: Saturday, June 11, 2016 1:10 PM
 Subject: Re: Access to storm fieldsGrouping hashing method in bolt/spout
   
Hi,
It seems you wanted to send tuples to a bolt's task from which those tuples 
were received in the current bolt.
Initial bolt can send the current 
task-id(org.apache.storm.task.TopologyContext#getThisTaskId()) as part of tuple 
fields and this can be used by subsequent bolt to emit tuples directly to the 
earlier bolt's task using `OutputCollector#emiteDirect(int taskId, String 
streamId, Collection anchors, List tuple) `
Hope it helps,Satish.


On Fri, Jun 10, 2016 at 7:47 PM, Jason Kania  wrote:

Thanks for the response and the code reference.
To explain the use case, we would like to be able to use shuffle grouping for 
initial load balancing but then use fields grouping and some generated token to 
be able to get back to the same bolt after passing messages to a different bolt 
doing other processing. The reason to get back to the same box is because of a 
very large media file that needs to be pulled and processed on that box causing 
us to want to be sticky to that box.
We have tried using fieldsGrouping on its own but we end up with asymmetries in 
the load across the boxes simply because we only have 5 boxes doing this 
specific processing and hashing does not allow it to be evenly balanced.
Right now, we are generating random token values and then sending them out to 
be collected by the different bolts. The bolts then use the tokens that have 
arrived as the fieldsGrouping key for subsequent operations. The ideal would be 
if we could directly get a token that would allow return to the same bolt 
instance to complete the job.
I am wondering if something like this would be worthy of an enhancement request?
Jason
  From: Matthias J. Sax 
 To: user@storm.apache.org 
 Sent: Wednesday, June 8, 2016 1:46 PM
 Subject: Re: Access to storm fieldsGrouping hashing method in bolt/spout
  
I cannot completely follow you use case and what you want to accomplish...

However, even as there is no official API to call the hash function, it
is actually pretty simple. Storm internally creates a List of the
fieldsGrouping attributes and call .hashCode() on it (and of course
applies a modulo afterwards).

I actually wrote a wrapper to call the internal hash function once (it
works for 0.9.3 -- not sure if it is compatible with newer versions).
You can find the code here:
https://github.com/mjsax/aeolus/tree/master/aeolus-storm-connector

The project assembles a Java class with a static method to expose a
simple to use Java API that call internal Storm core stuff to get the
receiver task ID.

> StormConnector.getFieldsGroupingReceiverTaskId(
    TopologyContext ctx,
    String producer-component-id,
    String output-stream-id,
    String receiver-component-id,
    List tuple);

-Matthias

On 06/08/2016 07:02 PM, Jason Kania wrote:
> Hello,
> 
> I am wondering if there is a means to access the hashing method/function
> that storm applies for the fieldsGrouping method. I would like to
> generate a token that will hash back to the current node so that
> subsequent processing can come back to the same node . I realize that
> generating an applicable token would be trial and an error, but I would
> like to take advantage of shufflegrouping to assign tasks but then use
> fieldsGrouping to ensure that the rest of the work for the task comes
> back to the same node.
> 
> Thanks,
> 
> Jason


   



  

Re: Access to storm fieldsGrouping hashing method in bolt/spout

2016-06-10 Thread Jason Kania
Thanks for the response and the code reference.
To explain the use case, we would like to be able to use shuffle grouping for 
initial load balancing but then use fields grouping and some generated token to 
be able to get back to the same bolt after passing messages to a different bolt 
doing other processing. The reason to get back to the same box is because of a 
very large media file that needs to be pulled and processed on that box causing 
us to want to be sticky to that box.
We have tried using fieldsGrouping on its own but we end up with asymmetries in 
the load across the boxes simply because we only have 5 boxes doing this 
specific processing and hashing does not allow it to be evenly balanced.
Right now, we are generating random token values and then sending them out to 
be collected by the different bolts. The bolts then use the tokens that have 
arrived as the fieldsGrouping key for subsequent operations. The ideal would be 
if we could directly get a token that would allow return to the same bolt 
instance to complete the job.
I am wondering if something like this would be worthy of an enhancement request?
Jason
  From: Matthias J. Sax 
 To: user@storm.apache.org 
 Sent: Wednesday, June 8, 2016 1:46 PM
 Subject: Re: Access to storm fieldsGrouping hashing method in bolt/spout
   
I cannot completely follow you use case and what you want to accomplish...

However, even as there is no official API to call the hash function, it
is actually pretty simple. Storm internally creates a List of the
fieldsGrouping attributes and call .hashCode() on it (and of course
applies a modulo afterwards).

I actually wrote a wrapper to call the internal hash function once (it
works for 0.9.3 -- not sure if it is compatible with newer versions).
You can find the code here:
https://github.com/mjsax/aeolus/tree/master/aeolus-storm-connector

The project assembles a Java class with a static method to expose a
simple to use Java API that call internal Storm core stuff to get the
receiver task ID.

> StormConnector.getFieldsGroupingReceiverTaskId(
    TopologyContext ctx,
    String producer-component-id,
    String output-stream-id,
    String receiver-component-id,
    List tuple);

-Matthias

On 06/08/2016 07:02 PM, Jason Kania wrote:
> Hello,
> 
> I am wondering if there is a means to access the hashing method/function
> that storm applies for the fieldsGrouping method. I would like to
> generate a token that will hash back to the current node so that
> subsequent processing can come back to the same node . I realize that
> generating an applicable token would be trial and an error, but I would
> like to take advantage of shufflegrouping to assign tasks but then use
> fieldsGrouping to ensure that the rest of the work for the task comes
> back to the same node.
> 
> Thanks,
> 
> Jason


  

Access to storm fieldsGrouping hashing method in bolt/spout

2016-06-08 Thread Jason Kania
Hello,
I am wondering if there is a means to access the hashing method/function that 
storm applies for the fieldsGrouping method. I would like to generate a token 
that will hash back to the current node so that subsequent processing can come 
back to the same node . I realize that generating an applicable token would be 
trial and an error, but I would like to take advantage of shufflegrouping to 
assign tasks but then use fieldsGrouping to ensure that the rest of the work 
for the task comes back to the same node.
Thanks,
Jason

How to configure broker ids in storm kafka?

2016-02-29 Thread Jason Kania
Hello,
I am trying to run storm-kafka with multiple zookeeper and kafka nodes, but I 
am getting "Node /brokers/ids/0 does not exist". In my case, /brokers/ids/8 and 
/brokers/ids/9 exist as per my kafka configuration. I cannot seem to find out 
how to configure storm-kafka to look for these broker ids instead of the 
/brokers/ids/0 which I also have no idea as to its source.
Can someone point me to documentation, a code example or a configuration 
parameter for doing this?
Thanks,
Jason


Re: Question About Emitted, Transferred and Acked Bolts

2015-02-23 Thread Jason Kania
Nathan,
I only get spout failures. None of my bolts fail. Are all failures on a spout 
going to be associated with timeout? If not, how would one know the difference? 
Hence my question about whether it would be possible to track timed-out acks 
explicitly.
Thanks,
Jason

  From: Nathan Leung 
 To: user ; Jason Kania  
 Sent: Monday, February 23, 2015 4:52 PM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
If your processing takes a long time they may be timing out.


On Mon, Feb 23, 2015 at 4:35 PM, Jason Kania  wrote:

Thanks for the suggestion. My traffic is only one or two tuples per second in 
the current environment so I would not expect that to be the problem. I do have 
failures. Hence, I was wondering if slow acking was the problem. In stepping 
through the processing, I know the failures aren't in the bolts themselves. 
That is why I thought it strange.
  From: Michael Rose 
 To: "user@storm.apache.org" ; Jason Kania 
 
 Sent: Monday, February 23, 2015 4:24 PM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
Do you have enough ackers to keep up with your traffic? How about failures?
Michael RoseSenior Software EngineerFullContact | fullcontact.comm: 
+1.720.837.1357 | t: @xorlev

All Your Contacts, Updated and In One Place.Try FullContact for Free


On Mon, Feb 23, 2015 at 2:16 PM, Jason Kania  wrote:

Michael,
That's good to know. I was unaware. That said, if execution of a bolt has not 
occurred, I would still expect a 0 emit count and acks not to be falling behind 
the emits by much. My acks are half my emits.
  From: Michael Rose 
 To: "user@storm.apache.org" ; Jason Kania 
 
 Sent: Monday, February 23, 2015 3:52 PM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
Keep in mind that those metrics are sampled at the rate of 
topology.stats.sample.rate, 0.05 by default. If you turn it up to 1.0 you'll 
see full-resolution, though at the price of more time spent collecting metrics.
Michael RoseSenior Software EngineerFullContact | fullcontact.comm: 
+1.720.837.1357 | t: @xorlev

All Your Contacts, Updated and In One Place.Try FullContact for Free


On Mon, Feb 23, 2015 at 12:14 PM, Jason Kania  wrote:

I have two comments to add:
1) Is there any JIRA for invalid metrics values? I did not see one. I am 
running with bolts having breakpoints and long before my bolts are every 
entered, the metrics indicate that these bolts already have more than 100 
emits. I have thought to raise a JIRA on this but I am not sure what I would 
add for details. Would some specific debug output aid in resolving this?

2) For acks, is there any possibility of adding tracking for acks that happen 
after a timeout? I can step into my bolt each time it is called and confirm 
that it is acking each request, yet the acks do not match the emits (which 
should have a 1 to 1 ratio). I am guessing that this is because the ack 
happened too late or it might be incorrect metrics total.

I use the STORM UI for processing tracking.

Thanks,
Jason
  From: Nathan Leung 
 To: user  
 Sent: Monday, February 23, 2015 11:56 AM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
executed = # of times you called executedacked = # of executed tuples that you 
acked; ideally this will match executedemitted = # of tuples that you emitted; 
if you call emit more than once per execute call this can be higher than 
execute counttransferred = # of tuples transferred downstream; if you have 2 
bolts subscribing to your bolt, then this count can be higher than emitted.


On Mon, Feb 23, 2015 at 11:35 AM, Rahul Reddy  wrote:

Hi,

Can you guys help me understand difference between emitted, transferred and 
acked tuples.

In my case every tuple emitted by ablog-filter-bolt will be processed by 
ablog-flatten-xml-bolt which will then be written by ablog-hdfs-bolt to hdfs. 
Ideally all metrics for executed/acked should match after tuples are emitted 
from ablog-filter-bolt . I'm not sure why there is so much discrepancy in 
emitted/transferredacked tuple count between these bolts although it dosent 
show any failed tuples.

Any ideas what I can check and how to interpret metrics correctly?

Thanks
Rahul





   



   



   



  

Re: Question About Emitted, Transferred and Acked Bolts

2015-02-23 Thread Jason Kania
Thanks for the suggestion. My traffic is only one or two tuples per second in 
the current environment so I would not expect that to be the problem. I do have 
failures. Hence, I was wondering if slow acking was the problem. In stepping 
through the processing, I know the failures aren't in the bolts themselves. 
That is why I thought it strange.
  From: Michael Rose 
 To: "user@storm.apache.org" ; Jason Kania 
 
 Sent: Monday, February 23, 2015 4:24 PM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
Do you have enough ackers to keep up with your traffic? How about failures?
Michael RoseSenior Software EngineerFullContact | fullcontact.comm: 
+1.720.837.1357 | t: @xorlev

All Your Contacts, Updated and In One Place.Try FullContact for Free


On Mon, Feb 23, 2015 at 2:16 PM, Jason Kania  wrote:

Michael,
That's good to know. I was unaware. That said, if execution of a bolt has not 
occurred, I would still expect a 0 emit count and acks not to be falling behind 
the emits by much. My acks are half my emits.
  From: Michael Rose 
 To: "user@storm.apache.org" ; Jason Kania 
 
 Sent: Monday, February 23, 2015 3:52 PM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
Keep in mind that those metrics are sampled at the rate of 
topology.stats.sample.rate, 0.05 by default. If you turn it up to 1.0 you'll 
see full-resolution, though at the price of more time spent collecting metrics.
Michael RoseSenior Software EngineerFullContact | fullcontact.comm: 
+1.720.837.1357 | t: @xorlev

All Your Contacts, Updated and In One Place.Try FullContact for Free


On Mon, Feb 23, 2015 at 12:14 PM, Jason Kania  wrote:

I have two comments to add:
1) Is there any JIRA for invalid metrics values? I did not see one. I am 
running with bolts having breakpoints and long before my bolts are every 
entered, the metrics indicate that these bolts already have more than 100 
emits. I have thought to raise a JIRA on this but I am not sure what I would 
add for details. Would some specific debug output aid in resolving this?

2) For acks, is there any possibility of adding tracking for acks that happen 
after a timeout? I can step into my bolt each time it is called and confirm 
that it is acking each request, yet the acks do not match the emits (which 
should have a 1 to 1 ratio). I am guessing that this is because the ack 
happened too late or it might be incorrect metrics total.

I use the STORM UI for processing tracking.

Thanks,
Jason
  From: Nathan Leung 
 To: user  
 Sent: Monday, February 23, 2015 11:56 AM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
executed = # of times you called executedacked = # of executed tuples that you 
acked; ideally this will match executedemitted = # of tuples that you emitted; 
if you call emit more than once per execute call this can be higher than 
execute counttransferred = # of tuples transferred downstream; if you have 2 
bolts subscribing to your bolt, then this count can be higher than emitted.


On Mon, Feb 23, 2015 at 11:35 AM, Rahul Reddy  wrote:

Hi,

Can you guys help me understand difference between emitted, transferred and 
acked tuples.

In my case every tuple emitted by ablog-filter-bolt will be processed by 
ablog-flatten-xml-bolt which will then be written by ablog-hdfs-bolt to hdfs. 
Ideally all metrics for executed/acked should match after tuples are emitted 
from ablog-filter-bolt . I'm not sure why there is so much discrepancy in 
emitted/transferredacked tuple count between these bolts although it dosent 
show any failed tuples.

Any ideas what I can check and how to interpret metrics correctly?

Thanks
Rahul





   



   



  

Re: Question About Emitted, Transferred and Acked Bolts

2015-02-23 Thread Jason Kania
Michael,
That's good to know. I was unaware. That said, if execution of a bolt has not 
occurred, I would still expect a 0 emit count and acks not to be falling behind 
the emits by much. My acks are half my emits.
  From: Michael Rose 
 To: "user@storm.apache.org" ; Jason Kania 
 
 Sent: Monday, February 23, 2015 3:52 PM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
Keep in mind that those metrics are sampled at the rate of 
topology.stats.sample.rate, 0.05 by default. If you turn it up to 1.0 you'll 
see full-resolution, though at the price of more time spent collecting metrics.
Michael RoseSenior Software EngineerFullContact | fullcontact.comm: 
+1.720.837.1357 | t: @xorlev

All Your Contacts, Updated and In One Place.Try FullContact for Free


On Mon, Feb 23, 2015 at 12:14 PM, Jason Kania  wrote:

I have two comments to add:
1) Is there any JIRA for invalid metrics values? I did not see one. I am 
running with bolts having breakpoints and long before my bolts are every 
entered, the metrics indicate that these bolts already have more than 100 
emits. I have thought to raise a JIRA on this but I am not sure what I would 
add for details. Would some specific debug output aid in resolving this?

2) For acks, is there any possibility of adding tracking for acks that happen 
after a timeout? I can step into my bolt each time it is called and confirm 
that it is acking each request, yet the acks do not match the emits (which 
should have a 1 to 1 ratio). I am guessing that this is because the ack 
happened too late or it might be incorrect metrics total.

I use the STORM UI for processing tracking.

Thanks,
Jason
  From: Nathan Leung 
 To: user  
 Sent: Monday, February 23, 2015 11:56 AM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
executed = # of times you called executedacked = # of executed tuples that you 
acked; ideally this will match executedemitted = # of tuples that you emitted; 
if you call emit more than once per execute call this can be higher than 
execute counttransferred = # of tuples transferred downstream; if you have 2 
bolts subscribing to your bolt, then this count can be higher than emitted.


On Mon, Feb 23, 2015 at 11:35 AM, Rahul Reddy  wrote:

Hi,

Can you guys help me understand difference between emitted, transferred and 
acked tuples.

In my case every tuple emitted by ablog-filter-bolt will be processed by 
ablog-flatten-xml-bolt which will then be written by ablog-hdfs-bolt to hdfs. 
Ideally all metrics for executed/acked should match after tuples are emitted 
from ablog-filter-bolt . I'm not sure why there is so much discrepancy in 
emitted/transferredacked tuple count between these bolts although it dosent 
show any failed tuples.

Any ideas what I can check and how to interpret metrics correctly?

Thanks
Rahul





   



  

Re: Question About Emitted, Transferred and Acked Bolts

2015-02-23 Thread Jason Kania
I have two comments to add:
1) Is there any JIRA for invalid metrics values? I did not see one. I am 
running with bolts having breakpoints and long before my bolts are every 
entered, the metrics indicate that these bolts already have more than 100 
emits. I have thought to raise a JIRA on this but I am not sure what I would 
add for details. Would some specific debug output aid in resolving this?

2) For acks, is there any possibility of adding tracking for acks that happen 
after a timeout? I can step into my bolt each time it is called and confirm 
that it is acking each request, yet the acks do not match the emits (which 
should have a 1 to 1 ratio). I am guessing that this is because the ack 
happened too late or it might be incorrect metrics total.

I use the STORM UI for processing tracking.

Thanks,
Jason
  From: Nathan Leung 
 To: user  
 Sent: Monday, February 23, 2015 11:56 AM
 Subject: Re: Question About Emitted, Transferred and Acked Bolts
   
executed = # of times you called executedacked = # of executed tuples that you 
acked; ideally this will match executedemitted = # of tuples that you emitted; 
if you call emit more than once per execute call this can be higher than 
execute counttransferred = # of tuples transferred downstream; if you have 2 
bolts subscribing to your bolt, then this count can be higher than emitted.


On Mon, Feb 23, 2015 at 11:35 AM, Rahul Reddy  wrote:

Hi,

Can you guys help me understand difference between emitted, transferred and 
acked tuples.

In my case every tuple emitted by ablog-filter-bolt will be processed by 
ablog-flatten-xml-bolt which will then be written by ablog-hdfs-bolt to hdfs. 
Ideally all metrics for executed/acked should match after tuples are emitted 
from ablog-filter-bolt . I'm not sure why there is so much discrepancy in 
emitted/transferredacked tuple count between these bolts although it dosent 
show any failed tuples.

Any ideas what I can check and how to interpret metrics correctly?

Thanks
Rahul





   

Re: Storm-Kafka and KeeperErrorCode = NoNode

2014-10-15 Thread Jason Kania
I guess the next question is where it doesn't exist. If I run kafka-topics.sh 
script, I get a list of topics that includes the one that I am using.


Thanks,

Jason




 From: Deepak Subhramanian 
To: user@storm.apache.org; Jason Kania  
Cc: "u...@storm.incubator.apache.org"  
Sent: Wednesday, October 15, 2014 2:20 PM
Subject: Re: Storm-Kafka and KeeperErrorCode = NoNode
 


I think it happens when the kafka topic doesnt exist. 


On Wed, Oct 15, 2014 at 7:08 PM, Jason Kania  wrote:

Hello,
>
>
>
>I suspect that their is an open JIRA report on this but have not found it 
>although I have found a number of related jiras but none that directly 
>matches. I am getting the following Exception on startup with a topology that 
>uses the storm kafka spout: storm-kafka-0.9.2-incubating.jar
>
>
>
>java.lang.RuntimeException: java.lang.RuntimeException: 
>org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
>for /brokers/topics/mytopic/partitions
>
>
>Is there any available interpretation of this error to allow it to be resolved 
>or is it a bug that is to be fixed?
>
>
>Thanks,
>
>
>Jason
>
> 


-- 
Deepak Subhramanian 

Storm-Kafka and KeeperErrorCode = NoNode

2014-10-15 Thread Jason Kania
Hello,


I suspect that their is an open JIRA report on this but have not found it 
although I have found a number of related jiras but none that directly matches. 
I am getting the following Exception on startup with a topology that uses the 
storm kafka spout: storm-kafka-0.9.2-incubating.jar


java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/topics/mytopic/partitions

Is there any available interpretation of this error to allow it to be resolved 
or is it a bug that is to be fixed?

Thanks,

Jason