Re: Regarding storm & Kafka Configuration.
1. Parallelism - You can set a maximum of 3, one for each partition in your topic. Typically, this will net you the fastest way to get messages out of Kafka and into your topology, but doing your own testing/benchmarks would be best to know for sure. 2. How many workers - This probably depends on what kind of work your topology is doing. Is it IO bound? Memory Bound? CPU Bound? 3. Max pending - Are you using timeouts/tracking tuples through your topology? Typically you want this high enough such that your bolts are not starved for things to work on, but not so high that tuples are queued up waiting to be processed and timeout before they can be worked on. The biggest trick here is your "total tuples in flight" is equal to (Number Of Spout Instances * Your Configured Max Spout Pending). For example, if you set max pending to 1000, and have 3 spout instances, you can have ~3000 tuples in flight. On Tue, Nov 21, 2017 at 12:55 PM, Mahabaleshwar < mahabaleshwa...@trinitymobility.com> wrote: > Hi, > > > > I am using 3 Node Kafka Cluster and i have created one topic called > iot_gateway with 3 partition & 3 replication factor. My doubt is in storm > Kafka spout configuration: > > > > 1. How much parallelism hint should give? > > 2. How much worker should give? > > 3. How much max pending messages should configure? > > 4. How should maintain task & partition relation? > > > > I need your help friends. > > > > Thanks, > > Mahabaleshwar > > >
Regarding storm & Kafka Configuration.
Hi, I am using 3 Node Kafka Cluster and i have created one topic called iot_gateway with 3 partition & 3 replication factor. My doubt is in storm Kafka spout configuration: 1. How much parallelism hint should give? 2. How much worker should give? 3. How much max pending messages should configure? 4. How should maintain task & partition relation? I need your help friends. Thanks, Mahabaleshwar
Storm topology(Kafka spout) stopped reading after 1 min's.
Hi, I am reading tuples from one topic in Kafka cluster using Kafka spout. It is reading for 1 min's after that it is stopped reading messages but topology is running successfully in local cluster. My cluster info : 1. 3 Node cluster 2. one topic - iot_gateway(3 partition & 3 replica) 3. I given parallelism hint 5 in spout & 10 in bolt. 5. 4 Worker And also can you tell me the partition & parallelism hint relationship(means ratio between both). Please help here overcome this problem. Thanks, Mahabaleshwar
Re: Acking, failing, and anchor tuples
Yes, BasicBolt2 shouldn't emit anything. Since there's nothing listening to BasicBolt2's output, it won't have any effect if you emit tuples from it. 2017-11-20 17:54 GMT+01:00 Hannum, Daniel : > Thanks so much for this explanation. > > > > Am I right that BasicBolt2 should not emit anything because it’s at the > end of the line? Right now I am emitting tuples from the last bolt and it > appears to work, but I guess I shouldn’t. > > > > *From: *Stig Rohde Døssing > *Reply-To: *"user@storm.apache.org" > *Date: *Monday, November 20, 2017 at 11:37 AM > *To: *"user@storm.apache.org" > *Subject: *Re: Acking, failing, and anchor tuples > > > > This email did not originate from the Premier, Inc. network. Use > caution when opening attachments or clicking on URLs.* > > > . > > I think you are a little confused about the difference between failing > tuples and skipping bolts. Here's a quick rundown: > > > > Let's say your spout has emitted a tuple t. BaseBasicBolt has just > received t0, which is a tuple anchored to t. > > > > If you decide to emit nothing and return from execute(), t0 will be acked. > If t0 was the last pending tuple anchored to t, t will be acked on the > spout (marked as "done", so it won't be replayed). > > > > If you instead throw FailedException t is marked as failed, and the spout > will likely replay it. > > > > If you emit any tuples they will automatically be anchored to t. This > means that the new tuples must also succeed before t gets acked. > > > > So here's the answers to your questions: > > > > The correct way to ack a tuple from a BaseBasicBolt is to not throw > FailedException. Unless you throw FailedException, the tuple will be acked. > > The correct way to fail a tuple from a BaseBasicBolt is to throw > FailedException. > > > > Also because it seems like this is what you're actually asking: If you > have a topology like Spout -> BasicBolt1 -> BasicBolt2 and you have a tuple > t1 in BasicBolt1 and you want to skip BasicBolt2, BasicBolt1 simply > shouldn't emit any tuples while running execute() for t1. > > > > I hope this helps. > > > > 2017-11-20 15:39 GMT+01:00 Hannum, Daniel : > > Hi, > > > > I’m trying to get clear on how to handle various cases in my BaseBasicBolt. > > > > So far, I just have each bolt emit more tuples, pretty standard. But I > still do that for the last bolt in the topology. I’m not sure I should do > that. Seems dirty. > > > > Now, I have a case where I want a bolt to fail a tuple (skip all bolts > after). I read that I should just return without emitting any tuples and > that functions as a fail. That seems odd to me, that I should emit tuples > at the end of my topology for success even when they go nowhere, but not > emit anything to show failure. > > > > And then there’s always FailedException. Maybe I should forget all of this > and just throw that if I want to fail the tuple. > > > > So what is the correct way to > >1. Ack a tuple properly in the last bolt >2. Fail a tuple in the middle > > > > Thanks! > > >
RE: A Batching Bolt
Thanks Mauro. I think my situation is different. I need to emit even the information from each tuple, it's just that I have to restructure it and perform some grouping. What is the best way to emit these mappings and collections in batch?I tried emitting the whole map but the performance of that seemed low. Marco. On 20 Nov 2017 17:46, "Mauro Giusti" wrote: > Marco – > > Our first bolt emits a summarized record of the info we received from the > spouts – > > It is time based – every 30 seconds we emit one record that summarizes all > the records we received from the spout – > > We don’t re-emit the source records that we received from the spouts, they > are persisted on cold path storage though and we can access them offline > for detailed analysis - > > > > Is this similar to what you are trying to do? > > > > Thx, > > Mauro. > > > > *From:* Marco Costantini [mailto:mcsil...@gmail.com] > *Sent:* Monday, November 20, 2017 1:01 AM > *To:* user@storm.apache.org > *Subject:* A Batching Bolt > > > > Hello, > > I need to group/batch tuples. I've seen an excellent tutorial which does > this. It handles timeouts and batch size breaches. Great. However, there, > all of the logic takes place in the final bolt. That means it does not have > the problem of "emitting batched information". > > Sadly for me, I want to create a distinct bolt in the middle of a topology > for batching. This means I have to worry about emitting batches of > information. > > I tried it out. Both with the batching done in the final bolt, and with > the batching done in a separate bolt. When it's done in the final bolt, all > is well. When it's done in a separate bolt, performance suffers greatly. By > this I mean the indexing rate of ElasticSearch (probably not a good measure > of performance, I know). The batching method is the same in both cases. > > Question: Is it bad to emit a Map or a List of objects? What are the best > practices for batching in a distinct batching bolt? > > > > Please and thank you, > > Marco. >
Re: Acking, failing, and anchor tuples
Thanks so much for this explanation. Am I right that BasicBolt2 should not emit anything because it’s at the end of the line? Right now I am emitting tuples from the last bolt and it appears to work, but I guess I shouldn’t. From: Stig Rohde Døssing Reply-To: "user@storm.apache.org" Date: Monday, November 20, 2017 at 11:37 AM To: "user@storm.apache.org" Subject: Re: Acking, failing, and anchor tuples This email did not originate from the Premier, Inc. network. Use caution when opening attachments or clicking on URLs.* . I think you are a little confused about the difference between failing tuples and skipping bolts. Here's a quick rundown: Let's say your spout has emitted a tuple t. BaseBasicBolt has just received t0, which is a tuple anchored to t. If you decide to emit nothing and return from execute(), t0 will be acked. If t0 was the last pending tuple anchored to t, t will be acked on the spout (marked as "done", so it won't be replayed). If you instead throw FailedException t is marked as failed, and the spout will likely replay it. If you emit any tuples they will automatically be anchored to t. This means that the new tuples must also succeed before t gets acked. So here's the answers to your questions: The correct way to ack a tuple from a BaseBasicBolt is to not throw FailedException. Unless you throw FailedException, the tuple will be acked. The correct way to fail a tuple from a BaseBasicBolt is to throw FailedException. Also because it seems like this is what you're actually asking: If you have a topology like Spout -> BasicBolt1 -> BasicBolt2 and you have a tuple t1 in BasicBolt1 and you want to skip BasicBolt2, BasicBolt1 simply shouldn't emit any tuples while running execute() for t1. I hope this helps. 2017-11-20 15:39 GMT+01:00 Hannum, Daniel mailto:daniel_han...@premierinc.com>>: Hi, I’m trying to get clear on how to handle various cases in my BaseBasicBolt. So far, I just have each bolt emit more tuples, pretty standard. But I still do that for the last bolt in the topology. I’m not sure I should do that. Seems dirty. Now, I have a case where I want a bolt to fail a tuple (skip all bolts after). I read that I should just return without emitting any tuples and that functions as a fail. That seems odd to me, that I should emit tuples at the end of my topology for success even when they go nowhere, but not emit anything to show failure. And then there’s always FailedException. Maybe I should forget all of this and just throw that if I want to fail the tuple. So what is the correct way to 1. Ack a tuple properly in the last bolt 2. Fail a tuple in the middle Thanks!
RE: A Batching Bolt
Marco – Our first bolt emits a summarized record of the info we received from the spouts – It is time based – every 30 seconds we emit one record that summarizes all the records we received from the spout – We don’t re-emit the source records that we received from the spouts, they are persisted on cold path storage though and we can access them offline for detailed analysis - Is this similar to what you are trying to do? Thx, Mauro. From: Marco Costantini [mailto:mcsil...@gmail.com] Sent: Monday, November 20, 2017 1:01 AM To: user@storm.apache.org Subject: A Batching Bolt Hello, I need to group/batch tuples. I've seen an excellent tutorial which does this. It handles timeouts and batch size breaches. Great. However, there, all of the logic takes place in the final bolt. That means it does not have the problem of "emitting batched information". Sadly for me, I want to create a distinct bolt in the middle of a topology for batching. This means I have to worry about emitting batches of information. I tried it out. Both with the batching done in the final bolt, and with the batching done in a separate bolt. When it's done in the final bolt, all is well. When it's done in a separate bolt, performance suffers greatly. By this I mean the indexing rate of ElasticSearch (probably not a good measure of performance, I know). The batching method is the same in both cases. Question: Is it bad to emit a Map or a List of objects? What are the best practices for batching in a distinct batching bolt? Please and thank you, Marco.
Re: Acking, failing, and anchor tuples
I think you are a little confused about the difference between failing tuples and skipping bolts. Here's a quick rundown: Let's say your spout has emitted a tuple t. BaseBasicBolt has just received t0, which is a tuple anchored to t. If you decide to emit nothing and return from execute(), t0 will be acked. If t0 was the last pending tuple anchored to t, t will be acked on the spout (marked as "done", so it won't be replayed). If you instead throw FailedException t is marked as failed, and the spout will likely replay it. If you emit any tuples they will automatically be anchored to t. This means that the new tuples must also succeed before t gets acked. So here's the answers to your questions: The correct way to ack a tuple from a BaseBasicBolt is to not throw FailedException. Unless you throw FailedException, the tuple will be acked. The correct way to fail a tuple from a BaseBasicBolt is to throw FailedException. Also because it seems like this is what you're actually asking: If you have a topology like Spout -> BasicBolt1 -> BasicBolt2 and you have a tuple t1 in BasicBolt1 and you want to skip BasicBolt2, BasicBolt1 simply shouldn't emit any tuples while running execute() for t1. I hope this helps. 2017-11-20 15:39 GMT+01:00 Hannum, Daniel : > Hi, > > > > I’m trying to get clear on how to handle various cases in my BaseBasicBolt. > > > > So far, I just have each bolt emit more tuples, pretty standard. But I > still do that for the last bolt in the topology. I’m not sure I should do > that. Seems dirty. > > > > Now, I have a case where I want a bolt to fail a tuple (skip all bolts > after). I read that I should just return without emitting any tuples and > that functions as a fail. That seems odd to me, that I should emit tuples > at the end of my topology for success even when they go nowhere, but not > emit anything to show failure. > > > > And then there’s always FailedException. Maybe I should forget all of this > and just throw that if I want to fail the tuple. > > > > So what is the correct way to > >1. Ack a tuple properly in the last bolt >2. Fail a tuple in the middle > > > > Thanks! >
Acking, failing, and anchor tuples
Hi, I’m trying to get clear on how to handle various cases in my BaseBasicBolt. So far, I just have each bolt emit more tuples, pretty standard. But I still do that for the last bolt in the topology. I’m not sure I should do that. Seems dirty. Now, I have a case where I want a bolt to fail a tuple (skip all bolts after). I read that I should just return without emitting any tuples and that functions as a fail. That seems odd to me, that I should emit tuples at the end of my topology for success even when they go nowhere, but not emit anything to show failure. And then there’s always FailedException. Maybe I should forget all of this and just throw that if I want to fail the tuple. So what is the correct way to 1. Ack a tuple properly in the last bolt 2. Fail a tuple in the middle Thanks!
A Batching Bolt
Hello, I need to group/batch tuples. I've seen an excellent tutorial which does this. It handles timeouts and batch size breaches. Great. However, there, all of the logic takes place in the final bolt. That means it does not have the problem of "emitting batched information". Sadly for me, I want to create a distinct bolt in the middle of a topology for batching. This means I have to worry about emitting batches of information. I tried it out. Both with the batching done in the final bolt, and with the batching done in a separate bolt. When it's done in the final bolt, all is well. When it's done in a separate bolt, performance suffers greatly. By this I mean the indexing rate of ElasticSearch (probably not a good measure of performance, I know). The batching method is the same in both cases. Question: Is it bad to emit a Map or a List of objects? What are the best practices for batching in a distinct batching bolt? Please and thank you, Marco.