Re: Regarding storm & Kafka Configuration.

2017-11-20 Thread Stephen Powis
1. Parallelism - You can set a maximum of 3, one for each partition in your
topic.  Typically, this will net you the fastest way to get messages out of
Kafka and into your topology, but doing your own testing/benchmarks would
be best to know for sure.
2. How many workers - This probably depends on what kind of work your
topology is doing.  Is it IO bound? Memory Bound? CPU Bound?
3. Max pending - Are you using timeouts/tracking tuples through your
topology?  Typically you want this high enough such that your bolts are not
starved for things to work on, but not so high that tuples are queued up
waiting to be processed and timeout before they can be worked on.  The
biggest trick here is your "total tuples in flight" is equal to (Number Of
Spout Instances * Your Configured Max Spout Pending).   For example, if you
set max pending to 1000, and have 3 spout instances, you can have ~3000
tuples in flight.

On Tue, Nov 21, 2017 at 12:55 PM, Mahabaleshwar <
mahabaleshwa...@trinitymobility.com> wrote:

> Hi,
>
>
>
> I am using 3 Node Kafka Cluster and i have created one topic called
> iot_gateway with 3 partition & 3 replication factor. My doubt is in storm
> Kafka spout configuration:
>
>
>
> 1.   How much parallelism hint should give?
>
> 2.   How much worker should give?
>
> 3.   How much max pending messages should configure?
>
> 4.   How should maintain task & partition relation?
>
>
>
> I need your help friends.
>
>
>
> Thanks,
>
> Mahabaleshwar
>
>
>


Regarding storm & Kafka Configuration.

2017-11-20 Thread Mahabaleshwar
Hi,

 

I am using 3 Node Kafka Cluster and i have created one topic called
iot_gateway with 3 partition & 3 replication factor. My doubt is in storm
Kafka spout configuration:

 

1.   How much parallelism hint should give?

2.   How much worker should give?

3.   How much max pending messages should configure?

4.   How should maintain task & partition relation?

 

I need your help friends.

 

Thanks,

Mahabaleshwar

 



Storm topology(Kafka spout) stopped reading after 1 min's.

2017-11-20 Thread Mahabaleshwar
Hi,

 

I am reading tuples from one topic in  Kafka cluster using Kafka spout. It
is reading for 1 min's after that it is stopped reading messages but
topology is running successfully in local cluster.

 

My cluster info :  1. 3 Node cluster

  2. one topic - iot_gateway(3 partition & 3
replica)

  3. I given parallelism hint 5 in spout &
10 in bolt.

  5. 4 Worker

 

And also can you tell me the partition & parallelism hint relationship(means
ratio between both).

 

Please help here overcome this problem.

 

Thanks,

Mahabaleshwar

 



Re: Acking, failing, and anchor tuples

2017-11-20 Thread Stig Rohde Døssing
Yes, BasicBolt2 shouldn't emit anything. Since there's nothing listening to
BasicBolt2's output, it won't have any effect if you emit tuples from it.

2017-11-20 17:54 GMT+01:00 Hannum, Daniel :

> Thanks so much for this explanation.
>
>
>
> Am I right that BasicBolt2 should not emit anything because it’s at the
> end of the line? Right now I am emitting tuples from the last bolt and it
> appears to work, but I guess I shouldn’t.
>
>
>
> *From: *Stig Rohde Døssing 
> *Reply-To: *"user@storm.apache.org" 
> *Date: *Monday, November 20, 2017 at 11:37 AM
> *To: *"user@storm.apache.org" 
> *Subject: *Re: Acking, failing, and anchor tuples
>
>
>
> This email did not originate from the Premier, Inc. network. Use
> caution when opening attachments or clicking on URLs.*
>
>
> .
>
> I think you are a little confused about the difference between failing
> tuples and skipping bolts. Here's a quick rundown:
>
>
>
> Let's say your spout has emitted a tuple t. BaseBasicBolt has just
> received t0, which is a tuple anchored to t.
>
>
>
> If you decide to emit nothing and return from execute(), t0 will be acked.
> If t0 was the last pending tuple anchored to t, t will be acked on the
> spout (marked as "done", so it won't be replayed).
>
>
>
> If you instead throw FailedException t is marked as failed, and the spout
> will likely replay it.
>
>
>
> If you emit any tuples they will automatically be anchored to t. This
> means that the new tuples must also succeed before t gets acked.
>
>
>
> So here's the answers to your questions:
>
>
>
> The correct way to ack a tuple from a BaseBasicBolt is to not throw
> FailedException. Unless you throw FailedException, the tuple will be acked.
>
> The correct way to fail a tuple from a BaseBasicBolt is to throw
> FailedException.
>
>
>
> Also because it seems like this is what you're actually asking: If you
> have a topology like Spout -> BasicBolt1 -> BasicBolt2 and you have a tuple
> t1 in BasicBolt1 and you want to skip BasicBolt2, BasicBolt1 simply
> shouldn't emit any tuples while running execute() for t1.
>
>
>
> I hope this helps.
>
>
>
> 2017-11-20 15:39 GMT+01:00 Hannum, Daniel :
>
> Hi,
>
>
>
> I’m trying to get clear on how to handle various cases in my BaseBasicBolt.
>
>
>
> So far, I just have each bolt emit more tuples, pretty standard. But I
> still do that for the last bolt in the topology. I’m not sure I should do
> that. Seems dirty.
>
>
>
> Now, I have a case where I want a bolt to fail a tuple (skip all bolts
> after). I read that I should just return without emitting any tuples and
> that functions as a fail. That seems odd to me, that I should emit tuples
> at the end of my topology for success even when they go nowhere, but not
> emit anything to show failure.
>
>
>
> And then there’s always FailedException. Maybe I should forget all of this
> and just throw that if I want to fail the tuple.
>
>
>
> So what is the correct way to
>
>1. Ack a tuple properly in the last bolt
>2. Fail a tuple in the middle
>
>
>
> Thanks!
>
>
>


RE: A Batching Bolt

2017-11-20 Thread Marco Costantini
Thanks Mauro. I think my situation is different. I need to emit even the
information from each tuple, it's just that I have to restructure it and
perform some grouping. What is the best way to emit these mappings and
collections in batch?I tried emitting the whole map but the performance of
that seemed low.

Marco.

On 20 Nov 2017 17:46, "Mauro Giusti"  wrote:

> Marco –
>
> Our first bolt emits a summarized record of the info we received from the
> spouts –
>
> It is time based – every 30 seconds we emit one record that summarizes all
> the records we received from the spout –
>
> We don’t re-emit the source records that we received from the spouts, they
> are persisted on cold path storage though and we can access them offline
> for detailed analysis -
>
>
>
> Is this similar to what you are trying to do?
>
>
>
> Thx,
>
> Mauro.
>
>
>
> *From:* Marco Costantini [mailto:mcsil...@gmail.com]
> *Sent:* Monday, November 20, 2017 1:01 AM
> *To:* user@storm.apache.org
> *Subject:* A Batching Bolt
>
>
>
> Hello,
>
> I need to group/batch tuples. I've seen an excellent tutorial which does
> this. It handles timeouts and batch size breaches. Great. However, there,
> all of the logic takes place in the final bolt. That means it does not have
> the problem of "emitting batched information".
>
> Sadly for me, I want to create a distinct bolt in the middle of a topology
> for batching. This means I have to worry about emitting batches of
> information.
>
> I tried it out. Both with the batching done in the final bolt, and with
> the batching done in a separate bolt. When it's done in the final bolt, all
> is well. When it's done in a separate bolt, performance suffers greatly. By
> this I mean the indexing rate of ElasticSearch (probably not a good measure
> of performance, I know). The batching method is the same in both cases.
>
> Question: Is it bad to emit a Map or a List of objects? What are the best
> practices for batching in a distinct batching bolt?
>
>
>
> Please and thank you,
>
> Marco.
>


Re: Acking, failing, and anchor tuples

2017-11-20 Thread Hannum, Daniel
Thanks so much for this explanation.

Am I right that BasicBolt2 should not emit anything because it’s at the end of 
the line? Right now I am emitting tuples from the last bolt and it appears to 
work, but I guess I shouldn’t.

From: Stig Rohde Døssing 
Reply-To: "user@storm.apache.org" 
Date: Monday, November 20, 2017 at 11:37 AM
To: "user@storm.apache.org" 
Subject: Re: Acking, failing, and anchor tuples

This email did not originate from the Premier, Inc. network. Use caution 
when opening attachments or clicking on URLs.*


.
I think you are a little confused about the difference between failing tuples 
and skipping bolts. Here's a quick rundown:

Let's say your spout has emitted a tuple t. BaseBasicBolt has just received t0, 
which is a tuple anchored to t.

If you decide to emit nothing and return from execute(), t0 will be acked. If 
t0 was the last pending tuple anchored to t, t will be acked on the spout 
(marked as "done", so it won't be replayed).

If you instead throw FailedException t is marked as failed, and the spout will 
likely replay it.

If you emit any tuples they will automatically be anchored to t. This means 
that the new tuples must also succeed before t gets acked.

So here's the answers to your questions:

The correct way to ack a tuple from a BaseBasicBolt is to not throw 
FailedException. Unless you throw FailedException, the tuple will be acked.
The correct way to fail a tuple from a BaseBasicBolt is to throw 
FailedException.

Also because it seems like this is what you're actually asking: If you have a 
topology like Spout -> BasicBolt1 -> BasicBolt2 and you have a tuple t1 in 
BasicBolt1 and you want to skip BasicBolt2, BasicBolt1 simply shouldn't emit 
any tuples while running execute() for t1.

I hope this helps.

2017-11-20 15:39 GMT+01:00 Hannum, Daniel 
mailto:daniel_han...@premierinc.com>>:
Hi,

I’m trying to get clear on how to handle various cases in my BaseBasicBolt.

So far, I just have each bolt emit more tuples, pretty standard. But I still do 
that for the last bolt in the topology. I’m not sure I should do that. Seems 
dirty.

Now, I have a case where I want a bolt to fail a tuple (skip all bolts after). 
I read that I should just return without emitting any tuples and that functions 
as a fail. That seems odd to me, that I should emit tuples at the end of my 
topology for success even when they go nowhere, but not emit anything to show 
failure.

And then there’s always FailedException. Maybe I should forget all of this and 
just throw that if I want to fail the tuple.

So what is the correct way to

  1.  Ack a tuple properly in the last bolt
  2.  Fail a tuple in the middle

Thanks!



RE: A Batching Bolt

2017-11-20 Thread Mauro Giusti
Marco –
Our first bolt emits a summarized record of the info we received from the 
spouts –
It is time based – every 30 seconds we emit one record that summarizes all the 
records we received from the spout –
We don’t re-emit the source records that we received from the spouts, they are 
persisted on cold path storage though and we can access them offline for 
detailed analysis -

Is this similar to what you are trying to do?

Thx,
Mauro.

From: Marco Costantini [mailto:mcsil...@gmail.com]
Sent: Monday, November 20, 2017 1:01 AM
To: user@storm.apache.org
Subject: A Batching Bolt

Hello,
I need to group/batch tuples. I've seen an excellent tutorial which does this. 
It handles timeouts and batch size breaches. Great. However, there, all of the 
logic takes place in the final bolt. That means it does not have the problem of 
"emitting batched information".

Sadly for me, I want to create a distinct bolt in the middle of a topology for 
batching. This means I have to worry about emitting batches of information.

I tried it out. Both with the batching done in the final bolt, and with the 
batching done in a separate bolt. When it's done in the final bolt, all is 
well. When it's done in a separate bolt, performance suffers greatly. By this I 
mean the indexing rate of ElasticSearch (probably not a good measure of 
performance, I know). The batching method is the same in both cases.

Question: Is it bad to emit a Map or a List of objects? What are the best 
practices for batching in a distinct batching bolt?

Please and thank you,
Marco.


Re: Acking, failing, and anchor tuples

2017-11-20 Thread Stig Rohde Døssing
I think you are a little confused about the difference between failing
tuples and skipping bolts. Here's a quick rundown:

Let's say your spout has emitted a tuple t. BaseBasicBolt has just received
t0, which is a tuple anchored to t.

If you decide to emit nothing and return from execute(), t0 will be acked.
If t0 was the last pending tuple anchored to t, t will be acked on the
spout (marked as "done", so it won't be replayed).

If you instead throw FailedException t is marked as failed, and the spout
will likely replay it.

If you emit any tuples they will automatically be anchored to t. This means
that the new tuples must also succeed before t gets acked.

So here's the answers to your questions:

The correct way to ack a tuple from a BaseBasicBolt is to not throw
FailedException. Unless you throw FailedException, the tuple will be acked.
The correct way to fail a tuple from a BaseBasicBolt is to throw
FailedException.

Also because it seems like this is what you're actually asking: If you have
a topology like Spout -> BasicBolt1 -> BasicBolt2 and you have a tuple t1
in BasicBolt1 and you want to skip BasicBolt2, BasicBolt1 simply shouldn't
emit any tuples while running execute() for t1.

I hope this helps.

2017-11-20 15:39 GMT+01:00 Hannum, Daniel :

> Hi,
>
>
>
> I’m trying to get clear on how to handle various cases in my BaseBasicBolt.
>
>
>
> So far, I just have each bolt emit more tuples, pretty standard. But I
> still do that for the last bolt in the topology. I’m not sure I should do
> that. Seems dirty.
>
>
>
> Now, I have a case where I want a bolt to fail a tuple (skip all bolts
> after). I read that I should just return without emitting any tuples and
> that functions as a fail. That seems odd to me, that I should emit tuples
> at the end of my topology for success even when they go nowhere, but not
> emit anything to show failure.
>
>
>
> And then there’s always FailedException. Maybe I should forget all of this
> and just throw that if I want to fail the tuple.
>
>
>
> So what is the correct way to
>
>1. Ack a tuple properly in the last bolt
>2. Fail a tuple in the middle
>
>
>
> Thanks!
>


Acking, failing, and anchor tuples

2017-11-20 Thread Hannum, Daniel
Hi,

I’m trying to get clear on how to handle various cases in my BaseBasicBolt.

So far, I just have each bolt emit more tuples, pretty standard. But I still do 
that for the last bolt in the topology. I’m not sure I should do that. Seems 
dirty.

Now, I have a case where I want a bolt to fail a tuple (skip all bolts after). 
I read that I should just return without emitting any tuples and that functions 
as a fail. That seems odd to me, that I should emit tuples at the end of my 
topology for success even when they go nowhere, but not emit anything to show 
failure.

And then there’s always FailedException. Maybe I should forget all of this and 
just throw that if I want to fail the tuple.

So what is the correct way to

  1.  Ack a tuple properly in the last bolt
  2.  Fail a tuple in the middle

Thanks!


A Batching Bolt

2017-11-20 Thread Marco Costantini
Hello,
I need to group/batch tuples. I've seen an excellent tutorial which does
this. It handles timeouts and batch size breaches. Great. However, there,
all of the logic takes place in the final bolt. That means it does not have
the problem of "emitting batched information".

Sadly for me, I want to create a distinct bolt in the middle of a topology
for batching. This means I have to worry about emitting batches of
information.

I tried it out. Both with the batching done in the final bolt, and with the
batching done in a separate bolt. When it's done in the final bolt, all is
well. When it's done in a separate bolt, performance suffers greatly. By
this I mean the indexing rate of ElasticSearch (probably not a good measure
of performance, I know). The batching method is the same in both cases.

Question: Is it bad to emit a Map or a List of objects? What are the best
practices for batching in a distinct batching bolt?

Please and thank you,
Marco.