I want to enhance the performance of the running topology

2014-08-14 Thread M.Tarkeshwar Rao
Hi all,

I want to enhance the performance of the running topology.

If i done all standard configuration and then if  i want to improve the
performance then i have to
re-balance and schedule the topology of the basis of traffic flow on the
bolts.
As i new to storm.  am  I going in right way?

I want to get the metrics of running traffic on bolts. Can you please
suggest me how should i go ahead.


Only firing rebalance will not help us a lot. We have to decide the
criteria for rebalance and we should 1st calculate the traffic on the DAG
or any other matrices of the running topology.




Few useful Links are:

http://www.orgs.ttu.edu/debs2013/presentations/DEBS13-Paper88-Querzoni.pdf

http://www.dis.uniroma1.it/~midlab/articoli/ABQ13storm.pdf


regards

tarkeshwar


java.lang.ArrayIndexOutOfBoundsException: 3 at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor

2014-08-14 Thread Kushan Maskey
I am getting this error message in the Storm UI. Topology works fine on
localCluster.


java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 3 at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
at
backtype.storm.daemon.executor$fn__5641$fn__5653$fn__5700.invoke(executor.clj:746)
at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) at
clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at my
package.method(My Class.java:135) at My
Class.method(MyClass.java:83) at MyBolt.execute(MyBolt.java:56) at
backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50)
at
backtype.storm.daemon.executor$fn__5641$tuple_action_fn__5643.invoke(executor.clj:631)
at
backtype.storm.daemon.executor$mk_task_receiver$fn__5564.invoke(executor.clj:399)
at
backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58)
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
... 6 more


I am wondering if it has to do with curator version. Coz the storm
distribution comes with curator 2.4.0 and i think we have to use curator
2.5.0.

I am using storm 0.9.2 with kafka_2.10-0.8.1.1, zookeeper 3.4.5.

--
Kushan Maskey
817.403.7500


Kafka + Storm

2014-08-14 Thread Adaryl Bob Wakefield, MBA
Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest 
messages without having Kafka in the middle?

B.

Re: Kafka + Storm

2014-08-14 Thread Adaryl Bob Wakefield, MBA
I get your reasoning at a high level. I should have specified that I wasn’t 
sure what Kafka does. I don’t have a hard software engineering background. I 
know that Kafka is “a message queuing” system, but I don’t really know what 
that means.

(I can’t believe you wrote all that from your iPhone)
B.


From: Justin Workman 
Sent: Thursday, August 14, 2014 7:22 PM
To: user@storm.incubator.apache.org 
Subject: Re: Kafka + Storm

Personally, we looked at several options, including writing our own storm 
source. There are limited storm sources with community support out there. For 
us, it boiled down to the following;

1) community support and what appeared to be a standard method. Storm has now 
included the kafka source as a bundled component to storm. This made the 
implementation much faster, because the code was done. 
2) the durability (replication and clustering) of Kafka. We have a three hour 
retention period on our queues, so if we need to do maintenance on storm or 
deploy an updated topology, we don't need to stop or replay any sources
3) the ability to have other tools attach to the Kafka queues to consume the 
same events for other purposes. 
4) to compliment point #1, it's easy to write to Kafka. So it was little effort 
to start sending our desired data to Kafka. 

These are our main reasons ( I'm sure there were more ). Each use case is going 
to be different and Kafka might not be the best choice for everyone. For us it 
made sense. 

Justin 

Sent from my iPhone

On Aug 14, 2014, at 6:08 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:


  Can someone tell me why people put Kafka in front of Storm? Can’t Storm 
ingest messages without having Kafka in the middle?

  B.

Re: Kafka + Storm

2014-08-14 Thread Justin Workman
If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if
I can explain, I am definitely not a subject matter expert on this.

Within Kafka you can create queues, ie a webclicks queue. Your web
servers can then send click events to this queue in Kafka. The web servers,
or agent writing the events to this queue are referred to as the
producer.  Each event, or message in Kafka is assigned an id.

On the other side there are consumers, in storms case this would be the
storm Kafka spout, that can subscribe to this webclicks queue to consume
the messages that are in the queue. The consumer can consume a single
message from the queue, or a batch of messages, as storm does. The consumer
keeps track of the latest offset, Kafka message id, that it has consumed.
This way the next time the consumer checks to see if there are more
messages to consume it will ask for messages with a message id greater than
its last offset.

This helps with the reliability of the event stream and helps guarantee
that your events/message make it start to finish through your stream,
assuming the events get to Kafka ;)

Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)

Justin

Sent from my iPhone

On Aug 14, 2014, at 6:28 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

  I get your reasoning at a high level. I should have specified that I
wasn’t sure what Kafka does. I don’t have a hard software engineering
background. I know that Kafka is “a message queuing” system, but I don’t
really know what that means.

(I can’t believe you wrote all that from your iPhone)
B.


 *From:* Justin Workman justinjwork...@gmail.com
*Sent:* Thursday, August 14, 2014 7:22 PM
*To:* user@storm.incubator.apache.org
*Subject:* Re: Kafka + Storm

 Personally, we looked at several options, including writing our own storm
source. There are limited storm sources with community support out there.
For us, it boiled down to the following;

1) community support and what appeared to be a standard method. Storm has
now included the kafka source as a bundled component to storm. This made
the implementation much faster, because the code was done.
2) the durability (replication and clustering) of Kafka. We have a three
hour retention period on our queues, so if we need to do maintenance on
storm or deploy an updated topology, we don't need to stop or replay any
sources
3) the ability to have other tools attach to the Kafka queues to consume
the same events for other purposes.
4) to compliment point #1, it's easy to write to Kafka. So it was little
effort to start sending our desired data to Kafka.

These are our main reasons ( I'm sure there were more ). Each use case is
going to be different and Kafka might not be the best choice for everyone.
For us it made sense.

Justin

Sent from my iPhone

On Aug 14, 2014, at 6:08 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

  Can someone tell me why people put Kafka in front of Storm? Can’t Storm
ingest messages without having Kafka in the middle?

B.


RE: Kafka + Storm

2014-08-14 Thread anand nalya
Also, since Kafka acts as a buffer, storm is not directly affected by the speed 
of your data sources/producers.

-Original Message-
From: Justin Workman justinjwork...@gmail.com
Sent: ‎15-‎08-‎2014 07:12
To: user@storm.incubator.apache.org user@storm.incubator.apache.org
Subject: Re: Kafka + Storm

Good analogy!

Sent from my iPhone

On Aug 14, 2014, at 7:36 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:


Ah so Storm is the hospital and Kafka is the waiting room where everybody 
queues up to be seen in turn yes?
 
Adaryl Bob Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData
 
From: Justin Workman 
Sent: Thursday, August 14, 2014 7:47 PM
To: user@storm.incubator.apache.org 
Subject: Re: Kafka + Storm
 
If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I 
can explain, I am definitely not a subject matter expert on this. 
 
Within Kafka you can create queues, ie a webclicks queue. Your web servers 
can then send click events to this queue in Kafka. The web servers, or agent 
writing the events to this queue are referred to as the producer.  Each 
event, or message in Kafka is assigned an id. 
 
On the other side there are consumers, in storms case this would be the storm 
Kafka spout, that can subscribe to this webclicks queue to consume the messages 
that are in the queue. The consumer can consume a single message from the 
queue, or a batch of messages, as storm does. The consumer keeps track of the 
latest offset, Kafka message id, that it has consumed. This way the next time 
the consumer checks to see if there are more messages to consume it will ask 
for messages with a message id greater than its last offset. 
 
This helps with the reliability of the event stream and helps guarantee that 
your events/message make it start to finish through your stream, assuming the 
events get to Kafka ;)
 
Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)
 
Justin

Sent from my iPhone

On Aug 14, 2014, at 6:28 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:


I get your reasoning at a high level. I should have specified that I wasn’t 
sure what Kafka does. I don’t have a hard software engineering background. I 
know that Kafka is “a message queuing” system, but I don’t really know what 
that means.
 
(I can’t believe you wrote all that from your iPhone)
B.
 
 
From: Justin Workman 
Sent: Thursday, August 14, 2014 7:22 PM
To: user@storm.incubator.apache.org 
Subject: Re: Kafka + Storm
 
Personally, we looked at several options, including writing our own storm 
source. There are limited storm sources with community support out there. For 
us, it boiled down to the following;
 
1) community support and what appeared to be a standard method. Storm has now 
included the kafka source as a bundled component to storm. This made the 
implementation much faster, because the code was done. 
2) the durability (replication and clustering) of Kafka. We have a three hour 
retention period on our queues, so if we need to do maintenance on storm or 
deploy an updated topology, we don't need to stop or replay any sources
3) the ability to have other tools attach to the Kafka queues to consume the 
same events for other purposes. 
4) to compliment point #1, it's easy to write to Kafka. So it was little effort 
to start sending our desired data to Kafka. 
 
These are our main reasons ( I'm sure there were more ). Each use case is going 
to be different and Kafka might not be the best choice for everyone. For us it 
made sense. 
 
Justin 

Sent from my iPhone

On Aug 14, 2014, at 6:08 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:


Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest 
messages without having Kafka in the middle?
 
B.

Re: Kafka + Storm

2014-08-14 Thread Justin Workman
I suppose not directly.  It depends on the lifetime of your Kafka queues
and on your latency requirements. You need to make sure you have enough
doctors or in storm language workers, in your storm cluster to process
your messages within your SLA.

For our case we, we have a 3 hour lifetime or ttl configured for our
queues. Meaning records in the queue older than 3 hours are purged. We also
have an internal SLA ( team goal, not published to the business ;)) of 10
seconds from event to end of stream and available for end user consumption.

So we need to make sure we have enough storm workers to to meet; 1) the
normal SLA and 2) be able to catch up on the queues when we have to take
storm down for maintenance and such and the queues build.

There are many knobs you can tune for both storm and Kafka. We have spent
many hours tuning things to meet our SLAs.

Justin

Sent from my iPhone

On Aug 14, 2014, at 8:05 PM, anand nalya anand.na...@gmail.com wrote:

Also, since Kafka acts as a buffer, storm is not directly affected by the
speed of your data sources/producers.
--
From: Justin Workman justinjwork...@gmail.com
Sent: ‎15-‎08-‎2014 07:12
To: user@storm.incubator.apache.org
Subject: Re: Kafka + Storm

Good analogy!

Sent from my iPhone

On Aug 14, 2014, at 7:36 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

 Ah so Storm is the hospital and Kafka is the waiting room where everybody
queues up to be seen in turn yes?

Adaryl Bob Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

 *From:* Justin Workman justinjwork...@gmail.com
*Sent:* Thursday, August 14, 2014 7:47 PM
*To:* user@storm.incubator.apache.org
*Subject:* Re: Kafka + Storm

 If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if
I can explain, I am definitely not a subject matter expert on this.

Within Kafka you can create queues, ie a webclicks queue. Your web
servers can then send click events to this queue in Kafka. The web servers,
or agent writing the events to this queue are referred to as the
producer.  Each event, or message in Kafka is assigned an id.

On the other side there are consumers, in storms case this would be the
storm Kafka spout, that can subscribe to this webclicks queue to consume
the messages that are in the queue. The consumer can consume a single
message from the queue, or a batch of messages, as storm does. The consumer
keeps track of the latest offset, Kafka message id, that it has consumed.
This way the next time the consumer checks to see if there are more
messages to consume it will ask for messages with a message id greater than
its last offset.

This helps with the reliability of the event stream and helps guarantee
that your events/message make it start to finish through your stream,
assuming the events get to Kafka ;)

Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)

Justin

Sent from my iPhone

On Aug 14, 2014, at 6:28 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

  I get your reasoning at a high level. I should have specified that I
wasn’t sure what Kafka does. I don’t have a hard software engineering
background. I know that Kafka is “a message queuing” system, but I don’t
really know what that means.

(I can’t believe you wrote all that from your iPhone)
B.


 *From:* Justin Workman justinjwork...@gmail.com
*Sent:* Thursday, August 14, 2014 7:22 PM
*To:* user@storm.incubator.apache.org
*Subject:* Re: Kafka + Storm

 Personally, we looked at several options, including writing our own storm
source. There are limited storm sources with community support out there.
For us, it boiled down to the following;

1) community support and what appeared to be a standard method. Storm has
now included the kafka source as a bundled component to storm. This made
the implementation much faster, because the code was done.
2) the durability (replication and clustering) of Kafka. We have a three
hour retention period on our queues, so if we need to do maintenance on
storm or deploy an updated topology, we don't need to stop or replay any
sources
3) the ability to have other tools attach to the Kafka queues to consume
the same events for other purposes.
4) to compliment point #1, it's easy to write to Kafka. So it was little
effort to start sending our desired data to Kafka.

These are our main reasons ( I'm sure there were more ). Each use case is
going to be different and Kafka might not be the best choice for everyone.
For us it made sense.

Justin

Sent from my iPhone

On Aug 14, 2014, at 6:08 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

  Can someone tell me why people put Kafka in front of Storm? Can’t Storm
ingest messages without having Kafka in the middle?

B.


RE: Kafka + Storm

2014-08-14 Thread anand nalya
I agree, not for the long run but for small bursts in data production rate, say 
peak hours, Kafka can help in providing a somewhat consistent load on Storm 
cluster.

-Original Message-
From: Justin Workman justinjwork...@gmail.com
Sent: ‎15-‎08-‎2014 07:53
To: user@storm.incubator.apache.org user@storm.incubator.apache.org
Subject: Re: Kafka + Storm

I suppose not directly.  It depends on the lifetime of your Kafka queues and on 
your latency requirements. You need to make sure you have enough doctors or 
in storm language workers, in your storm cluster to process your messages 
within your SLA. 


For our case we, we have a 3 hour lifetime or ttl configured for our queues. 
Meaning records in the queue older than 3 hours are purged. We also have an 
internal SLA ( team goal, not published to the business ;)) of 10 seconds from 
event to end of stream and available for end user consumption. 


So we need to make sure we have enough storm workers to to meet; 1) the normal 
SLA and 2) be able to catch up on the queues when we have to take storm down 
for maintenance and such and the queues build. 


There are many knobs you can tune for both storm and Kafka. We have spent many 
hours tuning things to meet our SLAs.


Justin

Sent from my iPhone

On Aug 14, 2014, at 8:05 PM, anand nalya anand.na...@gmail.com wrote:


Also, since Kafka acts as a buffer, storm is not directly affected by the speed 
of your data sources/producers.


From: Justin Workman
Sent: ‎15-‎08-‎2014 07:12
To: user@storm.incubator.apache.org
Subject: Re: Kafka + Storm


Good analogy!

Sent from my iPhone

On Aug 14, 2014, at 7:36 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:


Ah so Storm is the hospital and Kafka is the waiting room where everybody 
queues up to be seen in turn yes?
 
Adaryl Bob Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData
 
From: Justin Workman 
Sent: Thursday, August 14, 2014 7:47 PM
To: user@storm.incubator.apache.org 
Subject: Re: Kafka + Storm
 
If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I 
can explain, I am definitely not a subject matter expert on this. 
 
Within Kafka you can create queues, ie a webclicks queue. Your web servers 
can then send click events to this queue in Kafka. The web servers, or agent 
writing the events to this queue are referred to as the producer.  Each 
event, or message in Kafka is assigned an id. 
 
On the other side there are consumers, in storms case this would be the storm 
Kafka spout, that can subscribe to this webclicks queue to consume the messages 
that are in the queue. The consumer can consume a single message from the 
queue, or a batch of messages, as storm does. The consumer keeps track of the 
latest offset, Kafka message id, that it has consumed. This way the next time 
the consumer checks to see if there are more messages to consume it will ask 
for messages with a message id greater than its last offset. 
 
This helps with the reliability of the event stream and helps guarantee that 
your events/message make it start to finish through your stream, assuming the 
events get to Kafka ;)
 
Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)
 
Justin

Sent from my iPhone

On Aug 14, 2014, at 6:28 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:


I get your reasoning at a high level. I should have specified that I wasn’t 
sure what Kafka does. I don’t have a hard software engineering background. I 
know that Kafka is “a message queuing” system, but I don’t really know what 
that means.
 
(I can’t believe you wrote all that from your iPhone)
B.
 
 
From: Justin Workman 
Sent: Thursday, August 14, 2014 7:22 PM
To: user@storm.incubator.apache.org 
Subject: Re: Kafka + Storm
 
Personally, we looked at several options, including writing our own storm 
source. There are limited storm sources with community support out there. For 
us, it boiled down to the following;
 
1) community support and what appeared to be a standard method. Storm has now 
included the kafka source as a bundled component to storm. This made the 
implementation much faster, because the code was done. 
2) the durability (replication and clustering) of Kafka. We have a three hour 
retention period on our queues, so if we need to do maintenance on storm or 
deploy an updated topology, we don't need to stop or replay any sources
3) the ability to have other tools attach to the Kafka queues to consume the 
same events for other purposes. 
4) to compliment point #1, it's easy to write to Kafka. So it was little effort 
to start sending our desired data to Kafka. 
 
These are our main reasons ( I'm sure there were more ). Each use case is going 
to be different and Kafka might not be the best choice for everyone. For us it 
made sense. 
 
Justin 

Sent from my iPhone

On Aug 14, 2014, at 6:08 PM, Adaryl \Bob\ Wakefield, MBA 

Re: Kafka + Storm

2014-08-14 Thread Justin Workman
Absolutely!

Sent from my iPhone

On Aug 14, 2014, at 9:02 PM, anand nalya anand.na...@gmail.com wrote:

I agree, not for the long run but for small bursts in data production rate,
say peak hours, Kafka can help in providing a somewhat consistent load on
Storm cluster.
--
From: Justin Workman justinjwork...@gmail.com
Sent: ‎15-‎08-‎2014 07:53
To: user@storm.incubator.apache.org
Subject: Re: Kafka + Storm

I suppose not directly.  It depends on the lifetime of your Kafka queues
and on your latency requirements. You need to make sure you have enough
doctors or in storm language workers, in your storm cluster to process
your messages within your SLA.

For our case we, we have a 3 hour lifetime or ttl configured for our
queues. Meaning records in the queue older than 3 hours are purged. We also
have an internal SLA ( team goal, not published to the business ;)) of 10
seconds from event to end of stream and available for end user consumption.

So we need to make sure we have enough storm workers to to meet; 1) the
normal SLA and 2) be able to catch up on the queues when we have to take
storm down for maintenance and such and the queues build.

There are many knobs you can tune for both storm and Kafka. We have spent
many hours tuning things to meet our SLAs.

Justin

Sent from my iPhone

On Aug 14, 2014, at 8:05 PM, anand nalya anand.na...@gmail.com wrote:

Also, since Kafka acts as a buffer, storm is not directly affected by the
speed of your data sources/producers.
--
From: Justin Workman justinjwork...@gmail.com
Sent: ‎15-‎08-‎2014 07:12
To: user@storm.incubator.apache.org
Subject: Re: Kafka + Storm

Good analogy!

Sent from my iPhone

On Aug 14, 2014, at 7:36 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

 Ah so Storm is the hospital and Kafka is the waiting room where everybody
queues up to be seen in turn yes?

Adaryl Bob Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

 *From:* Justin Workman justinjwork...@gmail.com
*Sent:* Thursday, August 14, 2014 7:47 PM
*To:* user@storm.incubator.apache.org
*Subject:* Re: Kafka + Storm

 If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if
I can explain, I am definitely not a subject matter expert on this.

Within Kafka you can create queues, ie a webclicks queue. Your web
servers can then send click events to this queue in Kafka. The web servers,
or agent writing the events to this queue are referred to as the
producer.  Each event, or message in Kafka is assigned an id.

On the other side there are consumers, in storms case this would be the
storm Kafka spout, that can subscribe to this webclicks queue to consume
the messages that are in the queue. The consumer can consume a single
message from the queue, or a batch of messages, as storm does. The consumer
keeps track of the latest offset, Kafka message id, that it has consumed.
This way the next time the consumer checks to see if there are more
messages to consume it will ask for messages with a message id greater than
its last offset.

This helps with the reliability of the event stream and helps guarantee
that your events/message make it start to finish through your stream,
assuming the events get to Kafka ;)

Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)

Justin

Sent from my iPhone

On Aug 14, 2014, at 6:28 PM, Adaryl \Bob\ Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

  I get your reasoning at a high level. I should have specified that I
wasn’t sure what Kafka does. I don’t have a hard software engineering
background. I know that Kafka is “a message queuing” system, but I don’t
really know what that means.

(I can’t believe you wrote all that from your iPhone)
B.


 *From:* Justin Workman justinjwork...@gmail.com
*Sent:* Thursday, August 14, 2014 7:22 PM
*To:* user@storm.incubator.apache.org
*Subject:* Re: Kafka + Storm

 Personally, we looked at several options, including writing our own storm
source. There are limited storm sources with community support out there.
For us, it boiled down to the following;

1) community support and what appeared to be a standard method. Storm has
now included the kafka source as a bundled component to storm. This made
the implementation much faster, because the code was done.
2) the durability (replication and clustering) of Kafka. We have a three
hour retention period on our queues, so if we need to do maintenance on
storm or deploy an updated topology, we don't need to stop or replay any
sources
3) the ability to have other tools attach to the Kafka queues to consume
the same events for other purposes.
4) to compliment point #1, it's easy to write to Kafka. So it was little
effort to start sending our desired data to Kafka.

These are our main reasons ( I'm sure there were more ). Each use case is
going to be different and Kafka might not be the best choice for 

Re: Kafka + Storm

2014-08-14 Thread Corey Nolet
Kafka is also distributed in nature, which is not something easily achieved
by queuing brokers like ActiveMQ or JMS (1.0) in general. Kafka allows data
to be partitioned across many machines which can grow as necessary as your
data grows.




On Thu, Aug 14, 2014 at 11:20 PM, Justin Workman justinjwork...@gmail.com
wrote:

 Absolutely!

 Sent from my iPhone

 On Aug 14, 2014, at 9:02 PM, anand nalya anand.na...@gmail.com wrote:

 I agree, not for the long run but for small bursts in data production
 rate, say peak hours, Kafka can help in providing a somewhat consistent
 load on Storm cluster.
 --
 From: Justin Workman justinjwork...@gmail.com
 Sent: ‎15-‎08-‎2014 07:53
 To: user@storm.incubator.apache.org
 Subject: Re: Kafka + Storm

 I suppose not directly.  It depends on the lifetime of your Kafka queues
 and on your latency requirements. You need to make sure you have enough
 doctors or in storm language workers, in your storm cluster to process
 your messages within your SLA.

 For our case we, we have a 3 hour lifetime or ttl configured for our
 queues. Meaning records in the queue older than 3 hours are purged. We also
 have an internal SLA ( team goal, not published to the business ;)) of 10
 seconds from event to end of stream and available for end user consumption.

 So we need to make sure we have enough storm workers to to meet; 1) the
 normal SLA and 2) be able to catch up on the queues when we have to take
 storm down for maintenance and such and the queues build.

 There are many knobs you can tune for both storm and Kafka. We have spent
 many hours tuning things to meet our SLAs.

 Justin

 Sent from my iPhone

 On Aug 14, 2014, at 8:05 PM, anand nalya anand.na...@gmail.com wrote:

 Also, since Kafka acts as a buffer, storm is not directly affected by the
 speed of your data sources/producers.
 --
 From: Justin Workman justinjwork...@gmail.com
 Sent: ‎15-‎08-‎2014 07:12
 To: user@storm.incubator.apache.org
 Subject: Re: Kafka + Storm

 Good analogy!

 Sent from my iPhone

 On Aug 14, 2014, at 7:36 PM, Adaryl \Bob\ Wakefield, MBA 
 adaryl.wakefi...@hotmail.com wrote:

  Ah so Storm is the hospital and Kafka is the waiting room where
 everybody queues up to be seen in turn yes?

 Adaryl Bob Wakefield, MBA
 Principal
 Mass Street Analytics
 913.938.6685
 www.linkedin.com/in/bobwakefieldmba
 Twitter: @BobLovesData

  *From:* Justin Workman justinjwork...@gmail.com
 *Sent:* Thursday, August 14, 2014 7:47 PM
 *To:* user@storm.incubator.apache.org
 *Subject:* Re: Kafka + Storm

  If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see
 if I can explain, I am definitely not a subject matter expert on this.

 Within Kafka you can create queues, ie a webclicks queue. Your web
 servers can then send click events to this queue in Kafka. The web servers,
 or agent writing the events to this queue are referred to as the
 producer.  Each event, or message in Kafka is assigned an id.

 On the other side there are consumers, in storms case this would be the
 storm Kafka spout, that can subscribe to this webclicks queue to consume
 the messages that are in the queue. The consumer can consume a single
 message from the queue, or a batch of messages, as storm does. The consumer
 keeps track of the latest offset, Kafka message id, that it has consumed.
 This way the next time the consumer checks to see if there are more
 messages to consume it will ask for messages with a message id greater than
 its last offset.

 This helps with the reliability of the event stream and helps guarantee
 that your events/message make it start to finish through your stream,
 assuming the events get to Kafka ;)

 Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)

 Justin

 Sent from my iPhone

 On Aug 14, 2014, at 6:28 PM, Adaryl \Bob\ Wakefield, MBA 
 adaryl.wakefi...@hotmail.com wrote:

   I get your reasoning at a high level. I should have specified that I
 wasn’t sure what Kafka does. I don’t have a hard software engineering
 background. I know that Kafka is “a message queuing” system, but I don’t
 really know what that means.

 (I can’t believe you wrote all that from your iPhone)
 B.


  *From:* Justin Workman justinjwork...@gmail.com
 *Sent:* Thursday, August 14, 2014 7:22 PM
 *To:* user@storm.incubator.apache.org
 *Subject:* Re: Kafka + Storm

  Personally, we looked at several options, including writing our own
 storm source. There are limited storm sources with community support out
 there. For us, it boiled down to the following;

 1) community support and what appeared to be a standard method. Storm has
 now included the kafka source as a bundled component to storm. This made
 the implementation much faster, because the code was done.
 2) the durability (replication and clustering) of Kafka. We have a three
 hour retention period on our queues, so if we need to do maintenance on
 storm or deploy an updated topology, 

Re: Need help to use storm with mysql.

2014-08-14 Thread amjad khan
I have a situation where i have seven bolts and one spout  i want to
distribute the tuples according to the field ID.
For eg. if ID=21 I want the tuple to be processed by first bolt
  ID=31 I want that tuple to be processed by second bolt  so
on.

So is there a way to implement these. I was thinking about using fields
grouping but in that i can only define the field name but not the value of
that field, So if i use field grouping i don't think there would be a
guarantee that suppose for ID=21 the tuple would be processed by first bolt.
Kindly correct me if i'm wrong about field grouping  provide solution to
implement these kind of topology.
Thanks in advance.


On Fri, Aug 1, 2014 at 10:20 PM, amjad khan amjadkhan987...@gmail.com
wrote:

 My bolt tries to write data to hdfs but the whole data is not written it
 throws exception

 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /storm.txt File does not exist. Holder DFSClient_attempt_storm.txt does not 
 have any open files.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1557)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1548

 Kindly help me if anyone has any idea about this.



 On Sat, Jul 26, 2014 at 12:47 PM, amjad khan amjadkhan987...@gmail.com
 wrote:

 Output when using bolt that tries to write its data to hdfs.

 INFO org.apache.hadoop,ipc.Client - Retrying Connect to Server: localhost/
 131.0.0.1:43785 Already tried 6 time(s).
 WARN Caught URI Exception
 java.net.ConnectException Call to localhost/131.0.0.1:43785 Failed on
 Connect Exception: java.net.ConnectException: Connection Refused

 IN MY Code:
 Configuration config = new Configuration();
 config.set(fs.defaultFS,hdfs://localhost:9000);
 FileSystem fs = FileSystem.get(config);


 /etc/hosts contain
 181.45.83.79 localhost

 core-site contain

 namefs.default.name/name
 valuehdfs://localhost:9000/value


 Kindly tell me why it is trying to connect on 131.0.0.1  why at port
 43785 .


 The same code is working fine in java without implementing it in storm 
 i'm using hadoop 1.0.2.


 On Fri, Jul 18, 2014 at 11:33 AM, Parth Brahmbhatt 
 pbrahmbh...@hortonworks.com wrote:

 Hi Amjad,

 Is there any reason you can not upgrade to hadoop 2.0? Hadoop 2.0 has
 made many improvements over 1.X versions and they are source compatible so
 any of your MR jobs will be unaffected as long as you recompile with 2.x.

 The code we pointed at assumes that all the classes for hadoop 2.X are
 present in your class path. if you are not using maven or some other build
 system and would like to add jars manually you probably will have tough
 time resolving conflicts so I would advise against it.
 If you still want to add jars manually my best guess would be to look
 under
 YOUR_HADOO_INSTALLATION_DIR/libexec/share/hadoop/

 Thanks
 Parth
 On Jul 18, 2014, at 10:56 AM, amjad khan amjadkhan987...@gmail.com
 wrote:

 Thanks for your reply taylor. I'm using hadoop1.0.2. Can u suggest me
 any alternative to connect to hadoop.



 On Fri, Jul 18, 2014 at 8:45 AM, P. Taylor Goetz ptgo...@gmail.com
 wrote:

 What version of Hadoop are you using? Storm-hdfs requires Hadoop 2.x.

 - Taylor

 On Jul 18, 2014, at 6:07 AM, amjad khan amjadkhan987...@gmail.com
 wrote:

 Thanks for your help parth

 When i trying to run the topology to write the data to hdfs it throws
 exception Class Not Found:
 org.apache.hadoop.client.hdfs.HDFSDataOutputStream$SyncFlags
 Can anyone tell me what are the jars needed to execute the code to
 write data to hdfs. Please tell me all the required jars.


 On Wed, Jul 16, 2014 at 10:46 AM, Parth Brahmbhatt 
 pbrahmbh...@hortonworks.com wrote:

 You can use

 https://github.com/ptgoetz/storm-hdfs

 It supports writing to HDFS with both Storm bolts and trident states.
 Thanks
 Parth

 On Jul 16, 2014, at 10:41 AM, amjad khan amjadkhan987...@gmail.com
 wrote:

 Can anyone provide the code for bolt to write its data to hdfs. Kindly
 tell me the jar's required to run that bolt.


 On Mon, Jul 14, 2014 at 2:33 PM, Max Evers mcev...@gmail.com wrote:

 Can you expand on your use case? What is the query selecting on? Is
 the column you are querying on indexed?  Do you really need to look at 
 the
 entire 20 gb every 20ms?
  On Jul 14, 2014 6:39 AM, amjad khan amjadkhan987...@gmail.com
 wrote:

 I made a storm topoogy where spout was fetching data from mysql
 using select query. The select query was fired after every 30 msec but
 because the size of the table is more than 20 GB the select query takes
 more than 10 sec to execute therefore this is not working. I need to 
 know
 what are the possible alternatives for this situation. Kindly reply as 
 soon
 as possible.

 Thanks,




 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information that is