Re: Need help to use storm with mysql.
I have a situation where i have seven bolts and one spout & i want to distribute the tuples according to the field ID. For eg. if ID=21 I want the tuple to be processed by first bolt ID=31 I want that tuple to be processed by second bolt & so on. So is there a way to implement these. I was thinking about using fields grouping but in that i can only define the field name but not the value of that field, So if i use field grouping i don't think there would be a guarantee that suppose for ID=21 the tuple would be processed by first bolt. Kindly correct me if i'm wrong about field grouping & provide solution to implement these kind of topology. Thanks in advance. On Fri, Aug 1, 2014 at 10:20 PM, amjad khan wrote: > My bolt tries to write data to hdfs but the whole data is not written it > throws exception > > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on > /storm.txt File does not exist. Holder DFSClient_attempt_storm.txt does not > have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1557) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1548 > > Kindly help me if anyone has any idea about this. > > > > On Sat, Jul 26, 2014 at 12:47 PM, amjad khan > wrote: > >> Output when using bolt that tries to write its data to hdfs. >> >> INFO org.apache.hadoop,ipc.Client - Retrying Connect to Server: localhost/ >> 131.0.0.1:43785 Already tried 6 time(s). >> WARN Caught URI Exception >> java.net.ConnectException Call to localhost/131.0.0.1:43785 Failed on >> Connect Exception: java.net.ConnectException: Connection Refused >> >> IN MY Code: >> Configuration config = new Configuration(); >> config.set("fs.defaultFS","hdfs://localhost:9000"); >> FileSystem fs = FileSystem.get(config); >> >> >> /etc/hosts contain >> 181.45.83.79 localhost >> >> core-site contain >> >> fs.default.name >> hdfs://localhost:9000 >> >> >> Kindly tell me why it is trying to connect on 131.0.0.1 & why at port >> 43785 . >> >> >> The same code is working fine in java without implementing it in storm & >> i'm using hadoop 1.0.2. >> >> >> On Fri, Jul 18, 2014 at 11:33 AM, Parth Brahmbhatt < >> pbrahmbh...@hortonworks.com> wrote: >> >>> Hi Amjad, >>> >>> Is there any reason you can not upgrade to hadoop 2.0? Hadoop 2.0 has >>> made many improvements over 1.X versions and they are source compatible so >>> any of your MR jobs will be unaffected as long as you recompile with 2.x. >>> >>> The code we pointed at assumes that all the classes for hadoop 2.X are >>> present in your class path. if you are not using maven or some other build >>> system and would like to add jars manually you probably will have tough >>> time resolving conflicts so I would advise against it. >>> If you still want to add jars manually my best guess would be to look >>> under >>> /libexec/share/hadoop/ >>> >>> Thanks >>> Parth >>> On Jul 18, 2014, at 10:56 AM, amjad khan >>> wrote: >>> >>> Thanks for your reply taylor. I'm using hadoop1.0.2. Can u suggest me >>> any alternative to connect to hadoop. >>> >>> >>> >>> On Fri, Jul 18, 2014 at 8:45 AM, P. Taylor Goetz >>> wrote: >>> What version of Hadoop are you using? Storm-hdfs requires Hadoop 2.x. - Taylor On Jul 18, 2014, at 6:07 AM, amjad khan wrote: Thanks for your help parth When i trying to run the topology to write the data to hdfs it throws exception Class Not Found: org.apache.hadoop.client.hdfs.HDFSDataOutputStream$SyncFlags Can anyone tell me what are the jars needed to execute the code to write data to hdfs. Please tell me all the required jars. On Wed, Jul 16, 2014 at 10:46 AM, Parth Brahmbhatt < pbrahmbh...@hortonworks.com> wrote: > You can use > > https://github.com/ptgoetz/storm-hdfs > > It supports writing to HDFS with both Storm bolts and trident states. > Thanks > Parth > > On Jul 16, 2014, at 10:41 AM, amjad khan > wrote: > > Can anyone provide the code for bolt to write its data to hdfs. Kindly > tell me the jar's required to run that bolt. > > > On Mon, Jul 14, 2014 at 2:33 PM, Max Evers wrote: > >> Can you expand on your use case? What is the query selecting on? Is >> the column you are querying on indexed? Do you really need to look at >> the >> entire 20 gb every 20ms? >> On Jul 14, 2014 6:39 AM, "amjad khan" >> wrote: >> >>> I made a storm topoogy where spout was fetching data from mysql >>> using select query. The select query was fired after every 30 msec but >>> because the size of the table is more than 20 GB the select query takes >>> more than 10 sec to execute therefore this is not working. I need to >>> know >>> what are the possible alternatives for this situation. Kindly reply as >>> soo
Re: Kafka + Storm
Kafka is also distributed in nature, which is not something easily achieved by queuing brokers like ActiveMQ or JMS (1.0) in general. Kafka allows data to be partitioned across many machines which can grow as necessary as your data grows. On Thu, Aug 14, 2014 at 11:20 PM, Justin Workman wrote: > Absolutely! > > Sent from my iPhone > > On Aug 14, 2014, at 9:02 PM, anand nalya wrote: > > I agree, not for the long run but for small bursts in data production > rate, say peak hours, Kafka can help in providing a somewhat consistent > load on Storm cluster. > -- > From: Justin Workman > Sent: 15-08-2014 07:53 > To: user@storm.incubator.apache.org > Subject: Re: Kafka + Storm > > I suppose not directly. It depends on the lifetime of your Kafka queues > and on your latency requirements. You need to make sure you have enough > "doctors" or in storm language workers, in your storm cluster to process > your messages within your SLA. > > For our case we, we have a 3 hour lifetime or ttl configured for our > queues. Meaning records in the queue older than 3 hours are purged. We also > have an internal SLA ( team goal, not published to the business ;)) of 10 > seconds from event to end of stream and available for end user consumption. > > So we need to make sure we have enough storm workers to to meet; 1) the > normal SLA and 2) be able to "catch up" on the queues when we have to take > storm down for maintenance and such and the queues build. > > There are many knobs you can tune for both storm and Kafka. We have spent > many hours tuning things to meet our SLAs. > > Justin > > Sent from my iPhone > > On Aug 14, 2014, at 8:05 PM, anand nalya wrote: > > Also, since Kafka acts as a buffer, storm is not directly affected by the > speed of your data sources/producers. > -- > From: Justin Workman > Sent: 15-08-2014 07:12 > To: user@storm.incubator.apache.org > Subject: Re: Kafka + Storm > > Good analogy! > > Sent from my iPhone > > On Aug 14, 2014, at 7:36 PM, "Adaryl \"Bob\" Wakefield, MBA" < > adaryl.wakefi...@hotmail.com> wrote: > > Ah so Storm is the hospital and Kafka is the waiting room where > everybody queues up to be seen in turn yes? > > Adaryl "Bob" Wakefield, MBA > Principal > Mass Street Analytics > 913.938.6685 > www.linkedin.com/in/bobwakefieldmba > Twitter: @BobLovesData > > *From:* Justin Workman > *Sent:* Thursday, August 14, 2014 7:47 PM > *To:* user@storm.incubator.apache.org > *Subject:* Re: Kafka + Storm > > If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see > if I can explain, I am definitely not a subject matter expert on this. > > Within Kafka you can create "queues", ie a webclicks queue. Your web > servers can then send click events to this queue in Kafka. The web servers, > or agent writing the events to this queue are referred to as the > "producer". Each event, or message in Kafka is assigned an id. > > On the other side there are "consumers", in storms case this would be the > storm Kafka spout, that can subscribe to this webclicks queue to consume > the messages that are in the queue. The consumer can consume a single > message from the queue, or a batch of messages, as storm does. The consumer > keeps track of the latest offset, Kafka message id, that it has consumed. > This way the next time the consumer checks to see if there are more > messages to consume it will ask for messages with a message id greater than > its last offset. > > This helps with the reliability of the event stream and helps guarantee > that your events/message make it start to finish through your stream, > assuming the events get to Kafka ;) > > Hope this helps and makes some sort of sense. Again, sent from my iPhone ;) > > Justin > > Sent from my iPhone > > On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" < > adaryl.wakefi...@hotmail.com> wrote: > > I get your reasoning at a high level. I should have specified that I > wasn’t sure what Kafka does. I don’t have a hard software engineering > background. I know that Kafka is “a message queuing” system, but I don’t > really know what that means. > > (I can’t believe you wrote all that from your iPhone) > B. > > > *From:* Justin Workman > *Sent:* Thursday, August 14, 2014 7:22 PM > *To:* user@storm.incubator.apache.org > *Subject:* Re: Kafka + Storm > > Personally, we looked at several options, including writing our own > storm source. There are limited storm sources with community support out > there. For us, it boiled down to the following; > > 1) community support and what appeared to be a standard method. Storm has > now included the kafka source as a bundled component to storm. This made > the implementation much faster, because the code was done. > 2) the durability (replication and clustering) of Kafka. We have a three > hour retention period on our queues, so if we need to do maintenance on > storm or deploy an updated topology, we don't need to s
Re: Kafka + Storm
Absolutely! Sent from my iPhone On Aug 14, 2014, at 9:02 PM, anand nalya wrote: I agree, not for the long run but for small bursts in data production rate, say peak hours, Kafka can help in providing a somewhat consistent load on Storm cluster. -- From: Justin Workman Sent: 15-08-2014 07:53 To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm I suppose not directly. It depends on the lifetime of your Kafka queues and on your latency requirements. You need to make sure you have enough "doctors" or in storm language workers, in your storm cluster to process your messages within your SLA. For our case we, we have a 3 hour lifetime or ttl configured for our queues. Meaning records in the queue older than 3 hours are purged. We also have an internal SLA ( team goal, not published to the business ;)) of 10 seconds from event to end of stream and available for end user consumption. So we need to make sure we have enough storm workers to to meet; 1) the normal SLA and 2) be able to "catch up" on the queues when we have to take storm down for maintenance and such and the queues build. There are many knobs you can tune for both storm and Kafka. We have spent many hours tuning things to meet our SLAs. Justin Sent from my iPhone On Aug 14, 2014, at 8:05 PM, anand nalya wrote: Also, since Kafka acts as a buffer, storm is not directly affected by the speed of your data sources/producers. -- From: Justin Workman Sent: 15-08-2014 07:12 To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm Good analogy! Sent from my iPhone On Aug 14, 2014, at 7:36 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: Ah so Storm is the hospital and Kafka is the waiting room where everybody queues up to be seen in turn yes? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData *From:* Justin Workman *Sent:* Thursday, August 14, 2014 7:47 PM *To:* user@storm.incubator.apache.org *Subject:* Re: Kafka + Storm If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I can explain, I am definitely not a subject matter expert on this. Within Kafka you can create "queues", ie a webclicks queue. Your web servers can then send click events to this queue in Kafka. The web servers, or agent writing the events to this queue are referred to as the "producer". Each event, or message in Kafka is assigned an id. On the other side there are "consumers", in storms case this would be the storm Kafka spout, that can subscribe to this webclicks queue to consume the messages that are in the queue. The consumer can consume a single message from the queue, or a batch of messages, as storm does. The consumer keeps track of the latest offset, Kafka message id, that it has consumed. This way the next time the consumer checks to see if there are more messages to consume it will ask for messages with a message id greater than its last offset. This helps with the reliability of the event stream and helps guarantee that your events/message make it start to finish through your stream, assuming the events get to Kafka ;) Hope this helps and makes some sort of sense. Again, sent from my iPhone ;) Justin Sent from my iPhone On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: I get your reasoning at a high level. I should have specified that I wasn’t sure what Kafka does. I don’t have a hard software engineering background. I know that Kafka is “a message queuing” system, but I don’t really know what that means. (I can’t believe you wrote all that from your iPhone) B. *From:* Justin Workman *Sent:* Thursday, August 14, 2014 7:22 PM *To:* user@storm.incubator.apache.org *Subject:* Re: Kafka + Storm Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefiel
RE: Kafka + Storm
I agree, not for the long run but for small bursts in data production rate, say peak hours, Kafka can help in providing a somewhat consistent load on Storm cluster. -Original Message- From: "Justin Workman" Sent: 15-08-2014 07:53 To: "user@storm.incubator.apache.org" Subject: Re: Kafka + Storm I suppose not directly. It depends on the lifetime of your Kafka queues and on your latency requirements. You need to make sure you have enough "doctors" or in storm language workers, in your storm cluster to process your messages within your SLA. For our case we, we have a 3 hour lifetime or ttl configured for our queues. Meaning records in the queue older than 3 hours are purged. We also have an internal SLA ( team goal, not published to the business ;)) of 10 seconds from event to end of stream and available for end user consumption. So we need to make sure we have enough storm workers to to meet; 1) the normal SLA and 2) be able to "catch up" on the queues when we have to take storm down for maintenance and such and the queues build. There are many knobs you can tune for both storm and Kafka. We have spent many hours tuning things to meet our SLAs. Justin Sent from my iPhone On Aug 14, 2014, at 8:05 PM, anand nalya wrote: Also, since Kafka acts as a buffer, storm is not directly affected by the speed of your data sources/producers. From: Justin Workman Sent: 15-08-2014 07:12 To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm Good analogy! Sent from my iPhone On Aug 14, 2014, at 7:36 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: Ah so Storm is the hospital and Kafka is the waiting room where everybody queues up to be seen in turn yes? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Justin Workman Sent: Thursday, August 14, 2014 7:47 PM To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I can explain, I am definitely not a subject matter expert on this. Within Kafka you can create "queues", ie a webclicks queue. Your web servers can then send click events to this queue in Kafka. The web servers, or agent writing the events to this queue are referred to as the "producer". Each event, or message in Kafka is assigned an id. On the other side there are "consumers", in storms case this would be the storm Kafka spout, that can subscribe to this webclicks queue to consume the messages that are in the queue. The consumer can consume a single message from the queue, or a batch of messages, as storm does. The consumer keeps track of the latest offset, Kafka message id, that it has consumed. This way the next time the consumer checks to see if there are more messages to consume it will ask for messages with a message id greater than its last offset. This helps with the reliability of the event stream and helps guarantee that your events/message make it start to finish through your stream, assuming the events get to Kafka ;) Hope this helps and makes some sort of sense. Again, sent from my iPhone ;) Justin Sent from my iPhone On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: I get your reasoning at a high level. I should have specified that I wasn’t sure what Kafka does. I don’t have a hard software engineering background. I know that Kafka is “a message queuing” system, but I don’t really know what that means. (I can’t believe you wrote all that from your iPhone) B. From: Justin Workman Sent: Thursday, August 14, 2014 7:22 PM To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without havin
Re: Kafka + Storm
I suppose not directly. It depends on the lifetime of your Kafka queues and on your latency requirements. You need to make sure you have enough "doctors" or in storm language workers, in your storm cluster to process your messages within your SLA. For our case we, we have a 3 hour lifetime or ttl configured for our queues. Meaning records in the queue older than 3 hours are purged. We also have an internal SLA ( team goal, not published to the business ;)) of 10 seconds from event to end of stream and available for end user consumption. So we need to make sure we have enough storm workers to to meet; 1) the normal SLA and 2) be able to "catch up" on the queues when we have to take storm down for maintenance and such and the queues build. There are many knobs you can tune for both storm and Kafka. We have spent many hours tuning things to meet our SLAs. Justin Sent from my iPhone On Aug 14, 2014, at 8:05 PM, anand nalya wrote: Also, since Kafka acts as a buffer, storm is not directly affected by the speed of your data sources/producers. -- From: Justin Workman Sent: 15-08-2014 07:12 To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm Good analogy! Sent from my iPhone On Aug 14, 2014, at 7:36 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: Ah so Storm is the hospital and Kafka is the waiting room where everybody queues up to be seen in turn yes? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData *From:* Justin Workman *Sent:* Thursday, August 14, 2014 7:47 PM *To:* user@storm.incubator.apache.org *Subject:* Re: Kafka + Storm If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I can explain, I am definitely not a subject matter expert on this. Within Kafka you can create "queues", ie a webclicks queue. Your web servers can then send click events to this queue in Kafka. The web servers, or agent writing the events to this queue are referred to as the "producer". Each event, or message in Kafka is assigned an id. On the other side there are "consumers", in storms case this would be the storm Kafka spout, that can subscribe to this webclicks queue to consume the messages that are in the queue. The consumer can consume a single message from the queue, or a batch of messages, as storm does. The consumer keeps track of the latest offset, Kafka message id, that it has consumed. This way the next time the consumer checks to see if there are more messages to consume it will ask for messages with a message id greater than its last offset. This helps with the reliability of the event stream and helps guarantee that your events/message make it start to finish through your stream, assuming the events get to Kafka ;) Hope this helps and makes some sort of sense. Again, sent from my iPhone ;) Justin Sent from my iPhone On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: I get your reasoning at a high level. I should have specified that I wasn’t sure what Kafka does. I don’t have a hard software engineering background. I know that Kafka is “a message queuing” system, but I don’t really know what that means. (I can’t believe you wrote all that from your iPhone) B. *From:* Justin Workman *Sent:* Thursday, August 14, 2014 7:22 PM *To:* user@storm.incubator.apache.org *Subject:* Re: Kafka + Storm Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without having Kafka in the middle? B.
RE: Kafka + Storm
Also, since Kafka acts as a buffer, storm is not directly affected by the speed of your data sources/producers. -Original Message- From: "Justin Workman" Sent: 15-08-2014 07:12 To: "user@storm.incubator.apache.org" Subject: Re: Kafka + Storm Good analogy! Sent from my iPhone On Aug 14, 2014, at 7:36 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: Ah so Storm is the hospital and Kafka is the waiting room where everybody queues up to be seen in turn yes? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Justin Workman Sent: Thursday, August 14, 2014 7:47 PM To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I can explain, I am definitely not a subject matter expert on this. Within Kafka you can create "queues", ie a webclicks queue. Your web servers can then send click events to this queue in Kafka. The web servers, or agent writing the events to this queue are referred to as the "producer". Each event, or message in Kafka is assigned an id. On the other side there are "consumers", in storms case this would be the storm Kafka spout, that can subscribe to this webclicks queue to consume the messages that are in the queue. The consumer can consume a single message from the queue, or a batch of messages, as storm does. The consumer keeps track of the latest offset, Kafka message id, that it has consumed. This way the next time the consumer checks to see if there are more messages to consume it will ask for messages with a message id greater than its last offset. This helps with the reliability of the event stream and helps guarantee that your events/message make it start to finish through your stream, assuming the events get to Kafka ;) Hope this helps and makes some sort of sense. Again, sent from my iPhone ;) Justin Sent from my iPhone On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: I get your reasoning at a high level. I should have specified that I wasn’t sure what Kafka does. I don’t have a hard software engineering background. I know that Kafka is “a message queuing” system, but I don’t really know what that means. (I can’t believe you wrote all that from your iPhone) B. From: Justin Workman Sent: Thursday, August 14, 2014 7:22 PM To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without having Kafka in the middle? B.
Re: Kafka + Storm
Good analogy! Sent from my iPhone On Aug 14, 2014, at 7:36 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: Ah so Storm is the hospital and Kafka is the waiting room where everybody queues up to be seen in turn yes? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData *From:* Justin Workman *Sent:* Thursday, August 14, 2014 7:47 PM *To:* user@storm.incubator.apache.org *Subject:* Re: Kafka + Storm If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I can explain, I am definitely not a subject matter expert on this. Within Kafka you can create "queues", ie a webclicks queue. Your web servers can then send click events to this queue in Kafka. The web servers, or agent writing the events to this queue are referred to as the "producer". Each event, or message in Kafka is assigned an id. On the other side there are "consumers", in storms case this would be the storm Kafka spout, that can subscribe to this webclicks queue to consume the messages that are in the queue. The consumer can consume a single message from the queue, or a batch of messages, as storm does. The consumer keeps track of the latest offset, Kafka message id, that it has consumed. This way the next time the consumer checks to see if there are more messages to consume it will ask for messages with a message id greater than its last offset. This helps with the reliability of the event stream and helps guarantee that your events/message make it start to finish through your stream, assuming the events get to Kafka ;) Hope this helps and makes some sort of sense. Again, sent from my iPhone ;) Justin Sent from my iPhone On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: I get your reasoning at a high level. I should have specified that I wasn’t sure what Kafka does. I don’t have a hard software engineering background. I know that Kafka is “a message queuing” system, but I don’t really know what that means. (I can’t believe you wrote all that from your iPhone) B. *From:* Justin Workman *Sent:* Thursday, August 14, 2014 7:22 PM *To:* user@storm.incubator.apache.org *Subject:* Re: Kafka + Storm Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without having Kafka in the middle? B.
Re: Kafka + Storm
Ah so Storm is the hospital and Kafka is the waiting room where everybody queues up to be seen in turn yes? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Justin Workman Sent: Thursday, August 14, 2014 7:47 PM To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I can explain, I am definitely not a subject matter expert on this. Within Kafka you can create "queues", ie a webclicks queue. Your web servers can then send click events to this queue in Kafka. The web servers, or agent writing the events to this queue are referred to as the "producer". Each event, or message in Kafka is assigned an id. On the other side there are "consumers", in storms case this would be the storm Kafka spout, that can subscribe to this webclicks queue to consume the messages that are in the queue. The consumer can consume a single message from the queue, or a batch of messages, as storm does. The consumer keeps track of the latest offset, Kafka message id, that it has consumed. This way the next time the consumer checks to see if there are more messages to consume it will ask for messages with a message id greater than its last offset. This helps with the reliability of the event stream and helps guarantee that your events/message make it start to finish through your stream, assuming the events get to Kafka ;) Hope this helps and makes some sort of sense. Again, sent from my iPhone ;) Justin Sent from my iPhone On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: I get your reasoning at a high level. I should have specified that I wasn’t sure what Kafka does. I don’t have a hard software engineering background. I know that Kafka is “a message queuing” system, but I don’t really know what that means. (I can’t believe you wrote all that from your iPhone) B. From: Justin Workman Sent: Thursday, August 14, 2014 7:22 PM To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without having Kafka in the middle? B.
Re: Kafka + Storm
If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I can explain, I am definitely not a subject matter expert on this. Within Kafka you can create "queues", ie a webclicks queue. Your web servers can then send click events to this queue in Kafka. The web servers, or agent writing the events to this queue are referred to as the "producer". Each event, or message in Kafka is assigned an id. On the other side there are "consumers", in storms case this would be the storm Kafka spout, that can subscribe to this webclicks queue to consume the messages that are in the queue. The consumer can consume a single message from the queue, or a batch of messages, as storm does. The consumer keeps track of the latest offset, Kafka message id, that it has consumed. This way the next time the consumer checks to see if there are more messages to consume it will ask for messages with a message id greater than its last offset. This helps with the reliability of the event stream and helps guarantee that your events/message make it start to finish through your stream, assuming the events get to Kafka ;) Hope this helps and makes some sort of sense. Again, sent from my iPhone ;) Justin Sent from my iPhone On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: I get your reasoning at a high level. I should have specified that I wasn’t sure what Kafka does. I don’t have a hard software engineering background. I know that Kafka is “a message queuing” system, but I don’t really know what that means. (I can’t believe you wrote all that from your iPhone) B. *From:* Justin Workman *Sent:* Thursday, August 14, 2014 7:22 PM *To:* user@storm.incubator.apache.org *Subject:* Re: Kafka + Storm Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without having Kafka in the middle? B.
Re: Kafka + Storm
I get your reasoning at a high level. I should have specified that I wasn’t sure what Kafka does. I don’t have a hard software engineering background. I know that Kafka is “a message queuing” system, but I don’t really know what that means. (I can’t believe you wrote all that from your iPhone) B. From: Justin Workman Sent: Thursday, August 14, 2014 7:22 PM To: user@storm.incubator.apache.org Subject: Re: Kafka + Storm Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" wrote: Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without having Kafka in the middle? B.
Re: Kafka + Storm
Personally, we looked at several options, including writing our own storm source. There are limited storm sources with community support out there. For us, it boiled down to the following; 1) community support and what appeared to be a standard method. Storm has now included the kafka source as a bundled component to storm. This made the implementation much faster, because the code was done. 2) the durability (replication and clustering) of Kafka. We have a three hour retention period on our queues, so if we need to do maintenance on storm or deploy an updated topology, we don't need to stop or replay any sources 3) the ability to have other tools attach to the Kafka queues to consume the same events for other purposes. 4) to compliment point #1, it's easy to write to Kafka. So it was little effort to start sending our desired data to Kafka. These are our main reasons ( I'm sure there were more ). Each use case is going to be different and Kafka might not be the best choice for everyone. For us it made sense. Justin Sent from my iPhone On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without having Kafka in the middle? B.
Kafka + Storm
Can someone tell me why people put Kafka in front of Storm? Can’t Storm ingest messages without having Kafka in the middle? B.
java.lang.ArrayIndexOutOfBoundsException: 3 at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor
I am getting this error message in the Storm UI. Topology works fine on localCluster. java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 3 at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) at backtype.storm.daemon.executor$fn__5641$fn__5653$fn__5700.invoke(executor.clj:746) at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at .method(.java:135) at .method(.java:83) at .execute(.java:56) at backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) at backtype.storm.daemon.executor$fn__5641$tuple_action_fn__5643.invoke(executor.clj:631) at backtype.storm.daemon.executor$mk_task_receiver$fn__5564.invoke(executor.clj:399) at backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ... 6 more I am wondering if it has to do with curator version. Coz the storm distribution comes with curator 2.4.0 and i think we have to use curator 2.5.0. I am using storm 0.9.2 with kafka_2.10-0.8.1.1, zookeeper 3.4.5. -- Kushan Maskey 817.403.7500
UI not showing full trace of exception
I am having issue that there are exceptions shown in the storm UI for a particular bolt. When i go inside that bolt in the UI, most of the time I dont see any detail trace. Sometime it does but not all the time. Where can I see the detail trace? I dont see them in the ui.log, supervisor.log, nimbus.log or any work logs. What do I need to do to see the entire log trace? Thanks. -- Kushan Maskey 817.403.7500