Application Logic: In Kafka, Storm or Redis?

2013-08-28 Thread Yavar Husain
I have an application where I will be getting some Time Series data which I
am feeding to Kafka and Kafka in turn is giving data to Storm for doing
some real time processing.

Now one of my use case is that there might be certain lag in my data. For
an example: I might not get all the data for 2:00:00 PM all together. There
is a possibility that say all the data for 2:00:00 PM does not arrive at a
time and the application has to wait for all the data to arrive to perform
certain analytics.

For example, say at 2:00:00 pm I get 990 points and another 10 points (say
I know beforehand that there would be 1000 points of data per millisecond)
arrive at 2:00:40 PM. Now I have to wait for all the data to arrive to
perform analytics.

Where should I place my application logic: (1) In Kafka, (2) In Storm or
should I use something like Redis to get all the timestamped data and when
I get all the points for a particular time than only I give it to
Kafka/Storm.

I am confused :) Any help would be appreciated. Sorry for any grammatical
errors as I just was thinking aloud and jotting down my question.

Regards,
Yavar


Sending Data from more than one producer

2013-08-07 Thread Yavar Husain
How can I make multiple producers to write data? I have written a producer
that produces some data for 15 seconds on a single machine setup. Now when
I run another instance of same producer it says the port is in use (which
is natural as I think the first producer is sending data using TCP). So it
is a blocking call for me. How can I start and send data from multiple
producers at the same time. Note that it is a vanilla setup with 1 broker
on a single machine. I don't need any synchronization and I can send data
in random from both the producers.


Re: Apache Kafka Question

2013-07-22 Thread Yavar Husain
Millions of messages per day (with each message being few bytes) is not
really 'Big Data'. Kafka has been tested for a million message per second.

The answer to all your question IMO is "It depends".

You can start with a single instance (Single machine installation). Let
your producer send messages. Keep one broker. Increase to N brokers. When
you touch the upper limit add a server and repeat all the stuff.

Bench marking and scalability are aspects which you should try on your own
by playing with Kafka. Every use case is different. So performance metric
of one is not a global answer.

For your question on Topic or Queue, please read something about
Distributed Computing Pub/Sub, Message Queue's and other patterns which are
generic concepts and has nothing to do with Kafka. It again depends on your
use case.

Please read as to what topics in Kafka are? If you just go through the
definition of topics you would yourself answer your question within a
minute.

Replications and all would be next steps once you are done with a single
running instance of Kafka. So go ahead and get your hands dirty. You will
love Kafka :)

And yes, the most important thing: Please read the documentation first (bit
of theory) and then dive. There is no silver bullet.

Cheers,
Yavar
http://lnkd.in/GRrrDJ

On Mon, Jul 22, 2013 at 4:27 PM,  wrote:

> Hi,
>
>
>
> I am planning to use Apache Kafka 0.8  to handle millions of messages per
> day. Now I need to form the environment, like
>
>
>
> (i) How many Topics to be created?
> (ii) How many partitions/replications to be created?
> (iii) How many Brokers to be created?
> (iv) How many consumer instances in consumer group?
>
> (v) Topic or Queue? If topic whether we need to create multiple group Id
> as supposed to single one?
>
>
>
> How we can go about it? Please clarify.
>
> Thanks & Regards,
> Anantha
>
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
>
> www.wipro.com
>


Re: Kafka 0.7 Quickstart Errors

2013-07-08 Thread Yavar Husain
Perfect Jun! It works. Thanks a ton.

On Mon, Jul 8, 2013 at 9:00 AM, Jun Rao  wrote:

> The following is the weird part. 0:0 is not a valid host and port. Could
> you take a look at the EC2 FAQ in
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ? It's for the
> consumers, but may apply to the producers too.
>
> [2013-06-28 14:07:19,653] ERROR Connection attempt to 0:0 failed, next
> attempt in 1 ms (kafka.producer.SyncProducer)
> java.net.ConnectException: Connection refused
>
> Thanks,
>
> Jun
>
>
> On Sat, Jul 6, 2013 at 3:30 PM, Yavar Husain 
> wrote:
>
> > Hi Jun
> >
> > I am still not able to run Kafka 0.7. and getting the same error as
> > described in my thread. As for Kafka Spout to work I need Kafka 0.7 so it
> > would be great if you could help me out with this. I did not understand
> > what you mentioned in your last message "wipe out both Zookeeper and
> Kafka
> > 0.8 data".I just changed the log data directories in both kafka and
> > zookeeper configs and still I am getting the same error. Isn't that
> > sufficient? What else do I need to do to wipe out the data? What
> > directories do I need to visit?
> >
> > Will the above be the reason for getting the following error:
> >
> >  [2013-06-28 14:06:05,606] INFO Creating async producer for broker id =
> > > > 0 at 0:0 (kafka.producer.ProducerPool)
> > > > 5)  Time to send some messages & oops I get this error:
> > > > [2013-06-28 14:07:19,650] INFO Disconnecting from 0:0
> > > > (kafka.producer.SyncProducer)
> > > > [2013-06-28 14:07:19,653] ERROR Connection attempt to 0:0 failed,
> next
> > > > attempt in 1 ms (kafka.producer.SyncProducer)
> > > > java.net.ConnectException: Connection refused
> > > > at sun.nio.ch.Net.connect0(Native Method)
> > > > at sun.nio.ch.Net.connect(Net.java:364)
> > > > at sun.nio.ch.Net.connect(Net.java:356)
> > > > at
> > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
> > > > at
> kafka.producer.SyncProducer.connect(SyncProducer.scala:173)
> > > > at
> > > >
> kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:196)
> > > > at kafka.producer.SyncProducer.send(SyncProducer.scala:92)
> > > > at
> > kafka.producer.SyncProducer.multiSend(SyncProducer.scala:135)
> > > > at
> > > >
> > >
> >
> kafka.producer.async.DefaultEventHandler.send(DefaultEventHandler.scala:58)
> > > > at
> > > >
> > >
> >
> >
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:44)
> > > > at
> > > >
> > >
> >
> >
> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:116)
> > > > at
> scala.collection.immutable.Stream.foreach(Stream.scala:254)
> > > > at
> > > >
> > >
> >
> >
> kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:70)
> > > > at
> > > >
> > kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:41)
> >
> > Regards,
> > Yavar
> >
> > On Thu, Jul 4, 2013 at 4:53 PM, Yavar Husain 
> > wrote:
> >
> > > Hey Jun
> > >
> > > Thanks for your prompt response. I don't really get "wipe out both
> > > Zookeeper and Kafka 0.8 data". I just changed the log data directories
> in
> > > both kafka and zookeeper configs and still I am getting the same error.
> > > Isn't that sufficient? What else do I need to do to wipe out the data?
> > What
> > > directories do I need to visit?
> > >
> > > Thanks,
> > > Yavar
> > >
> > >
> > > On Mon, Jul 1, 2013 at 9:13 PM, Jun Rao  wrote:
> > >
> > >> You need to wipe out both the ZK data and the Kafka data from 0.8, in
> > >> order
> > >> to try 0.7.
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >>
> > >> On Sun, Jun 30, 2013 at 11:28 PM, Yavar Husain  > >> >wrote:
> > >>
> > >> > Kafka 0.8 works great. I am able to use CLI as well as write my own
> > >> > producers/consumers!
> > >> >
> > >> > Checking Zookeeper... and I see all the topics and partitions
> created
> 

Re: Kafka 0.7 Quickstart Errors

2013-07-06 Thread Yavar Husain
Hi Jun

I am still not able to run Kafka 0.7. and getting the same error as
described in my thread. As for Kafka Spout to work I need Kafka 0.7 so it
would be great if you could help me out with this. I did not understand
what you mentioned in your last message "wipe out both Zookeeper and Kafka
0.8 data".I just changed the log data directories in both kafka and
zookeeper configs and still I am getting the same error. Isn't that
sufficient? What else do I need to do to wipe out the data? What
directories do I need to visit?

Will the above be the reason for getting the following error:

 [2013-06-28 14:06:05,606] INFO Creating async producer for broker id =
> > 0 at 0:0 (kafka.producer.ProducerPool)
> > 5)  Time to send some messages & oops I get this error:
> > [2013-06-28 14:07:19,650] INFO Disconnecting from 0:0
> > (kafka.producer.SyncProducer)
> > [2013-06-28 14:07:19,653] ERROR Connection attempt to 0:0 failed, next
> > attempt in 1 ms (kafka.producer.SyncProducer)
> > java.net.ConnectException: Connection refused
> > at sun.nio.ch.Net.connect0(Native Method)
> > at sun.nio.ch.Net.connect(Net.java:364)
> > at sun.nio.ch.Net.connect(Net.java:356)
> > at
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
> > at kafka.producer.SyncProducer.connect(SyncProducer.scala:173)
> > at
> > kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:196)
> > at kafka.producer.SyncProducer.send(SyncProducer.scala:92)
> > at kafka.producer.SyncProducer.multiSend(SyncProducer.scala:135)
> > at
> >
>
kafka.producer.async.DefaultEventHandler.send(DefaultEventHandler.scala:58)
> > at
> >
>
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:44)
> > at
> >
>
kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:116)
> > at scala.collection.immutable.Stream.foreach(Stream.scala:254)
> > at
> >
>
kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:70)
> > at
> > kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:41)

Regards,
Yavar

On Thu, Jul 4, 2013 at 4:53 PM, Yavar Husain  wrote:

> Hey Jun
>
> Thanks for your prompt response. I don't really get "wipe out both
> Zookeeper and Kafka 0.8 data". I just changed the log data directories in
> both kafka and zookeeper configs and still I am getting the same error.
> Isn't that sufficient? What else do I need to do to wipe out the data? What
> directories do I need to visit?
>
> Thanks,
> Yavar
>
>
> On Mon, Jul 1, 2013 at 9:13 PM, Jun Rao  wrote:
>
>> You need to wipe out both the ZK data and the Kafka data from 0.8, in
>> order
>> to try 0.7.
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Sun, Jun 30, 2013 at 11:28 PM, Yavar Husain > >wrote:
>>
>> > Kafka 0.8 works great. I am able to use CLI as well as write my own
>> > producers/consumers!
>> >
>> > Checking Zookeeper... and I see all the topics and partitions created
>> > successfully for 0.8.
>> >
>> > Kafka 0.7 does not work!
>> >
>> > Why Kafka 0.7? I am using Kafka Spout from Storm which is made for Kafka
>> > 0.7.
>> >
>> > First I just want to run CLI based producer/consumer for Kafka 0.7,
>> which I
>> > am unable to. I carry out the following steps:
>> >
>> > 1)  I delete all the topics/partitions etc. in Zookeeper that were
>> > created from my Kafka 0.8
>> > 2)  I change the dataDir in zoo.cfg to point to different location.
>> > 3)  Now I start the kafka server 0.7. It starts successfully. However
>> > I don’t know why it again registers the broker topics I deleted?
>> > 4)  Now I start the Kafka Producer :
>> > bin/kafka-console-producer.sh --zookeeper localhost:2181 --topic
>> topicime
>> > & it starts successfully:
>> > [2013-06-28 14:06:05,521] INFO zookeeper state changed (SyncConnected)
>> > (org.I0Itec.zkclient.ZkClient)
>> > [2013-06-28 14:06:05,606] INFO Creating async producer for broker id =
>> > 0 at 0:0 (kafka.producer.ProducerPool)
>> > 5)  Time to send some messages & oops I get this error:
>> > [2013-06-28 14:07:19,650] INFO Disconnecting from 0:0
>> > (kafka.producer.SyncProducer)
>> > [2013-06-28 14:07:19,653] ERROR Connection attempt to 0:0 failed, next
>> > attempt in 1 ms (kafka.producer.SyncProducer)
>> > java.net.ConnectException: Connection ref

Re: Kafka 0.7 Quickstart Errors

2013-07-04 Thread Yavar Husain
Hey Jun

Thanks for your prompt response. I don't really get "wipe out both
Zookeeper and Kafka 0.8 data". I just changed the log data directories in
both kafka and zookeeper configs and still I am getting the same error.
Isn't that sufficient? What else do I need to do to wipe out the data? What
directories do I need to visit?

Thanks,
Yavar

On Mon, Jul 1, 2013 at 9:13 PM, Jun Rao  wrote:

> You need to wipe out both the ZK data and the Kafka data from 0.8, in order
> to try 0.7.
>
> Thanks,
>
> Jun
>
>
> On Sun, Jun 30, 2013 at 11:28 PM, Yavar Husain  >wrote:
>
> > Kafka 0.8 works great. I am able to use CLI as well as write my own
> > producers/consumers!
> >
> > Checking Zookeeper... and I see all the topics and partitions created
> > successfully for 0.8.
> >
> > Kafka 0.7 does not work!
> >
> > Why Kafka 0.7? I am using Kafka Spout from Storm which is made for Kafka
> > 0.7.
> >
> > First I just want to run CLI based producer/consumer for Kafka 0.7,
> which I
> > am unable to. I carry out the following steps:
> >
> > 1)  I delete all the topics/partitions etc. in Zookeeper that were
> > created from my Kafka 0.8
> > 2)  I change the dataDir in zoo.cfg to point to different location.
> > 3)  Now I start the kafka server 0.7. It starts successfully. However
> > I don’t know why it again registers the broker topics I deleted?
> > 4)  Now I start the Kafka Producer :
> > bin/kafka-console-producer.sh --zookeeper localhost:2181 --topic topicime
> > & it starts successfully:
> > [2013-06-28 14:06:05,521] INFO zookeeper state changed (SyncConnected)
> > (org.I0Itec.zkclient.ZkClient)
> > [2013-06-28 14:06:05,606] INFO Creating async producer for broker id =
> > 0 at 0:0 (kafka.producer.ProducerPool)
> > 5)  Time to send some messages & oops I get this error:
> > [2013-06-28 14:07:19,650] INFO Disconnecting from 0:0
> > (kafka.producer.SyncProducer)
> > [2013-06-28 14:07:19,653] ERROR Connection attempt to 0:0 failed, next
> > attempt in 1 ms (kafka.producer.SyncProducer)
> > java.net.ConnectException: Connection refused
> > at sun.nio.ch.Net.connect0(Native Method)
> > at sun.nio.ch.Net.connect(Net.java:364)
> > at sun.nio.ch.Net.connect(Net.java:356)
> > at
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
> > at kafka.producer.SyncProducer.connect(SyncProducer.scala:173)
> > at
> > kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:196)
> > at kafka.producer.SyncProducer.send(SyncProducer.scala:92)
> > at kafka.producer.SyncProducer.multiSend(SyncProducer.scala:135)
> > at
> >
> kafka.producer.async.DefaultEventHandler.send(DefaultEventHandler.scala:58)
> > at
> >
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:44)
> > at
> >
> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:116)
> > at scala.collection.immutable.Stream.foreach(Stream.scala:254)
> > at
> >
> kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:70)
> > at
> > kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:41)
> >
> > Note that Zookeeper is already running.
> >
> > Any help would really be appreciated.
> >
> > *EDIT:*
> >
> > I don't even see the topic being created in zookeeper. I am running the
> > following command:
> >
> > bin/kafka-console-producer.sh --zookeeper localhost:2181 --topic topicime
> >
> > After the command everything is fine & I get the following message:
> >
> > [2013-06-28 14:30:17,614] INFO Session establishment complete on
> > server localhost/127.0.0.1:2181, sessionid = 0x13f805c6673004b,
> > negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
> > [2013-06-28 14:30:17,615] INFO zookeeper state changed (SyncConnected)
> > (org.I0Itec.zkclient.ZkClient)
> > [2013-06-28 14:30:17,700] INFO Creating async producer for broker id =
> > 0 at 0:0 (kafka.producer.ProducerPool)
> >
> > However now when i type a string to send I get the above error
> (Connection
> > refused!)
> >
>


Kafka 0.7 Quickstart Errors

2013-06-30 Thread Yavar Husain
Kafka 0.8 works great. I am able to use CLI as well as write my own
producers/consumers!

Checking Zookeeper... and I see all the topics and partitions created
successfully for 0.8.

Kafka 0.7 does not work!

Why Kafka 0.7? I am using Kafka Spout from Storm which is made for Kafka
0.7.

First I just want to run CLI based producer/consumer for Kafka 0.7, which I
am unable to. I carry out the following steps:

1)  I delete all the topics/partitions etc. in Zookeeper that were
created from my Kafka 0.8
2)  I change the dataDir in zoo.cfg to point to different location.
3)  Now I start the kafka server 0.7. It starts successfully. However
I don’t know why it again registers the broker topics I deleted?
4)  Now I start the Kafka Producer :
bin/kafka-console-producer.sh --zookeeper localhost:2181 --topic topicime
& it starts successfully:
[2013-06-28 14:06:05,521] INFO zookeeper state changed (SyncConnected)
(org.I0Itec.zkclient.ZkClient)
[2013-06-28 14:06:05,606] INFO Creating async producer for broker id =
0 at 0:0 (kafka.producer.ProducerPool)
5)  Time to send some messages & oops I get this error:
[2013-06-28 14:07:19,650] INFO Disconnecting from 0:0
(kafka.producer.SyncProducer)
[2013-06-28 14:07:19,653] ERROR Connection attempt to 0:0 failed, next
attempt in 1 ms (kafka.producer.SyncProducer)
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:364)
at sun.nio.ch.Net.connect(Net.java:356)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
at kafka.producer.SyncProducer.connect(SyncProducer.scala:173)
at 
kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:196)
at kafka.producer.SyncProducer.send(SyncProducer.scala:92)
at kafka.producer.SyncProducer.multiSend(SyncProducer.scala:135)
at 
kafka.producer.async.DefaultEventHandler.send(DefaultEventHandler.scala:58)
at 
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:44)
at 
kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:116)
at scala.collection.immutable.Stream.foreach(Stream.scala:254)
at 
kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:70)
at 
kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:41)

Note that Zookeeper is already running.

Any help would really be appreciated.

*EDIT:*

I don't even see the topic being created in zookeeper. I am running the
following command:

bin/kafka-console-producer.sh --zookeeper localhost:2181 --topic topicime

After the command everything is fine & I get the following message:

[2013-06-28 14:30:17,614] INFO Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x13f805c6673004b,
negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2013-06-28 14:30:17,615] INFO zookeeper state changed (SyncConnected)
(org.I0Itec.zkclient.ZkClient)
[2013-06-28 14:30:17,700] INFO Creating async producer for broker id =
0 at 0:0 (kafka.producer.ProducerPool)

However now when i type a string to send I get the above error (Connection
refused!)


Want to be a subscriber of Kafka User Mailing List. Please add me. Thanks!

2013-06-28 Thread Yavar Husain