Re: Batch size in producer

2012-12-20 Thread Neha Narkhede
You can call the close() API on the producer

Thanks,
Neha


On Thu, Dec 20, 2012 at 3:06 PM, Subhash Agrawal wrote:

> Thanks Neha.
>
> How can I shutdown the producer? is it ctrl-c?
>
> -Original Message-
> From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> Sent: Thursday, December 20, 2012 2:54 PM
> To: users@kafka.apache.org
> Subject: Re: Batch size in producer
>
> > How big should be the batch size?
> >
>
> In production at Linkedin, we use a batch size of 200, which has worked
> pretty well.
>
>
> > What happens if producer client crashes before batch is full and messages
> > were still sitting in producer queue?  Does it recover those messages
> when
> > we restart the producer?
> >
>
> If the producer shuts down, it flushes out the messages in the internal
> queue. If it crashes, it does not get a chance to do that and there is no
> good alternative to this.
>
>
> > Are these messages stored in memory or disk?
> >
>
> In memory.
>
> Thanks,
> Neha
>


Re: Http based producer

2012-12-20 Thread Pratyush Chandra
Hi David,

I was looking into the listed node.js library. Prozess doesn't seem to use
zookeeper for connection.

Instead, I found one (mentioned below) which uses zookeeper based
connection in node.js .
https://npmjs.org/package/franz-kafka
https://github.com/dannycoates/franz-kafka

Are you aware of this library ?

Thanks
Pratyush

On Thu, Dec 20, 2012 at 7:26 PM, David Arthur  wrote:

> There are several clients available listed on the project wiki. Node.js is
> among them
>
> https://cwiki.apache.org/**confluence/display/KAFKA/**
> Kafka+non-java+clients
>
> Since Kafka doesn't support the websockets or HTTP directly, you would
> need a middle man to redirect events from the browser to a Kafka broker.
>
> -David
>
>
> On 12/20/12 4:16 AM, Pratyush Chandra wrote:
>
>> Hi,
>>
>> I am new to Kafka. I am exploring ways to pump events from http
>> browser(using javascript) or over tcp (say using node js) to broker.
>> Currently I see, only scala based producer in source code.
>> What is the best way to do it ? Is there any standard client library which
>> supports it ?
>>
>> Thanks
>> Pratyush Chandra
>>
>>
>


-- 
Pratyush Chandra


RE: Batch size in producer

2012-12-20 Thread Subhash Agrawal
Thanks Neha.

How can I shutdown the producer? is it ctrl-c?
 
-Original Message-
From: Neha Narkhede [mailto:neha.narkh...@gmail.com] 
Sent: Thursday, December 20, 2012 2:54 PM
To: users@kafka.apache.org
Subject: Re: Batch size in producer

> How big should be the batch size?
>

In production at Linkedin, we use a batch size of 200, which has worked
pretty well.


> What happens if producer client crashes before batch is full and messages
> were still sitting in producer queue?  Does it recover those messages when
> we restart the producer?
>

If the producer shuts down, it flushes out the messages in the internal
queue. If it crashes, it does not get a chance to do that and there is no
good alternative to this.


> Are these messages stored in memory or disk?
>

In memory.

Thanks,
Neha


Re: Batch size in producer

2012-12-20 Thread Neha Narkhede
> How big should be the batch size?
>

In production at Linkedin, we use a batch size of 200, which has worked
pretty well.


> What happens if producer client crashes before batch is full and messages
> were still sitting in producer queue?  Does it recover those messages when
> we restart the producer?
>

If the producer shuts down, it flushes out the messages in the internal
queue. If it crashes, it does not get a chance to do that and there is no
good alternative to this.


> Are these messages stored in memory or disk?
>

In memory.

Thanks,
Neha


Batch size in producer

2012-12-20 Thread Subhash Agrawal
Hi,

I have some questions about the batch size for producer.

How big should be the batch size?
What happens if producer client crashes before batch is full and messages were 
still sitting in producer queue?  Does it recover those messages when we 
restart the producer?
Are these messages stored in memory or disk?

Thanks
Subhash A.


Re: Proper use of ConsumerConnector

2012-12-20 Thread Neha Narkhede
> Is that the correct interpretation?


Correct.


Re: Proper use of ConsumerConnector

2012-12-20 Thread Tom Brown
In order to support rollbacks and checkpoints, there would have to be
a way to both supply partition offsets to the consumer before reading,
as well as retrieve partition offsets from them consumer once reading
is complete.

>From what I've read here, it appears that neither the
ConsumerConnector nor the ZookeeperConsumerConnector have either of
those capabilities. In order to finely manage offsets, only the
SimpleConsumer will work. Is that the correct interpretation?

--Tom

On Thu, Dec 20, 2012 at 11:13 AM, Neha Narkhede  wrote:
>> An alternative to using simpleconsumer in this use case is to use the
>> zookeeper consumer connector and turn off auto commit.
>>
>
> Keep in mind that this works only if you don't care about controlling per
> partition rewind capability.
> The high level consumer will not give you control over which partitions
> your consumer consumes and
> which partitions it commits the offsets for. If you need to rewind
> consumption for a subset of those partitions,
> then ZookeeperConsumerConnector will not work for you.
>
> Thanks,
> Neha


Re: Proper use of ConsumerConnector

2012-12-20 Thread Neha Narkhede
> An alternative to using simpleconsumer in this use case is to use the
> zookeeper consumer connector and turn off auto commit.
>

Keep in mind that this works only if you don't care about controlling per
partition rewind capability.
The high level consumer will not give you control over which partitions
your consumer consumes and
which partitions it commits the offsets for. If you need to rewind
consumption for a subset of those partitions,
then ZookeeperConsumerConnector will not work for you.

Thanks,
Neha


Re: Proper use of ConsumerConnector

2012-12-20 Thread Joel Koshy
“unless you have a good reason to load balance and manage offsets manually”
>
> In general one consumer connector consumes more than one partition.
> In client side, we want to get all partitions offset for any message, if
> crash happens(some message is fetched from kafka but the result is not
> flushed to disk)
> happens we can use offset info to rewind kafka consumer.
>
> Do you think this is a good reason to use SimpleConsumer rather than
> ConsumerConnector?


An alternative to using simpleconsumer in this use case is to use the
zookeeper consumer connector and turn off auto commit. After your consumer
process is done processing a batch of messages you can all commitOffsets -
the main caveat to be aware of is that if your consumer processes batches
very fast you would write to zookeeper that often - so in fact setting an
autocommit interval and being willing to deal with duplicates is almost
equivalent. KAFKA-657 would help I think - since once that API is available
you can store your offsets anywhere you like.

Joel


>
> On 12-12-20 上午3:16, "Joel Koshy"  wrote:
>
> >In general, you should use the consumer connector - unless you have a good
> >reason to load balance and manage offsets manually (which is taken care of
> >in the consumer connector).
> >
> >
> >- Does the ConsumerConnector manage connections to multiple brokers,
> >> or just a single broker?
> >>
> >
> >Multiple brokers.
> >
> >
> >> - Does the ConsumerConnector require a thread for each partition on
> >> each broker? (If not, how many threads does it require?)
> >>
> >
> >You can specify how many streams you want - if there are more partitions
> >than threads, then a given thread can consume from multiple partitions. If
> >there are more threads than available partitions, there will be idle
> >threads.
> >
> >
> >> - Does the ConsumerConnector use actual asynchronous IO, or does it
> >> mimic it by using a dedicated behind-the-scenes thread (and the
> >> traditional java socket API)?
> >>
> >
> >The consumer connector uses SimpleConsumers for each broker that it
> >connects to. These consumers fetch from each broker and insert chunks into
> >blocking queues which the consumer iterators then dequeue.
> >
> >Joel
>
>
>


Re: Kafka Node.js Integration Questions/Advice

2012-12-20 Thread Jun Rao
Chris,

Not sure how stable those node.js clients are. In 0.8, we plan to provide a
native C version of the producer. A thin node.js layer can potentially be
built on top of that.

Thanks,

Jun

On Thu, Dec 20, 2012 at 8:46 AM, Christopher Alexander <
calexan...@gravycard.com> wrote:

> During my due diligence to assess use of Kafka for both our activity and
> log message streams, I would like to ask the project committers and
> community users about using Kafka with Node.js. Yes, I am aware that a
> Kafka client exists for Node.js (
> https://github.com/marcuswestin/node-kafka), which has spurred further
> interest by our front-end team. Here are my questions, excuse me if they
> seem "noobish".
>
> 1. How reliable is the Node.js client (
> https://github.com/marcuswestin/node-kafka) in production applications?
> If there are issues, what are they (the GitHub repo currently lists none)?
> 2. To support real-time activity streams within Node.js, what is the
> recommended consumer polling interval?
> 3. General advise observations on integrating a front-end based Node.js
> application with Kafka mediated messaging.
>
> Thanks you!
>
> Chris
>


Re: Kafka Node.js Integration Questions/Advice

2012-12-20 Thread Christopher Alexander
Thanks David. Yes, I am aware of the Prozess Node lib also. I forgot to include 
it in my posting. Good catch!

- Original Message -
From: "David Arthur" 
To: users@kafka.apache.org
Sent: Thursday, December 20, 2012 11:58:45 AM
Subject: Re: Kafka Node.js Integration Questions/Advice


On 12/20/12 11:46 AM, Christopher Alexander wrote:
> During my due diligence to assess use of Kafka for both our activity and log 
> message streams, I would like to ask the project committers and community 
> users about using Kafka with Node.js. Yes, I am aware that a Kafka client 
> exists for Node.js (https://github.com/marcuswestin/node-kafka), which has 
> spurred further interest by our front-end team. Here are my questions, excuse 
> me if they seem "noobish".
>
> 1. How reliable is the Node.js client 
> (https://github.com/marcuswestin/node-kafka) in production applications? If 
> there are issues, what are they (the GitHub repo currently lists none)?
Just FYI, there is another node.js library 
https://github.com/cainus/Prozess. I have no experience with either, so 
I cannot say how reliable they are.
> 2. To support real-time activity streams within Node.js, what is the 
> recommended consumer polling interval?
What kind of data velocity do you expect? You should only have to poll 
if your consumer catches up to the broker and there's no more data. 
Blocking/polling behavior of the consumer depends entirely on the client 
implementation.
> 3. General advise observations on integrating a front-end based Node.js 
> application with Kafka mediated messaging.
>
> Thanks you!
>
> Chris



Re: Kafka Node.js Integration Questions/Advice

2012-12-20 Thread David Arthur


On 12/20/12 11:46 AM, Christopher Alexander wrote:

During my due diligence to assess use of Kafka for both our activity and log message 
streams, I would like to ask the project committers and community users about using Kafka 
with Node.js. Yes, I am aware that a Kafka client exists for Node.js 
(https://github.com/marcuswestin/node-kafka), which has spurred further interest by our 
front-end team. Here are my questions, excuse me if they seem "noobish".

1. How reliable is the Node.js client 
(https://github.com/marcuswestin/node-kafka) in production applications? If 
there are issues, what are they (the GitHub repo currently lists none)?
Just FYI, there is another node.js library 
https://github.com/cainus/Prozess. I have no experience with either, so 
I cannot say how reliable they are.

2. To support real-time activity streams within Node.js, what is the 
recommended consumer polling interval?
What kind of data velocity do you expect? You should only have to poll 
if your consumer catches up to the broker and there's no more data. 
Blocking/polling behavior of the consumer depends entirely on the client 
implementation.

3. General advise observations on integrating a front-end based Node.js 
application with Kafka mediated messaging.

Thanks you!

Chris




Re: Proper use of ConsumerConnector

2012-12-20 Thread Neha Narkhede
> Do you think this is a good reason to use SimpleConsumer rather than
> ConsumerConnector?
>

Yes, if you want to be able to rewind to some offset, SimpleConsumer is the
right API for this purpose.


Re: Unable To Run QuickStart From CLI

2012-12-20 Thread Jun Rao
1. Yes.
2. Producer and consumer are libraries used in client services. So, the
client app needs to call the appropriate shutdown method in the producer
and consumer library.

Thanks,

Jun

On Thu, Dec 20, 2012 at 7:28 AM, Christopher Alexander <
calexan...@gravycard.com> wrote:

> Thanks Jun and Joel. I've got my Kafka development instance up and
> running. I do have a few questions through:
>
> 1. In /bin I note that there is kafka-server-stop.sh and
> zookeeper-server-stop.sh. I assume the scripts should be executed in this
> order for a clean shutdown?
> 2. Clearly absent are shutdown scripts for the producer and consumer. Does
> this reflect the overall design decisions about Kafka's push/pull model
> wherein any type of producer or consumer shutdown will not impact the
> broker? If so, no need for an explicit shutdown script. Is that correct?
>
> - Original Message -
> From: "Joel Koshy" 
> To: users@kafka.apache.org
> Sent: Wednesday, December 19, 2012 2:11:22 PM
> Subject: Re: Unable To Run QuickStart From CLI
>
> You will need to use the ConsoleConsumer (see the bin directory) or create
> a Java/Scala consumer connector.
>
>
> On Wed, Dec 19, 2012 at 9:41 AM, Christopher Alexander <
> calexan...@gravycard.com> wrote:
>
> > Hi Jun,
> >
> > Although this may not be the ideal method, I did get it working after I
> > issued the following commands:
> >
> > ./sbt clean
> > ./sbt clean-cache
> > ./sbt update
> > ./sbt package
> >
> > And then reran the FAQ QuickStart. Maybe newly created directories
> finally
> > recursively inherited my permissions.
> >
> > I see /tmp/kafka-logs/test0/.kafka but unable to open
> > the file in a standard editor to view what has been logged - file is
> > binary. What is the recommend application to view the topic log?
> >
> > - Original Message -
> > From: "Jun Rao" 
> > To: users@kafka.apache.org
> > Sent: Wednesday, December 19, 2012 11:53:56 AM
> > Subject: Re: Unable To Run QuickStart From CLI
> >
> > These exceptions are at the info level and are normal. Did you see data
> in
> > the kafka broker log?
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Dec 19, 2012 at 7:59 AM, Christopher Alexander <
> > calexan...@gravycard.com> wrote:
> >
> > > Hello All,
> > >
> > > I am in the early stages of exploring the use of Kafka for large-scale
> > > application I am authoring. Generally, the documentation has been
> pretty
> > > good and quite pleased with getting things set-up in a couple of hours.
> > > However, I am attempting to run the QuickStart using CLI to locally
> > confirm
> > > that producers/consumers works through Zookeeper and Kafka. I get the
> > > following exception when a producer connects to Zookeeper.
> Subsequently,
> > I
> > > am unable to send/receive message. The exception is:
> > >
> > > [2012-12-19 10:38:15,993] INFO Got user-level KeeperException when
> > > processing sessionid:0x13bb3cfbcde type:create cxid:0x1
> > > zxid:0xfffe txntype:unknown reqpath:n/a Error
> > Path:/brokers/ids
> > > Error:KeeperErrorCode = NoNode for /brokers/ids
> > > (org.apache.zookeeper.server.PrepRequestProcessor)
> > > [2012-12-19 10:38:16,032] INFO Got user-level KeeperException when
> > > processing sessionid:0x13bb3cfbcde type:create cxid:0x2
> > > zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/brokers
> > > Error:KeeperErrorCode = NoNode for /brokers
> > > (org.apache.zookeeper.server.PrepRequestProcessor)
> > >
> > > I would appreciate it if some could point me in the right direction.
> > >
> > >
> > > Kind regards,
> > >
> > > Chris Alexander
> > > Technical Architect and Engineer
> > > Gravy, Inc.
> > >
> > > W: http://www.gravycard.com
> > >
> > >
> >
>


Kafka Node.js Integration Questions/Advice

2012-12-20 Thread Christopher Alexander
During my due diligence to assess use of Kafka for both our activity and log 
message streams, I would like to ask the project committers and community users 
about using Kafka with Node.js. Yes, I am aware that a Kafka client exists for 
Node.js (https://github.com/marcuswestin/node-kafka), which has spurred further 
interest by our front-end team. Here are my questions, excuse me if they seem 
"noobish".

1. How reliable is the Node.js client 
(https://github.com/marcuswestin/node-kafka) in production applications? If 
there are issues, what are they (the GitHub repo currently lists none)?
2. To support real-time activity streams within Node.js, what is the 
recommended consumer polling interval?
3. General advise observations on integrating a front-end based Node.js 
application with Kafka mediated messaging.

Thanks you!

Chris


Re: Unable To Run QuickStart From CLI

2012-12-20 Thread Christopher Alexander
Thanks Jun and Joel. I've got my Kafka development instance up and running. I 
do have a few questions through:

1. In /bin I note that there is kafka-server-stop.sh and 
zookeeper-server-stop.sh. I assume the scripts should be executed in this order 
for a clean shutdown?
2. Clearly absent are shutdown scripts for the producer and consumer. Does this 
reflect the overall design decisions about Kafka's push/pull model wherein any 
type of producer or consumer shutdown will not impact the broker? If so, no 
need for an explicit shutdown script. Is that correct?

- Original Message -
From: "Joel Koshy" 
To: users@kafka.apache.org
Sent: Wednesday, December 19, 2012 2:11:22 PM
Subject: Re: Unable To Run QuickStart From CLI

You will need to use the ConsoleConsumer (see the bin directory) or create
a Java/Scala consumer connector.


On Wed, Dec 19, 2012 at 9:41 AM, Christopher Alexander <
calexan...@gravycard.com> wrote:

> Hi Jun,
>
> Although this may not be the ideal method, I did get it working after I
> issued the following commands:
>
> ./sbt clean
> ./sbt clean-cache
> ./sbt update
> ./sbt package
>
> And then reran the FAQ QuickStart. Maybe newly created directories finally
> recursively inherited my permissions.
>
> I see /tmp/kafka-logs/test0/.kafka but unable to open
> the file in a standard editor to view what has been logged - file is
> binary. What is the recommend application to view the topic log?
>
> - Original Message -
> From: "Jun Rao" 
> To: users@kafka.apache.org
> Sent: Wednesday, December 19, 2012 11:53:56 AM
> Subject: Re: Unable To Run QuickStart From CLI
>
> These exceptions are at the info level and are normal. Did you see data in
> the kafka broker log?
>
> Thanks,
>
> Jun
>
> On Wed, Dec 19, 2012 at 7:59 AM, Christopher Alexander <
> calexan...@gravycard.com> wrote:
>
> > Hello All,
> >
> > I am in the early stages of exploring the use of Kafka for large-scale
> > application I am authoring. Generally, the documentation has been pretty
> > good and quite pleased with getting things set-up in a couple of hours.
> > However, I am attempting to run the QuickStart using CLI to locally
> confirm
> > that producers/consumers works through Zookeeper and Kafka. I get the
> > following exception when a producer connects to Zookeeper. Subsequently,
> I
> > am unable to send/receive message. The exception is:
> >
> > [2012-12-19 10:38:15,993] INFO Got user-level KeeperException when
> > processing sessionid:0x13bb3cfbcde type:create cxid:0x1
> > zxid:0xfffe txntype:unknown reqpath:n/a Error
> Path:/brokers/ids
> > Error:KeeperErrorCode = NoNode for /brokers/ids
> > (org.apache.zookeeper.server.PrepRequestProcessor)
> > [2012-12-19 10:38:16,032] INFO Got user-level KeeperException when
> > processing sessionid:0x13bb3cfbcde type:create cxid:0x2
> > zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/brokers
> > Error:KeeperErrorCode = NoNode for /brokers
> > (org.apache.zookeeper.server.PrepRequestProcessor)
> >
> > I would appreciate it if some could point me in the right direction.
> >
> >
> > Kind regards,
> >
> > Chris Alexander
> > Technical Architect and Engineer
> > Gravy, Inc.
> >
> > W: http://www.gravycard.com
> >
> >
>


Re: Http based producer

2012-12-20 Thread David Arthur
There are several clients available listed on the project wiki. Node.js 
is among them


https://cwiki.apache.org/confluence/display/KAFKA/Kafka+non-java+clients

Since Kafka doesn't support the websockets or HTTP directly, you would 
need a middle man to redirect events from the browser to a Kafka broker.


-David

On 12/20/12 4:16 AM, Pratyush Chandra wrote:

Hi,

I am new to Kafka. I am exploring ways to pump events from http
browser(using javascript) or over tcp (say using node js) to broker.
Currently I see, only scala based producer in source code.
What is the best way to do it ? Is there any standard client library which
supports it ?

Thanks
Pratyush Chandra





Re: Proper use of ConsumerConnector

2012-12-20 Thread 永辉 赵
Hi Joel,

“unless you have a good reason to load balance and manage offsets manually”

In general one consumer connector consumes more than one partition.
In client side, we want to get all partitions offset for any message, if
crash happens(some message is fetched from kafka but the result is not
flushed to disk)
happens we can use offset info to rewind kafka consumer.

Do you think this is a good reason to use SimpleConsumer rather than
ConsumerConnector?

I think this is a common request, so is there any existed solution?

Thanks,
Yonghui 





On 12-12-20 上午3:16, "Joel Koshy"  wrote:

>In general, you should use the consumer connector - unless you have a good
>reason to load balance and manage offsets manually (which is taken care of
>in the consumer connector).
>
>
>- Does the ConsumerConnector manage connections to multiple brokers,
>> or just a single broker?
>>
>
>Multiple brokers.
>
>
>> - Does the ConsumerConnector require a thread for each partition on
>> each broker? (If not, how many threads does it require?)
>>
>
>You can specify how many streams you want - if there are more partitions
>than threads, then a given thread can consume from multiple partitions. If
>there are more threads than available partitions, there will be idle
>threads.
>
>
>> - Does the ConsumerConnector use actual asynchronous IO, or does it
>> mimic it by using a dedicated behind-the-scenes thread (and the
>> traditional java socket API)?
>>
>
>The consumer connector uses SimpleConsumers for each broker that it
>connects to. These consumers fetch from each broker and insert chunks into
>blocking queues which the consumer iterators then dequeue.
>
>Joel