Broker ID disappears in Zookeeper

2015-02-19 Thread Sybrandy, Casey
Hello,

We're having the following issue with Kafka and/or Zookeeper:
If a broker (id=1) is running, and you start another broker with id=1, the new 
broker will exit saying A broker is already registered on the path 
/brokers/ids/1. However, I noticed when I query zookeeper /brokers/ids/1 
disappears
This behaviour doesn't make sense to us.  The concern is that if we 
accidentally start up multiple brokers with the same ID (automatic restarts), 
then we may end up with multiple brokers with the same ID running at the same 
time.

Thoughts?

Kafka: 0.8.2
Zookeeper: 3.4.5


Two Kafka Question

2014-11-24 Thread Sybrandy, Casey
Hello,

First, is there a limit to how many Kafka brokers you can have?

Second, if a Kafka broker node fails and I start a new broker on a new node, is 
it correct to assume that the cluster will copy data to that node to satisfy 
the replication factor specified for a given topic?  In other words, let's 
assume that I have a 3 node cluster and a topic with a replication factor of 3. 
 If one node fails and I start up a new node, will the new node have existing 
messages replicated to it?

Thanks.

Casey


Elastsic Scaling

2014-11-20 Thread Sybrandy, Casey
Hello,

We're looking into using Kafka for a improved version of a system and the 
question of how to scale Kafka came up.  Specifically, we want to try to make 
the system scale as transparently as possible.  The concern was that if we go 
from N to N*2 consumers that we would have some that are still backed up while 
the new ones were working on only some of the new records.  Also, if the load 
drops, can we scale down effectively?

I'm sure there's a way to do it.  I'm just hoping that someone has some 
knowledge in this area.

Thanks.


Kafka in a docker container stops with no errors logged

2014-11-12 Thread Sybrandy, Casey
Hello,

We're using Kafka 0.8.1.1 and we're trying to run it in a Docker container.  
For the most part, this has been fine, however one of the containers has 
stopped a couple times and when I look at the log output from Docker (E.g. 
Kafka STDOUT), I don't see any errors.  At one point it states that the broker 
has started and several minutes later, I see the log messages stating that it's 
shutting down.

Has anyone seen anything like this before?  I don't know if Docker is the 
culprit as two other containers on different nodes don't seem to have any 
issues.

Thanks.

RE: Kafka/Zookeeper deployment Questions

2014-10-17 Thread Sybrandy, Casey
Neha,

Thanks.  I'd still love to know if anyone has used Consul and/or Confd to 
manage a cluster.

Casey


From: Neha Narkhede [neha.narkh...@gmail.com]
Sent: Thursday, October 16, 2014 9:54 AM
To: users@kafka.apache.org
Subject: Re: Kafka/Zookeeper deployment Questions

In other words, if I change the number of partitions, can I restart the
brokers one at a time so that I can continue processing data?

Changing the # of partitions is an online operation and doesn't require
restarting the brokers. However, any other configuration (with the
exception of a few operations) that requires a broker restart can be done
in a rolling manner.

On Wed, Oct 15, 2014 at 7:16 PM, Sybrandy, Casey 
casey.sybra...@six3systems.com wrote:

 Hello,

 We're looking into deploying Kafka and Zookeeper into an environment where
 we want things to be as easy to stand up and administer.  To do this, we're
 looking into using Consul, or similar, and Confd to try to make this as
 automatic as possible.  I was wondering if anyone had an experience in this
 area.  My major concern is reconfiguring Kafka as, in my experience, is
 making sure we don't end up losing messages.

 Also, can kafka and zookeeper be reconfigured in a rolling manner?  In
 other words, if I change the number of partitions, can I restart the
 brokers one at a time so that I can continue processing data?

 Thanks.


RE: Kafka/Zookeeper deployment Questions

2014-10-17 Thread Sybrandy, Casey
Roger,

My understanding of both, beyond what Zookeeper already does, are:

1. Consul can be used to monitor a service and report it's status.  This can be 
very useful for knowing if a service, such as Zookeeper of Kafka, goes down.  
This can be done through a built-in web interface.
2. Confd leverages Consul, or etcd, to propogate changes to a service and 
restart it if necessary.  So, if we change a broker specific setting, we can 
put the change in Consul and have Confd automatically modify the config files 
on the broker nodes and restart the service as needed.

My knowledge in this area is a bit limited as I haven't used either.  I'm 
working with someone who is and wanted to ask people about this so that we can 
learn what works and what doesn't.

___
From: Roger Hoover [roger.hoo...@gmail.com]
Sent: Friday, October 17, 2014 12:26 PM
To: users@kafka.apache.org
Subject: Re: Kafka/Zookeeper deployment Questions

Casey,

Could you describe a little more about how these would help manage a
cluster?

My understanding is that Consul provides service discovery and leader
election.  Kafka already uses ZooKeeper for brokers to discover each other
and elect partition leaders.  Kafka high-level consumers use ZK to divide
up topic partitions amongst themselves.

I'm not able to see how Consul +/or confd will help.

Cheers,

Roger

Kafka/Zookeeper deployment Questions

2014-10-15 Thread Sybrandy, Casey
Hello,

We're looking into deploying Kafka and Zookeeper into an environment where we 
want things to be as easy to stand up and administer.  To do this, we're 
looking into using Consul, or similar, and Confd to try to make this as 
automatic as possible.  I was wondering if anyone had an experience in this 
area.  My major concern is reconfiguring Kafka as, in my experience, is making 
sure we don't end up losing messages.

Also, can kafka and zookeeper be reconfigured in a rolling manner?  In other 
words, if I change the number of partitions, can I restart the brokers one at a 
time so that I can continue processing data?

Thanks.

RE:

2014-10-06 Thread Sybrandy, Casey
Nevermind...I just found it in the docs and it looks like it has been looked 
into.

Casey Sybrandy MSWE
Sr. Software Engineer
CACI/Six3Systems
301-206-6000 (Office)
301-206-6020 (Fax)
11820 West Market Place
Suites N-P
Fulton, MD. 20759

From: Sybrandy, Casey
Sent: Monday, October 06, 2014 1:14 PM
To: users@kafka.apache.org
Subject:

Hello,

I had a thought today that I wanted to run past everyone: has there been any 
thought to using a more common protocol for communicating with Kafka vs. the 
custom protocol currently being used?

Specifically, what I'm thinking is Thrift.  It's already supported by many 
popular languages, so that would reduce the need to maintain Kafka-specific 
clients.  Coordination with Zookeeper is still an issue, but IIRC, this was 
something being worked on already, so if all of the interaction with Zookeeper 
is done by the broker, then making producers/consumers becomes much easier.  
All one needs is Thrift to generate the appropriate classes and the user can 
interact with Kafka.

I know there's probably something I'm missing, but I felt I should bring this 
up as this would make things much easier for people who want to work with Kafka.

Casey


RE: Partial Message Read by Consumer

2013-12-11 Thread Sybrandy, Casey
Hello,

No, the entire log file isn't bigger than that buffer size and this is 
occurring while trying to retrieve the first message on the topic, not the last.

I attached a log.  Line 408 ( Iterating.) is where we get an iterator 
and start iterating over the data.  There should be subsequent log entries 
displaying a filename, but they never appear after that point.

Some other thoughts:

* Network latency is a non-issue as everything is installed on a local VM.
* I tried with both 10 and 100 messages in case I didn't have enough to make it 
start producing.  No change.  Yes, I do realize this is silly, but when nothing 
else is working, why not give it a try.  It's like adding magical print 
statements.

Hope this helps.  I need it.

Casey


From: Tom Brown [tombrow...@gmail.com]
Sent: Tuesday, December 10, 2013 7:10 PM
To: users@kafka.apache.org
Subject: Re: Partial Message Read by Consumer

Having a partial message transfer over the network is the design of Kafka
0.7.x (I can't speak to 0.8.x, though it may still be).

When the request is made, you tell the server the partition number, the
byte offset into that partition, and the size of response that you want.
The server finds that offset in the partition, and sends N bytes back
(where N is the maximum response size specified). The server does not
inspect the contents of the reply to ensure that message boundaries line up
with the response size. This is by design, and the simplicity allows for
high throughput, at the cost of higher client complexity. In practice this
means is that the response often includes a partial message at the end
which the client drops. This means that if the response contains a single
message is larger than your maximum response size, you will not be able to
process that message or continue to the next message. Each time you request
it, it will only send the partial message, and the Kafka client will send
the request again.

If I understand the high-level consumer configuration, the fetch.size
parameter should be what you need to adjust. It's default is 300K, but I
see you have it set to roughly 50MB. Is there any chance your message is
larger than that?

--Tom


On Tue, Dec 10, 2013 at 1:52 PM, Guozhang Wang wangg...@gmail.com wrote:

 Hello Casey,

 What do you mean by part of a message is being read? Could you upload the
 output and also the log of the consumer here?

 Guozhang


 On Tue, Dec 10, 2013 at 12:26 PM, Sybrandy, Casey 
 casey.sybra...@six3systems.com wrote:

  Hello,
 
  First, I'm using version 0.7.2.
 
  I'm trying to read some messages from a broker, but looking at wireshark,
  it appears that only part of a message is being read by the consumer.
   After that, no other data is read and I can verify that there are 10
  messages on the broker.  I have the consumer configured as follows:
 
  kafka.zk.connectinfo=127.0.0.1
  kafka.zk.groupid=foo3
  kafka.topic=...
  fetch.size=52428800
  socket.buffersize=524288
 
  I only set socket.buffersize today to see if it helps.  Any help would be
  great because this is baffling, especially since this only started
  happening yesterday.
 
  Casey Sybrandy MSWE
  Six3Systems
  Cyber and Enterprise Systems Group
  www.six3systems.com
  301-206-6000 (Office)
  301-206-6020 (Fax)
  11820 West Market Place
  Suites N-P
  Fulton, MD. 20759
 



 --
 -- Guozhang



RE: Partial Message Read by Consumer

2013-12-11 Thread Sybrandy, Casey
First, I saw the partial message looking at raw network traffic via Wireshark, 
not the output of the iterator as the iterator never seems to provide me any 
data.  That's where the code is hanging.

Second, here's the output from the ConsumerOffsetChecker:

grp1,tdf_topic,0-0 (Group,Topic,BrokerId-PartitionId)
Owner = null
  Consumer offset = 47947
  = 47,947 (0.00G)
 Log size = 1743252
  = 1,743,252 (0.00G)
 Consumer lag = 1695305
  = 1,695,305 (0.00G)

BROKER INFO
0 - 127.0.1.1:9092

To answer the questions related to this in the FAQ:

* Yes, there are more messages.
* No, the messages are all smaller than my configured fetch size.
* As far as I know, the consumer thread did not stop.  There are no errors or 
exceptions to indicate anything of the sort.

One thing I did notice is that it looks like it's reading from the topic before 
the consumer thread actually starts.  I'm using the pattern where I start a new 
thread per stream and submit them to an ExecutorService.  Not sure if this 
makes a difference, but this is our standard consumer pattern and has worked 
well until I started seeing this issue.  For this consumer, I'm only working 
with one stream.  I tried 2, but no change.

Casey

From: Guozhang Wang [wangg...@gmail.com]
Sent: Wednesday, December 11, 2013 11:31 AM
To: users@kafka.apache.org
Subject: Re: Partial Message Read by Consumer

Casey,

Just to confirm, you saw a partial message output from the iterator.next()
call, not from the consumer's fetch response, correct?

Guozhang


On Wed, Dec 11, 2013 at 8:14 AM, Jun Rao jun...@gmail.com wrote:

 Have you looked at

 https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped%2Cwhy%3F
 ?
 If that doesn't help, could you file a jira and attach your log?
 Apache
 mailing list doesn't support attachments.

 Thanks,

 Jun


 On Wed, Dec 11, 2013 at 6:15 AM, Sybrandy, Casey 
 casey.sybra...@six3systems.com wrote:

  Hello,
 
  No, the entire log file isn't bigger than that buffer size and this is
  occurring while trying to retrieve the first message on the topic, not
 the
  last.
 
  I attached a log.  Line 408 ( Iterating.) is where we get an
  iterator and start iterating over the data.  There should be subsequent
 log
  entries displaying a filename, but they never appear after that point.
 
  Some other thoughts:
 
  * Network latency is a non-issue as everything is installed on a local
 VM.
  * I tried with both 10 and 100 messages in case I didn't have enough to
  make it start producing.  No change.  Yes, I do realize this is silly,
 but
  when nothing else is working, why not give it a try.  It's like adding
  magical print statements.
 
  Hope this helps.  I need it.
 
  Casey
 
  
  From: Tom Brown [tombrow...@gmail.com]
  Sent: Tuesday, December 10, 2013 7:10 PM
  To: users@kafka.apache.org
  Subject: Re: Partial Message Read by Consumer
 
  Having a partial message transfer over the network is the design of Kafka
  0.7.x (I can't speak to 0.8.x, though it may still be).
 
  When the request is made, you tell the server the partition number, the
  byte offset into that partition, and the size of response that you want.
  The server finds that offset in the partition, and sends N bytes back
  (where N is the maximum response size specified). The server does not
  inspect the contents of the reply to ensure that message boundaries line
 up
  with the response size. This is by design, and the simplicity allows for
  high throughput, at the cost of higher client complexity. In practice
 this
  means is that the response often includes a partial message at the end
  which the client drops. This means that if the response contains a single
  message is larger than your maximum response size, you will not be able
 to
  process that message or continue to the next message. Each time you
 request
  it, it will only send the partial message, and the Kafka client will send
  the request again.
 
  If I understand the high-level consumer configuration, the fetch.size
  parameter should be what you need to adjust. It's default is 300K, but I
  see you have it set to roughly 50MB. Is there any chance your message is
  larger than that?
 
  --Tom
 
 
  On Tue, Dec 10, 2013 at 1:52 PM, Guozhang Wang wangg...@gmail.com
 wrote:
 
   Hello Casey,
  
   What do you mean by part of a message is being read? Could you upload
  the
   output and also the log of the consumer here?
  
   Guozhang
  
  
   On Tue, Dec 10, 2013 at 12:26 PM, Sybrandy, Casey 
   casey.sybra...@six3systems.com wrote:
  
Hello,
   
First, I'm using version 0.7.2.
   
I'm trying to read some messages from a broker, but looking at
  wireshark,
it appears that only part of a message is being read by the consumer.
 After that, no other data is read and I can verify

RE: Partial Message Read by Consumer

2013-12-11 Thread Sybrandy, Casey
Actually, I think I isolated where the error may be.  We have a library that 
was recently updated to fix an issue.  Other code using the same part of the 
library is working properly, but for some reason in this case it isn't.  
Apologies for wasting people's time, but I just never even thought to look 
there since it is working in other places.

Casey

From: Guozhang Wang [wangg...@gmail.com]
Sent: Wednesday, December 11, 2013 12:09 PM
To: users@kafka.apache.org
Subject: Re: Partial Message Read by Consumer

Do you have compression turned on in the broker?

Guozhang


On Wed, Dec 11, 2013 at 8:43 AM, Sybrandy, Casey 
casey.sybra...@six3systems.com wrote:

 First, I saw the partial message looking at raw network traffic via
 Wireshark, not the output of the iterator as the iterator never seems to
 provide me any data.  That's where the code is hanging.

 Second, here's the output from the ConsumerOffsetChecker:

 grp1,tdf_topic,0-0 (Group,Topic,BrokerId-PartitionId)
 Owner = null
   Consumer offset = 47947
   = 47,947 (0.00G)
  Log size = 1743252
   = 1,743,252 (0.00G)
  Consumer lag = 1695305
   = 1,695,305 (0.00G)

 BROKER INFO
 0 - 127.0.1.1:9092

 To answer the questions related to this in the FAQ:

 * Yes, there are more messages.
 * No, the messages are all smaller than my configured fetch size.
 * As far as I know, the consumer thread did not stop.  There are no errors
 or exceptions to indicate anything of the sort.

 One thing I did notice is that it looks like it's reading from the topic
 before the consumer thread actually starts.  I'm using the pattern where I
 start a new thread per stream and submit them to an ExecutorService.  Not
 sure if this makes a difference, but this is our standard consumer pattern
 and has worked well until I started seeing this issue.  For this consumer,
 I'm only working with one stream.  I tried 2, but no change.

 Casey
 
 From: Guozhang Wang [wangg...@gmail.com]
 Sent: Wednesday, December 11, 2013 11:31 AM
 To: users@kafka.apache.org
 Subject: Re: Partial Message Read by Consumer

 Casey,

 Just to confirm, you saw a partial message output from the iterator.next()
 call, not from the consumer's fetch response, correct?

 Guozhang


 On Wed, Dec 11, 2013 at 8:14 AM, Jun Rao jun...@gmail.com wrote:

  Have you looked at
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped%2Cwhy%3F
  ?
  If that doesn't help, could you file a jira and attach your log?
  Apache
  mailing list doesn't support attachments.
 
  Thanks,
 
  Jun
 
 
  On Wed, Dec 11, 2013 at 6:15 AM, Sybrandy, Casey 
  casey.sybra...@six3systems.com wrote:
 
   Hello,
  
   No, the entire log file isn't bigger than that buffer size and this is
   occurring while trying to retrieve the first message on the topic, not
  the
   last.
  
   I attached a log.  Line 408 ( Iterating.) is where we get an
   iterator and start iterating over the data.  There should be subsequent
  log
   entries displaying a filename, but they never appear after that point.
  
   Some other thoughts:
  
   * Network latency is a non-issue as everything is installed on a local
  VM.
   * I tried with both 10 and 100 messages in case I didn't have enough to
   make it start producing.  No change.  Yes, I do realize this is silly,
  but
   when nothing else is working, why not give it a try.  It's like adding
   magical print statements.
  
   Hope this helps.  I need it.
  
   Casey
  
   
   From: Tom Brown [tombrow...@gmail.com]
   Sent: Tuesday, December 10, 2013 7:10 PM
   To: users@kafka.apache.org
   Subject: Re: Partial Message Read by Consumer
  
   Having a partial message transfer over the network is the design of
 Kafka
   0.7.x (I can't speak to 0.8.x, though it may still be).
  
   When the request is made, you tell the server the partition number, the
   byte offset into that partition, and the size of response that you
 want.
   The server finds that offset in the partition, and sends N bytes back
   (where N is the maximum response size specified). The server does not
   inspect the contents of the reply to ensure that message boundaries
 line
  up
   with the response size. This is by design, and the simplicity allows
 for
   high throughput, at the cost of higher client complexity. In practice
  this
   means is that the response often includes a partial message at the end
   which the client drops. This means that if the response contains a
 single
   message is larger than your maximum response size, you will not be able
  to
   process that message or continue to the next message. Each time you
  request
   it, it will only send the partial message, and the Kafka client will
 send
   the request again.
  
   If I understand the high-level consumer configuration

Logging of errors

2013-11-06 Thread Sybrandy, Casey
Hello,

How can I have the brokers log errors to a file?  Do I just have to configure 
something like log4j or is something else used?

Thanks.

Casey


RE: Logging of errors

2013-11-06 Thread Sybrandy, Casey
One file is good for now.  I just didn't find any documentation on this, so I 
figured I'd ask.  Guess I should have just looked at the config directory.

-Original Message-
From: Guozhang Wang [mailto:wangg...@gmail.com] 
Sent: Wednesday, November 06, 2013 12:16 PM
To: users@kafka.apache.org
Subject: Re: Logging of errors

Hi Casey,

Did you want to route all the error log entries to one file and the others to 
another file?

Guozhang


On Wed, Nov 6, 2013 at 9:07 AM, Neha Narkhede neha.narkh...@gmail.comwrote:

 Yes, configure the kafka/config/log4j.properties that ships with Kafka.

 Thanks,
 Neha


 On Wed, Nov 6, 2013 at 8:48 AM, Sybrandy, Casey  
 casey.sybra...@six3systems.com wrote:

  Hello,
 
  How can I have the brokers log errors to a file?  Do I just have to 
  configure something like log4j or is something else used?
 
  Thanks.
 
  Casey
 




--
-- Guozhang


RE: Consumer pauses when running many threads

2013-08-02 Thread Sybrandy, Casey
Yes, we have.  Our SA where this is occurring has been monitoring this.  When 
the consumers went down, we could see that things were lagging.  Yesterday, 
they lowered the number of threads for the consumers to six each and they 
haven't shut down yet.  There appears to still be some lag, but since the 
consumers are running, it's decreasing.

A test was run with each broker configured to have 32 partitions each and when 
the number of threads across the consumers exceeds 32, then we have issues.  My 
understanding from the documentation is that when you set the number of 
partitions on a broker, it's just for that broker, correct?  Therefore, if we 
set each broker to have 32 partitions, across 4 brokers we should have 128 
partitions per topic, correct?  In which case, we should be able to run 128 
consumer threads with ease.

Casey

-Original Message-
From: Jun Rao [mailto:jun...@gmail.com] 
Sent: Thursday, August 01, 2013 11:13 AM
To: users@kafka.apache.org
Subject: Re: Consumer pauses when running many threads

Have you looked at
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped%2Cwhy%3F?

Thanks,

Jun


On Thu, Aug 1, 2013 at 7:30 AM, Sybrandy, Casey  
casey.sybra...@six3systems.com wrote:

 Hello,

 We're seeing an issue running 0.7.0 where one or more of our consumers 
 are pausing after about an hour when we have a lot of threads 
 configured.  Our setup is as follows:


 * 4 brokers configured for 32 threads and 32 partitions on each
 broker.

 * 2 consumers each processing 40 streams (24 and 16).

 * Zookeeper server is a CDH version that's at least 3.3.4.

 We were also seeing this with 3 consumers running 18 threads each.  As 
 you can tell, the hardware is quite beefy and the brokers are 
 described as being bored.

 Outside of upgrading to 0.7.2, which we are planning on doing but 
 can't yet, what else can we look into to try to resolve this or at 
 least determine what's happening?

 Thanks.

 Casey



Consumer pauses when running many threads

2013-08-01 Thread Sybrandy, Casey
Hello,

We're seeing an issue running 0.7.0 where one or more of our consumers are 
pausing after about an hour when we have a lot of threads configured.  Our 
setup is as follows:


* 4 brokers configured for 32 threads and 32 partitions on each broker.

* 2 consumers each processing 40 streams (24 and 16).

* Zookeeper server is a CDH version that's at least 3.3.4.

We were also seeing this with 3 consumers running 18 threads each.  As you can 
tell, the hardware is quite beefy and the brokers are described as being 
bored.

Outside of upgrading to 0.7.2, which we are planning on doing but can't yet, 
what else can we look into to try to resolve this or at least determine what's 
happening?

Thanks.

Casey


RE: Client improvement discussion

2013-07-29 Thread Sybrandy, Casey
In the past there was some discussion about having a C client for non-JVM 
languages.  Is this still planned as well?  Being able to work with Kafka from 
other languages would be a great thing.  Where I work, we interact with Kafka 
via Java and Ruby (producer), so having an official C library that can be used 
from within Ruby would make it easier to have the same version of the client in 
Java and Ruby.

-Original Message-
From: Jay Kreps [mailto:jay.kr...@gmail.com] 
Sent: Friday, July 26, 2013 3:00 PM
To: d...@kafka.apache.org; users@kafka.apache.org
Subject: Client improvement discussion

I sent around a wiki a few weeks back proposing a set of client improvements 
that essentially amount to a rewrite of the producer and consumer java clients.

https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite

The below discussion assumes you have read this wiki.

I started to do a little prototyping for the producer and wanted to share some 
of the ideas that came up to get early feedback.

First, a few simple but perhaps controversial things to discuss.

Rollout
Phase 1: We add the new clients. No change on the server. Old clients still 
exist. The new clients will be entirely in a new package so there will be no 
possibility of name collision.
Phase 2: We swap out all shared code on the server to use the new client stuff. 
At this point the old clients still exist but are essentially deprecated.
Phase 3: We remove the old client code.

Java
I think we should do the clients in java. Making our users deal with scala's 
non-compatability issues and crazy stack traces causes people a lot of pain. 
Furthermore we end up having to wrap everything now to get a usable java api 
anyway for non-scala people. This does mean maintaining a substantial chunk of 
java code, which is maybe less fun than scala. But basically i think we should 
optimize for the end user and produce a standalone pure-java jar with no 
dependencies.

Jars
We definitely want to separate out the client jar. There is also a fair amount 
of code shared between both (exceptions, protocol definition, utils, and the 
message set implementation). Two approaches.
Two jar approach: split kafka.jar into kafka-clients.jar and kafka-server.jar 
with the server depending on the clients. The advantage of this is that it is 
simple. The disadvantage is that things like utils and protocol definition will 
be in the client jar though technical they belong equally to the server.
Many jar approach: split kafka.jar into kafka-common.jar, kafka-producer.jar, 
kafka-consumer.jar, kafka-admin.jar, and kafka-server.jar. The disadvantage of 
this is that the user needs two jars (common + something) which is for sure 
going to confuse people. I also think this will tend to spawn more jars over 
time.

Background threads
I am thinking of moving both serialization and compression out of the 
background send thread. I will explain a little about this idea below.

Serialization
I am not sure if we should handle serialization in the client at all.
Basically I wonder if our own API wouldn't just be a lot simpler if we took a 
byte[] key and byte[] value and let people serialize stuff themselves.
Injecting a class name for us to create the serializer is more roundabout and 
has a lot of problems if the serializer itself requires a lot of configuration 
or other objects to be instantiated.

Partitioning
The real question with serialization is whether the partitioning should happen 
on the java object or on the byte array key. The argument for doing it on the 
java object is that it is easier to do something like a range partition on the 
object. The problem with doing it on the object is that the consumer may not be 
in java and so may not be able to reproduce the partitioning. For example we 
currently use Object.hashCode which is a little sketchy. We would be better off 
doing a standardized hash function on the key bytes. If we want to give the 
partitioner access to the original java object then obviously we need to handle 
serialization behind our api.

Names
I think good names are important. I would like to rename the following classes 
in the new client:
  Message=Record: Now that the message has both a message and a key it is more 
of a KeyedMessage. Another name for a KeyedMessage is a Record.
  MessageSet=Records: This isn't too important but nit pickers complain that 
it is not technically a Set but rather a List or Sequence but MessageList 
sounds funny to me.

The actual clients will not interact with these classes. They will interact 
with a ProducerRecord and ConsumerRecord. The reason for having different 
fields is because the different clients Proposed producer API:
SendResponse r = producer.send(new ProducerRecord(topic, key, value))

Protocol Definition

Here is what I am thinking about protocol definition. I see a couple of 
problems with what we are doing currently. First the protocol definition is 
spread throughout a bunch of custom 

RE: Duplicate Messages on the Consumer

2013-07-19 Thread Sybrandy, Casey
Hello,

No, we couldn't check the broker logs because the data is obfuscated, so we 
can't just look at the files and tell.  It looks like our dev system may be 
experiencing the same issue, so I did turn of the obfuscation and we'll monitor 
it.  However, on our production system where we were seeing the errors more 
often, appears to have had zookeeper misconfigured, so we're thinking that may 
be the issue.

Casey

-Original Message-
From: Philip O'Toole [mailto:phi...@loggly.com] 
Sent: Thursday, July 18, 2013 3:29 PM
To: users@kafka.apache.org
Cc: kafka-us...@incubator.apache.org
Subject: Re: Duplicate Messages on the Consumer

Have you actually examined the Kafka files on disk, to make sure those dupes 
are really there? Or is this a case of reading the same message more than once?

Philip

On Thu, Jul 18, 2013 at 8:55 AM, Sybrandy, Casey 
casey.sybra...@six3systems.com wrote:
 Hello,

 We recently started seeing duplicate messages appearing at our consumers.  
 Thankfully, the database is set up so that we don't store the dupes, but it 
 is annoying.  It's not every message, only about 1% of them.  We are running 
 0.7.0 for the broker with Zookeeper 3.3.4 from Cloudera and 0.7.0 for the 
 producer and consumer.  We tried upgrading the consumer to 0.7.2 to see if 
 that worked, but we're still seeing the dupes.  Do we have to upgrade the 
 broker as well to resolve this?  Is there something we can check to see 
 what's going on because we're not seeing anything unusual in the logs.  I 
 suspected that there may be significant rebalancing, but that does not appear 
 to be the case at all.

 Casey Sybrandy



Duplicate Messages on the Consumer

2013-07-18 Thread Sybrandy, Casey
Hello,

We recently started seeing duplicate messages appearing at our consumers.  
Thankfully, the database is set up so that we don't store the dupes, but it is 
annoying.  It's not every message, only about 1% of them.  We are running 0.7.0 
for the broker with Zookeeper 3.3.4 from Cloudera and 0.7.0 for the producer 
and consumer.  We tried upgrading the consumer to 0.7.2 to see if that worked, 
but we're still seeing the dupes.  Do we have to upgrade the broker as well to 
resolve this?  Is there something we can check to see what's going on because 
we're not seeing anything unusual in the logs.  I suspected that there may be 
significant rebalancing, but that does not appear to be the case at all.

Casey Sybrandy



RE: NoBrokersForPartitionException

2013-07-15 Thread Sybrandy, Casey
Jun,

Unfortunately, upgrades are slow to occur with respect to projects, so I don't 
know when this will occur.  However, it looks like we will be upgrading over 
the next couple months to 0.7.2.

Regardless, what is causing this?  Is this a bug in Kafka or is it something 
that triggered by something we did?  I only ask because it looked like the 
directories for the specific topic I was looking for disappeared, so it seemed 
like someone deleted them.

Casey

-Original Message-
From: Jun Rao [mailto:jun...@gmail.com] 
Sent: Thursday, July 11, 2013 1:17 AM
To: users@kafka.apache.org
Subject: Re: NoBrokersForPartitionException

Could you try 0.7.2?

Thanks,

Jun


On Wed, Jul 10, 2013 at 11:38 AM, Sybrandy, Casey  
casey.sybra...@six3systems.com wrote:

 Hello,

 Apologies for bringing this back from the dead, but I'm getting the 
 same exception using Kafka 0.7.0.  What could be causing this?

 Thanks.

 Casey

 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Tuesday, March 12, 2013 12:14 AM
 To: users@kafka.apache.org
 Subject: Re: NoBrokersForPartitionException

 Which version of Kafka are you using?

 Thanks,

 Jun

 On Mon, Mar 11, 2013 at 12:30 PM, Ott, Charles H. 
 charles.h@saic.com
 wrote:

 
 
  I am trying to do something like this:
 
  1.)Java Client Producer(Sever A) - Zookeeper (Server B) to getKafka
  service.
 
  2.)Zookeeper gives IP for Kafka (Server C) to Producer (Server A)
 
  3.)Producer (Server A) attempts to publish message to Kafka (Server
  C) using IP resolved from zookeeper.
 
 
 
  I am getting an error when attempting to write a message to a Kafka 
  topic.
 
 
 
  kafka.common.NoBrokersForPartitionException: Partition = null
 
  at
  kafka.producer.Producer.kafka$producer$Producer$$getPartitionListFor
  To
  pi
  c(Producer.scala:167)
 
  at
  kafka.producer.Producer$$anonfun$3.apply(Producer.scala:116)
 
  at
  kafka.producer.Producer$$anonfun$3.apply(Producer.scala:105)
 
  at
  scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.
  sc
  ala:206)
 
  at
  scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.
  sc
  ala:206)
 
  at
  scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimiz
  ed
  .s
  cala:34)
 
  at
  scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32)
 
  at
  scala.collection.TraversableLike$class.map(TraversableLike.scala:206
  )
 
  at
  scala.collection.mutable.WrappedArray.map(WrappedArray.scala:32)
 
  at kafka.producer.Producer.zkSend(Producer.scala:105)
 
  at kafka.producer.Producer.send(Producer.scala:99)
 
  at kafka.javaapi.producer.Producer.send(Producer.scala:103)
 
  at
  com.saic.project.kafka.KafkaProducerConnection.push(KafkaProducerCon
  ne
  ct
  ion.java:76)
 
 
 
  I believe this implies that the Java Client cannot publish to the 
  Kafka server.  How would I go about trouble shooting this? What does 
  NoBrokersForPartition mean? Currently I have a different Client 
  (Server
  D) that is able to publish messages with custom topic to Server C 
  without error.
 
 
 
  Thanks,
 
  Charles
 
 



RE: Arguments for Kafka over RabbitMQ ?

2013-06-07 Thread Sybrandy, Casey
IIRC, I think I tried to use stunnel with Kafka once and it worked fine and the 
configuration wasn't too bad, at least for a simple configuration.

-Original Message-
From: Dragos Manolescu [mailto:dragos.manole...@servicenow.com] 
Sent: Friday, June 07, 2013 4:51 PM
To: users@kafka.apache.org
Subject: Re: Arguments for Kafka over RabbitMQ ?

Thank you Marc (and others) for jumping in and sharing your perspectives!

A feature that Kafka doesn't currently support while RabbitMQ does (since about 
a year ago, I don't remember exactly) is SSL support. I realize that one can 
set up a tunnel between data centers, etc.; that would require more 
(configuration) work than SSL though. I am surprised that this difference 
hasn't come up :o


Thanks,

-Dragos   

On 6/6/13 6:09 PM, Marc Labbe mrla...@gmail.com wrote:

There are two things where RabbitMQ would have given us less work out 
of the box as opposed to Kafka. RabbitMQ also provides a bunch of tools 
that makes it rather attractive too.



RE: Binary Data and Kafka

2013-05-08 Thread Sybrandy, Casey
That's what I would have assumed.  And no, we're not using compression.

Thanks.

From: Jun Rao [mailto:jun...@gmail.com]
Sent: Wednesday, May 08, 2013 11:26 AM
To: users@kafka.apache.org
Cc: Sybrandy, Casey
Subject: Re: Binary Data and Kafka

No. Kafka broker stores the binary data as it is. The binary data may be 
compressed, if compression is enabled at the producer.

Thanks,

Jun

On Wed, May 8, 2013 at 5:57 AM, Sybrandy, Casey 
casey.sybra...@six3systems.commailto:casey.sybra...@six3systems.com wrote:
All,

Does the Kafka broker Base64 encode the messages?  We are sending binary data 
to the brokers and I looked at the logs to confirm that they data was being 
stored, however all of the data, with a few exceptions, looks to be Base64 
encoded.  I didn't expect this, so I wanted to ask and confirm what I'm seeing.

If this is true, does this affect the size of the message when fetching?  In 
other words, if I send a 100K message, do I have to make sure I can fetch a 
300K message since the message can now be 300K in size because of the encoding?

Casey Sybrandy MSWE
Six3Systems
Cyber and Enterprise Systems Group
www.six3systems.comhttp://www.six3systems.com
301-206-6000tel:301-206-6000 (Office)
301-206-6020tel:301-206-6020 (Fax)
11820 West Market Place
Suites N-P
Fulton, MD. 20759



RE: Encryption at rest?

2013-04-01 Thread Sybrandy, Casey
Hello,

IIRC, no, it does not.  Where I work, one team had the same issue and built 
some custom code to handle the encryption and decryption of messages at the 
producer and consumer.  However, you have to take key management into account 
as once a message is written to the broker, you can't decrypt/re-encrypt to 
change the key.  This can be an issue if you have to replay messages.

-Original Message-
From: Chris Curtin [mailto:curtin.ch...@gmail.com] 
Sent: Monday, April 01, 2013 4:07 PM
To: users
Subject: Encryption at rest?

Hi,

Does Kafka support encrypting data at rest? During my AJUG presentation someone 
asked if the files could be encrypted to address PII needs?

Thanks,

Chris


RE: FW: Zookeeper Configuration Question

2012-12-10 Thread Sybrandy, Casey
Apologies for not responding sooner.  My mail client must have been 
malfunctioning at the time as I never saw your responses until today.

As for the error, it looks like it's a bug on my part that just didn't click 
until I read Jim's responses.  I have a config file that I specify the options 
in and I copied it from a Configuration object to a Properties object, as the 
producer/consumer requires.  Didn't realize until this morning that it was not 
working as expected.

On a different note: does anyone know how to create a namespace in Zookeeper?  
We're having some issues I'm trying to debug so I want to isolate some of our 
brokers, but finding documentation on this has been fruitless.

Thanks!

From: Neha Narkhede [neha.narkh...@gmail.com]
Sent: Thursday, November 29, 2012 4:39 PM
To: users@kafka.apache.org
Subject: Re: FW: Zookeeper Configuration Question

Please can you send around the log that shows the zookeeper connection
error ? I would like to see if it fails at connection establishment or
session establishment.

Thanks,
Neha

On Thu, Nov 29, 2012 at 1:19 PM, James A. Robinson
jim.robin...@stanford.edu wrote:
 On Thu, Nov 29, 2012 at 1:15 PM, James A. Robinson
 jim.robin...@stanford.edu wrote:
 For my kafka startup I point to the zookeeper cluster like so:

   --kafka-zk-connect 
 logproc-dev-03:2181,logproc-dev-03:2182,logproc-dev-03:2183

 Sorry, wrong copy and paste!  For the kafka startup I point to the zookeeper
 cluster like so (in the properties file):

 zk.connect=logproc-dev-03.highwire.org:2181,logproc-dev-03.highwire.org:2182,logproc-dev-03.highwire.org:2183


RE: Logging which broker a message was sent to

2012-12-10 Thread Sybrandy, Casey
I'll try that out.  Thanks!


From: Jun Rao [jun...@gmail.com]
Sent: Monday, December 10, 2012 1:04 PM
To: users@kafka.apache.org
Subject: Re: Logging which broker a message was sent to

So, you are using Producer, not SyncProducer. Assuming that you are using
DefaultEventHandler, there is a trace level logging that tells you which
broker a request is sent to.

Thanks,

Jun

On Mon, Dec 10, 2012 at 8:10 AM, Sybrandy, Casey 
casey.sybra...@six3systems.com wrote:

 Is it at least possible to see which broker a message is sent to?  I'm
 using a Zookeeper based producer and we have multiple brokers in our
 environment.  If I can tell which broker a message is sent to, that would
 be a big help.

 
 From: Jun Rao [jun...@gmail.com]
 Sent: Monday, December 10, 2012 11:07 AM
 To: users@kafka.apache.org
 Subject: Re: Logging which broker a message was sent to

 If you use -1 (ie, a random partition) as the partition #, there is no easy
 way to know which partition that the broker picks. However, you can
 explicitly specify the partition # in the request itself.

 Thanks,

 Jun

 On Mon, Dec 10, 2012 at 7:26 AM, Sybrandy, Casey 
 casey.sybra...@six3systems.com wrote:

  Is it possible to log/see which broker, and perhaps partition, a producer
  sent a message to?  I'm using the SyncProducer if that matters.