Broker ID disappears in Zookeeper
Hello, We're having the following issue with Kafka and/or Zookeeper: If a broker (id=1) is running, and you start another broker with id=1, the new broker will exit saying A broker is already registered on the path /brokers/ids/1. However, I noticed when I query zookeeper /brokers/ids/1 disappears This behaviour doesn't make sense to us. The concern is that if we accidentally start up multiple brokers with the same ID (automatic restarts), then we may end up with multiple brokers with the same ID running at the same time. Thoughts? Kafka: 0.8.2 Zookeeper: 3.4.5
Two Kafka Question
Hello, First, is there a limit to how many Kafka brokers you can have? Second, if a Kafka broker node fails and I start a new broker on a new node, is it correct to assume that the cluster will copy data to that node to satisfy the replication factor specified for a given topic? In other words, let's assume that I have a 3 node cluster and a topic with a replication factor of 3. If one node fails and I start up a new node, will the new node have existing messages replicated to it? Thanks. Casey
Elastsic Scaling
Hello, We're looking into using Kafka for a improved version of a system and the question of how to scale Kafka came up. Specifically, we want to try to make the system scale as transparently as possible. The concern was that if we go from N to N*2 consumers that we would have some that are still backed up while the new ones were working on only some of the new records. Also, if the load drops, can we scale down effectively? I'm sure there's a way to do it. I'm just hoping that someone has some knowledge in this area. Thanks.
Kafka in a docker container stops with no errors logged
Hello, We're using Kafka 0.8.1.1 and we're trying to run it in a Docker container. For the most part, this has been fine, however one of the containers has stopped a couple times and when I look at the log output from Docker (E.g. Kafka STDOUT), I don't see any errors. At one point it states that the broker has started and several minutes later, I see the log messages stating that it's shutting down. Has anyone seen anything like this before? I don't know if Docker is the culprit as two other containers on different nodes don't seem to have any issues. Thanks.
RE: Kafka/Zookeeper deployment Questions
Neha, Thanks. I'd still love to know if anyone has used Consul and/or Confd to manage a cluster. Casey From: Neha Narkhede [neha.narkh...@gmail.com] Sent: Thursday, October 16, 2014 9:54 AM To: users@kafka.apache.org Subject: Re: Kafka/Zookeeper deployment Questions In other words, if I change the number of partitions, can I restart the brokers one at a time so that I can continue processing data? Changing the # of partitions is an online operation and doesn't require restarting the brokers. However, any other configuration (with the exception of a few operations) that requires a broker restart can be done in a rolling manner. On Wed, Oct 15, 2014 at 7:16 PM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, We're looking into deploying Kafka and Zookeeper into an environment where we want things to be as easy to stand up and administer. To do this, we're looking into using Consul, or similar, and Confd to try to make this as automatic as possible. I was wondering if anyone had an experience in this area. My major concern is reconfiguring Kafka as, in my experience, is making sure we don't end up losing messages. Also, can kafka and zookeeper be reconfigured in a rolling manner? In other words, if I change the number of partitions, can I restart the brokers one at a time so that I can continue processing data? Thanks.
RE: Kafka/Zookeeper deployment Questions
Roger, My understanding of both, beyond what Zookeeper already does, are: 1. Consul can be used to monitor a service and report it's status. This can be very useful for knowing if a service, such as Zookeeper of Kafka, goes down. This can be done through a built-in web interface. 2. Confd leverages Consul, or etcd, to propogate changes to a service and restart it if necessary. So, if we change a broker specific setting, we can put the change in Consul and have Confd automatically modify the config files on the broker nodes and restart the service as needed. My knowledge in this area is a bit limited as I haven't used either. I'm working with someone who is and wanted to ask people about this so that we can learn what works and what doesn't. ___ From: Roger Hoover [roger.hoo...@gmail.com] Sent: Friday, October 17, 2014 12:26 PM To: users@kafka.apache.org Subject: Re: Kafka/Zookeeper deployment Questions Casey, Could you describe a little more about how these would help manage a cluster? My understanding is that Consul provides service discovery and leader election. Kafka already uses ZooKeeper for brokers to discover each other and elect partition leaders. Kafka high-level consumers use ZK to divide up topic partitions amongst themselves. I'm not able to see how Consul +/or confd will help. Cheers, Roger
Kafka/Zookeeper deployment Questions
Hello, We're looking into deploying Kafka and Zookeeper into an environment where we want things to be as easy to stand up and administer. To do this, we're looking into using Consul, or similar, and Confd to try to make this as automatic as possible. I was wondering if anyone had an experience in this area. My major concern is reconfiguring Kafka as, in my experience, is making sure we don't end up losing messages. Also, can kafka and zookeeper be reconfigured in a rolling manner? In other words, if I change the number of partitions, can I restart the brokers one at a time so that I can continue processing data? Thanks.
RE:
Nevermind...I just found it in the docs and it looks like it has been looked into. Casey Sybrandy MSWE Sr. Software Engineer CACI/Six3Systems 301-206-6000 (Office) 301-206-6020 (Fax) 11820 West Market Place Suites N-P Fulton, MD. 20759 From: Sybrandy, Casey Sent: Monday, October 06, 2014 1:14 PM To: users@kafka.apache.org Subject: Hello, I had a thought today that I wanted to run past everyone: has there been any thought to using a more common protocol for communicating with Kafka vs. the custom protocol currently being used? Specifically, what I'm thinking is Thrift. It's already supported by many popular languages, so that would reduce the need to maintain Kafka-specific clients. Coordination with Zookeeper is still an issue, but IIRC, this was something being worked on already, so if all of the interaction with Zookeeper is done by the broker, then making producers/consumers becomes much easier. All one needs is Thrift to generate the appropriate classes and the user can interact with Kafka. I know there's probably something I'm missing, but I felt I should bring this up as this would make things much easier for people who want to work with Kafka. Casey
RE: Partial Message Read by Consumer
Hello, No, the entire log file isn't bigger than that buffer size and this is occurring while trying to retrieve the first message on the topic, not the last. I attached a log. Line 408 ( Iterating.) is where we get an iterator and start iterating over the data. There should be subsequent log entries displaying a filename, but they never appear after that point. Some other thoughts: * Network latency is a non-issue as everything is installed on a local VM. * I tried with both 10 and 100 messages in case I didn't have enough to make it start producing. No change. Yes, I do realize this is silly, but when nothing else is working, why not give it a try. It's like adding magical print statements. Hope this helps. I need it. Casey From: Tom Brown [tombrow...@gmail.com] Sent: Tuesday, December 10, 2013 7:10 PM To: users@kafka.apache.org Subject: Re: Partial Message Read by Consumer Having a partial message transfer over the network is the design of Kafka 0.7.x (I can't speak to 0.8.x, though it may still be). When the request is made, you tell the server the partition number, the byte offset into that partition, and the size of response that you want. The server finds that offset in the partition, and sends N bytes back (where N is the maximum response size specified). The server does not inspect the contents of the reply to ensure that message boundaries line up with the response size. This is by design, and the simplicity allows for high throughput, at the cost of higher client complexity. In practice this means is that the response often includes a partial message at the end which the client drops. This means that if the response contains a single message is larger than your maximum response size, you will not be able to process that message or continue to the next message. Each time you request it, it will only send the partial message, and the Kafka client will send the request again. If I understand the high-level consumer configuration, the fetch.size parameter should be what you need to adjust. It's default is 300K, but I see you have it set to roughly 50MB. Is there any chance your message is larger than that? --Tom On Tue, Dec 10, 2013 at 1:52 PM, Guozhang Wang wangg...@gmail.com wrote: Hello Casey, What do you mean by part of a message is being read? Could you upload the output and also the log of the consumer here? Guozhang On Tue, Dec 10, 2013 at 12:26 PM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, First, I'm using version 0.7.2. I'm trying to read some messages from a broker, but looking at wireshark, it appears that only part of a message is being read by the consumer. After that, no other data is read and I can verify that there are 10 messages on the broker. I have the consumer configured as follows: kafka.zk.connectinfo=127.0.0.1 kafka.zk.groupid=foo3 kafka.topic=... fetch.size=52428800 socket.buffersize=524288 I only set socket.buffersize today to see if it helps. Any help would be great because this is baffling, especially since this only started happening yesterday. Casey Sybrandy MSWE Six3Systems Cyber and Enterprise Systems Group www.six3systems.com 301-206-6000 (Office) 301-206-6020 (Fax) 11820 West Market Place Suites N-P Fulton, MD. 20759 -- -- Guozhang
RE: Partial Message Read by Consumer
First, I saw the partial message looking at raw network traffic via Wireshark, not the output of the iterator as the iterator never seems to provide me any data. That's where the code is hanging. Second, here's the output from the ConsumerOffsetChecker: grp1,tdf_topic,0-0 (Group,Topic,BrokerId-PartitionId) Owner = null Consumer offset = 47947 = 47,947 (0.00G) Log size = 1743252 = 1,743,252 (0.00G) Consumer lag = 1695305 = 1,695,305 (0.00G) BROKER INFO 0 - 127.0.1.1:9092 To answer the questions related to this in the FAQ: * Yes, there are more messages. * No, the messages are all smaller than my configured fetch size. * As far as I know, the consumer thread did not stop. There are no errors or exceptions to indicate anything of the sort. One thing I did notice is that it looks like it's reading from the topic before the consumer thread actually starts. I'm using the pattern where I start a new thread per stream and submit them to an ExecutorService. Not sure if this makes a difference, but this is our standard consumer pattern and has worked well until I started seeing this issue. For this consumer, I'm only working with one stream. I tried 2, but no change. Casey From: Guozhang Wang [wangg...@gmail.com] Sent: Wednesday, December 11, 2013 11:31 AM To: users@kafka.apache.org Subject: Re: Partial Message Read by Consumer Casey, Just to confirm, you saw a partial message output from the iterator.next() call, not from the consumer's fetch response, correct? Guozhang On Wed, Dec 11, 2013 at 8:14 AM, Jun Rao jun...@gmail.com wrote: Have you looked at https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped%2Cwhy%3F ? If that doesn't help, could you file a jira and attach your log? Apache mailing list doesn't support attachments. Thanks, Jun On Wed, Dec 11, 2013 at 6:15 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, No, the entire log file isn't bigger than that buffer size and this is occurring while trying to retrieve the first message on the topic, not the last. I attached a log. Line 408 ( Iterating.) is where we get an iterator and start iterating over the data. There should be subsequent log entries displaying a filename, but they never appear after that point. Some other thoughts: * Network latency is a non-issue as everything is installed on a local VM. * I tried with both 10 and 100 messages in case I didn't have enough to make it start producing. No change. Yes, I do realize this is silly, but when nothing else is working, why not give it a try. It's like adding magical print statements. Hope this helps. I need it. Casey From: Tom Brown [tombrow...@gmail.com] Sent: Tuesday, December 10, 2013 7:10 PM To: users@kafka.apache.org Subject: Re: Partial Message Read by Consumer Having a partial message transfer over the network is the design of Kafka 0.7.x (I can't speak to 0.8.x, though it may still be). When the request is made, you tell the server the partition number, the byte offset into that partition, and the size of response that you want. The server finds that offset in the partition, and sends N bytes back (where N is the maximum response size specified). The server does not inspect the contents of the reply to ensure that message boundaries line up with the response size. This is by design, and the simplicity allows for high throughput, at the cost of higher client complexity. In practice this means is that the response often includes a partial message at the end which the client drops. This means that if the response contains a single message is larger than your maximum response size, you will not be able to process that message or continue to the next message. Each time you request it, it will only send the partial message, and the Kafka client will send the request again. If I understand the high-level consumer configuration, the fetch.size parameter should be what you need to adjust. It's default is 300K, but I see you have it set to roughly 50MB. Is there any chance your message is larger than that? --Tom On Tue, Dec 10, 2013 at 1:52 PM, Guozhang Wang wangg...@gmail.com wrote: Hello Casey, What do you mean by part of a message is being read? Could you upload the output and also the log of the consumer here? Guozhang On Tue, Dec 10, 2013 at 12:26 PM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, First, I'm using version 0.7.2. I'm trying to read some messages from a broker, but looking at wireshark, it appears that only part of a message is being read by the consumer. After that, no other data is read and I can verify
RE: Partial Message Read by Consumer
Actually, I think I isolated where the error may be. We have a library that was recently updated to fix an issue. Other code using the same part of the library is working properly, but for some reason in this case it isn't. Apologies for wasting people's time, but I just never even thought to look there since it is working in other places. Casey From: Guozhang Wang [wangg...@gmail.com] Sent: Wednesday, December 11, 2013 12:09 PM To: users@kafka.apache.org Subject: Re: Partial Message Read by Consumer Do you have compression turned on in the broker? Guozhang On Wed, Dec 11, 2013 at 8:43 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: First, I saw the partial message looking at raw network traffic via Wireshark, not the output of the iterator as the iterator never seems to provide me any data. That's where the code is hanging. Second, here's the output from the ConsumerOffsetChecker: grp1,tdf_topic,0-0 (Group,Topic,BrokerId-PartitionId) Owner = null Consumer offset = 47947 = 47,947 (0.00G) Log size = 1743252 = 1,743,252 (0.00G) Consumer lag = 1695305 = 1,695,305 (0.00G) BROKER INFO 0 - 127.0.1.1:9092 To answer the questions related to this in the FAQ: * Yes, there are more messages. * No, the messages are all smaller than my configured fetch size. * As far as I know, the consumer thread did not stop. There are no errors or exceptions to indicate anything of the sort. One thing I did notice is that it looks like it's reading from the topic before the consumer thread actually starts. I'm using the pattern where I start a new thread per stream and submit them to an ExecutorService. Not sure if this makes a difference, but this is our standard consumer pattern and has worked well until I started seeing this issue. For this consumer, I'm only working with one stream. I tried 2, but no change. Casey From: Guozhang Wang [wangg...@gmail.com] Sent: Wednesday, December 11, 2013 11:31 AM To: users@kafka.apache.org Subject: Re: Partial Message Read by Consumer Casey, Just to confirm, you saw a partial message output from the iterator.next() call, not from the consumer's fetch response, correct? Guozhang On Wed, Dec 11, 2013 at 8:14 AM, Jun Rao jun...@gmail.com wrote: Have you looked at https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped%2Cwhy%3F ? If that doesn't help, could you file a jira and attach your log? Apache mailing list doesn't support attachments. Thanks, Jun On Wed, Dec 11, 2013 at 6:15 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, No, the entire log file isn't bigger than that buffer size and this is occurring while trying to retrieve the first message on the topic, not the last. I attached a log. Line 408 ( Iterating.) is where we get an iterator and start iterating over the data. There should be subsequent log entries displaying a filename, but they never appear after that point. Some other thoughts: * Network latency is a non-issue as everything is installed on a local VM. * I tried with both 10 and 100 messages in case I didn't have enough to make it start producing. No change. Yes, I do realize this is silly, but when nothing else is working, why not give it a try. It's like adding magical print statements. Hope this helps. I need it. Casey From: Tom Brown [tombrow...@gmail.com] Sent: Tuesday, December 10, 2013 7:10 PM To: users@kafka.apache.org Subject: Re: Partial Message Read by Consumer Having a partial message transfer over the network is the design of Kafka 0.7.x (I can't speak to 0.8.x, though it may still be). When the request is made, you tell the server the partition number, the byte offset into that partition, and the size of response that you want. The server finds that offset in the partition, and sends N bytes back (where N is the maximum response size specified). The server does not inspect the contents of the reply to ensure that message boundaries line up with the response size. This is by design, and the simplicity allows for high throughput, at the cost of higher client complexity. In practice this means is that the response often includes a partial message at the end which the client drops. This means that if the response contains a single message is larger than your maximum response size, you will not be able to process that message or continue to the next message. Each time you request it, it will only send the partial message, and the Kafka client will send the request again. If I understand the high-level consumer configuration
Logging of errors
Hello, How can I have the brokers log errors to a file? Do I just have to configure something like log4j or is something else used? Thanks. Casey
RE: Logging of errors
One file is good for now. I just didn't find any documentation on this, so I figured I'd ask. Guess I should have just looked at the config directory. -Original Message- From: Guozhang Wang [mailto:wangg...@gmail.com] Sent: Wednesday, November 06, 2013 12:16 PM To: users@kafka.apache.org Subject: Re: Logging of errors Hi Casey, Did you want to route all the error log entries to one file and the others to another file? Guozhang On Wed, Nov 6, 2013 at 9:07 AM, Neha Narkhede neha.narkh...@gmail.comwrote: Yes, configure the kafka/config/log4j.properties that ships with Kafka. Thanks, Neha On Wed, Nov 6, 2013 at 8:48 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, How can I have the brokers log errors to a file? Do I just have to configure something like log4j or is something else used? Thanks. Casey -- -- Guozhang
RE: Consumer pauses when running many threads
Yes, we have. Our SA where this is occurring has been monitoring this. When the consumers went down, we could see that things were lagging. Yesterday, they lowered the number of threads for the consumers to six each and they haven't shut down yet. There appears to still be some lag, but since the consumers are running, it's decreasing. A test was run with each broker configured to have 32 partitions each and when the number of threads across the consumers exceeds 32, then we have issues. My understanding from the documentation is that when you set the number of partitions on a broker, it's just for that broker, correct? Therefore, if we set each broker to have 32 partitions, across 4 brokers we should have 128 partitions per topic, correct? In which case, we should be able to run 128 consumer threads with ease. Casey -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Thursday, August 01, 2013 11:13 AM To: users@kafka.apache.org Subject: Re: Consumer pauses when running many threads Have you looked at https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped%2Cwhy%3F? Thanks, Jun On Thu, Aug 1, 2013 at 7:30 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, We're seeing an issue running 0.7.0 where one or more of our consumers are pausing after about an hour when we have a lot of threads configured. Our setup is as follows: * 4 brokers configured for 32 threads and 32 partitions on each broker. * 2 consumers each processing 40 streams (24 and 16). * Zookeeper server is a CDH version that's at least 3.3.4. We were also seeing this with 3 consumers running 18 threads each. As you can tell, the hardware is quite beefy and the brokers are described as being bored. Outside of upgrading to 0.7.2, which we are planning on doing but can't yet, what else can we look into to try to resolve this or at least determine what's happening? Thanks. Casey
Consumer pauses when running many threads
Hello, We're seeing an issue running 0.7.0 where one or more of our consumers are pausing after about an hour when we have a lot of threads configured. Our setup is as follows: * 4 brokers configured for 32 threads and 32 partitions on each broker. * 2 consumers each processing 40 streams (24 and 16). * Zookeeper server is a CDH version that's at least 3.3.4. We were also seeing this with 3 consumers running 18 threads each. As you can tell, the hardware is quite beefy and the brokers are described as being bored. Outside of upgrading to 0.7.2, which we are planning on doing but can't yet, what else can we look into to try to resolve this or at least determine what's happening? Thanks. Casey
RE: Client improvement discussion
In the past there was some discussion about having a C client for non-JVM languages. Is this still planned as well? Being able to work with Kafka from other languages would be a great thing. Where I work, we interact with Kafka via Java and Ruby (producer), so having an official C library that can be used from within Ruby would make it easier to have the same version of the client in Java and Ruby. -Original Message- From: Jay Kreps [mailto:jay.kr...@gmail.com] Sent: Friday, July 26, 2013 3:00 PM To: d...@kafka.apache.org; users@kafka.apache.org Subject: Client improvement discussion I sent around a wiki a few weeks back proposing a set of client improvements that essentially amount to a rewrite of the producer and consumer java clients. https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite The below discussion assumes you have read this wiki. I started to do a little prototyping for the producer and wanted to share some of the ideas that came up to get early feedback. First, a few simple but perhaps controversial things to discuss. Rollout Phase 1: We add the new clients. No change on the server. Old clients still exist. The new clients will be entirely in a new package so there will be no possibility of name collision. Phase 2: We swap out all shared code on the server to use the new client stuff. At this point the old clients still exist but are essentially deprecated. Phase 3: We remove the old client code. Java I think we should do the clients in java. Making our users deal with scala's non-compatability issues and crazy stack traces causes people a lot of pain. Furthermore we end up having to wrap everything now to get a usable java api anyway for non-scala people. This does mean maintaining a substantial chunk of java code, which is maybe less fun than scala. But basically i think we should optimize for the end user and produce a standalone pure-java jar with no dependencies. Jars We definitely want to separate out the client jar. There is also a fair amount of code shared between both (exceptions, protocol definition, utils, and the message set implementation). Two approaches. Two jar approach: split kafka.jar into kafka-clients.jar and kafka-server.jar with the server depending on the clients. The advantage of this is that it is simple. The disadvantage is that things like utils and protocol definition will be in the client jar though technical they belong equally to the server. Many jar approach: split kafka.jar into kafka-common.jar, kafka-producer.jar, kafka-consumer.jar, kafka-admin.jar, and kafka-server.jar. The disadvantage of this is that the user needs two jars (common + something) which is for sure going to confuse people. I also think this will tend to spawn more jars over time. Background threads I am thinking of moving both serialization and compression out of the background send thread. I will explain a little about this idea below. Serialization I am not sure if we should handle serialization in the client at all. Basically I wonder if our own API wouldn't just be a lot simpler if we took a byte[] key and byte[] value and let people serialize stuff themselves. Injecting a class name for us to create the serializer is more roundabout and has a lot of problems if the serializer itself requires a lot of configuration or other objects to be instantiated. Partitioning The real question with serialization is whether the partitioning should happen on the java object or on the byte array key. The argument for doing it on the java object is that it is easier to do something like a range partition on the object. The problem with doing it on the object is that the consumer may not be in java and so may not be able to reproduce the partitioning. For example we currently use Object.hashCode which is a little sketchy. We would be better off doing a standardized hash function on the key bytes. If we want to give the partitioner access to the original java object then obviously we need to handle serialization behind our api. Names I think good names are important. I would like to rename the following classes in the new client: Message=Record: Now that the message has both a message and a key it is more of a KeyedMessage. Another name for a KeyedMessage is a Record. MessageSet=Records: This isn't too important but nit pickers complain that it is not technically a Set but rather a List or Sequence but MessageList sounds funny to me. The actual clients will not interact with these classes. They will interact with a ProducerRecord and ConsumerRecord. The reason for having different fields is because the different clients Proposed producer API: SendResponse r = producer.send(new ProducerRecord(topic, key, value)) Protocol Definition Here is what I am thinking about protocol definition. I see a couple of problems with what we are doing currently. First the protocol definition is spread throughout a bunch of custom
RE: Duplicate Messages on the Consumer
Hello, No, we couldn't check the broker logs because the data is obfuscated, so we can't just look at the files and tell. It looks like our dev system may be experiencing the same issue, so I did turn of the obfuscation and we'll monitor it. However, on our production system where we were seeing the errors more often, appears to have had zookeeper misconfigured, so we're thinking that may be the issue. Casey -Original Message- From: Philip O'Toole [mailto:phi...@loggly.com] Sent: Thursday, July 18, 2013 3:29 PM To: users@kafka.apache.org Cc: kafka-us...@incubator.apache.org Subject: Re: Duplicate Messages on the Consumer Have you actually examined the Kafka files on disk, to make sure those dupes are really there? Or is this a case of reading the same message more than once? Philip On Thu, Jul 18, 2013 at 8:55 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, We recently started seeing duplicate messages appearing at our consumers. Thankfully, the database is set up so that we don't store the dupes, but it is annoying. It's not every message, only about 1% of them. We are running 0.7.0 for the broker with Zookeeper 3.3.4 from Cloudera and 0.7.0 for the producer and consumer. We tried upgrading the consumer to 0.7.2 to see if that worked, but we're still seeing the dupes. Do we have to upgrade the broker as well to resolve this? Is there something we can check to see what's going on because we're not seeing anything unusual in the logs. I suspected that there may be significant rebalancing, but that does not appear to be the case at all. Casey Sybrandy
Duplicate Messages on the Consumer
Hello, We recently started seeing duplicate messages appearing at our consumers. Thankfully, the database is set up so that we don't store the dupes, but it is annoying. It's not every message, only about 1% of them. We are running 0.7.0 for the broker with Zookeeper 3.3.4 from Cloudera and 0.7.0 for the producer and consumer. We tried upgrading the consumer to 0.7.2 to see if that worked, but we're still seeing the dupes. Do we have to upgrade the broker as well to resolve this? Is there something we can check to see what's going on because we're not seeing anything unusual in the logs. I suspected that there may be significant rebalancing, but that does not appear to be the case at all. Casey Sybrandy
RE: NoBrokersForPartitionException
Jun, Unfortunately, upgrades are slow to occur with respect to projects, so I don't know when this will occur. However, it looks like we will be upgrading over the next couple months to 0.7.2. Regardless, what is causing this? Is this a bug in Kafka or is it something that triggered by something we did? I only ask because it looked like the directories for the specific topic I was looking for disappeared, so it seemed like someone deleted them. Casey -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Thursday, July 11, 2013 1:17 AM To: users@kafka.apache.org Subject: Re: NoBrokersForPartitionException Could you try 0.7.2? Thanks, Jun On Wed, Jul 10, 2013 at 11:38 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, Apologies for bringing this back from the dead, but I'm getting the same exception using Kafka 0.7.0. What could be causing this? Thanks. Casey -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Tuesday, March 12, 2013 12:14 AM To: users@kafka.apache.org Subject: Re: NoBrokersForPartitionException Which version of Kafka are you using? Thanks, Jun On Mon, Mar 11, 2013 at 12:30 PM, Ott, Charles H. charles.h@saic.com wrote: I am trying to do something like this: 1.)Java Client Producer(Sever A) - Zookeeper (Server B) to getKafka service. 2.)Zookeeper gives IP for Kafka (Server C) to Producer (Server A) 3.)Producer (Server A) attempts to publish message to Kafka (Server C) using IP resolved from zookeeper. I am getting an error when attempting to write a message to a Kafka topic. kafka.common.NoBrokersForPartitionException: Partition = null at kafka.producer.Producer.kafka$producer$Producer$$getPartitionListFor To pi c(Producer.scala:167) at kafka.producer.Producer$$anonfun$3.apply(Producer.scala:116) at kafka.producer.Producer$$anonfun$3.apply(Producer.scala:105) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike. sc ala:206) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike. sc ala:206) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimiz ed .s cala:34) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32) at scala.collection.TraversableLike$class.map(TraversableLike.scala:206 ) at scala.collection.mutable.WrappedArray.map(WrappedArray.scala:32) at kafka.producer.Producer.zkSend(Producer.scala:105) at kafka.producer.Producer.send(Producer.scala:99) at kafka.javaapi.producer.Producer.send(Producer.scala:103) at com.saic.project.kafka.KafkaProducerConnection.push(KafkaProducerCon ne ct ion.java:76) I believe this implies that the Java Client cannot publish to the Kafka server. How would I go about trouble shooting this? What does NoBrokersForPartition mean? Currently I have a different Client (Server D) that is able to publish messages with custom topic to Server C without error. Thanks, Charles
RE: Arguments for Kafka over RabbitMQ ?
IIRC, I think I tried to use stunnel with Kafka once and it worked fine and the configuration wasn't too bad, at least for a simple configuration. -Original Message- From: Dragos Manolescu [mailto:dragos.manole...@servicenow.com] Sent: Friday, June 07, 2013 4:51 PM To: users@kafka.apache.org Subject: Re: Arguments for Kafka over RabbitMQ ? Thank you Marc (and others) for jumping in and sharing your perspectives! A feature that Kafka doesn't currently support while RabbitMQ does (since about a year ago, I don't remember exactly) is SSL support. I realize that one can set up a tunnel between data centers, etc.; that would require more (configuration) work than SSL though. I am surprised that this difference hasn't come up :o Thanks, -Dragos On 6/6/13 6:09 PM, Marc Labbe mrla...@gmail.com wrote: There are two things where RabbitMQ would have given us less work out of the box as opposed to Kafka. RabbitMQ also provides a bunch of tools that makes it rather attractive too.
RE: Binary Data and Kafka
That's what I would have assumed. And no, we're not using compression. Thanks. From: Jun Rao [mailto:jun...@gmail.com] Sent: Wednesday, May 08, 2013 11:26 AM To: users@kafka.apache.org Cc: Sybrandy, Casey Subject: Re: Binary Data and Kafka No. Kafka broker stores the binary data as it is. The binary data may be compressed, if compression is enabled at the producer. Thanks, Jun On Wed, May 8, 2013 at 5:57 AM, Sybrandy, Casey casey.sybra...@six3systems.commailto:casey.sybra...@six3systems.com wrote: All, Does the Kafka broker Base64 encode the messages? We are sending binary data to the brokers and I looked at the logs to confirm that they data was being stored, however all of the data, with a few exceptions, looks to be Base64 encoded. I didn't expect this, so I wanted to ask and confirm what I'm seeing. If this is true, does this affect the size of the message when fetching? In other words, if I send a 100K message, do I have to make sure I can fetch a 300K message since the message can now be 300K in size because of the encoding? Casey Sybrandy MSWE Six3Systems Cyber and Enterprise Systems Group www.six3systems.comhttp://www.six3systems.com 301-206-6000tel:301-206-6000 (Office) 301-206-6020tel:301-206-6020 (Fax) 11820 West Market Place Suites N-P Fulton, MD. 20759
RE: Encryption at rest?
Hello, IIRC, no, it does not. Where I work, one team had the same issue and built some custom code to handle the encryption and decryption of messages at the producer and consumer. However, you have to take key management into account as once a message is written to the broker, you can't decrypt/re-encrypt to change the key. This can be an issue if you have to replay messages. -Original Message- From: Chris Curtin [mailto:curtin.ch...@gmail.com] Sent: Monday, April 01, 2013 4:07 PM To: users Subject: Encryption at rest? Hi, Does Kafka support encrypting data at rest? During my AJUG presentation someone asked if the files could be encrypted to address PII needs? Thanks, Chris
RE: FW: Zookeeper Configuration Question
Apologies for not responding sooner. My mail client must have been malfunctioning at the time as I never saw your responses until today. As for the error, it looks like it's a bug on my part that just didn't click until I read Jim's responses. I have a config file that I specify the options in and I copied it from a Configuration object to a Properties object, as the producer/consumer requires. Didn't realize until this morning that it was not working as expected. On a different note: does anyone know how to create a namespace in Zookeeper? We're having some issues I'm trying to debug so I want to isolate some of our brokers, but finding documentation on this has been fruitless. Thanks! From: Neha Narkhede [neha.narkh...@gmail.com] Sent: Thursday, November 29, 2012 4:39 PM To: users@kafka.apache.org Subject: Re: FW: Zookeeper Configuration Question Please can you send around the log that shows the zookeeper connection error ? I would like to see if it fails at connection establishment or session establishment. Thanks, Neha On Thu, Nov 29, 2012 at 1:19 PM, James A. Robinson jim.robin...@stanford.edu wrote: On Thu, Nov 29, 2012 at 1:15 PM, James A. Robinson jim.robin...@stanford.edu wrote: For my kafka startup I point to the zookeeper cluster like so: --kafka-zk-connect logproc-dev-03:2181,logproc-dev-03:2182,logproc-dev-03:2183 Sorry, wrong copy and paste! For the kafka startup I point to the zookeeper cluster like so (in the properties file): zk.connect=logproc-dev-03.highwire.org:2181,logproc-dev-03.highwire.org:2182,logproc-dev-03.highwire.org:2183
RE: Logging which broker a message was sent to
I'll try that out. Thanks! From: Jun Rao [jun...@gmail.com] Sent: Monday, December 10, 2012 1:04 PM To: users@kafka.apache.org Subject: Re: Logging which broker a message was sent to So, you are using Producer, not SyncProducer. Assuming that you are using DefaultEventHandler, there is a trace level logging that tells you which broker a request is sent to. Thanks, Jun On Mon, Dec 10, 2012 at 8:10 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Is it at least possible to see which broker a message is sent to? I'm using a Zookeeper based producer and we have multiple brokers in our environment. If I can tell which broker a message is sent to, that would be a big help. From: Jun Rao [jun...@gmail.com] Sent: Monday, December 10, 2012 11:07 AM To: users@kafka.apache.org Subject: Re: Logging which broker a message was sent to If you use -1 (ie, a random partition) as the partition #, there is no easy way to know which partition that the broker picks. However, you can explicitly specify the partition # in the request itself. Thanks, Jun On Mon, Dec 10, 2012 at 7:26 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Is it possible to log/see which broker, and perhaps partition, a producer sent a message to? I'm using the SyncProducer if that matters.