RE: Logs and distributed systems
Thanks for sharing. Best Regards Jerry -Original Message- From: Jay Krepslt;jay.kr...@gmail.comgt; To: users@kafka.apache.orglt;users@kafka.apache.orggt;; lt;d...@samza.incubator.apache.orggt;; Cc: Sent: 2013-12-17 (星期二) 06:00:17 Subject: Logs and distributed systems For anyone that's interested, I did a blog post on logs, stream processing, and distributed systems: http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Kafka producer behavior
Hi All, We are having kafka cluster of 2 nodes. (using 0.8.0 final release) Replication Factor: 2 Number of partitions: 2 I have created a topic test-topic1 in kafka. When i am listing status of that topic using bin/kafka-list-topic.sh, the status is: topic: test-topic1partition: 0leader: 0 replicas: 0,1 isr: 0,1 topic: test-topic1partition: 1leader: 1 replicas: 1,0 isr: 1,0 As both partition are on two separate nodes so when we produce the data it should be go to both nodes. But when i insert the data, it is going to only one node. For example if i insert 1000 messages then all 1000 messages will go either node1 or node2. Data is not evenly distributed on both nodes. Expected: 500 messages should go to node1 and 500 messages should go to node2. Any suggestion why i am facing this behavior? -- *Thanks Regards* *Hanish Bansal*
Consumer Group Rebalance Issues
Hey all, I've recently been having problems with consumer groups rebalancing. I'm using several high level consumers which all belong to the same group. Occasionally one or two of them will get stuck in a rebalance loop. They attempt to rebalance, but the partitions they try to claim are owned. Anyone run into this? Ideas? I see errors in my zookeeper logs like: 2013-12-17 17:12:31,171 [myid:001] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x342e4febc180852 type:create cxid:0x1a9a zxid:0x501390d4b txntype:-1 reqpath:n/a Error Path:/kafka/consumers/trackingGroup/owners/Events2/25 Error:KeeperErrorCode = NodeExists for /kafka/consumers/trackingGroup/owners/Events2/25 And errors in my kafka logs like: 2013-12-17 17:20:32 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], begin rebalancing consumer trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 try #8 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] Stopping leader finder thread 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] Stopping all fetchers 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] All connections stopped 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared all relevant queues for this fetcher 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared the data chunks in all the consumer message iterators 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Committing all offsets after clearing the fetcher queues 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Releasing partition ownership 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Consumer trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 rebalancing the following partitions: List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127) for topic Events2 with consumers: List(trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-0, trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-1, trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-0, trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-1, trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-0, trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-1, trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-0, trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-1, trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-0, trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-1, trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-0, trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-1, trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-0, trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-1, trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-0, trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-1, trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-0, trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-1, trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-0, trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-1, trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-0, trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-1, trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-0, trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-1, trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-0, trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-1, trackingGroup_prod-storm-sup-trk013-1387249530700-b323ee53-0, trackingGroup_prod-storm-sup-trk013-1387249530700-b323ee53-1, trackingGroup_prod-storm-sup-trk014-1387249529916-e32e6363-0, trackingGroup_prod-storm-sup-trk014-1387249529916-e32e6363-1, trackingGroup_prod-storm-sup-trk015-1387249529709-d655ccd4-0, trackingGroup_prod-storm-sup-trk015-1387249529709-d655ccd4-1, trackingGroup_prod-storm-sup-trk016-1387249531064-bc8f8f3e-0, trackingGroup_prod-storm-sup-trk016-1387249531064-bc8f8f3e-1, trackingGroup_prod-storm-sup-trk017-1387249530635-35f505b7-0,
Re: Kafka producer behavior
Hello, This issue is known as in this JIRA: https://issues.apache.org/jira/browse/KAFKA-1067 Guozhang On Tue, Dec 17, 2013 at 8:48 AM, Gerrit Jansen van Vuuren gerrit...@gmail.com wrote: hi, I've had the same issue with the kafka producer. you need to use a different partitioner than the default one provided for kafka. I've created a round robin partitioner that works well for equally distributing data across partitions. https://github.com/gerritjvv/pseidon/blob/master/pseidon-kafka/java/pseidon/kafka/util/RoundRobinPartitioner.java On Tue, Dec 17, 2013 at 5:32 PM, Hanish Bansal hanish.bansal.agar...@gmail.com wrote: Hi All, We are having kafka cluster of 2 nodes. (using 0.8.0 final release) Replication Factor: 2 Number of partitions: 2 I have created a topic test-topic1 in kafka. When i am listing status of that topic using bin/kafka-list-topic.sh, the status is: topic: test-topic1partition: 0leader: 0 replicas: 0,1 isr: 0,1 topic: test-topic1partition: 1leader: 1 replicas: 1,0 isr: 1,0 As both partition are on two separate nodes so when we produce the data it should be go to both nodes. But when i insert the data, it is going to only one node. For example if i insert 1000 messages then all 1000 messages will go either node1 or node2. Data is not evenly distributed on both nodes. Expected: 500 messages should go to node1 and 500 messages should go to node2. Any suggestion why i am facing this behavior? -- *Thanks Regards* *Hanish Bansal* -- -- Guozhang
Re: redis versus zookeeper to track consumer offsets
ZK was designed from the start as a clustered, consistent, highly available store for this sort of data and it works extremely well. Redis wasn't and I don't know anyone using Redis in production, including me, who doesn't have stories of Redis losing data. I'm sticking with ZK. On Tue, Dec 17, 2013 at 10:57 AM, S Ahmed sahmed1...@gmail.com wrote: I am leaning towards using redis to track consumer offsets etc., but I see how using zookeeper makes sense since it already part of the kafka infra. One thing which bothers me is, how are you guys keeping track of the load on zookeeper? How do you get an idea when your zookeeper cluster is underprovisioned? Redis is a richer store and could help in other areas where you want to store more than just status information or offsets, and setup and administration wise it seems a bit easier to manage. Thoughts?
Re: Connection Timeouts
I'm on Kafka 0.8 final. Both brokers are up. The behavior is my producer produces messages just fine, then it pauses for a few seconds. Then it continues. The brokers are not stopping and starting. The broker logs show that another producer/consumer has a connection error at the same time my producer pauses. The exception at my producer (that pauses) indicates that the connection was aborted, which to my understanding usually indicates that the TCP connection was closed underneath it due to the broker end not responding. It's as if the socket error indicated in the logs below for 10.236.67.30 (the other producer/consumer) causes the broker to hang long enough for the TCP connection for my producer to timeout. Thanks - What version of kafka are you on? It seems like your producers are not seeing your broker(s). can you confirm brokers are up? On Mon, Dec 16, 2013 at 7:52 PM, Tom Amon ta46...@gmail.com wrote: Hi All, I have a situation where one producer/consumer is causing timeout errors on the Kafka broker. The exception in the logs looks like this: [2013-12-16 17:32:25,992] ERROR Closing socket for /10.236.67.30 because of error (kafka.network.Processor) java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at kafka.utils.Utils$.read(Utils.scala:395) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:347) at kafka.network.Processor.run(SocketServer.scala:245) at java.lang.Thread.run(Thread.java:662) When this happens, _another separate_ producer hangs for about 2-5 seconds. In the producer log I get this exception: [2013-12-16 16:32:14,961] INFO Disconnecting from qa-hermes004.phx.qa.com:9092 (kafka.producer.SyncProducer) [2013-12-16 16:32:14,982] WARN Failed to send producer request with correlation id 3290 to broker 1 with data for partitions [ktr3,4] (kafka.producer.async.DefaultEventHandler) java.io.IOException: An established connection was aborted by the software in your host machine. at sun.nio.ch.SocketDispatcher.writev0(Native Method) at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:49) at sun.nio.ch.IOUtil.write(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.write0(SocketChannelImpl.java:377) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:400) at java.nio.channels.SocketChannel.write(SocketChannel.java:371) at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) at kafka.network.Send$class.writeCompletely(Transmission.scala:75) at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) at kafka.network.BlockingChannel.send(BlockingChannel.scala:92) at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72) at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71) at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102) at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102) at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101) at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101) at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.producer.SyncProducer.send(SyncProducer.scala:100) at kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:245) at kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:107) at kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:101) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at
RE: a consumer question
Sorry, a typo. Correct my question. When consumer.timeout.ms is set to 0, if there is no message available, hasNext() throws a timeout exception, otherwise it returns true. Is that the right behavior? Regards, Libo -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Tuesday, December 17, 2013 12:40 AM To: users@kafka.apache.org Subject: Re: a consumer question If there is a message, hasNext() returns true, not throwing an exception. Thanks, Jun On Mon, Dec 16, 2013 at 11:29 AM, Yu, Libo libo...@citi.com wrote: Hi folks, For this parameters, if consumer.timeout.ms is set to 0, whenever I call ConsumerIterator's hasNext(), if there is a message available, a timeout exception will be thrown. Is my understanding correct? Thanks. consumer.timeout.ms -1 Throw a timeout exception to the consumer if no message is available for consumption after the specified interval Regards, Libo
Re: Migrating a cluster from 0.8.0 to 0.8.1
There are no compatibility issues. You can roll upgrades through the cluster one node at a time. Thanks Neha On Tue, Dec 17, 2013 at 9:15 AM, Drew Goya d...@gradientx.com wrote: So I'm going to be going through the process of upgrading a cluster from 0.8.0 to the trunk (0.8.1). I'm going to be expanding this cluster several times and the problems with reassigning partitions in 0.8.0 mean I have to move to trunk(0.8.1) asap. Will it be safe to roll upgrades through the cluster one by one? Also are there any client compatibility issues I need to worry about? Am I going to need to pause/upgrade all my consumers/producers at once or can I roll upgrades through the cluster and then upgrade my clients one by one? Thanks in advance!
Re: redis versus zookeeper to track consumer offsets
Interesting, wasn't aware of that. Can you comment on how you go about monitoring your ZK cluster in terms of throughput and if it is reaching its limits? Or is it even possible to do this? On Tue, Dec 17, 2013 at 2:01 PM, Benjamin Black b...@b3k.us wrote: ZK was designed from the start as a clustered, consistent, highly available store for this sort of data and it works extremely well. Redis wasn't and I don't know anyone using Redis in production, including me, who doesn't have stories of Redis losing data. I'm sticking with ZK. On Tue, Dec 17, 2013 at 10:57 AM, S Ahmed sahmed1...@gmail.com wrote: I am leaning towards using redis to track consumer offsets etc., but I see how using zookeeper makes sense since it already part of the kafka infra. One thing which bothers me is, how are you guys keeping track of the load on zookeeper? How do you get an idea when your zookeeper cluster is underprovisioned? Redis is a richer store and could help in other areas where you want to store more than just status information or offsets, and setup and administration wise it seems a bit easier to manage. Thoughts?
Kafka 0.8.0 server-stop.sh does not stop broker
Hi, We have been trying out the kafka 0.8.0 beta1 for awhile and recently attempted to upgrade to 0.8.0 but noticed that the stop server script doesn't seem to stop the broker anymore. I noticed here[1] that a commit was made before the release to change the signal sent to stop the broker from SIGTERM to SIGINT. Changing this script back to using SIGTERM seems to fix the issue for me. Has anyone else noticed this, is there a bug or should I log one? [1] - https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 -Bryan
Re: Connection Timeouts
It appears that consumers that do not get messages regularly are timing out every 30 seconds. This interval coincides with the default setting for socket.timeout.ms at the consumer. When the timeout happens it looks like the broker socket hangs for a few seconds, causing all other connected consumers and producers to hang along with it. Producers recover by reconnecting, consumers do the same. The exception in the consumer log (every 30 seconds) is below. Am I misreading something? What is supposed to happen with consumers that don't regularly consume messages? Can a consumer timeout affect the broker socket such that it causes it to hang long enough for other connected producers to abort their connections? [2013-12-17 00:00:25,107] INFO Reconnect due to socket error: (kafka.consumer.SimpleConsumer) java.net.SocketTimeoutException at java.io.InterruptedIOException.init(InterruptedIOException.java:43) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:184) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:98) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:232) at kafka.utils.Utils$.read(Utils.scala:374) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$sendRequest(SimpleConsumer.scala:71) at kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) --- When you say it pauses, do you mean producing and consuming? Can you get metrics form before that is happening, during and after? Could be gc pauses ... are you using this http://kafka.apache.org/documentation.html#java or defaults? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Dec 17, 2013 at 2:01 PM, Tom Amon ta46...@gmail.com wrote: I'm on Kafka 0.8 final. Both brokers are up. The behavior is my producer produces messages just fine, then it pauses for a few seconds. Then it continues. The brokers are not stopping and starting. The broker logs show that another producer/consumer has a connection error at the same time my producer pauses. The exception at my producer (that pauses) indicates that the connection was aborted, which to my understanding usually indicates that the TCP connection was closed underneath it due to the broker end not responding. It's as if the socket error indicated in the logs below for 10.236.67.30 (the other producer/consumer) causes the broker to hang long enough for the TCP connection for my producer to timeout. Thanks -- --- What version of kafka are you on? It seems like your producers are not seeing your broker(s). can you confirm brokers are up? On Mon, Dec 16, 2013 at 7:52 PM, Tom Amon ta46...@gmail.com wrote: Hi All, I have a situation where one producer/consumer is causing timeout errors on the Kafka broker. The exception in the logs looks like this: [2013-12-16 17:32:25,992] ERROR Closing socket for /10.236.67.30 because of error (kafka.network.Processor) java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at
Re: ClientUtils.fetchTopicMetadata reports smaller ISR than ZkUtils.getLeaderIsrAndEpochForPartition
Sorry it's taken so long to reply, the issue went away after I reassigned partitions. Now it's back. I haven't checked JMX, because the brokers and zookeeper have been reporting the same ISR for several hours. Some more details: The cluster/topic has 5 brokers (1, 4, 5, 7, 8) 15 partitions (0...14) 2 replicas A single broker, 4, is the one missing from the ISR in every case. For partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For partitions where 4 is not the leader (4, 8, 12), it is not present in the ISR. Here's the output of my tool, showing assignment and ISR: https://gist.github.com/also/8012383#file-from-brokers-txt I haven't seen anything interesting in the logs, but I'm not entirely sure what to look for. The cluster is currently in this state, and if it goes like last time, this will persist until I reassign partitions. What can I do in the meantime to track down the issue? Thanks, Ryan On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao jun...@gmail.com wrote: Do you see any ISR churns on the brokers? You can check the ISR expand/shrink rate jmx. Thanks, Jun On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen rberd...@hubspot.com wrote: I'm working on some monitoring tools for Kafka, and I've seen a couple of clusters get into a state where ClientUtils.fetchTopicMetadata will show that not all replicas are in the ISR. At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show that all all partitions are in the ISR, and the kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager MBean will report 0. What's going on? Is there something wrong with my controller, or should I not be paying attention to ClientUtils.fetchTopicMetadata? Thanks, Ryan
Re: redis versus zookeeper to track consumer offsets
You can do this indirectly by monitoring the avg/max latency of operations on zookeeper. There is no direct way of measuring the requests/sec to zookeeper since they don't expose the relevant jmx metrics. Thanks, Neha On Tue, Dec 17, 2013 at 11:13 AM, S Ahmed sahmed1...@gmail.com wrote: Interesting, wasn't aware of that. Can you comment on how you go about monitoring your ZK cluster in terms of throughput and if it is reaching its limits? Or is it even possible to do this? On Tue, Dec 17, 2013 at 2:01 PM, Benjamin Black b...@b3k.us wrote: ZK was designed from the start as a clustered, consistent, highly available store for this sort of data and it works extremely well. Redis wasn't and I don't know anyone using Redis in production, including me, who doesn't have stories of Redis losing data. I'm sticking with ZK. On Tue, Dec 17, 2013 at 10:57 AM, S Ahmed sahmed1...@gmail.com wrote: I am leaning towards using redis to track consumer offsets etc., but I see how using zookeeper makes sense since it already part of the kafka infra. One thing which bothers me is, how are you guys keeping track of the load on zookeeper? How do you get an idea when your zookeeper cluster is underprovisioned? Redis is a richer store and could help in other areas where you want to store more than just status information or offsets, and setup and administration wise it seems a bit easier to manage. Thoughts?
Re: FW: Unable to start consumers in Tomcat
That error comes from you calling create message stream twice or the container you are running in is causing this to be called twice https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/consumer/ ZookeeperConsumerConnector.scala#L133from , don't do that and you won't get the error. On Mon, Dec 16, 2013 at 2:25 PM, Seshadri, Balaji balaji.sesha...@dish.comwrote: Any update on this guys ?. -Original Message- From: Seshadri, Balaji Sent: Saturday, December 14, 2013 4:22 PM To: users@kafka.apache.org Subject: RE: Unable to start consumers in Tomcat We are doing one scala consumer and one java consumer who listen on same topic with different group id.The issue is if I start both second one fails with this error.If we run just one its working fine. Attaching Scala code and Java code where we call createMessageStreams.It seems like issue with API not recognizing that its different group of consumers that are started because createMessageStream doesnot accept group.id but should refer already passed group.id to make sure second on is new stream and not related even we listen on same topic. Please let me know if it is API issue or way we handle API. -Original Message- From: Joe Stein [mailto:joe.st...@stealth.ly] Sent: Friday, December 13, 2013 4:36 PM To: users@kafka.apache.org Subject: Re: Unable to start consumers in Tomcat you do this at the consumer connector level not at the message stream level so one propsA.put(group.id, groupA) be val configA = new ConsumerConfig(propsA) val one = Consumer.create(configA) and another propsB.put(group.id, groupB) be val configB = new ConsumerConfig(propsB) val two = Consumer.create(configB) so you then manage each Consumer connector created by each of those .create in separate threads (or actors or processes or however you are doing concurrency) and you only call the create message stream each once but each once on each of them makes sense? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Fri, Dec 13, 2013 at 6:29 PM, Seshadri, Balaji balaji.sesha...@dish.comwrote: Cant we create message stream for same topic but different consumer group ?. -Original Message- From: Joe Stein [mailto:joe.st...@stealth.ly] Sent: Friday, December 13, 2013 4:23 PM To: users@kafka.apache.org Subject: Re: Unable to start consumers in Tomcat You should check your code to verify that it is only called once per instantiated consumer connector. Here is where the exception is thrown https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/con sumer/ZookeeperConsumerConnector.scala#L133from /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Fri, Dec 13, 2013 at 6:17 PM, Seshadri, Balaji balaji.sesha...@dish.comwrote: We needed HTTP interface to start our consumers using REST interface for management that why we chose tomcat to run our consumers. We create streams only once.when we initially start consumer. -Original Message- From: Joe Stein [mailto:joe.st...@stealth.ly] Sent: Friday, December 13, 2013 4:12 PM To: users@kafka.apache.org Subject: Re: Unable to start consumers in Tomcat That error comes if you are trying to create a message/stream on a consumer instantiated object more than once. Why are you using tomcat for the consumers? Is it to see the results of messages? if so you need to isolate the Consumer in some way so there is a singleton (assuming one partition or if more than one partition some type of consumer pool) and the messages being consumed write to a thread safe memory structure and then the UI can read from that thread safe memory structure. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Fri, Dec 13, 2013 at 5:42 PM, Seshadri, Balaji balaji.sesha...@dish.comwrote: 0.8 -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Friday, December 13, 2013 3:33 PM To: users@kafka.apache.org Subject: Re: Unable to start consumers in Tomcat Which version of kafka are you using? On Fri, Dec 13, 2013 at 2:29 PM, Seshadri, Balaji balaji.sesha...@dish.comwrote: Any idea on this error guys ?. -Original
Re: a consumer question
If there is no more messages, hasNext will return false instead of throwing an exception. Guozhang On Tue, Dec 17, 2013 at 11:53 AM, Yu, Libo libo...@citi.com wrote: Sorry, a typo. Correct my question. When consumer.timeout.ms is set to 0, if there is no message available, hasNext() throws a timeout exception, otherwise it returns true. Is that the right behavior? Regards, Libo -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Tuesday, December 17, 2013 12:40 AM To: users@kafka.apache.org Subject: Re: a consumer question If there is a message, hasNext() returns true, not throwing an exception. Thanks, Jun On Mon, Dec 16, 2013 at 11:29 AM, Yu, Libo libo...@citi.com wrote: Hi folks, For this parameters, if consumer.timeout.ms is set to 0, whenever I call ConsumerIterator's hasNext(), if there is a message available, a timeout exception will be thrown. Is my understanding correct? Thanks. consumer.timeout.ms -1 Throw a timeout exception to the consumer if no message is available for consumption after the specified interval Regards, Libo -- -- Guozhang
Re: Killing broker leader
It is worth mentioning you can reduce the likelyhood of loosing message by running the controlled shutdown before killing the broker. https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-Whatiscontrolledshutdown? The connection refused is a bit surprising though. The consumer should normally react by switching to the new leader. Are you getting this error from the killed broker or the new leader? On Tue, Dec 17, 2013 at 7:54 PM, Guozhang Wang wangg...@gmail.com wrote: Hello Francois, What is the producer ack value in your console producer? If it is equal to 1 then when a leader is down it is possible to lose data, and hence not consumed by the consumer. Guozhang On Tue, Dec 17, 2013 at 2:39 PM, Francois Langelier francois.langel...@mate1inc.com wrote: Hi, I installed zookeeper and kafka 8.0 following the quick start ( https://kafka.apache.org/documentation.html#quickstart) and when i try to kill my leader, i got a lot of exception in my producer and consumer consoles. Then, after the exceptions stop printing, some of the messages I produce in my console don't print in my consumer console... The exception I get is java.net.ConnectException : Connection refused. Did someone already had this problem? Thank you! PS: I have 3 brokers running on my system. -- -- Guozhang
regarding run-simulator.sh
i am not able to find run-simulator.sh in 0.8 even after building perf.if this tool has been deprecated what are other alternatives available now for perf testing? Regards, Pushkar
Re: Consumer Group Rebalance Issues
Have you looked at https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog ? Thanks, Jun On Tue, Dec 17, 2013 at 9:24 AM, Drew Goya d...@gradientx.com wrote: Hey all, I've recently been having problems with consumer groups rebalancing. I'm using several high level consumers which all belong to the same group. Occasionally one or two of them will get stuck in a rebalance loop. They attempt to rebalance, but the partitions they try to claim are owned. Anyone run into this? Ideas? I see errors in my zookeeper logs like: 2013-12-17 17:12:31,171 [myid:001] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x342e4febc180852 type:create cxid:0x1a9a zxid:0x501390d4b txntype:-1 reqpath:n/a Error Path:/kafka/consumers/trackingGroup/owners/Events2/25 Error:KeeperErrorCode = NodeExists for /kafka/consumers/trackingGroup/owners/Events2/25 And errors in my kafka logs like: 2013-12-17 17:20:32 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], begin rebalancing consumer trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 try #8 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] Stopping leader finder thread 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] Stopping all fetchers 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] All connections stopped 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared all relevant queues for this fetcher 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared the data chunks in all the consumer message iterators 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Committing all offsets after clearing the fetcher queues 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Releasing partition ownership 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Consumer trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 rebalancing the following partitions: List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127) for topic Events2 with consumers: List(trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-0, trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-1, trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-0, trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-1, trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-0, trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-1, trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-0, trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-1, trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-0, trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-1, trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-0, trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-1, trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-0, trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-1, trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-0, trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-1, trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-0, trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-1, trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-0, trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-1, trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-0, trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-1, trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-0, trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-1, trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-0, trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-1, trackingGroup_prod-storm-sup-trk013-1387249530700-b323ee53-0, trackingGroup_prod-storm-sup-trk013-1387249530700-b323ee53-1, trackingGroup_prod-storm-sup-trk014-1387249529916-e32e6363-0, trackingGroup_prod-storm-sup-trk014-1387249529916-e32e6363-1,
Re: Kafka 0.8.0 server-stop.sh does not stop broker
Which OS are you on? Thanks, Jun On Tue, Dec 17, 2013 at 11:15 AM, Bryan Baugher bjb...@gmail.com wrote: Hi, We have been trying out the kafka 0.8.0 beta1 for awhile and recently attempted to upgrade to 0.8.0 but noticed that the stop server script doesn't seem to stop the broker anymore. I noticed here[1] that a commit was made before the release to change the signal sent to stop the broker from SIGTERM to SIGINT. Changing this script back to using SIGTERM seems to fix the issue for me. Has anyone else noticed this, is there a bug or should I log one? [1] - https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 -Bryan
Re: Connection Timeouts
Did you change fetch.wait.max.ms in the consumer config? If so, did you make sure that it is smaller than socket.timeout.ms? Also, if you look at the request log, how long does it take to complete the timed out fetch request? Thanks, Jun On Tue, Dec 17, 2013 at 2:30 PM, Tom Amon ta46...@gmail.com wrote: It appears that consumers that do not get messages regularly are timing out every 30 seconds. This interval coincides with the default setting for socket.timeout.ms at the consumer. When the timeout happens it looks like the broker socket hangs for a few seconds, causing all other connected consumers and producers to hang along with it. Producers recover by reconnecting, consumers do the same. The exception in the consumer log (every 30 seconds) is below. Am I misreading something? What is supposed to happen with consumers that don't regularly consume messages? Can a consumer timeout affect the broker socket such that it causes it to hang long enough for other connected producers to abort their connections? [2013-12-17 00:00:25,107] INFO Reconnect due to socket error: (kafka.consumer.SimpleConsumer) java.net.SocketTimeoutException at java.io.InterruptedIOException.init(InterruptedIOException.java:43) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:184) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:98) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:232) at kafka.utils.Utils$.read(Utils.scala:374) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$sendRequest(SimpleConsumer.scala:71) at kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) --- When you say it pauses, do you mean producing and consuming? Can you get metrics form before that is happening, during and after? Could be gc pauses ... are you using this http://kafka.apache.org/documentation.html#java or defaults? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Dec 17, 2013 at 2:01 PM, Tom Amon ta46...@gmail.com wrote: I'm on Kafka 0.8 final. Both brokers are up. The behavior is my producer produces messages just fine, then it pauses for a few seconds. Then it continues. The brokers are not stopping and starting. The broker logs show that another producer/consumer has a connection error at the same time my producer pauses. The exception at my producer (that pauses) indicates that the connection was aborted, which to my understanding usually indicates that the TCP connection was closed underneath it due to the broker end not responding. It's as if the socket error indicated in the logs below for 10.236.67.30 (the other producer/consumer) causes the broker to hang long enough for the TCP connection for my producer to timeout. Thanks -- --- What version of kafka are you on? It seems like your producers are not seeing your broker(s). can you confirm brokers are up? On Mon, Dec 16, 2013 at 7:52 PM, Tom Amon ta46...@gmail.com wrote: Hi All, I have a situation where one producer/consumer is causing timeout errors on the Kafka broker. The exception in the logs looks like this: [2013-12-16 17:32:25,992] ERROR Closing socket for
Re: a consumer question
Actually, hasNext() only returns false when the consumer connector is shutdown. Typically, you either set consumer.timeout.ms to -1 or a value larger than 0. If it's set to 0, my guess is that it throws a timeout exception immediately if there is no more message. Thanks, Jun On Tue, Dec 17, 2013 at 4:57 PM, Guozhang Wang wangg...@gmail.com wrote: If there is no more messages, hasNext will return false instead of throwing an exception. Guozhang On Tue, Dec 17, 2013 at 11:53 AM, Yu, Libo libo...@citi.com wrote: Sorry, a typo. Correct my question. When consumer.timeout.ms is set to 0, if there is no message available, hasNext() throws a timeout exception, otherwise it returns true. Is that the right behavior? Regards, Libo -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Tuesday, December 17, 2013 12:40 AM To: users@kafka.apache.org Subject: Re: a consumer question If there is a message, hasNext() returns true, not throwing an exception. Thanks, Jun On Mon, Dec 16, 2013 at 11:29 AM, Yu, Libo libo...@citi.com wrote: Hi folks, For this parameters, if consumer.timeout.ms is set to 0, whenever I call ConsumerIterator's hasNext(), if there is a message available, a timeout exception will be thrown. Is my understanding correct? Thanks. consumer.timeout.ms -1 Throw a timeout exception to the consumer if no message is available for consumption after the specified interval Regards, Libo -- -- Guozhang
Re: ClientUtils.fetchTopicMetadata reports smaller ISR than ZkUtils.getLeaderIsrAndEpochForPartition
If a broker never joins an ISR, it could be that the fetcher died unexpectedly. Did you see any Error due to in the log of broker 4? Another thing to check is the max lag and the per partition lag in jmx. Thanks, Jun On Tue, Dec 17, 2013 at 4:09 PM, Ryan Berdeen rberd...@hubspot.com wrote: Sorry it's taken so long to reply, the issue went away after I reassigned partitions. Now it's back. I haven't checked JMX, because the brokers and zookeeper have been reporting the same ISR for several hours. Some more details: The cluster/topic has 5 brokers (1, 4, 5, 7, 8) 15 partitions (0...14) 2 replicas A single broker, 4, is the one missing from the ISR in every case. For partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For partitions where 4 is not the leader (4, 8, 12), it is not present in the ISR. Here's the output of my tool, showing assignment and ISR: https://gist.github.com/also/8012383#file-from-brokers-txt I haven't seen anything interesting in the logs, but I'm not entirely sure what to look for. The cluster is currently in this state, and if it goes like last time, this will persist until I reassign partitions. What can I do in the meantime to track down the issue? Thanks, Ryan On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao jun...@gmail.com wrote: Do you see any ISR churns on the brokers? You can check the ISR expand/shrink rate jmx. Thanks, Jun On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen rberd...@hubspot.com wrote: I'm working on some monitoring tools for Kafka, and I've seen a couple of clusters get into a state where ClientUtils.fetchTopicMetadata will show that not all replicas are in the ISR. At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show that all all partitions are in the ISR, and the kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager MBean will report 0. What's going on? Is there something wrong with my controller, or should I not be paying attention to ClientUtils.fetchTopicMetadata? Thanks, Ryan
Re: Killing broker leader
What's the replication factor of the topic? Is it larger than 1? You can find out using the list topic command. Thanks, Jun On Tue, Dec 17, 2013 at 2:39 PM, Francois Langelier francois.langel...@mate1inc.com wrote: Hi, I installed zookeeper and kafka 8.0 following the quick start ( https://kafka.apache.org/documentation.html#quickstart) and when i try to kill my leader, i got a lot of exception in my producer and consumer consoles. Then, after the exceptions stop printing, some of the messages I produce in my console don't print in my consumer console... The exception I get is java.net.ConnectException : Connection refused. Did someone already had this problem? Thank you! PS: I have 3 brokers running on my system.
Re: regarding run-simulator.sh
You can run kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh. Thanks, Jun On Tue, Dec 17, 2013 at 8:44 PM, pushkar priyadarshi priyadarshi.push...@gmail.com wrote: i am not able to find run-simulator.sh in 0.8 even after building perf.if this tool has been deprecated what are other alternatives available now for perf testing? Regards, Pushkar
Re: Kafka 0.8.0 server-stop.sh does not stop broker
RHEL 6.4 64bit Java 6u35 On Tue, Dec 17, 2013 at 10:57 PM, Jun Rao jun...@gmail.com wrote: Which OS are you on? Thanks, Jun On Tue, Dec 17, 2013 at 11:15 AM, Bryan Baugher bjb...@gmail.com wrote: Hi, We have been trying out the kafka 0.8.0 beta1 for awhile and recently attempted to upgrade to 0.8.0 but noticed that the stop server script doesn't seem to stop the broker anymore. I noticed here[1] that a commit was made before the release to change the signal sent to stop the broker from SIGTERM to SIGINT. Changing this script back to using SIGTERM seems to fix the issue for me. Has anyone else noticed this, is there a bug or should I log one? [1] - https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 -Bryan -- -Bryan
Re: Consumer Group Rebalance Issues
I explored that possibility but I'm not seeing any ZK session expirations in the logs and it doesn't look like these rebalances complete. They fail due to conflicts in the zookeeper data On Tue, Dec 17, 2013 at 8:53 PM, Jun Rao jun...@gmail.com wrote: Have you looked at https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog ? Thanks, Jun On Tue, Dec 17, 2013 at 9:24 AM, Drew Goya d...@gradientx.com wrote: Hey all, I've recently been having problems with consumer groups rebalancing. I'm using several high level consumers which all belong to the same group. Occasionally one or two of them will get stuck in a rebalance loop. They attempt to rebalance, but the partitions they try to claim are owned. Anyone run into this? Ideas? I see errors in my zookeeper logs like: 2013-12-17 17:12:31,171 [myid:001] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x342e4febc180852 type:create cxid:0x1a9a zxid:0x501390d4b txntype:-1 reqpath:n/a Error Path:/kafka/consumers/trackingGroup/owners/Events2/25 Error:KeeperErrorCode = NodeExists for /kafka/consumers/trackingGroup/owners/Events2/25 And errors in my kafka logs like: 2013-12-17 17:20:32 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], begin rebalancing consumer trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 try #8 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] Stopping leader finder thread 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] Stopping all fetchers 2013-12-17 17:20:33 ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1387249530381] All connections stopped 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared all relevant queues for this fetcher 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared the data chunks in all the consumer message iterators 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Committing all offsets after clearing the fetcher queues 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Releasing partition ownership 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO] [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Consumer trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 rebalancing the following partitions: List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127) for topic Events2 with consumers: List(trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-0, trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-1, trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-0, trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-1, trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-0, trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-1, trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-0, trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-1, trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-0, trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-1, trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-0, trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-1, trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-0, trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-1, trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-0, trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-1, trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-0, trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-1, trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-0, trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-1, trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-0, trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-1, trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-0, trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-1, trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-0,
Re: regarding run-simulator.sh
thanks Jun. On Wed, Dec 18, 2013 at 10:47 AM, Jun Rao jun...@gmail.com wrote: You can run kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh. Thanks, Jun On Tue, Dec 17, 2013 at 8:44 PM, pushkar priyadarshi priyadarshi.push...@gmail.com wrote: i am not able to find run-simulator.sh in 0.8 even after building perf.if this tool has been deprecated what are other alternatives available now for perf testing? Regards, Pushkar