RE: Logs and distributed systems

2013-12-17 Thread 陈小军
Thanks for sharing.

Best Regards
Jerry
-Original Message-
From: Jay Krepslt;jay.kr...@gmail.comgt; 
To: users@kafka.apache.orglt;users@kafka.apache.orggt;; 
lt;d...@samza.incubator.apache.orggt;; 
Cc: 
Sent: 2013-12-17 (星期二) 06:00:17
Subject: Logs and distributed systems

For anyone that's interested, I did a blog post on logs, stream processing,
and distributed systems:
http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
 


Kafka producer behavior

2013-12-17 Thread Hanish Bansal
Hi All,

We are having kafka cluster of 2 nodes. (using 0.8.0 final release)
Replication Factor: 2
Number of partitions: 2

I have created a topic test-topic1 in kafka.

When i am listing status of that topic using bin/kafka-list-topic.sh, the
status is:

topic: test-topic1partition: 0leader: 0   replicas: 0,1   isr:
0,1
topic: test-topic1partition: 1leader: 1   replicas: 1,0   isr:
1,0

As both partition are on two separate nodes so when we produce the data it
should be go to both nodes.

But when i insert the data, it is going to only one node.

For example if i insert 1000 messages then all 1000 messages will go either
node1 or node2. Data is not evenly distributed on both nodes.

Expected: 500 messages should go to node1 and 500 messages should go to
node2.

Any suggestion why i am facing this behavior?

-- 
*Thanks  Regards*
*Hanish Bansal*


Consumer Group Rebalance Issues

2013-12-17 Thread Drew Goya
Hey all,

I've recently been having problems with consumer groups rebalancing.  I'm
using several high level consumers which all belong to the same group.
 Occasionally one or two of them will get stuck in a rebalance loop.  They
attempt to rebalance, but the partitions they try to claim are owned.
 Anyone run into this?  Ideas?

I see errors in my zookeeper logs like:

2013-12-17 17:12:31,171 [myid:001] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when
processing sessionid:0x342e4febc180852 type:create cxid:0x1a9a
zxid:0x501390d4b txntype:-1 reqpath:n/a Error
Path:/kafka/consumers/trackingGroup/owners/Events2/25 Error:KeeperErrorCode
= NodeExists for /kafka/consumers/trackingGroup/owners/Events2/25

And errors in my kafka logs like:

2013-12-17 17:20:32 ZookeeperConsumerConnector [INFO]
[trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], begin
rebalancing consumer
trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 try #8

2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
[ConsumerFetcherManager-1387249530381] Stopping leader finder thread

2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
[ConsumerFetcherManager-1387249530381] Stopping all fetchers

2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
[ConsumerFetcherManager-1387249530381] All connections stopped

2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
[trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared all
relevant queues for this fetcher

2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
[trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared the
data chunks in all the consumer message iterators

2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
[trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Committing
all offsets after clearing the fetcher queues

2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
[trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Releasing
partition ownership

2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
[trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Consumer
trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 rebalancing the
following partitions: List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121,
122, 123, 124, 125, 126, 127) for topic Events2 with consumers:
List(trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-0,
trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-1,
trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-0,
trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-1,
trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-0,
trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-1,
trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-0,
trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-1,
trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-0,
trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-1,
trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-0,
trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-1,
trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-0,
trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-1,
trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-0,
trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-1,
trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-0,
trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-1,
trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-0,
trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-1,
trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-0,
trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-1,
trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-0,
trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-1,
trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-0,
trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-1,
trackingGroup_prod-storm-sup-trk013-1387249530700-b323ee53-0,
trackingGroup_prod-storm-sup-trk013-1387249530700-b323ee53-1,
trackingGroup_prod-storm-sup-trk014-1387249529916-e32e6363-0,
trackingGroup_prod-storm-sup-trk014-1387249529916-e32e6363-1,
trackingGroup_prod-storm-sup-trk015-1387249529709-d655ccd4-0,
trackingGroup_prod-storm-sup-trk015-1387249529709-d655ccd4-1,
trackingGroup_prod-storm-sup-trk016-1387249531064-bc8f8f3e-0,
trackingGroup_prod-storm-sup-trk016-1387249531064-bc8f8f3e-1,
trackingGroup_prod-storm-sup-trk017-1387249530635-35f505b7-0,

Re: Kafka producer behavior

2013-12-17 Thread Guozhang Wang
Hello,

This issue is known as in this JIRA:

https://issues.apache.org/jira/browse/KAFKA-1067

Guozhang


On Tue, Dec 17, 2013 at 8:48 AM, Gerrit Jansen van Vuuren 
gerrit...@gmail.com wrote:

 hi,

 I've had the same issue with the kafka producer.

 you need to use a different partitioner than the default one provided for
 kafka.
 I've created a round robin partitioner that works well for equally
 distributing data across partitions.


 https://github.com/gerritjvv/pseidon/blob/master/pseidon-kafka/java/pseidon/kafka/util/RoundRobinPartitioner.java





 On Tue, Dec 17, 2013 at 5:32 PM, Hanish Bansal 
 hanish.bansal.agar...@gmail.com wrote:

  Hi All,
 
  We are having kafka cluster of 2 nodes. (using 0.8.0 final release)
  Replication Factor: 2
  Number of partitions: 2
 
  I have created a topic test-topic1 in kafka.
 
  When i am listing status of that topic using bin/kafka-list-topic.sh, the
  status is:
 
  topic: test-topic1partition: 0leader: 0   replicas: 0,1
 isr:
  0,1
  topic: test-topic1partition: 1leader: 1   replicas: 1,0
 isr:
  1,0
 
  As both partition are on two separate nodes so when we produce the data
 it
  should be go to both nodes.
 
  But when i insert the data, it is going to only one node.
 
  For example if i insert 1000 messages then all 1000 messages will go
 either
  node1 or node2. Data is not evenly distributed on both nodes.
 
  Expected: 500 messages should go to node1 and 500 messages should go to
  node2.
 
  Any suggestion why i am facing this behavior?
 
  --
  *Thanks  Regards*
  *Hanish Bansal*
 




-- 
-- Guozhang


Re: redis versus zookeeper to track consumer offsets

2013-12-17 Thread Benjamin Black
ZK was designed from the start as a clustered, consistent, highly available
store for this sort of data and it works extremely well. Redis wasn't and I
don't know anyone using Redis in production, including me, who doesn't have
stories of Redis losing data. I'm sticking with ZK.


On Tue, Dec 17, 2013 at 10:57 AM, S Ahmed sahmed1...@gmail.com wrote:

 I am leaning towards using redis to track consumer offsets etc., but I see
 how using zookeeper makes sense since it already part of the kafka infra.

 One thing which bothers me is, how are you guys keeping track of the load
 on zookeeper?  How do you get an idea when your zookeeper cluster is
 underprovisioned?

 Redis is a richer store and could help in other areas where you want to
 store more than just status information or offsets, and setup and
 administration wise it seems a bit easier to manage.

 Thoughts?



Re: Connection Timeouts

2013-12-17 Thread Tom Amon
I'm on Kafka 0.8 final. Both brokers are up. The behavior is my producer
produces messages just fine, then it pauses for a few seconds. Then it
continues. The brokers are not stopping and starting. The broker logs show
that another producer/consumer has a connection error at the same time my
producer pauses. The exception at my producer (that pauses) indicates that
the connection was aborted, which to my understanding usually indicates
that the TCP connection was closed underneath it due to the broker end not
responding. It's as if the socket error indicated in the logs below for
10.236.67.30 (the other producer/consumer) causes the broker to hang long
enough for the TCP connection for my producer to timeout.


Thanks


-

What version of kafka are you on?



It seems like your producers are not seeing your broker(s).  can you
confirm brokers are up?



On Mon, Dec 16, 2013 at 7:52 PM, Tom Amon ta46...@gmail.com wrote:



 Hi All,



 I have a situation where one producer/consumer is causing timeout

 errors on the Kafka broker. The exception in the logs looks like this:



 [2013-12-16 17:32:25,992] ERROR Closing socket for /10.236.67.30

 because of error (kafka.network.Processor)

 java.io.IOException: Connection timed out

 at sun.nio.ch.FileDispatcher.read0(Native Method)

 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)

 at sun.nio.ch.IOUtil.read(IOUtil.java:175)

 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)

 at kafka.utils.Utils$.read(Utils.scala:395)

 at




kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)

 at kafka.network.Processor.read(SocketServer.scala:347)

 at kafka.network.Processor.run(SocketServer.scala:245)

 at java.lang.Thread.run(Thread.java:662)



 When this happens, _another separate_ producer hangs for about 2-5
seconds.



 In the producer log I get this exception:



 [2013-12-16 16:32:14,961] INFO Disconnecting from

 qa-hermes004.phx.qa.com:9092 (kafka.producer.SyncProducer)

 [2013-12-16 16:32:14,982] WARN Failed to send producer request with

 correlation id 3290 to broker 1 with data for partitions [ktr3,4]

 (kafka.producer.async.DefaultEventHandler)

 java.io.IOException: An established connection was aborted by the

 software in your host machine.

 at sun.nio.ch.SocketDispatcher.writev0(Native Method)

 at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:49)

 at sun.nio.ch.IOUtil.write(IOUtil.java:171)

 at sun.nio.ch.SocketChannelImpl.write0(SocketChannelImpl.java:377)

 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:400)

 at java.nio.channels.SocketChannel.write(SocketChannel.java:371)

 at


kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)

 at kafka.network.Send$class.writeCompletely(Transmission.scala:75)

 at




kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)

 at kafka.network.BlockingChannel.send(BlockingChannel.scala:92)

 at
kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72)

 at




kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)

 at




kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102)

 at




kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)

 at




kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)

 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)

 at




kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101)

 at

 kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)

 at

 kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)

 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)

 at kafka.producer.SyncProducer.send(SyncProducer.scala:100)

 at




kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:245)

 at




kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:107)

 at




kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:101)

 at


scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)

 at


scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)

 at scala.collection.Iterator$class.foreach(Iterator.scala:631)

 at

 

RE: a consumer question

2013-12-17 Thread Yu, Libo
Sorry, a typo. Correct my question. When consumer.timeout.ms is set to 0,  if 
there is no 
message available, hasNext() throws a timeout exception, otherwise it returns 
true.
Is that the right behavior? 

Regards,

Libo


-Original Message-
From: Jun Rao [mailto:jun...@gmail.com] 
Sent: Tuesday, December 17, 2013 12:40 AM
To: users@kafka.apache.org
Subject: Re: a consumer question

If there is a message, hasNext() returns true, not throwing an exception.

Thanks,

Jun


On Mon, Dec 16, 2013 at 11:29 AM, Yu, Libo libo...@citi.com wrote:

 Hi folks,

 For this parameters, if consumer.timeout.ms is set to 0, whenever I 
 call ConsumerIterator's hasNext(), if there is a message available, a 
 timeout exception will be thrown. Is my understanding correct? Thanks.

 consumer.timeout.ms

 -1

 Throw a timeout exception to the consumer if no message is available 
 for consumption after the specified interval



 Regards,

 Libo




Re: Migrating a cluster from 0.8.0 to 0.8.1

2013-12-17 Thread Neha Narkhede
There are no compatibility issues. You can roll upgrades through the
cluster one node at a time.

Thanks
Neha


On Tue, Dec 17, 2013 at 9:15 AM, Drew Goya d...@gradientx.com wrote:

 So I'm going to be going through the process of upgrading a cluster from
 0.8.0 to the trunk (0.8.1).

 I'm going to be expanding this cluster several times and the problems with
 reassigning partitions in 0.8.0 mean I have to move to trunk(0.8.1) asap.

 Will it be safe to roll upgrades through the cluster one by one?

 Also are there any client compatibility issues I need to worry about?  Am I
 going to need to pause/upgrade all my consumers/producers at once or can I
 roll upgrades through the cluster and then upgrade my clients one by one?

 Thanks in advance!



Re: redis versus zookeeper to track consumer offsets

2013-12-17 Thread S Ahmed
Interesting, wasn't aware of that.

Can you comment on how you go about monitoring your ZK cluster in terms of
throughput and if it is reaching its limits? Or is it even possible to do
this?


On Tue, Dec 17, 2013 at 2:01 PM, Benjamin Black b...@b3k.us wrote:

 ZK was designed from the start as a clustered, consistent, highly available
 store for this sort of data and it works extremely well. Redis wasn't and I
 don't know anyone using Redis in production, including me, who doesn't have
 stories of Redis losing data. I'm sticking with ZK.


 On Tue, Dec 17, 2013 at 10:57 AM, S Ahmed sahmed1...@gmail.com wrote:

  I am leaning towards using redis to track consumer offsets etc., but I
 see
  how using zookeeper makes sense since it already part of the kafka infra.
 
  One thing which bothers me is, how are you guys keeping track of the load
  on zookeeper?  How do you get an idea when your zookeeper cluster is
  underprovisioned?
 
  Redis is a richer store and could help in other areas where you want to
  store more than just status information or offsets, and setup and
  administration wise it seems a bit easier to manage.
 
  Thoughts?
 



Kafka 0.8.0 server-stop.sh does not stop broker

2013-12-17 Thread Bryan Baugher
Hi,

We have been trying out the kafka 0.8.0 beta1 for awhile and recently
attempted to upgrade to 0.8.0 but noticed that the stop server script
doesn't seem to stop the broker anymore. I noticed here[1] that a commit
was made before the release to change the signal sent to stop the broker
from SIGTERM to SIGINT. Changing this script back to using SIGTERM seems to
fix the issue for me. Has anyone else noticed this, is there a bug or
should I log one?

[1] -
https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0

-Bryan


Re: Connection Timeouts

2013-12-17 Thread Tom Amon
It appears that consumers that do not get messages regularly are timing out
every 30 seconds. This interval coincides with the default setting for 
socket.timeout.ms at the consumer. When the timeout happens it looks like
the broker socket hangs for a few seconds, causing all other connected
consumers and producers to hang along with it. Producers recover by
reconnecting, consumers do the same. The exception in the consumer log
(every 30 seconds) is below. Am I misreading something? What is supposed to
happen with consumers that don't regularly consume messages? Can a consumer
timeout affect the broker socket such that it causes it to hang long enough
for other connected producers to abort their connections?

[2013-12-17 00:00:25,107] INFO Reconnect due to socket error:
 (kafka.consumer.SimpleConsumer)
java.net.SocketTimeoutException
at java.io.InterruptedIOException.init(InterruptedIOException.java:43)
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:184)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:98)
at
java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:232)
at kafka.utils.Utils$.read(Utils.scala:374)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73)
at
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$sendRequest(SimpleConsumer.scala:71)
at
kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
at
kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
at
kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at
kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
at
kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
at
kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)

---

When you say it pauses, do you mean producing and consuming?  Can you get
metrics form before that is happening, during and after?



Could be gc pauses ... are you using this
http://kafka.apache.org/documentation.html#java or defaults?



/***

 Joe Stein

 Founder, Principal Consultant

 Big Data Open Source Security LLC

 http://www.stealth.ly

 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop

/





On Tue, Dec 17, 2013 at 2:01 PM, Tom Amon ta46...@gmail.com wrote:



 I'm on Kafka 0.8 final. Both brokers are up. The behavior is my

 producer produces messages just fine, then it pauses for a few

 seconds. Then it continues. The brokers are not stopping and starting.

 The broker logs show that another producer/consumer has a connection

 error at the same time my producer pauses. The exception at my

 producer (that pauses) indicates that the connection was aborted,

 which to my understanding usually indicates that the TCP connection

 was closed underneath it due to the broker end not responding. It's as

 if the socket error indicated in the logs below for 10.236.67.30

 (the other producer/consumer) causes the broker to hang long enough for
the TCP connection for my producer to timeout.





 Thanks





 --

 ---



 What version of kafka are you on?







 It seems like your producers are not seeing your broker(s).  can you

 confirm brokers are up?







 On Mon, Dec 16, 2013 at 7:52 PM, Tom Amon ta46...@gmail.com wrote:







  Hi All,



 



  I have a situation where one producer/consumer is causing timeout



  errors on the Kafka broker. The exception in the logs looks like this:



 



  [2013-12-16 17:32:25,992] ERROR Closing socket for /10.236.67.30



  because of error (kafka.network.Processor)



  java.io.IOException: Connection timed out



  at sun.nio.ch.FileDispatcher.read0(Native Method)



  at

  sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)



  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)



  at sun.nio.ch.IOUtil.read(IOUtil.java:175)



  at

  

Re: ClientUtils.fetchTopicMetadata reports smaller ISR than ZkUtils.getLeaderIsrAndEpochForPartition

2013-12-17 Thread Ryan Berdeen
Sorry it's taken so long to reply, the issue went away after I reassigned
partitions. Now it's back.

I haven't checked JMX, because the brokers and zookeeper have been
reporting the same ISR for several hours.

Some more details:

The cluster/topic has
  5 brokers (1, 4, 5, 7, 8)
  15 partitions (0...14)
  2 replicas

A single broker, 4, is the one missing from the ISR in every case. For
partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For
partitions where 4 is not the leader (4, 8, 12), it is not present in the
ISR. Here's the output of my tool, showing assignment and ISR:
https://gist.github.com/also/8012383#file-from-brokers-txt

I haven't seen anything interesting in the logs, but I'm not entirely sure
what to look for. The cluster is currently in this state, and if it goes
like last time, this will persist until I reassign partitions.

What can I do in the meantime to track down the issue?

Thanks,

Ryan

On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao jun...@gmail.com wrote:

 Do you see any ISR churns on the brokers? You can check the ISR
 expand/shrink rate jmx.

 Thanks,

 Jun


 On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen rberd...@hubspot.com wrote:

  I'm working on some monitoring tools for Kafka, and I've seen a couple of
  clusters get into a state where ClientUtils.fetchTopicMetadata will show
  that not all replicas are in the ISR.
 
  At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show that
  all all partitions are in the ISR, and
  the kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager
  MBean will report 0.
 
  What's going on? Is there something wrong with my controller, or should I
  not be paying attention to ClientUtils.fetchTopicMetadata?
 
  Thanks,
 
  Ryan
 



Re: redis versus zookeeper to track consumer offsets

2013-12-17 Thread Neha Narkhede
You can do this indirectly by monitoring the avg/max latency of operations
on zookeeper. There is no direct way of measuring the requests/sec to
zookeeper since they don't expose the relevant jmx metrics.

Thanks,
Neha


On Tue, Dec 17, 2013 at 11:13 AM, S Ahmed sahmed1...@gmail.com wrote:

 Interesting, wasn't aware of that.

 Can you comment on how you go about monitoring your ZK cluster in terms of
 throughput and if it is reaching its limits? Or is it even possible to do
 this?


 On Tue, Dec 17, 2013 at 2:01 PM, Benjamin Black b...@b3k.us wrote:

  ZK was designed from the start as a clustered, consistent, highly
 available
  store for this sort of data and it works extremely well. Redis wasn't
 and I
  don't know anyone using Redis in production, including me, who doesn't
 have
  stories of Redis losing data. I'm sticking with ZK.
 
 
  On Tue, Dec 17, 2013 at 10:57 AM, S Ahmed sahmed1...@gmail.com wrote:
 
   I am leaning towards using redis to track consumer offsets etc., but I
  see
   how using zookeeper makes sense since it already part of the kafka
 infra.
  
   One thing which bothers me is, how are you guys keeping track of the
 load
   on zookeeper?  How do you get an idea when your zookeeper cluster is
   underprovisioned?
  
   Redis is a richer store and could help in other areas where you want to
   store more than just status information or offsets, and setup and
   administration wise it seems a bit easier to manage.
  
   Thoughts?
  
 



Re: FW: Unable to start consumers in Tomcat

2013-12-17 Thread Joe Stein
That error comes from you calling create message stream twice or the
container you are running in is causing this to be called twice
https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/consumer/
ZookeeperConsumerConnector.scala#L133from , don't do that and you won't get
the error.



On Mon, Dec 16, 2013 at 2:25 PM, Seshadri, Balaji
balaji.sesha...@dish.comwrote:

 Any update on this guys ?.

 -Original Message-
 From: Seshadri, Balaji
 Sent: Saturday, December 14, 2013 4:22 PM
 To: users@kafka.apache.org
 Subject: RE: Unable to start consumers in Tomcat

 We are doing one scala consumer and one java consumer who listen on same
 topic with different group id.The issue is if I start both second one fails
 with this error.If we run just one its working fine.

 Attaching Scala code and Java code where we call createMessageStreams.It
 seems like issue with API not recognizing that its different group of
 consumers that are started because createMessageStream doesnot accept
 group.id but should refer already passed group.id to make sure second on
 is new stream and not related even we listen on same topic.

 Please let me know if it is API issue or way we handle API.


 -Original Message-
 From: Joe Stein [mailto:joe.st...@stealth.ly]
 Sent: Friday, December 13, 2013 4:36 PM
 To: users@kafka.apache.org
 Subject: Re: Unable to start consumers in Tomcat

 you do this at the consumer connector level not at the message stream level

 so one
 propsA.put(group.id, groupA) be
 val configA = new ConsumerConfig(propsA) val one = Consumer.create(configA)

 and another
 propsB.put(group.id, groupB) be
 val configB = new ConsumerConfig(propsB) val two = Consumer.create(configB)

 so you then manage each Consumer connector  created by each of those
 .create in separate threads (or actors or processes or however you are
 doing concurrency) and you only call the create message stream each once
 but each once on each of them

 makes sense?

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /


 On Fri, Dec 13, 2013 at 6:29 PM, Seshadri, Balaji
 balaji.sesha...@dish.comwrote:

  Cant we create message stream for same topic but different consumer
  group ?.
 
  -Original Message-
  From: Joe Stein [mailto:joe.st...@stealth.ly]
  Sent: Friday, December 13, 2013 4:23 PM
  To: users@kafka.apache.org
  Subject: Re: Unable to start consumers in Tomcat
 
  You should check your code to verify that it is only called once per
  instantiated consumer connector.  Here is where the exception is
  thrown
  https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/con
  sumer/ZookeeperConsumerConnector.scala#L133from
 
  /***
   Joe Stein
   Founder, Principal Consultant
   Big Data Open Source Security LLC
   http://www.stealth.ly
   Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
  /
 
 
  On Fri, Dec 13, 2013 at 6:17 PM, Seshadri, Balaji
  balaji.sesha...@dish.comwrote:
 
   We needed HTTP interface to start our consumers using REST interface
   for management that why we chose tomcat to run our consumers.
  
   We create streams only once.when we initially start consumer.
  
   -Original Message-
   From: Joe Stein [mailto:joe.st...@stealth.ly]
   Sent: Friday, December 13, 2013 4:12 PM
   To: users@kafka.apache.org
   Subject: Re: Unable to start consumers in Tomcat
  
   That error comes if you are trying to create a message/stream on a
   consumer instantiated object more than once.
  
   Why are you using tomcat for the consumers? Is it to see the results
   of messages? if so you need to isolate the Consumer in some way so
   there is a singleton (assuming one partition or if more than one
   partition some type of consumer pool) and the messages being
   consumed write to a thread safe memory structure and then the UI can
   read from that thread safe memory structure.
  
   /***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
   /
  
  
   On Fri, Dec 13, 2013 at 5:42 PM, Seshadri, Balaji
   balaji.sesha...@dish.comwrote:
  
0.8
   
-Original Message-
From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
Sent: Friday, December 13, 2013 3:33 PM
To: users@kafka.apache.org
Subject: Re: Unable to start consumers in Tomcat
   
Which version of kafka are you using?
   
   
On Fri, Dec 13, 2013 at 2:29 PM, Seshadri, Balaji
balaji.sesha...@dish.comwrote:
   
 Any idea on this error guys ?.

 -Original 

Re: a consumer question

2013-12-17 Thread Guozhang Wang
If there is no more messages, hasNext will return false instead of throwing
an exception.

Guozhang


On Tue, Dec 17, 2013 at 11:53 AM, Yu, Libo libo...@citi.com wrote:

 Sorry, a typo. Correct my question. When consumer.timeout.ms is set to 0,
  if there is no
 message available, hasNext() throws a timeout exception, otherwise it
 returns true.
 Is that the right behavior?

 Regards,

 Libo


 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Tuesday, December 17, 2013 12:40 AM
 To: users@kafka.apache.org
 Subject: Re: a consumer question

 If there is a message, hasNext() returns true, not throwing an exception.

 Thanks,

 Jun


 On Mon, Dec 16, 2013 at 11:29 AM, Yu, Libo libo...@citi.com wrote:

  Hi folks,
 
  For this parameters, if consumer.timeout.ms is set to 0, whenever I
  call ConsumerIterator's hasNext(), if there is a message available, a
  timeout exception will be thrown. Is my understanding correct? Thanks.
 
  consumer.timeout.ms
 
  -1
 
  Throw a timeout exception to the consumer if no message is available
  for consumption after the specified interval
 
 
 
  Regards,
 
  Libo
 
 




-- 
-- Guozhang


Re: Killing broker leader

2013-12-17 Thread Marc Labbe
It is worth mentioning you can reduce the likelyhood of loosing message by
running the controlled shutdown before killing the broker.

https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-Whatiscontrolledshutdown?

The connection refused is a bit surprising though. The consumer should
normally react by switching to the new leader. Are you getting this error
from the killed broker or the new leader?



On Tue, Dec 17, 2013 at 7:54 PM, Guozhang Wang wangg...@gmail.com wrote:

 Hello Francois,

 What is the producer ack value in your console producer? If it is equal to
 1 then when a leader is down it is possible to lose data, and hence not
 consumed by the consumer.

 Guozhang


 On Tue, Dec 17, 2013 at 2:39 PM, Francois Langelier 
 francois.langel...@mate1inc.com wrote:

  Hi,
 
  I installed zookeeper and kafka 8.0 following the quick start (
  https://kafka.apache.org/documentation.html#quickstart) and when i try
 to
  kill my leader, i got a lot of exception in my producer and consumer
  consoles.
 
  Then, after the exceptions stop printing, some of the messages I produce
 in
  my console don't print in my consumer console...
 
  The exception I get is java.net.ConnectException : Connection refused.
 
  Did someone already had this problem?
 
  Thank you!
 
  PS: I have 3 brokers running on my system.
 



 --
 -- Guozhang



regarding run-simulator.sh

2013-12-17 Thread pushkar priyadarshi
i am not able to find run-simulator.sh in 0.8 even after building perf.if
this tool has been deprecated what are other alternatives available now for
perf testing?

Regards,
Pushkar


Re: Consumer Group Rebalance Issues

2013-12-17 Thread Jun Rao
Have you looked at
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog
?

Thanks,

Jun


On Tue, Dec 17, 2013 at 9:24 AM, Drew Goya d...@gradientx.com wrote:

 Hey all,

 I've recently been having problems with consumer groups rebalancing.  I'm
 using several high level consumers which all belong to the same group.
  Occasionally one or two of them will get stuck in a rebalance loop.  They
 attempt to rebalance, but the partitions they try to claim are owned.
  Anyone run into this?  Ideas?

 I see errors in my zookeeper logs like:

 2013-12-17 17:12:31,171 [myid:001] - INFO  [ProcessThread(sid:1
 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when
 processing sessionid:0x342e4febc180852 type:create cxid:0x1a9a
 zxid:0x501390d4b txntype:-1 reqpath:n/a Error
 Path:/kafka/consumers/trackingGroup/owners/Events2/25 Error:KeeperErrorCode
 = NodeExists for /kafka/consumers/trackingGroup/owners/Events2/25

 And errors in my kafka logs like:

 2013-12-17 17:20:32 ZookeeperConsumerConnector [INFO]
 [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], begin
 rebalancing consumer
 trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 try #8

 2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
 [ConsumerFetcherManager-1387249530381] Stopping leader finder thread

 2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
 [ConsumerFetcherManager-1387249530381] Stopping all fetchers

 2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
 [ConsumerFetcherManager-1387249530381] All connections stopped

 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
 [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared all
 relevant queues for this fetcher

 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
 [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared the
 data chunks in all the consumer message iterators

 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
 [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Committing
 all offsets after clearing the fetcher queues

 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
 [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Releasing
 partition ownership

 2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
 [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Consumer
 trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 rebalancing the
 following partitions: List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121,
 122, 123, 124, 125, 126, 127) for topic Events2 with consumers:
 List(trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-0,
 trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-1,
 trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-0,
 trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-1,
 trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-0,
 trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-1,
 trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-0,
 trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-1,
 trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-0,
 trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-1,
 trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-0,
 trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-1,
 trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-0,
 trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-1,
 trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-0,
 trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-1,
 trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-0,
 trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-1,
 trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-0,
 trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-1,
 trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-0,
 trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-1,
 trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-0,
 trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-1,
 trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-0,
 trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-1,
 trackingGroup_prod-storm-sup-trk013-1387249530700-b323ee53-0,
 trackingGroup_prod-storm-sup-trk013-1387249530700-b323ee53-1,
 trackingGroup_prod-storm-sup-trk014-1387249529916-e32e6363-0,
 trackingGroup_prod-storm-sup-trk014-1387249529916-e32e6363-1,
 

Re: Kafka 0.8.0 server-stop.sh does not stop broker

2013-12-17 Thread Jun Rao
Which OS are you on?

Thanks,

Jun


On Tue, Dec 17, 2013 at 11:15 AM, Bryan Baugher bjb...@gmail.com wrote:

 Hi,

 We have been trying out the kafka 0.8.0 beta1 for awhile and recently
 attempted to upgrade to 0.8.0 but noticed that the stop server script
 doesn't seem to stop the broker anymore. I noticed here[1] that a commit
 was made before the release to change the signal sent to stop the broker
 from SIGTERM to SIGINT. Changing this script back to using SIGTERM seems to
 fix the issue for me. Has anyone else noticed this, is there a bug or
 should I log one?

 [1] -

 https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0

 -Bryan



Re: Connection Timeouts

2013-12-17 Thread Jun Rao
Did you change fetch.wait.max.ms in the consumer config? If so, did you
make sure that it is smaller than socket.timeout.ms? Also, if you look at
the request log, how long does it take to complete the timed out fetch
request?

Thanks,

Jun


On Tue, Dec 17, 2013 at 2:30 PM, Tom Amon ta46...@gmail.com wrote:

 It appears that consumers that do not get messages regularly are timing out
 every 30 seconds. This interval coincides with the default setting for 
 socket.timeout.ms at the consumer. When the timeout happens it looks like
 the broker socket hangs for a few seconds, causing all other connected
 consumers and producers to hang along with it. Producers recover by
 reconnecting, consumers do the same. The exception in the consumer log
 (every 30 seconds) is below. Am I misreading something? What is supposed to
 happen with consumers that don't regularly consume messages? Can a consumer
 timeout affect the broker socket such that it causes it to hang long enough
 for other connected producers to abort their connections?

 [2013-12-17 00:00:25,107] INFO Reconnect due to socket error:
  (kafka.consumer.SimpleConsumer)
 java.net.SocketTimeoutException
 at java.io.InterruptedIOException.init(InterruptedIOException.java:43)
 at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:184)
 at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:98)
 at
 java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:232)
 at kafka.utils.Utils$.read(Utils.scala:374)
 at

 kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
 at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
 at

 kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
 at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
 at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73)
 at

 kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$sendRequest(SimpleConsumer.scala:71)
 at

 kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
 at

 kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
 at

 kafka.consumer.SimpleConsumer$anonfun$fetch$1$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at

 kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
 at

 kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
 at

 kafka.consumer.SimpleConsumer$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
 at

 kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
 at
 kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)

 ---

 When you say it pauses, do you mean producing and consuming?  Can you get
 metrics form before that is happening, during and after?



 Could be gc pauses ... are you using this
 http://kafka.apache.org/documentation.html#java or defaults?



 /***

  Joe Stein

  Founder, Principal Consultant

  Big Data Open Source Security LLC

  http://www.stealth.ly

  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop

 /





 On Tue, Dec 17, 2013 at 2:01 PM, Tom Amon ta46...@gmail.com wrote:



  I'm on Kafka 0.8 final. Both brokers are up. The behavior is my

  producer produces messages just fine, then it pauses for a few

  seconds. Then it continues. The brokers are not stopping and starting.

  The broker logs show that another producer/consumer has a connection

  error at the same time my producer pauses. The exception at my

  producer (that pauses) indicates that the connection was aborted,

  which to my understanding usually indicates that the TCP connection

  was closed underneath it due to the broker end not responding. It's as

  if the socket error indicated in the logs below for 10.236.67.30

  (the other producer/consumer) causes the broker to hang long enough for
 the TCP connection for my producer to timeout.

 

 

  Thanks

 

 

  --

  ---

 

  What version of kafka are you on?

 

 

 

  It seems like your producers are not seeing your broker(s).  can you

  confirm brokers are up?

 

 

 

  On Mon, Dec 16, 2013 at 7:52 PM, Tom Amon ta46...@gmail.com wrote:

 

 

 

   Hi All,

 

  

 

   I have a situation where one producer/consumer is causing timeout

 

   errors on the Kafka broker. The exception in the logs looks like this:

 

  

 

   [2013-12-16 17:32:25,992] ERROR Closing socket for 

Re: a consumer question

2013-12-17 Thread Jun Rao
Actually, hasNext() only returns false when the consumer connector is
shutdown. Typically, you either set consumer.timeout.ms to -1 or a value
larger than 0. If it's set to 0, my guess is that it throws a timeout
exception immediately if there is no more message.

Thanks,

Jun


On Tue, Dec 17, 2013 at 4:57 PM, Guozhang Wang wangg...@gmail.com wrote:

 If there is no more messages, hasNext will return false instead of throwing
 an exception.

 Guozhang


 On Tue, Dec 17, 2013 at 11:53 AM, Yu, Libo libo...@citi.com wrote:

  Sorry, a typo. Correct my question. When consumer.timeout.ms is set to
 0,
   if there is no
  message available, hasNext() throws a timeout exception, otherwise it
  returns true.
  Is that the right behavior?
 
  Regards,
 
  Libo
 
 
  -Original Message-
  From: Jun Rao [mailto:jun...@gmail.com]
  Sent: Tuesday, December 17, 2013 12:40 AM
  To: users@kafka.apache.org
  Subject: Re: a consumer question
 
  If there is a message, hasNext() returns true, not throwing an exception.
 
  Thanks,
 
  Jun
 
 
  On Mon, Dec 16, 2013 at 11:29 AM, Yu, Libo libo...@citi.com wrote:
 
   Hi folks,
  
   For this parameters, if consumer.timeout.ms is set to 0, whenever I
   call ConsumerIterator's hasNext(), if there is a message available, a
   timeout exception will be thrown. Is my understanding correct? Thanks.
  
   consumer.timeout.ms
  
   -1
  
   Throw a timeout exception to the consumer if no message is available
   for consumption after the specified interval
  
  
  
   Regards,
  
   Libo
  
  
 



 --
 -- Guozhang



Re: ClientUtils.fetchTopicMetadata reports smaller ISR than ZkUtils.getLeaderIsrAndEpochForPartition

2013-12-17 Thread Jun Rao
If a broker never joins an ISR, it could be that the fetcher died
unexpectedly. Did you see any Error due to  in the log of broker 4?

Another thing to check is the max lag and the per partition lag in jmx.

Thanks,

Jun


On Tue, Dec 17, 2013 at 4:09 PM, Ryan Berdeen rberd...@hubspot.com wrote:

 Sorry it's taken so long to reply, the issue went away after I reassigned
 partitions. Now it's back.

 I haven't checked JMX, because the brokers and zookeeper have been
 reporting the same ISR for several hours.

 Some more details:

 The cluster/topic has
   5 brokers (1, 4, 5, 7, 8)
   15 partitions (0...14)
   2 replicas

 A single broker, 4, is the one missing from the ISR in every case. For
 partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For
 partitions where 4 is not the leader (4, 8, 12), it is not present in the
 ISR. Here's the output of my tool, showing assignment and ISR:
 https://gist.github.com/also/8012383#file-from-brokers-txt

 I haven't seen anything interesting in the logs, but I'm not entirely sure
 what to look for. The cluster is currently in this state, and if it goes
 like last time, this will persist until I reassign partitions.

 What can I do in the meantime to track down the issue?

 Thanks,

 Ryan

 On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao jun...@gmail.com wrote:

  Do you see any ISR churns on the brokers? You can check the ISR
  expand/shrink rate jmx.
 
  Thanks,
 
  Jun
 
 
  On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen rberd...@hubspot.com
 wrote:
 
   I'm working on some monitoring tools for Kafka, and I've seen a couple
 of
   clusters get into a state where ClientUtils.fetchTopicMetadata will
 show
   that not all replicas are in the ISR.
  
   At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show
 that
   all all partitions are in the ISR, and
   the
 kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager
   MBean will report 0.
  
   What's going on? Is there something wrong with my controller, or
 should I
   not be paying attention to ClientUtils.fetchTopicMetadata?
  
   Thanks,
  
   Ryan
  
 



Re: Killing broker leader

2013-12-17 Thread Jun Rao
What's the replication factor of the topic? Is it larger than 1? You can
find out using the list topic command.

Thanks,

Jun


On Tue, Dec 17, 2013 at 2:39 PM, Francois Langelier 
francois.langel...@mate1inc.com wrote:

 Hi,

 I installed zookeeper and kafka 8.0 following the quick start (
 https://kafka.apache.org/documentation.html#quickstart) and when i try to
 kill my leader, i got a lot of exception in my producer and consumer
 consoles.

 Then, after the exceptions stop printing, some of the messages I produce in
 my console don't print in my consumer console...

 The exception I get is java.net.ConnectException : Connection refused.

 Did someone already had this problem?

 Thank you!

 PS: I have 3 brokers running on my system.



Re: regarding run-simulator.sh

2013-12-17 Thread Jun Rao
You can run kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh.

Thanks,

Jun


On Tue, Dec 17, 2013 at 8:44 PM, pushkar priyadarshi 
priyadarshi.push...@gmail.com wrote:

 i am not able to find run-simulator.sh in 0.8 even after building perf.if
 this tool has been deprecated what are other alternatives available now for
 perf testing?

 Regards,
 Pushkar



Re: Kafka 0.8.0 server-stop.sh does not stop broker

2013-12-17 Thread Bryan Baugher
RHEL 6.4 64bit
Java 6u35


On Tue, Dec 17, 2013 at 10:57 PM, Jun Rao jun...@gmail.com wrote:

 Which OS are you on?

 Thanks,

 Jun


 On Tue, Dec 17, 2013 at 11:15 AM, Bryan Baugher bjb...@gmail.com wrote:

  Hi,
 
  We have been trying out the kafka 0.8.0 beta1 for awhile and recently
  attempted to upgrade to 0.8.0 but noticed that the stop server script
  doesn't seem to stop the broker anymore. I noticed here[1] that a commit
  was made before the release to change the signal sent to stop the broker
  from SIGTERM to SIGINT. Changing this script back to using SIGTERM seems
 to
  fix the issue for me. Has anyone else noticed this, is there a bug or
  should I log one?
 
  [1] -
 
 
 https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0
 
  -Bryan
 




-- 
-Bryan


Re: Consumer Group Rebalance Issues

2013-12-17 Thread Drew Goya
I explored that possibility but I'm not seeing any ZK session expirations
in the logs and it doesn't look like these rebalances complete.

They fail due to conflicts in the zookeeper data


On Tue, Dec 17, 2013 at 8:53 PM, Jun Rao jun...@gmail.com wrote:

 Have you looked at

 https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog
 ?

 Thanks,

 Jun


 On Tue, Dec 17, 2013 at 9:24 AM, Drew Goya d...@gradientx.com wrote:

  Hey all,
 
  I've recently been having problems with consumer groups rebalancing.  I'm
  using several high level consumers which all belong to the same group.
   Occasionally one or two of them will get stuck in a rebalance loop.
  They
  attempt to rebalance, but the partitions they try to claim are owned.
   Anyone run into this?  Ideas?
 
  I see errors in my zookeeper logs like:
 
  2013-12-17 17:12:31,171 [myid:001] - INFO  [ProcessThread(sid:1
  cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
 when
  processing sessionid:0x342e4febc180852 type:create cxid:0x1a9a
  zxid:0x501390d4b txntype:-1 reqpath:n/a Error
  Path:/kafka/consumers/trackingGroup/owners/Events2/25
 Error:KeeperErrorCode
  = NodeExists for /kafka/consumers/trackingGroup/owners/Events2/25
 
  And errors in my kafka logs like:
 
  2013-12-17 17:20:32 ZookeeperConsumerConnector [INFO]
  [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], begin
  rebalancing consumer
  trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 try #8
 
  2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
  [ConsumerFetcherManager-1387249530381] Stopping leader finder thread
 
  2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
  [ConsumerFetcherManager-1387249530381] Stopping all fetchers
 
  2013-12-17 17:20:33 ConsumerFetcherManager [INFO]
  [ConsumerFetcherManager-1387249530381] All connections stopped
 
  2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
  [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared all
  relevant queues for this fetcher
 
  2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
  [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Cleared the
  data chunks in all the consumer message iterators
 
  2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
  [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Committing
  all offsets after clearing the fetcher queues
 
  2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
  [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Releasing
  partition ownership
 
  2013-12-17 17:20:33 ZookeeperConsumerConnector [INFO]
  [trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306], Consumer
  trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306 rebalancing
 the
  following partitions: List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
  14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
 32,
  33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
 51,
  52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
 70,
  71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
 89,
  90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
 106,
  107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
 121,
  122, 123, 124, 125, 126, 127) for topic Events2 with consumers:
  List(trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-0,
  trackingGroup_prod-storm-sup-trk001-1387249529775-2a8484f1-1,
  trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-0,
  trackingGroup_prod-storm-sup-trk002-1387249530831-97c586ab-1,
  trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-0,
  trackingGroup_prod-storm-sup-trk003-1387249529739-f2de3dd9-1,
  trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-0,
  trackingGroup_prod-storm-sup-trk004-1387249530445-8f57ec5c-1,
  trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-0,
  trackingGroup_prod-storm-sup-trk005-1387249530451-d59c669a-1,
  trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-0,
  trackingGroup_prod-storm-sup-trk005-1387249530452-2b244683-1,
  trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-0,
  trackingGroup_prod-storm-sup-trk006-1387249530327-9d15c306-1,
  trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-0,
  trackingGroup_prod-storm-sup-trk007-1387249529436-fb79e4c8-1,
  trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-0,
  trackingGroup_prod-storm-sup-trk008-1387249526700-11ba655b-1,
  trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-0,
  trackingGroup_prod-storm-sup-trk009-1387249530020-cb36831c-1,
  trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-0,
  trackingGroup_prod-storm-sup-trk010-1387249529975-d43aff06-1,
  trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-0,
  trackingGroup_prod-storm-sup-trk011-1387249527684-479a04f9-1,
  trackingGroup_prod-storm-sup-trk012-1387249530208-155ecd68-0,

Re: regarding run-simulator.sh

2013-12-17 Thread pushkar priyadarshi
thanks Jun.


On Wed, Dec 18, 2013 at 10:47 AM, Jun Rao jun...@gmail.com wrote:

 You can run kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh.

 Thanks,

 Jun


 On Tue, Dec 17, 2013 at 8:44 PM, pushkar priyadarshi 
 priyadarshi.push...@gmail.com wrote:

  i am not able to find run-simulator.sh in 0.8 even after building perf.if
  this tool has been deprecated what are other alternatives available now
 for
  perf testing?
 
  Regards,
  Pushkar