RE: Duplicate records in Kafka 0.7

2014-01-10 Thread Xuyen On
Actually, most of the duplicates I was seeing was due to a bug in an old Hive 
version I'm using 0.9. 
But I am still seeing some, although fewer duplicates. Instead of 3-13% I'm now 
only seeing less than 1%. This appears to be the case for each of the batch 
messages for my consumer which is set to be 1,000,000 messages right now. Does 
that seem more reasonable?

-Original Message-
From: Joel Koshy [mailto:jjkosh...@gmail.com] 
Sent: Thursday, January 09, 2014 7:07 AM
To: users@kafka.apache.org
Subject: Re: Duplicate records in Kafka 0.7

You mean duplicate records on the consumer side? Duplicates are possible if 
there are consumer failures and a another consumer instance resumes from an 
earlier offset. It is also possible if there are producer retries due to 
exceptions while producing. Do you see any of these errors in your logs? 
Besides these scenarios though, you shouldn't be seeing duplicates.

Thanks,

Joel


On Wed, Jan 8, 2014 at 5:21 PM, Xuyen On  wrote:
> Hi,
>
> I would like to check to see if other people are seeing duplicate records 
> with Kafka 0.7. I read the Jira's and I believe that duplicates are still 
> possible when using message compression on Kafka 0.7. I'm seeing duplicate 
> records from the range of 6-13%. Is this normal?
>
> If you're using Kafka 0.7 with message compression enabled, can you please 
> let me know any duplicate records and if so, what %?
>
> Also, please let me know what sort of deduplication strategy you're using.
>
> Thanks!
>
>




Re: Kafka with Docker - producer disconnecting

2014-01-10 Thread Joe Stein
You might have to be more explicit in setting your host.name in 
server.properties of your brokers


/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


On Jan 10, 2014, at 1:30 PM, Alex Artigues  wrote:

> Hi everyone, I am attempting to run Zookeeper and Kafka in dockers.
> 
> Both startup normally and Kafka connects ok.  I think my containers are
> linked fine because I am able to list topics, and create topics.
> 
> The producer however never delivers messages.  It connects using the
> provided shell scripts, and it seems I am able to type messages, but I
> cannot ctl+c to stop it.  in the logs there is an INFO message that the
> client disconnects, but no good reason why.
> 
> This is in the ZK logs:
> 2014-01-10 17:55:39,893 - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x1437d4304fe, likely client has closed socket
> 
> 
> If anyone has any clues I would really appreciate it, thanks.


Re: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Rob Withers
That was an interesting section too.  Which GC settings would you suggest?

Thank you,
- charlie

> On Jan 10, 2014, at 10:11 PM, Jun Rao  wrote:
> 
> Have you looked at our FAQ, especially
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog
> ?
> 
> Thanks,
> 
> Jun
> 
> 
> On Fri, Jan 10, 2014 at 2:25 PM, Seshadri, Balaji
> wrote:
> 
>> Any clue would be helpful.
>> 
>> -Original Message-
>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>> Sent: Friday, January 10, 2014 12:46 PM
>> To: users@kafka.apache.org
>> Subject: RE: Looks like consumer fetchers get stopped we are not getting
>> any data
>> 
>> Yes rebalance begins and exceptions occurs.
>> 
>> 
>> {2014-01-10 00:58:11,293} INFO
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
>> (?:?) - [account-i
>> nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Cleared the
>> data chunks in all the consumer message iterators
>> {2014-01-10 00:58:11,293} INFO
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
>> (?:?) - [account-i
>> nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Committing
>> all offsets after clearing the fetcher queues
>> {2014-01-10 00:58:11,298} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:759) - Got ping
>> response for sessionid: 0x1437b2879870005 af ter 0ms
>> {2014-01-10 00:58:11,313} INFO
>> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
>> (?:?) -
>> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738],
>> begin rebalancing consumer
>> account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738 try #1
>> {2014-01-10 00:58:11,314} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
>> finished:false header:: 627,8  replyHeader:: 627,51539619966,0  request::
>> '/brokers/ids,F  response:: v{'1}
>> {2014-01-10 00:58:11,315} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
>> finished:false header:: 628,4  replyHeader:: 628,51539619966,0  request::
>> '/brokers/ids/1,F  response::
>> #7b2022686f7374223a22746d312d6b61666b6162726f6b6572313031222c20226a6d785f706f7274223a393939392c2022706f7274223a393039322c202276657273696f6e223a31207d,s{47244644685,47244644685,1388537628753,1388537628753,0,0,0,163056791896588316,74,0,47244644685}
>> {2014-01-10 00:58:11,316} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
>> finished:false header:: 629,4  replyHeader:: 629,51539619966,-101
>> request::
>> '/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738,F
>> response::
>> {2014-01-10 00:58:11,316} INFO
>> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
>> (?:?) -
>> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738],
>> exception during rebalance
>> org.I0Itec.zkclient.exception.ZkNoNodeException:
>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
>> NoNode for
>> /consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
>>at
>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>>at
>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>>at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>at kafka.utils.ZkUtils$.readData(Unknown Source)
>>at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
>>at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
>> Source)
>>at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
>> Source)
>>at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
>>at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
>> Source)
>>at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
>> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for
>> /consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
>>at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102

Re: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Jun Rao
Have you looked at our FAQ, especially
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog
?

Thanks,

Jun


On Fri, Jan 10, 2014 at 2:25 PM, Seshadri, Balaji
wrote:

> Any clue would be helpful.
>
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> Sent: Friday, January 10, 2014 12:46 PM
> To: users@kafka.apache.org
> Subject: RE: Looks like consumer fetchers get stopped we are not getting
> any data
>
> Yes rebalance begins and exceptions occurs.
>
>
> {2014-01-10 00:58:11,293} INFO
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
> (?:?) - [account-i
> nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Cleared the
> data chunks in all the consumer message iterators
> {2014-01-10 00:58:11,293} INFO
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
> (?:?) - [account-i
> nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Committing
> all offsets after clearing the fetcher queues
> {2014-01-10 00:58:11,298} DEBUG [catalina-exec-12-SendThread(
> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:759) - Got ping
> response for sessionid: 0x1437b2879870005 af ter 0ms
> {2014-01-10 00:58:11,313} INFO
>  
> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
> (?:?) -
> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738],
> begin rebalancing consumer
> account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738 try #1
> {2014-01-10 00:58:11,314} DEBUG [catalina-exec-12-SendThread(
> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
> finished:false header:: 627,8  replyHeader:: 627,51539619966,0  request::
> '/brokers/ids,F  response:: v{'1}
> {2014-01-10 00:58:11,315} DEBUG [catalina-exec-12-SendThread(
> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
> finished:false header:: 628,4  replyHeader:: 628,51539619966,0  request::
> '/brokers/ids/1,F  response::
> #7b2022686f7374223a22746d312d6b61666b6162726f6b6572313031222c20226a6d785f706f7274223a393939392c2022706f7274223a393039322c202276657273696f6e223a31207d,s{47244644685,47244644685,1388537628753,1388537628753,0,0,0,163056791896588316,74,0,47244644685}
> {2014-01-10 00:58:11,316} DEBUG [catalina-exec-12-SendThread(
> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
> finished:false header:: 629,4  replyHeader:: 629,51539619966,-101
>  request::
> '/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738,F
>  response::
> {2014-01-10 00:58:11,316} INFO
>  
> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
> (?:?) -
> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738],
> exception during rebalance
> org.I0Itec.zkclient.exception.ZkNoNodeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for
> /consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
> at
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> at kafka.utils.ZkUtils$.readData(Unknown Source)
> at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
> at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
> Source)
> at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
> Source)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
> at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
> Source)
> at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)

Re: understanding OffsetOutOfRangeException's....

2014-01-10 Thread Jun Rao
Do you think you can reproduce this easily?

Thanks,

Jun


On Fri, Jan 10, 2014 at 11:33 AM, Jason Rosenberg  wrote:

> well, not currently, as we don't have multiple partitions for the
> topics.but yes, I understand that would help too
>
> but, we are using this multiple consumers within a process approach in
> general with much success so far..just was curious about this ERROR I
> was seeing :)
>
>
> On Fri, Jan 10, 2014 at 11:06 AM, Jun Rao  wrote:
>
> > Could you increase parallelism on the consumers?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Jan 9, 2014 at 1:22 PM, Jason Rosenberg 
> wrote:
> >
> > > The consumption rate is a little better after the refactoring.  The
> main
> > > issue though, was that we had a mismatch between large and small
> topics.
> >  A
> > > large topic can lag, and adversely affect consumption of other topics,
> so
> > > this is an attempt to isolate topic filtering, and better balance the
> > > consumers for the different topics.
> > >
> > > So, it's definitely working on that score.
> > >
> > > The topic that was lagging (and getting OffsetOutOfRangeExceptions) was
> > > doing that before and after the refactor (and after we started also
> > seeing
> > > the ERROR logging).  But consumption of all other topics is working
> > better
> > > now (almost no lag at all).
> > >
> > > I'm also setting the client.id for each consumer in the process, so
> > that I
> > > can see the individual metrics per consumer.
> > >
> > > Jason
> > >
> > >
> > > On Thu, Jan 9, 2014 at 1:00 PM, Jun Rao  wrote:
> > >
> > > > Does the consumption rate in the client (msg/sec) change
> significantly
> > > > after the refactoring?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > Yes, it's happening continuously, at the moment (although I'm
> > expecting
> > > > the
> > > > > consumer to catch up soon)
> > > > >
> > > > > It seemed to start happening after I refactored the consumer app to
> > use
> > > > > multiple consumer connectors in the same process (each one has a
> > > separate
> > > > > topic filter, so should be no overlap between them).  All using the
> > > same
> > > > > consumer group.
> > > > >
> > > > > Could it be a thread safety issue in the ZookeeperConsumerConnector
> > > > (seems
> > > > > unlikely).
> > > > >
> > > > > Jason
> > > > >
> > > > >
> > > > > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
> > > > >
> > > > > > Normally, if the consumer can't keep up, you should just see the
> > > > > > OffsetOutOfRangeException warning. The offset mismatch error
> should
> > > > never
> > > > > > happen. It could be that OffsetOutOfRangeException exposed a bug.
> > Do
> > > > you
> > > > > > think you can reproduce this easily?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg <
> j...@squareup.com>
> > > > > wrote:
> > > > > >
> > > > > > > Jun,
> > > > > > >
> > > > > > > I'm not sure I understand your question, wrt produced data?
> > > > > > >
> > > > > > > But yes, in general, I believe the consumer is not keeping up
> > with
> > > > the
> > > > > > > broker's deleting the data.  So it's trying to fetch the next
> > batch
> > > > of
> > > > > > > data, but it's last offset is no longer there, etc.  So that's
> > the
> > > > > reason
> > > > > > > for the WARN message, in the fetcher thread.
> > > > > > >
> > > > > > > I'm just not sure I understand then why we don't always see the
> > > > > > > ConsumerIterator error also, because won't there always be
> > missing
> > > > data
> > > > > > > detected there?  Why sometimes and not always?  What's the
> > > > difference?
> > > > > > >
> > > > > > > Jason
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao 
> > wrote:
> > > > > > >
> > > > > > > > The WARN and ERROR may not be completely correlated. Could it
> > be
> > > > that
> > > > > > the
> > > > > > > > consumer is slow and couldn't keep up with the produced data?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <
> > > j...@squareup.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > So, sometimes I just get the WARN from the
> > > ConsumerFetcherThread
> > > > > (as
> > > > > > > > > previously noted, above), e.g.:
> > > > > > > > >
> > > > > > > > > 2014-01-08 02:31:47,394  WARN
> > > > > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > > > > consumer.ConsumerFetcherThread -
> > > > > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > > > > 16163904970
> > > > > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > > > > 16175326044
> > > > > > > > >
> > > > > > > > > More recently, I see these in the following log line (not
> > sure
> > > > why
> > > > > I

Re: Topic not created if number of live brokers less than # replicas

2014-01-10 Thread Jun Rao
Hanish,

Currently, we don't have plans to fix this issue in 0.8.1 since other
supports such as deleting topics are probably more important. Could you
work on this by pre-creating the topic when all brokers are up?

Thanks,

Jun


On Fri, Jan 10, 2014 at 8:56 AM, Hanish Bansal <
hanish.bansal.agar...@gmail.com> wrote:

> Hi All,
>
> As kafka is known behavior is : number of live brokers can not be less than
> # replicas when creating  a new topic.
>
> I raised a jira (https://issues.apache.org/jira/browse/KAFKA-1182)
> regarding this for improvement so that topic should be created so that in
> case of n replication factor we can support n-1 failure support.
>
> As mentioned on Future Release Plan of
> Kafka<
> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan>,
> kafka-0.8.1 is going to be released this month. Is 1182 is being considered
> for this next release ?
>
> I just want to know is there any plan to resolve this in incoming releases
> ?
>
> Thanks in advance !!
>
>
> --
> *Regards*
> *Hanish Bansal*
>


Spring Integration Kafka support

2014-01-10 Thread Premchandra, Preetham Kukillaya
Hi,
I was doing a poc using 
https://github.com/SpringSource/spring-integration-extensions/tree/master/spring-integration-kafka.
 I figured that code is expecting the brokerid=0 and ideally this will not be 
the case if multiple brokers are connecting to the same zookeeper.

Regards
Preetham

This email and any files transmitted with it are confidential, proprietary and 
intended solely for the individual or entity to whom they are addressed. If you 
have received this email in error please delete it immediately.


Kafka with Docker - producer disconnecting

2014-01-10 Thread Alex Artigues
Hi everyone, I am attempting to run Zookeeper and Kafka in dockers.

Both startup normally and Kafka connects ok.  I think my containers are
linked fine because I am able to list topics, and create topics.

The producer however never delivers messages.  It connects using the
provided shell scripts, and it seems I am able to type messages, but I
cannot ctl+c to stop it.  in the logs there is an INFO message that the
client disconnects, but no good reason why.

This is in the ZK logs:
2014-01-10 17:55:39,893 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x1437d4304fe, likely client has closed socket


If anyone has any clues I would really appreciate it, thanks.


Re: Consumers can't connect while broker is under load

2014-01-10 Thread Guozhang Wang
Can you post the consumer logs for the long-time-to-connect scenarios?

Guozhang


On Fri, Jan 10, 2014 at 1:20 PM, Tim Kellogg  wrote:

> Hi,
>
> I have a cluster of 12 brokers receiving 10,000 msg/s from producers where
> each message is roughly 2.5KB. We also have 12 ZooKeepers and everything is
> on AWS. Under these conditions, top (the Linux utility) reports around
> 10-15 out of 32 for system load, so we’re at less than half capacity.
>
> When under this load consumers take a very long time, often more than 30
> minutes, to connect to the brokers. When under no load they connect
> immediately. Why is this happening?
>
> Thanks,
>
> Tim Kellogg
> Sr. Software Engineer, Protocols
> 2lemetry
> @kellogh




-- 
-- Guozhang


Re: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Guozhang Wang
>From the logs it seems the consumer 562b6738's registry node in Zookeeper
has lost:

NoNode for
/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738

As Joel suggested for now you may just stop all your consumers and restart,
to debug you may need to investigate into zookeeper's log checking is there
any session expiration or close socket events happened which cause ZK to
delete the registry node.

Which Kafka version are you using?

Guozhang


On Fri, Jan 10, 2014 at 3:36 PM, Joel Koshy  wrote:

> If a consumer rebalances for any reason (e.g., if a consumer
> in the group has a soft failure such as a long GC) then the fetchers
> are stopped as part of the rebalance process. The sequence is as
> follows:
>
> - Stop fetchers
> - Commit offsets
> - Release partition ownership
> - Rebalance (i.e., figure out what partitions this consumer should now
>   consume with the updated set of consumers)
> - Acquire partition ownership
> - Add fetchers to those partitions and resume consumption
>
> i.e., rebalances should complete successfully and fetching should
> resume. If you have any rebalance failures (search for "can't
> rebalance after") then the consumer will effectively stop.
>
> From later in this thread it seems your consumer somehow got into a
> weird state in zookeeper, so your only recourse at this point may be
> to stop all your consumers and restart.
>
> Thanks,
>
> Joel
>
> > If the fetchers get shutdown, due to a ClosedByInterruptException in the
> "leader_finder" thread, which tells the "executor_watcher" thread to
> shutdown the fetchers, that would be another reason the consumers stop
> processing data.  Is this possible?
> >
>
> >
> > -Original Message-
> > From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> > Sent: Friday, January 10, 2014 11:40 AM
> > To: users@kafka.apache.org
> > Subject: RE: Looks like consumer fetchers get stopped we are not getting
> any data
> >
> > It would be helpful if you guys can shed some light why all fetchers are
> getting stopped.
> >
> > -Original Message-
> > From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> > Sent: Friday, January 10, 2014 11:28 AM
> > To: users@kafka.apache.org
> > Subject: RE: Looks like consumer fetchers get stopped we are not getting
> any data
> >
> > We also got the below error when this happens.
> >
> > {2014-01-10 00:58:11,292} INFO
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
> (?:?) -
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b],
> exception during rebalance
> > org.I0Itec.zkclient.exception.ZkNoNodeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
> > at
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> > at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> > at kafka.utils.ZkUtils$.readData(Unknown Source)
> > at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
> > at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
> Source)
> > at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
> Source)
> > at
> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
> > at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
> Source)
> > at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
> > at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> > at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
> > at
> org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> > at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> > at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> > at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> > ... 9 more
> >
> > -Original Message-
> > From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> > Sent: Friday, January 10, 2014 10:52 

Consumers can't connect while broker is under load

2014-01-10 Thread Tim Kellogg
Hi,

I have a cluster of 12 brokers receiving 10,000 msg/s from producers where each 
message is roughly 2.5KB. We also have 12 ZooKeepers and everything is on AWS. 
Under these conditions, top (the Linux utility) reports around 10-15 out of 32 
for system load, so we’re at less than half capacity. 

When under this load consumers take a very long time, often more than 30 
minutes, to connect to the brokers. When under no load they connect 
immediately. Why is this happening?

Thanks,

Tim Kellogg
Sr. Software Engineer, Protocols
2lemetry
@kellogh

Re: Mirroring datacenters without vpn

2014-01-10 Thread Joel Koshy
> 
> Ops proposed to set up mirror to work over open internet channel without
> secured vpn. Security of this particular data is not a concern and, as I
> understood, it will give us more bandwidth (unless we buy some extra
> hardware, lot's of internal details there).
> 
> Is this configuration possible at all? Have anyone tried/using such
> configuration? I'd appreciate any feedback.
> 
> Major source of confusion is how MirrorMaker/other producers would handle
> external names for the brokers. As I understand, producer connects to the
> broker in the configuration only to bootstrap (get list of all available
> brokers), and after that talks to the brokers received during
> bootstrapping. So local clients won't work (or will route to external
> interface) if I configure brokers to use external names. Remote clients
> won't work if internal names configured.
> Is there some reasonable way to configure kafka to support such scenario?

Would this feature help in your case:
https://issues.apache.org/jira/browse/KAFKA-1092
i.e., you can configure the broker to publish a separate hostname to
zookeeper which is what the producers should use when actually sending
data. So you would need to override the advertised.host.name and port
properties.

> 
> Also, should I run MirrorMaker in the same DC as central kafka cluster or
> multiple MirrorMakers in remote DCs?
> 
> Any description of how it is setup in your case is helpful. Do you use vpn
> between DCs? Where do you run MirrorMaker - in central dc or in remote and
> why?

We generally run the mirror-maker in the target data center. i.e., we
do a remote consume but local produce. If you have a flaky connection
between the two clusters the consumers may encounter hit session
expirations and rebalance and reduce the overall throughput. You can
also do local consumption and remote produce although we have not
tried that. In either case you will need to set a high socket buffer
to help amortize the high network latencies.

Thanks,

Joel



Re: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Joel Koshy
If a consumer rebalances for any reason (e.g., if a consumer
in the group has a soft failure such as a long GC) then the fetchers
are stopped as part of the rebalance process. The sequence is as
follows:

- Stop fetchers
- Commit offsets
- Release partition ownership
- Rebalance (i.e., figure out what partitions this consumer should now
  consume with the updated set of consumers)
- Acquire partition ownership
- Add fetchers to those partitions and resume consumption

i.e., rebalances should complete successfully and fetching should
resume. If you have any rebalance failures (search for "can't
rebalance after") then the consumer will effectively stop.

>From later in this thread it seems your consumer somehow got into a
weird state in zookeeper, so your only recourse at this point may be
to stop all your consumers and restart.

Thanks,

Joel

> If the fetchers get shutdown, due to a ClosedByInterruptException in the 
> "leader_finder" thread, which tells the "executor_watcher" thread to shutdown 
> the fetchers, that would be another reason the consumers stop processing 
> data.  Is this possible?
> 

> 
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com] 
> Sent: Friday, January 10, 2014 11:40 AM
> To: users@kafka.apache.org
> Subject: RE: Looks like consumer fetchers get stopped we are not getting any 
> data
> 
> It would be helpful if you guys can shed some light why all fetchers are 
> getting stopped.
> 
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com] 
> Sent: Friday, January 10, 2014 11:28 AM
> To: users@kafka.apache.org
> Subject: RE: Looks like consumer fetchers get stopped we are not getting any 
> data
> 
> We also got the below error when this happens.
> 
> {2014-01-10 00:58:11,292} INFO  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
>  (?:?) - 
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], 
> exception during rebalance
> org.I0Itec.zkclient.exception.ZkNoNodeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
> at 
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> at kafka.utils.ZkUtils$.readData(Unknown Source)
> at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
> at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
>  Source)
> at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
>  Source)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
> at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
>  Source)
> at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
>  Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for 
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
> at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> ... 9 more
> 
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> Sent: Friday, January 10, 2014 10:52 AM
> To: users@kafka.apache.org
> Subject: Looks like consumer fetchers get stopped we are not getting any data
> 
> Please let us know why we are not getting any data from Kafaka after this log 
> from Kafka,can you guys lets us know.
> 
> What could be causing all fetchers associated to be stooped why it is not 
> doing retry.
> 
> {2014-01-10 00:58:09,284} WARN  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
>  (?:?) - Fetching topic metadata with correlation id 3 for topics 
> [Set(account-info-updated)] from broker 
> [id:1,host:tm1-kafkabroker101,port:9092] failed 
> java.nio.channels.ClosedByInterruptException
> 

RE: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Seshadri, Balaji
Any clue would be helpful.

-Original Message-
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
Sent: Friday, January 10, 2014 12:46 PM
To: users@kafka.apache.org
Subject: RE: Looks like consumer fetchers get stopped we are not getting any 
data

Yes rebalance begins and exceptions occurs.


{2014-01-10 00:58:11,293} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - [account-i 
nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Cleared the 
data chunks in all the consumer message iterators
{2014-01-10 00:58:11,293} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - [account-i 
nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Committing all 
offsets after clearing the fetcher queues
{2014-01-10 00:58:11,298} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:759) - Got ping response for sessionid: 0x1437b2879870005 af 
ter 0ms
{2014-01-10 00:58:11,313} INFO  
[account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
 (?:?) - [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738], 
begin rebalancing consumer 
account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738 try #1
{2014-01-10 00:58:11,314} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 627,8  replyHeader:: 
627,51539619966,0  request:: '/brokers/ids,F  response:: v{'1}
{2014-01-10 00:58:11,315} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 628,4  replyHeader:: 
628,51539619966,0  request:: '/brokers/ids/1,F  response:: 
#7b2022686f7374223a22746d312d6b61666b6162726f6b6572313031222c20226a6d785f706f7274223a393939392c2022706f7274223a393039322c202276657273696f6e223a31207d,s{47244644685,47244644685,1388537628753,1388537628753,0,0,0,163056791896588316,74,0,47244644685}
{2014-01-10 00:58:11,316} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 629,4  replyHeader:: 
629,51539619966,-101  request:: 
'/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738,F
  response::
{2014-01-10 00:58:11,316} INFO  
[account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
 (?:?) - [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738], 
exception during rebalance
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
at kafka.utils.ZkUtils$.readData(Unknown Source)
at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
 Source)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
 Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for 
/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
:
-Original Message-
From: Guozhang Wang [mailto:wangg...@gmail.com]
Sent: Friday, January 10, 2014 12:30 PM
To: users@kafka.apache.org
Subject: Re: Looks like consumer fetchers get stopped we are not getting any 
data

Actually, the broken channel is broken by shuttin

Mirroring datacenters without vpn

2014-01-10 Thread Andrey Yegorov
Hi,

I am trying to figure out best deployment plan and configuration with ops
to ship new version of our system that will use kafka. Multiple
geo-distributed datacenters are a given, and we are planning to build
central DC to aggregate the data.

Ops proposed to set up mirror to work over open internet channel without
secured vpn. Security of this particular data is not a concern and, as I
understood, it will give us more bandwidth (unless we buy some extra
hardware, lot's of internal details there).

Is this configuration possible at all? Have anyone tried/using such
configuration? I'd appreciate any feedback.

Major source of confusion is how MirrorMaker/other producers would handle
external names for the brokers. As I understand, producer connects to the
broker in the configuration only to bootstrap (get list of all available
brokers), and after that talks to the brokers received during
bootstrapping. So local clients won't work (or will route to external
interface) if I configure brokers to use external names. Remote clients
won't work if internal names configured.
Is there some reasonable way to configure kafka to support such scenario?
So far I only tried opening ssh tunnel from devbox to remote machine and
configuring local producer to talk to localhost, it failed as described
above.


Also, should I run MirrorMaker in the same DC as central kafka cluster or
multiple MirrorMakers in remote DCs?

Any description of how it is setup in your case is helpful. Do you use vpn
between DCs? Where do you run MirrorMaker - in central dc or in remote and
why?

A lot of question, thank you beforehand for your answers.

--
Andrey Yegorov


Re: Velocity on local machine

2014-01-10 Thread Neha Narkhede
What version of Kafka are you benchmarking?


On Fri, Jan 10, 2014 at 8:36 AM, Klaus Schaefers <
klaus.schaef...@ligatus.com> wrote:

> Hi,
>
> during my test cpu load is quite low, roughly 50 percent, sometimes peask
> to 70%.
>
> >Are you using the sync producer per chance?
> I enforced now the async and I got a huge improvement in one of my test
> cases. let my explore the rest a little bit more.
>
> Cheers,
>
> Klaus
>
>
>
>
> On Fri, Jan 10, 2014 at 2:36 PM, Magnus Edenhill 
> wrote:
>
> > 2k msgs/s is silly, unless your messages are 10MB each, so something is
> > clearly wrong.
> >
> > * What is the CPU usage and IO load when running your performance tests?
> > * Are you using the sync producer per chance? Maybe combined with an
> > aggressive log.flush.interval?
> > * For reference, and to find out where the bottleneck is, try running
> > rdkafka_performance [1] according to the previous link I posted:
> >   if performance does not increase then the broker is the problem,
> > otherwise it is the producer/consumers.
> >
> > Regards,
> > Magnus
> >
> > [1]: https://github.com/edenhill/librdkafka/tree/master/examples
> >
> >
> >
> > 2014/1/10 Klaus Schaefers 
> >
> > > Hi,
> > >
> > > I have close to 2k messages per second. My machine is just a (BG 4-core
> > i5
> > > but I would expect more messages. I ran Kafka in the default settings.
> > >
> > >
> > > On Fri, Jan 10, 2014 at 12:31 PM, Magnus Edenhill  > > >wrote:
> > >
> > > > What performance numbers did you see?
> > > >
> > > > For reference you can check the following tests that were also run on
> > the
> > > > same machine as the broker:
> > > >
> > > >
> > >
> >
> https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance-numbers
> > > >
> > > > Do they correspond to your numbers?
> > > >
> > > > Consumer thruput is not included in that document but typically peaks
> > at
> > > > around 3 million msgs/s when run on the same host and with hot disk
> > > caches.
> > > >
> > > > Regards,
> > > > Magnus
> > > >
> > > >
> > > > 2014/1/10 Klaus Schaefers 
> > > >
> > > > > Hi,
> > > > >
> > > > > I am currently benchmarking Kafka against ActiveMQ and I got some
> > > results
> > > > > the surprised my quite a lot. ActiveMQ managed to deliver 4x more
> > > > messages
> > > > > when running locally. But from all what I was reading I am a little
> > bit
> > > > > surprised. Honestly I expected Kafka to outperform ActiveMQ. Some I
> > am
> > > > left
> > > > > with the feeling that I configured some wrong or did any kind of
> > other
> > > > > mistake.
> > > > >
> > > > > My setup looks like this:
> > > > >
> > > > > - One local broker
> > > > > - 10 Topic / Queues
> > > > > - 1 producer that dispatches messages randomly to the topics
> > > > > - 1 consumer per topic
> > > > >
> > > > > I basically used the example on the Kafka web page.
> > > > >
> > > > > Also I encountered some issues when increasing the number of
> topics,
> > > lets
> > > > > say 100. In this case the consumer cannot connect the Zookeeper...
> > > > >
> > > > > Does anybody has an idea how to improve the performance?
> > > > >
> > > > >
> > > > > Thx,
> > > > >
> > > > > Klaus
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > --
> > > > >
> > > > > Klaus Schaefers
> > > > > Senior Optimization Manager
> > > > >
> > > > > Ligatus GmbH
> > > > > Hohenstaufenring 30-32
> > > > > D-50674 Köln
> > > > >
> > > > > Tel.:  +49 (0) 221 / 56939 -784
> > > > > Fax:  +49 (0) 221 / 56 939 - 599
> > > > > E-Mail: klaus.schaef...@ligatus.com
> > > > > Web: www.ligatus.de
> > > > >
> > > > > HRB Köln 56003
> > > > > Geschäftsführung:
> > > > > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > > > > Dipl.-Wirtschaftsingenieur Arne Wolter
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > --
> > >
> > > Klaus Schaefers
> > > Senior Optimization Manager
> > >
> > > Ligatus GmbH
> > > Hohenstaufenring 30-32
> > > D-50674 Köln
> > >
> > > Tel.:  +49 (0) 221 / 56939 -784
> > > Fax:  +49 (0) 221 / 56 939 - 599
> > > E-Mail: klaus.schaef...@ligatus.com
> > > Web: www.ligatus.de
> > >
> > > HRB Köln 56003
> > > Geschäftsführung:
> > > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > > Dipl.-Wirtschaftsingenieur Arne Wolter
> > >
> >
>
>
>
> --
>
> --
>
> Klaus Schaefers
> Senior Optimization Manager
>
> Ligatus GmbH
> Hohenstaufenring 30-32
> D-50674 Köln
>
> Tel.:  +49 (0) 221 / 56939 -784
> Fax:  +49 (0) 221 / 56 939 - 599
> E-Mail: klaus.schaef...@ligatus.com
> Web: www.ligatus.de
>
> HRB Köln 56003
> Geschäftsführung:
> Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> Dipl.-Wirtschaftsingenieur Arne Wolter
>


RE: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Seshadri, Balaji
Yes rebalance begins and exceptions occurs.


{2014-01-10 00:58:11,293} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - [account-i
nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Cleared the 
data chunks in all the consumer message iterators
{2014-01-10 00:58:11,293} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - [account-i
nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Committing all 
offsets after clearing the fetcher queues
{2014-01-10 00:58:11,298} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:759) - Got ping response
for sessionid: 0x1437b2879870005 af
ter 0ms
{2014-01-10 00:58:11,313} INFO  
[account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
 (?:?) - [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738], 
begin rebalancing consumer 
account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738 try #1
{2014-01-10 00:58:11,314} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 627,8  replyHeader:: 
627,51539619966,0  request:: '/brokers/ids,F  response:: v{'1}
{2014-01-10 00:58:11,315} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 628,4  replyHeader:: 
628,51539619966,0  request:: '/brokers/ids/1,F  response:: 
#7b2022686f7374223a22746d312d6b61666b6162726f6b6572313031222c20226a6d785f706f7274223a393939392c2022706f7274223a393039322c202276657273696f6e223a31207d,s{47244644685,47244644685,1388537628753,1388537628753,0,0,0,163056791896588316,74,0,47244644685}
{2014-01-10 00:58:11,316} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 629,4  replyHeader:: 
629,51539619966,-101  request:: 
'/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738,F
  response::
{2014-01-10 00:58:11,316} INFO  
[account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
 (?:?) - [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738], 
exception during rebalance
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
at kafka.utils.ZkUtils$.readData(Unknown Source)
at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
 Source)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
 Source)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for 
/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
:
-Original Message-
From: Guozhang Wang [mailto:wangg...@gmail.com]
Sent: Friday, January 10, 2014 12:30 PM
To: users@kafka.apache.org
Subject: Re: Looks like consumer fetchers get stopped we are not getting any 
data

Actually, the broken channel is broken by shutting down the 
leader-finder-thread, which is shutdown either by a rebalance retry or shutting 
down the consumer.

Do you see "begin rebalance ..." before this log entry? And if yes, search to 
see if the rebalance keep failing.

Guozhang


On Fri, Jan 10, 2014 at 11:

Re: understanding OffsetOutOfRangeException's....

2014-01-10 Thread Jason Rosenberg
well, not currently, as we don't have multiple partitions for the
topics.but yes, I understand that would help too

but, we are using this multiple consumers within a process approach in
general with much success so far..just was curious about this ERROR I
was seeing :)


On Fri, Jan 10, 2014 at 11:06 AM, Jun Rao  wrote:

> Could you increase parallelism on the consumers?
>
> Thanks,
>
> Jun
>
>
> On Thu, Jan 9, 2014 at 1:22 PM, Jason Rosenberg  wrote:
>
> > The consumption rate is a little better after the refactoring.  The main
> > issue though, was that we had a mismatch between large and small topics.
>  A
> > large topic can lag, and adversely affect consumption of other topics, so
> > this is an attempt to isolate topic filtering, and better balance the
> > consumers for the different topics.
> >
> > So, it's definitely working on that score.
> >
> > The topic that was lagging (and getting OffsetOutOfRangeExceptions) was
> > doing that before and after the refactor (and after we started also
> seeing
> > the ERROR logging).  But consumption of all other topics is working
> better
> > now (almost no lag at all).
> >
> > I'm also setting the client.id for each consumer in the process, so
> that I
> > can see the individual metrics per consumer.
> >
> > Jason
> >
> >
> > On Thu, Jan 9, 2014 at 1:00 PM, Jun Rao  wrote:
> >
> > > Does the consumption rate in the client (msg/sec) change significantly
> > > after the refactoring?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg 
> > wrote:
> > >
> > > > Yes, it's happening continuously, at the moment (although I'm
> expecting
> > > the
> > > > consumer to catch up soon)
> > > >
> > > > It seemed to start happening after I refactored the consumer app to
> use
> > > > multiple consumer connectors in the same process (each one has a
> > separate
> > > > topic filter, so should be no overlap between them).  All using the
> > same
> > > > consumer group.
> > > >
> > > > Could it be a thread safety issue in the ZookeeperConsumerConnector
> > > (seems
> > > > unlikely).
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
> > > >
> > > > > Normally, if the consumer can't keep up, you should just see the
> > > > > OffsetOutOfRangeException warning. The offset mismatch error should
> > > never
> > > > > happen. It could be that OffsetOutOfRangeException exposed a bug.
> Do
> > > you
> > > > > think you can reproduce this easily?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg 
> > > > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > I'm not sure I understand your question, wrt produced data?
> > > > > >
> > > > > > But yes, in general, I believe the consumer is not keeping up
> with
> > > the
> > > > > > broker's deleting the data.  So it's trying to fetch the next
> batch
> > > of
> > > > > > data, but it's last offset is no longer there, etc.  So that's
> the
> > > > reason
> > > > > > for the WARN message, in the fetcher thread.
> > > > > >
> > > > > > I'm just not sure I understand then why we don't always see the
> > > > > > ConsumerIterator error also, because won't there always be
> missing
> > > data
> > > > > > detected there?  Why sometimes and not always?  What's the
> > > difference?
> > > > > >
> > > > > > Jason
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao 
> wrote:
> > > > > >
> > > > > > > The WARN and ERROR may not be completely correlated. Could it
> be
> > > that
> > > > > the
> > > > > > > consumer is slow and couldn't keep up with the produced data?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <
> > j...@squareup.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > So, sometimes I just get the WARN from the
> > ConsumerFetcherThread
> > > > (as
> > > > > > > > previously noted, above), e.g.:
> > > > > > > >
> > > > > > > > 2014-01-08 02:31:47,394  WARN
> > > > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > > > consumer.ConsumerFetcherThread -
> > > > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > > > 16163904970
> > > > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > > > 16175326044
> > > > > > > >
> > > > > > > > More recently, I see these in the following log line (not
> sure
> > > why
> > > > I
> > > > > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > > > > >
> > > > > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > > > > consumer.ConsumerIterator - consumed offset: 16163904970
> > doesn't
> > > > > match
> > > > > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > > > > 16175330598:
> > > > > > > > consumed offset = 16163904970;
> > > > > > > >  Consumer may lose data
> > > > > 

Re: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Guozhang Wang
Actually, the broken channel is broken by shutting down the
leader-finder-thread, which is shutdown either by a rebalance retry or
shutting down the consumer.

Do you see "begin rebalance ..." before this log entry? And if yes, search
to see if the rebalance keep failing.

Guozhang


On Fri, Jan 10, 2014 at 11:23 AM, Guozhang Wang  wrote:

> From your logs the channel with the brokers are broken, are the brokers
> alive at that time?
>
> Guozhang
>
>
> On Fri, Jan 10, 2014 at 10:52 AM, Withers, Robert  > wrote:
>
>> The core problem is our consumers stop consuming and lag increases.  We
>> found this blog:
>> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why?.
>>  This lists 3 possibilities.
>>
>> The blog also talks earlier about spurious rebalances, due to improper GC
>> settings, but we couldn't find what GC settings to use.  We are considering
>> changing the zookeeper timeouts.  We are a little confused about the
>> various issues, the sequence of issues and what could cause the consumers
>> to stop reading.  If the fetchers get shutdown, due to a
>> ClosedByInterruptException in the "leader_finder" thread, which tells the
>> "executor_watcher" thread to shutdown the fetchers, that would be another
>> reason the consumers stop processing data.  Is this possible?
>>
>> Thank you,
>> rob
>>
>> -Original Message-
>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>> Sent: Friday, January 10, 2014 11:40 AM
>> To: users@kafka.apache.org
>> Subject: RE: Looks like consumer fetchers get stopped we are not getting
>> any data
>>
>> It would be helpful if you guys can shed some light why all fetchers are
>> getting stopped.
>>
>> -Original Message-
>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>> Sent: Friday, January 10, 2014 11:28 AM
>> To: users@kafka.apache.org
>> Subject: RE: Looks like consumer fetchers get stopped we are not getting
>> any data
>>
>> We also got the below error when this happens.
>>
>> {2014-01-10 00:58:11,292} INFO
>>  
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
>> (?:?) -
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b],
>> exception during rebalance
>> org.I0Itec.zkclient.exception.ZkNoNodeException:
>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
>> NoNode for
>> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
>> at
>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>> at
>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>> at kafka.utils.ZkUtils$.readData(Unknown Source)
>> at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
>> at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
>> Source)
>> at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
>> Source)
>> at
>> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
>> at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
>> Source)
>> at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
>> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for
>> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
>> at
>> org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
>> at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
>> at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
>> at
>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>> ... 9 more
>>
>> -Original Message-
>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>> Sent: Friday, January 10, 2014 10:52 AM
>> To: users@kafka.apache.org
>> Subject: Looks like consumer fetchers get stopped we are not getting any
>> data
>>
>> Please let us know why we are not getting any data from Kafaka after this
>> log from Kafka,can you guys lets us know.
>>
>> What could be causing all fetchers associated to be stooped why it is not
>> doing retry.
>>
>> {2014-01-10 00:58:09,284} WARN
>>  
>> [account-

Re: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Guozhang Wang
>From your logs the channel with the brokers are broken, are the brokers
alive at that time?

Guozhang


On Fri, Jan 10, 2014 at 10:52 AM, Withers, Robert
wrote:

> The core problem is our consumers stop consuming and lag increases.  We
> found this blog:
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why?.
>  This lists 3 possibilities.
>
> The blog also talks earlier about spurious rebalances, due to improper GC
> settings, but we couldn't find what GC settings to use.  We are considering
> changing the zookeeper timeouts.  We are a little confused about the
> various issues, the sequence of issues and what could cause the consumers
> to stop reading.  If the fetchers get shutdown, due to a
> ClosedByInterruptException in the "leader_finder" thread, which tells the
> "executor_watcher" thread to shutdown the fetchers, that would be another
> reason the consumers stop processing data.  Is this possible?
>
> Thank you,
> rob
>
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> Sent: Friday, January 10, 2014 11:40 AM
> To: users@kafka.apache.org
> Subject: RE: Looks like consumer fetchers get stopped we are not getting
> any data
>
> It would be helpful if you guys can shed some light why all fetchers are
> getting stopped.
>
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> Sent: Friday, January 10, 2014 11:28 AM
> To: users@kafka.apache.org
> Subject: RE: Looks like consumer fetchers get stopped we are not getting
> any data
>
> We also got the below error when this happens.
>
> {2014-01-10 00:58:11,292} INFO
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
> (?:?) -
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b],
> exception during rebalance
> org.I0Itec.zkclient.exception.ZkNoNodeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
> at
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> at kafka.utils.ZkUtils$.readData(Unknown Source)
> at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
> at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
> Source)
> at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
> Source)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
> at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
> Source)
> at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
> at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> ... 9 more
>
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> Sent: Friday, January 10, 2014 10:52 AM
> To: users@kafka.apache.org
> Subject: Looks like consumer fetchers get stopped we are not getting any
> data
>
> Please let us know why we are not getting any data from Kafaka after this
> log from Kafka,can you guys lets us know.
>
> What could be causing all fetchers associated to be stooped why it is not
> doing retry.
>
> {2014-01-10 00:58:09,284} WARN
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
> (?:?) - Fetching topic metadata with correlation id 3 for topics
> [Set(account-info-updated)] from broker
> [id:1,host:tm1-kafkabroker101,port:9092] failed
> java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.SocketChannelImpl.write(SocketChanne

RE: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Withers, Robert
The core problem is our consumers stop consuming and lag increases.  We found 
this blog: 
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why?.
  This lists 3 possibilities.  

The blog also talks earlier about spurious rebalances, due to improper GC 
settings, but we couldn't find what GC settings to use.  We are considering 
changing the zookeeper timeouts.  We are a little confused about the various 
issues, the sequence of issues and what could cause the consumers to stop 
reading.  If the fetchers get shutdown, due to a ClosedByInterruptException in 
the "leader_finder" thread, which tells the "executor_watcher" thread to 
shutdown the fetchers, that would be another reason the consumers stop 
processing data.  Is this possible?

Thank you,
rob

-Original Message-
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com] 
Sent: Friday, January 10, 2014 11:40 AM
To: users@kafka.apache.org
Subject: RE: Looks like consumer fetchers get stopped we are not getting any 
data

It would be helpful if you guys can shed some light why all fetchers are 
getting stopped.

-Original Message-
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com] 
Sent: Friday, January 10, 2014 11:28 AM
To: users@kafka.apache.org
Subject: RE: Looks like consumer fetchers get stopped we are not getting any 
data

We also got the below error when this happens.

{2014-01-10 00:58:11,292} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - 
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], 
exception during rebalance
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
at kafka.utils.ZkUtils$.readData(Unknown Source)
at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
 Source)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
 Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for 
/consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 9 more

-Original Message-
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
Sent: Friday, January 10, 2014 10:52 AM
To: users@kafka.apache.org
Subject: Looks like consumer fetchers get stopped we are not getting any data

Please let us know why we are not getting any data from Kafaka after this log 
from Kafka,can you guys lets us know.

What could be causing all fetchers associated to be stooped why it is not doing 
retry.

{2014-01-10 00:58:09,284} WARN  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
 (?:?) - Fetching topic metadata with correlation id 3 for topics 
[Set(account-info-updated)] from broker 
[id:1,host:tm1-kafkabroker101,port:9092] failed 
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
at kafka.network.BoundedByteBufferSend.writeTo(Unknown Source)
at kafka.network.Send$class.writeCompletely(Unknown Source)
at kafka.network.BoundedByteBufferSend.writeCompletely(Unknown Source)
at kafk

RE: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Seshadri, Balaji
It would be helpful if you guys can shed some light why all fetchers are 
getting stopped.

-Original Message-
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com] 
Sent: Friday, January 10, 2014 11:28 AM
To: users@kafka.apache.org
Subject: RE: Looks like consumer fetchers get stopped we are not getting any 
data

We also got the below error when this happens.

{2014-01-10 00:58:11,292} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - 
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], 
exception during rebalance
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
at kafka.utils.ZkUtils$.readData(Unknown Source)
at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
 Source)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
 Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for 
/consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 9 more

-Original Message-
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
Sent: Friday, January 10, 2014 10:52 AM
To: users@kafka.apache.org
Subject: Looks like consumer fetchers get stopped we are not getting any data

Please let us know why we are not getting any data from Kafaka after this log 
from Kafka,can you guys lets us know.

What could be causing all fetchers associated to be stooped why it is not doing 
retry.

{2014-01-10 00:58:09,284} WARN  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
 (?:?) - Fetching topic metadata with correlation id 3 for topics 
[Set(account-info-updated)] from broker 
[id:1,host:tm1-kafkabroker101,port:9092] failed 
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
at kafka.network.BoundedByteBufferSend.writeTo(Unknown Source)
at kafka.network.Send$class.writeCompletely(Unknown Source)
at kafka.network.BoundedByteBufferSend.writeCompletely(Unknown Source)
at kafka.network.BlockingChannel.send(Unknown Source)
at kafka.producer.SyncProducer.liftedTree1$1(Unknown Source)
at 
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(Unknown Source)
at kafka.producer.SyncProducer.send(Unknown Source)
at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
at 
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(Unknown Source)
at kafka.utils.ShutdownableThread.run(Unknown Source)
{2014-01-10 00:58:09,284} DEBUG 
[account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_watcher_executor]
 (?:?) - initial fetch offset of account-access:27: fetched offset = 9655: 
consumed offset = 9655 is 9655
{2014-01-10 00:58:09,284} DEBUG 
[bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_watcher_executor]
 (?:?) - initial consumer offset of bill-generated:11: fetched offset = 152: 
consumed offset = 152 is 152
{2014-01-10 00:58:09,28

RE: Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Seshadri, Balaji
We also got the below error when this happens.

{2014-01-10 00:58:11,292} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - 
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], 
exception during rebalance
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
at kafka.utils.ZkUtils$.readData(Unknown Source)
at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
 Source)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
 Source)
at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
 Source)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for 
/consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 9 more

-Original Message-
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com] 
Sent: Friday, January 10, 2014 10:52 AM
To: users@kafka.apache.org
Subject: Looks like consumer fetchers get stopped we are not getting any data

Please let us know why we are not getting any data from Kafaka after this log 
from Kafka,can you guys lets us know.

What could be causing all fetchers associated to be stooped why it is not doing 
retry.

{2014-01-10 00:58:09,284} WARN  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
 (?:?) - Fetching topic metadata with correlation id 3 for topics 
[Set(account-info-updated)] from broker 
[id:1,host:tm1-kafkabroker101,port:9092] failed 
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
at kafka.network.BoundedByteBufferSend.writeTo(Unknown Source)
at kafka.network.Send$class.writeCompletely(Unknown Source)
at kafka.network.BoundedByteBufferSend.writeCompletely(Unknown Source)
at kafka.network.BlockingChannel.send(Unknown Source)
at kafka.producer.SyncProducer.liftedTree1$1(Unknown Source)
at 
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(Unknown Source)
at kafka.producer.SyncProducer.send(Unknown Source)
at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
at 
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(Unknown Source)
at kafka.utils.ShutdownableThread.run(Unknown Source)
{2014-01-10 00:58:09,284} DEBUG 
[account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_watcher_executor]
 (?:?) - initial fetch offset of account-access:27: fetched offset = 9655: 
consumed offset = 9655 is 9655
{2014-01-10 00:58:09,284} DEBUG 
[bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_watcher_executor]
 (?:?) - initial consumer offset of bill-generated:11: fetched offset = 152: 
consumed offset = 152 is 152
{2014-01-10 00:58:09,284} DEBUG 
[outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77_watcher_executor]
 (?:?) - 
[outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77], 
outbound-communications:108: fetched offset = 1689: consumed offset = 1689 
selected new offset 1689
{2014-01-10 00:58:09,284} DEBUG 
[c

Looks like consumer fetchers get stopped we are not getting any data

2014-01-10 Thread Seshadri, Balaji
Please let us know why we are not getting any data from Kafaka after this log 
from Kafka,can you guys lets us know.

What could be causing all fetchers associated to be stooped why it is not doing 
retry.

{2014-01-10 00:58:09,284} WARN  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
 (?:?) - Fetching topic metadata with correlation
id 3 for topics [Set(account-info-updated)] from broker 
[id:1,host:tm1-kafkabroker101,port:9092] failed
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
at kafka.network.BoundedByteBufferSend.writeTo(Unknown Source)
at kafka.network.Send$class.writeCompletely(Unknown Source)
at kafka.network.BoundedByteBufferSend.writeCompletely(Unknown Source)
at kafka.network.BlockingChannel.send(Unknown Source)
at kafka.producer.SyncProducer.liftedTree1$1(Unknown Source)
at 
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(Unknown Source)
at kafka.producer.SyncProducer.send(Unknown Source)
at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
at 
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(Unknown Source)
at kafka.utils.ShutdownableThread.run(Unknown Source)
{2014-01-10 00:58:09,284} DEBUG 
[account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_watcher_executor]
 (?:?) - initial fetch offset of account-access:27: fetched offset = 9655: 
consumed offset = 9655 is 9655
{2014-01-10 00:58:09,284} DEBUG 
[bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_watcher_executor]
 (?:?) - initial consumer offset of bill-generated:11: fetched offset = 152: 
consumed offset = 152 is 152
{2014-01-10 00:58:09,284} DEBUG 
[outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77_watcher_executor]
 (?:?) - 
[outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77], 
outbound-communications:108: fetched offset = 1689: consumed offset = 1689 
selected new offset 1689
{2014-01-10 00:58:09,284} DEBUG 
[catalina-exec-3-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1434b49cf56383b, packet:: 
clientPath:null serverPath:null finished:false header:: 279,4  replyHeader:: 
279,51539617506,0  request:: 
'/consumers/outbound-call-attempted-hadoop-consumer/offsets/outbound-call-attempted/14,F
  response:: 
#30,s{39860186414,39860186414,1387517714994,1387517714994,0,0,0,0,1,0,39860186414}
{2014-01-10 00:58:09,285} INFO  
[outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77_watcher_executor]
 (?:?) - 
[outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77], 
outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77-3 
attempting to claim partition 109
{2014-01-10 00:58:09,284} DEBUG 
[bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_watcher_executor]
 (?:?) - initial fetch offset of bill-generated:11: fetched offset = 152: 
consumed offset = 152 is 152
{2014-01-10 00:58:09,284} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 619,1  replyHeader:: 
619,51539617508,0  request:: 
'/consumers/account-activated-hadoop-consumer/owners/account-activated/68,#6163636f756e742d6163746976617465642d6861646f6f702d636f6e73756d65725f746d316d7764706c30342d313338393232323535373930362d35363262363733382d30,v{s{31,s{'world,'anyone}}},1
  response:: 
'/consumers/account-activated-hadoop-consumer/owners/account-activated/68
{2014-01-10 00:58:09,284} DEBUG 
[account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_watcher_executor]
 (?:?) - [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc], 
account-access:27: fetched offset = 9655: consumed offset = 9655 selected new 
offset 9655
{2014-01-10 00:58:09,284} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - 
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread],
 Shutdown completed
{2014-01-10 00:58:09,284} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
 (?:?) - 
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread],
 Stopped
{2014-01-10 00:58:09,285} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - [ConsumerFetcherManager-1389222553163] Stopping all fetchers



Topic not created if number of live brokers less than # replicas

2014-01-10 Thread Hanish Bansal
Hi All,

As kafka is known behavior is : number of live brokers can not be less than
# replicas when creating  a new topic.

I raised a jira (https://issues.apache.org/jira/browse/KAFKA-1182)
regarding this for improvement so that topic should be created so that in
case of n replication factor we can support n-1 failure support.

As mentioned on Future Release Plan of
Kafka,
kafka-0.8.1 is going to be released this month. Is 1182 is being considered
for this next release ?

I just want to know is there any plan to resolve this in incoming releases ?

Thanks in advance !!


-- 
*Regards*
*Hanish Bansal*


Re: Property Name Changes in Different Versions of Kafka

2014-01-10 Thread Monika Garg
Thanks a lot Joe...all my doubts/questions are cleared.

Thanks again...:)


Re: Velocity on local machine

2014-01-10 Thread Klaus Schaefers
Hi,

during my test cpu load is quite low, roughly 50 percent, sometimes peask
to 70%.

>Are you using the sync producer per chance?
I enforced now the async and I got a huge improvement in one of my test
cases. let my explore the rest a little bit more.

Cheers,

Klaus




On Fri, Jan 10, 2014 at 2:36 PM, Magnus Edenhill  wrote:

> 2k msgs/s is silly, unless your messages are 10MB each, so something is
> clearly wrong.
>
> * What is the CPU usage and IO load when running your performance tests?
> * Are you using the sync producer per chance? Maybe combined with an
> aggressive log.flush.interval?
> * For reference, and to find out where the bottleneck is, try running
> rdkafka_performance [1] according to the previous link I posted:
>   if performance does not increase then the broker is the problem,
> otherwise it is the producer/consumers.
>
> Regards,
> Magnus
>
> [1]: https://github.com/edenhill/librdkafka/tree/master/examples
>
>
>
> 2014/1/10 Klaus Schaefers 
>
> > Hi,
> >
> > I have close to 2k messages per second. My machine is just a (BG 4-core
> i5
> > but I would expect more messages. I ran Kafka in the default settings.
> >
> >
> > On Fri, Jan 10, 2014 at 12:31 PM, Magnus Edenhill  > >wrote:
> >
> > > What performance numbers did you see?
> > >
> > > For reference you can check the following tests that were also run on
> the
> > > same machine as the broker:
> > >
> > >
> >
> https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance-numbers
> > >
> > > Do they correspond to your numbers?
> > >
> > > Consumer thruput is not included in that document but typically peaks
> at
> > > around 3 million msgs/s when run on the same host and with hot disk
> > caches.
> > >
> > > Regards,
> > > Magnus
> > >
> > >
> > > 2014/1/10 Klaus Schaefers 
> > >
> > > > Hi,
> > > >
> > > > I am currently benchmarking Kafka against ActiveMQ and I got some
> > results
> > > > the surprised my quite a lot. ActiveMQ managed to deliver 4x more
> > > messages
> > > > when running locally. But from all what I was reading I am a little
> bit
> > > > surprised. Honestly I expected Kafka to outperform ActiveMQ. Some I
> am
> > > left
> > > > with the feeling that I configured some wrong or did any kind of
> other
> > > > mistake.
> > > >
> > > > My setup looks like this:
> > > >
> > > > - One local broker
> > > > - 10 Topic / Queues
> > > > - 1 producer that dispatches messages randomly to the topics
> > > > - 1 consumer per topic
> > > >
> > > > I basically used the example on the Kafka web page.
> > > >
> > > > Also I encountered some issues when increasing the number of topics,
> > lets
> > > > say 100. In this case the consumer cannot connect the Zookeeper...
> > > >
> > > > Does anybody has an idea how to improve the performance?
> > > >
> > > >
> > > > Thx,
> > > >
> > > > Klaus
> > > >
> > > >
> > > > --
> > > >
> > > > --
> > > >
> > > > Klaus Schaefers
> > > > Senior Optimization Manager
> > > >
> > > > Ligatus GmbH
> > > > Hohenstaufenring 30-32
> > > > D-50674 Köln
> > > >
> > > > Tel.:  +49 (0) 221 / 56939 -784
> > > > Fax:  +49 (0) 221 / 56 939 - 599
> > > > E-Mail: klaus.schaef...@ligatus.com
> > > > Web: www.ligatus.de
> > > >
> > > > HRB Köln 56003
> > > > Geschäftsführung:
> > > > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > > > Dipl.-Wirtschaftsingenieur Arne Wolter
> > > >
> > >
> >
> >
> >
> > --
> >
> > --
> >
> > Klaus Schaefers
> > Senior Optimization Manager
> >
> > Ligatus GmbH
> > Hohenstaufenring 30-32
> > D-50674 Köln
> >
> > Tel.:  +49 (0) 221 / 56939 -784
> > Fax:  +49 (0) 221 / 56 939 - 599
> > E-Mail: klaus.schaef...@ligatus.com
> > Web: www.ligatus.de
> >
> > HRB Köln 56003
> > Geschäftsführung:
> > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > Dipl.-Wirtschaftsingenieur Arne Wolter
> >
>



-- 

-- 

Klaus Schaefers
Senior Optimization Manager

Ligatus GmbH
Hohenstaufenring 30-32
D-50674 Köln

Tel.:  +49 (0) 221 / 56939 -784
Fax:  +49 (0) 221 / 56 939 - 599
E-Mail: klaus.schaef...@ligatus.com
Web: www.ligatus.de

HRB Köln 56003
Geschäftsführung:
Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
Dipl.-Wirtschaftsingenieur Arne Wolter


Re: understanding OffsetOutOfRangeException's....

2014-01-10 Thread Jun Rao
Could you increase parallelism on the consumers?

Thanks,

Jun


On Thu, Jan 9, 2014 at 1:22 PM, Jason Rosenberg  wrote:

> The consumption rate is a little better after the refactoring.  The main
> issue though, was that we had a mismatch between large and small topics.  A
> large topic can lag, and adversely affect consumption of other topics, so
> this is an attempt to isolate topic filtering, and better balance the
> consumers for the different topics.
>
> So, it's definitely working on that score.
>
> The topic that was lagging (and getting OffsetOutOfRangeExceptions) was
> doing that before and after the refactor (and after we started also seeing
> the ERROR logging).  But consumption of all other topics is working better
> now (almost no lag at all).
>
> I'm also setting the client.id for each consumer in the process, so that I
> can see the individual metrics per consumer.
>
> Jason
>
>
> On Thu, Jan 9, 2014 at 1:00 PM, Jun Rao  wrote:
>
> > Does the consumption rate in the client (msg/sec) change significantly
> > after the refactoring?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg 
> wrote:
> >
> > > Yes, it's happening continuously, at the moment (although I'm expecting
> > the
> > > consumer to catch up soon)
> > >
> > > It seemed to start happening after I refactored the consumer app to use
> > > multiple consumer connectors in the same process (each one has a
> separate
> > > topic filter, so should be no overlap between them).  All using the
> same
> > > consumer group.
> > >
> > > Could it be a thread safety issue in the ZookeeperConsumerConnector
> > (seems
> > > unlikely).
> > >
> > > Jason
> > >
> > >
> > > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
> > >
> > > > Normally, if the consumer can't keep up, you should just see the
> > > > OffsetOutOfRangeException warning. The offset mismatch error should
> > never
> > > > happen. It could be that OffsetOutOfRangeException exposed a bug. Do
> > you
> > > > think you can reproduce this easily?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > I'm not sure I understand your question, wrt produced data?
> > > > >
> > > > > But yes, in general, I believe the consumer is not keeping up with
> > the
> > > > > broker's deleting the data.  So it's trying to fetch the next batch
> > of
> > > > > data, but it's last offset is no longer there, etc.  So that's the
> > > reason
> > > > > for the WARN message, in the fetcher thread.
> > > > >
> > > > > I'm just not sure I understand then why we don't always see the
> > > > > ConsumerIterator error also, because won't there always be missing
> > data
> > > > > detected there?  Why sometimes and not always?  What's the
> > difference?
> > > > >
> > > > > Jason
> > > > >
> > > > >
> > > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:
> > > > >
> > > > > > The WARN and ERROR may not be completely correlated. Could it be
> > that
> > > > the
> > > > > > consumer is slow and couldn't keep up with the produced data?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <
> j...@squareup.com>
> > > > > wrote:
> > > > > >
> > > > > > > So, sometimes I just get the WARN from the
> ConsumerFetcherThread
> > > (as
> > > > > > > previously noted, above), e.g.:
> > > > > > >
> > > > > > > 2014-01-08 02:31:47,394  WARN
> > > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > > consumer.ConsumerFetcherThread -
> > > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > > 16163904970
> > > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > > 16175326044
> > > > > > >
> > > > > > > More recently, I see these in the following log line (not sure
> > why
> > > I
> > > > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > > > >
> > > > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > > > consumer.ConsumerIterator - consumed offset: 16163904970
> doesn't
> > > > match
> > > > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > > > 16175330598:
> > > > > > > consumed offset = 16163904970;
> > > > > > >  Consumer may lose data
> > > > > > >
> > > > > > > Why would I not see this second ERROR everytime there's a
> > > > > > > corresponding WARN on the FetcherThread for an offset reset?
> > > > > > >
> > > > > > > Should I only be concerned about possible lost data if I see
> the
> > > > > > > second ERROR log line?
> > > > > > >
> > > > > > > Jason
> > > > > > >
> > > > > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <
> > j...@squareup.com
> > > >
> > > > > > wrote:
> > > > > > > > But I assume this would not be normally you'd want to log
> > (every
> > > > > > > > incoming producer request?).  Maybe just for debugging?  Or
> is
> > it
> > > > > only
> > 

Re: custom kafka consumer - strangeness

2014-01-10 Thread Jun Rao
Are the offset used in the 2 fetch requests the same? If so, you will get
the same messages twice. You consumer is responsible for advancing the
offsets after consumption.

Thanks,

Jun


On Thu, Jan 9, 2014 at 1:00 PM, Gerrit Jansen van Vuuren <
gerrit...@gmail.com> wrote:

> Hi,
>
> I'm writing a custom consumer for kafka 0.8.
> Everything works except for the following:
>
> a. connect, send fetch, read all results
> b. send fetch
> c. send fetch
> d. send fetch
> e. via the console publisher, publish 2 messages
> f. send fetch :corr-id 1
> g. read 2 messages published :offsets [10 11] :corr-id 1
> h. send fetch :corr-id 2
> i. read 2 messages published :offsets [10 11] :corr-id 2
> j.  send fetch ...
>
> The problem is I get the messages sent twice as a response to two separate
> fetch requests. The correlation id is distinct so it cannot be that I read
> the response twice. The offsets of the 2 messages are are the same so they
> are duplicates, and its not the producer sending the messages twice.
>
> Note: the same connection is kept open the whole time, and I send
> block,receive then send again, after the first 2 messages are read, the
> offsets are incremented and the next fetch will ask kafka to give it
> messages from the new offsets.
>
> any ideas of why kafka would be sending the messages again on the second
> fetch request?
>
> Regards,
>  Gerrit
>


Re: Velocity on local machine

2014-01-10 Thread Magnus Edenhill
2k msgs/s is silly, unless your messages are 10MB each, so something is
clearly wrong.

* What is the CPU usage and IO load when running your performance tests?
* Are you using the sync producer per chance? Maybe combined with an
aggressive log.flush.interval?
* For reference, and to find out where the bottleneck is, try running
rdkafka_performance [1] according to the previous link I posted:
  if performance does not increase then the broker is the problem,
otherwise it is the producer/consumers.

Regards,
Magnus

[1]: https://github.com/edenhill/librdkafka/tree/master/examples



2014/1/10 Klaus Schaefers 

> Hi,
>
> I have close to 2k messages per second. My machine is just a (BG 4-core i5
> but I would expect more messages. I ran Kafka in the default settings.
>
>
> On Fri, Jan 10, 2014 at 12:31 PM, Magnus Edenhill  >wrote:
>
> > What performance numbers did you see?
> >
> > For reference you can check the following tests that were also run on the
> > same machine as the broker:
> >
> >
> https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance-numbers
> >
> > Do they correspond to your numbers?
> >
> > Consumer thruput is not included in that document but typically peaks at
> > around 3 million msgs/s when run on the same host and with hot disk
> caches.
> >
> > Regards,
> > Magnus
> >
> >
> > 2014/1/10 Klaus Schaefers 
> >
> > > Hi,
> > >
> > > I am currently benchmarking Kafka against ActiveMQ and I got some
> results
> > > the surprised my quite a lot. ActiveMQ managed to deliver 4x more
> > messages
> > > when running locally. But from all what I was reading I am a little bit
> > > surprised. Honestly I expected Kafka to outperform ActiveMQ. Some I am
> > left
> > > with the feeling that I configured some wrong or did any kind of other
> > > mistake.
> > >
> > > My setup looks like this:
> > >
> > > - One local broker
> > > - 10 Topic / Queues
> > > - 1 producer that dispatches messages randomly to the topics
> > > - 1 consumer per topic
> > >
> > > I basically used the example on the Kafka web page.
> > >
> > > Also I encountered some issues when increasing the number of topics,
> lets
> > > say 100. In this case the consumer cannot connect the Zookeeper...
> > >
> > > Does anybody has an idea how to improve the performance?
> > >
> > >
> > > Thx,
> > >
> > > Klaus
> > >
> > >
> > > --
> > >
> > > --
> > >
> > > Klaus Schaefers
> > > Senior Optimization Manager
> > >
> > > Ligatus GmbH
> > > Hohenstaufenring 30-32
> > > D-50674 Köln
> > >
> > > Tel.:  +49 (0) 221 / 56939 -784
> > > Fax:  +49 (0) 221 / 56 939 - 599
> > > E-Mail: klaus.schaef...@ligatus.com
> > > Web: www.ligatus.de
> > >
> > > HRB Köln 56003
> > > Geschäftsführung:
> > > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > > Dipl.-Wirtschaftsingenieur Arne Wolter
> > >
> >
>
>
>
> --
>
> --
>
> Klaus Schaefers
> Senior Optimization Manager
>
> Ligatus GmbH
> Hohenstaufenring 30-32
> D-50674 Köln
>
> Tel.:  +49 (0) 221 / 56939 -784
> Fax:  +49 (0) 221 / 56 939 - 599
> E-Mail: klaus.schaef...@ligatus.com
> Web: www.ligatus.de
>
> HRB Köln 56003
> Geschäftsführung:
> Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> Dipl.-Wirtschaftsingenieur Arne Wolter
>


Re: Velocity on local machine

2014-01-10 Thread Gerrit Jansen van Vuuren
Have you tried using more producers.
The kafka broker is performant, but the client producer's performance is
not what it should be.

You can also have a look at tuning the number of kafka broker's network and
io threads.


Regards,
 Gerrit


On Fri, Jan 10, 2014 at 1:06 PM, Klaus Schaefers <
klaus.schaef...@ligatus.com> wrote:

> Hi,
>
> I have close to 2k messages per second. My machine is just a (BG 4-core i5
> but I would expect more messages. I ran Kafka in the default settings.
>
>
> On Fri, Jan 10, 2014 at 12:31 PM, Magnus Edenhill  >wrote:
>
> > What performance numbers did you see?
> >
> > For reference you can check the following tests that were also run on the
> > same machine as the broker:
> >
> >
> https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance-numbers
> >
> > Do they correspond to your numbers?
> >
> > Consumer thruput is not included in that document but typically peaks at
> > around 3 million msgs/s when run on the same host and with hot disk
> caches.
> >
> > Regards,
> > Magnus
> >
> >
> > 2014/1/10 Klaus Schaefers 
> >
> > > Hi,
> > >
> > > I am currently benchmarking Kafka against ActiveMQ and I got some
> results
> > > the surprised my quite a lot. ActiveMQ managed to deliver 4x more
> > messages
> > > when running locally. But from all what I was reading I am a little bit
> > > surprised. Honestly I expected Kafka to outperform ActiveMQ. Some I am
> > left
> > > with the feeling that I configured some wrong or did any kind of other
> > > mistake.
> > >
> > > My setup looks like this:
> > >
> > > - One local broker
> > > - 10 Topic / Queues
> > > - 1 producer that dispatches messages randomly to the topics
> > > - 1 consumer per topic
> > >
> > > I basically used the example on the Kafka web page.
> > >
> > > Also I encountered some issues when increasing the number of topics,
> lets
> > > say 100. In this case the consumer cannot connect the Zookeeper...
> > >
> > > Does anybody has an idea how to improve the performance?
> > >
> > >
> > > Thx,
> > >
> > > Klaus
> > >
> > >
> > > --
> > >
> > > --
> > >
> > > Klaus Schaefers
> > > Senior Optimization Manager
> > >
> > > Ligatus GmbH
> > > Hohenstaufenring 30-32
> > > D-50674 Köln
> > >
> > > Tel.:  +49 (0) 221 / 56939 -784
> > > Fax:  +49 (0) 221 / 56 939 - 599
> > > E-Mail: klaus.schaef...@ligatus.com
> > > Web: www.ligatus.de
> > >
> > > HRB Köln 56003
> > > Geschäftsführung:
> > > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > > Dipl.-Wirtschaftsingenieur Arne Wolter
> > >
> >
>
>
>
> --
>
> --
>
> Klaus Schaefers
> Senior Optimization Manager
>
> Ligatus GmbH
> Hohenstaufenring 30-32
> D-50674 Köln
>
> Tel.:  +49 (0) 221 / 56939 -784
> Fax:  +49 (0) 221 / 56 939 - 599
> E-Mail: klaus.schaef...@ligatus.com
> Web: www.ligatus.de
>
> HRB Köln 56003
> Geschäftsführung:
> Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> Dipl.-Wirtschaftsingenieur Arne Wolter
>


Re: Velocity on local machine

2014-01-10 Thread Klaus Schaefers
Hi,

I have close to 2k messages per second. My machine is just a (BG 4-core i5
but I would expect more messages. I ran Kafka in the default settings.


On Fri, Jan 10, 2014 at 12:31 PM, Magnus Edenhill wrote:

> What performance numbers did you see?
>
> For reference you can check the following tests that were also run on the
> same machine as the broker:
>
> https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance-numbers
>
> Do they correspond to your numbers?
>
> Consumer thruput is not included in that document but typically peaks at
> around 3 million msgs/s when run on the same host and with hot disk caches.
>
> Regards,
> Magnus
>
>
> 2014/1/10 Klaus Schaefers 
>
> > Hi,
> >
> > I am currently benchmarking Kafka against ActiveMQ and I got some results
> > the surprised my quite a lot. ActiveMQ managed to deliver 4x more
> messages
> > when running locally. But from all what I was reading I am a little bit
> > surprised. Honestly I expected Kafka to outperform ActiveMQ. Some I am
> left
> > with the feeling that I configured some wrong or did any kind of other
> > mistake.
> >
> > My setup looks like this:
> >
> > - One local broker
> > - 10 Topic / Queues
> > - 1 producer that dispatches messages randomly to the topics
> > - 1 consumer per topic
> >
> > I basically used the example on the Kafka web page.
> >
> > Also I encountered some issues when increasing the number of topics, lets
> > say 100. In this case the consumer cannot connect the Zookeeper...
> >
> > Does anybody has an idea how to improve the performance?
> >
> >
> > Thx,
> >
> > Klaus
> >
> >
> > --
> >
> > --
> >
> > Klaus Schaefers
> > Senior Optimization Manager
> >
> > Ligatus GmbH
> > Hohenstaufenring 30-32
> > D-50674 Köln
> >
> > Tel.:  +49 (0) 221 / 56939 -784
> > Fax:  +49 (0) 221 / 56 939 - 599
> > E-Mail: klaus.schaef...@ligatus.com
> > Web: www.ligatus.de
> >
> > HRB Köln 56003
> > Geschäftsführung:
> > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > Dipl.-Wirtschaftsingenieur Arne Wolter
> >
>



-- 

-- 

Klaus Schaefers
Senior Optimization Manager

Ligatus GmbH
Hohenstaufenring 30-32
D-50674 Köln

Tel.:  +49 (0) 221 / 56939 -784
Fax:  +49 (0) 221 / 56 939 - 599
E-Mail: klaus.schaef...@ligatus.com
Web: www.ligatus.de

HRB Köln 56003
Geschäftsführung:
Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
Dipl.-Wirtschaftsingenieur Arne Wolter


Re: Velocity on local machine

2014-01-10 Thread Magnus Edenhill
What performance numbers did you see?

For reference you can check the following tests that were also run on the
same machine as the broker:
https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance-numbers

Do they correspond to your numbers?

Consumer thruput is not included in that document but typically peaks at
around 3 million msgs/s when run on the same host and with hot disk caches.

Regards,
Magnus


2014/1/10 Klaus Schaefers 

> Hi,
>
> I am currently benchmarking Kafka against ActiveMQ and I got some results
> the surprised my quite a lot. ActiveMQ managed to deliver 4x more messages
> when running locally. But from all what I was reading I am a little bit
> surprised. Honestly I expected Kafka to outperform ActiveMQ. Some I am left
> with the feeling that I configured some wrong or did any kind of other
> mistake.
>
> My setup looks like this:
>
> - One local broker
> - 10 Topic / Queues
> - 1 producer that dispatches messages randomly to the topics
> - 1 consumer per topic
>
> I basically used the example on the Kafka web page.
>
> Also I encountered some issues when increasing the number of topics, lets
> say 100. In this case the consumer cannot connect the Zookeeper...
>
> Does anybody has an idea how to improve the performance?
>
>
> Thx,
>
> Klaus
>
>
> --
>
> --
>
> Klaus Schaefers
> Senior Optimization Manager
>
> Ligatus GmbH
> Hohenstaufenring 30-32
> D-50674 Köln
>
> Tel.:  +49 (0) 221 / 56939 -784
> Fax:  +49 (0) 221 / 56 939 - 599
> E-Mail: klaus.schaef...@ligatus.com
> Web: www.ligatus.de
>
> HRB Köln 56003
> Geschäftsführung:
> Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> Dipl.-Wirtschaftsingenieur Arne Wolter
>


Velocity on local machine

2014-01-10 Thread Klaus Schaefers
Hi,

I am currently benchmarking Kafka against ActiveMQ and I got some results
the surprised my quite a lot. ActiveMQ managed to deliver 4x more messages
when running locally. But from all what I was reading I am a little bit
surprised. Honestly I expected Kafka to outperform ActiveMQ. Some I am left
with the feeling that I configured some wrong or did any kind of other
mistake.

My setup looks like this:

- One local broker
- 10 Topic / Queues
- 1 producer that dispatches messages randomly to the topics
- 1 consumer per topic

I basically used the example on the Kafka web page.

Also I encountered some issues when increasing the number of topics, lets
say 100. In this case the consumer cannot connect the Zookeeper...

Does anybody has an idea how to improve the performance?


Thx,

Klaus


-- 

-- 

Klaus Schaefers
Senior Optimization Manager

Ligatus GmbH
Hohenstaufenring 30-32
D-50674 Köln

Tel.:  +49 (0) 221 / 56939 -784
Fax:  +49 (0) 221 / 56 939 - 599
E-Mail: klaus.schaef...@ligatus.com
Web: www.ligatus.de

HRB Köln 56003
Geschäftsführung:
Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
Dipl.-Wirtschaftsingenieur Arne Wolter