Re: kafka 0.8.1.1 log.retention.minutes NOT being honored

2014-07-08 Thread Virendra Pratap Singh
That's correct. The server where in I was running 0.8.1.1 was not honoring
this parameter, despite the fact it was set in it server.properties.
Not sure if this fact would play any role, the server which was running
0.8.0 was the leader for all the topics and partition in my setup. And the
second server running 0.8.1.1 has all the replicas (follower).

Virendra

On 7/8/14, 12:54 PM, "Guozhang Wang"  wrote:

>Server properties should affect on only the local instance separately. Are
>you saying the property is not honored even on the 0.8.1 machines?
>
>Guozhang
>
>On Mon, Jul 7, 2014 at 3:55 PM, Virendra Pratap Singh <
>vpsi...@yahoo-inc.com.invalid> wrote:
>
>> By setting this property
>> log.retention.mins=10
>> in the server.properties file, which is passed as argument when starting
>> the broker.
>>
>> Virendra
>>
>> On 7/7/14, 3:31 PM, "Guozhang Wang"  wrote:
>>
>> >How do you set the retention.minutes property? Is it through zk-based
>> >topics tool?
>> >
>> >Guozhang
>> >
>> >
>> >On Mon, Jul 7, 2014 at 3:07 PM, Virendra Pratap Singh <
>> >vpsi...@yahoo-inc.com.invalid> wrote:
>> >
>> >> I am running a mixed cluster as I mentioned earlier. 1 broker 0.8.0
>>and
>> >> the other 0.8.1.1. Should the retention of topics for partitions
>> >> owned/replicated by the broker running 0.8.1.1 not enforce the server
>> >> properties settings as defined for that server.
>> >>
>> >> So this brings an interesting question, in case of heterogeneous
>> >> environment (as is in my case, which system parameters will take
>> >> preference/precedence).
>> >>
>> >> Virendra
>> >>
>> >> On 6/30/14, 9:19 AM, "Guozhang Wang"  wrote:
>> >>
>> >> >The retention.minute property is only introduced in 0.8.1:
>> >> >
>> >> >https://issues.apache.org/jira/browse/KAFKA-918
>> >> >
>> >> >if you are running 0.8.0 then it will not be recognized.
>> >> >
>> >> >Guozhang
>> >> >
>> >> >
>> >> >
>> >> >On Fri, Jun 27, 2014 at 2:13 PM, Virendra Pratap Singh <
>> >> >vpsi...@yahoo-inc.com.invalid> wrote:
>> >> >
>> >> >> Running a mixed 2 broker cluster. Mixed as in one of the broker1
>>is
>> >> >> running 0.8.0 and broker2 one 0.8.1.1 (from the apache release
>>link.
>> >> >> Directly using the tar ball, no local build used).
>> >> >>
>> >> >> I have set the log.retention.minutes=10. However the broker is not
>> >> >> honoring the setting. I see its not cleaning the log.dir at all.
>> >> >>
>> >> >> However when I set the log.retention.hours=1, then it starts
>>cleaning
>> >> >>the
>> >> >> log.
>> >> >>
>> >> >> When I have the log.retention.minutes set in the server.properties
>> >>then
>> >> >>I
>> >> >> see this logged in server.log:
>> >> >>
>> >> >> Š..
>> >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes is
>>not
>> >> >>valid
>> >> >> (kafka.utils.VerifiableProperties)
>> >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes is
>>not
>> >> >>valid
>> >> >> (kafka.utils.VerifiableProperties)
>> >> >> ŠŠ
>> >> >>
>> >> >>
>> >> >> I have set these properties too:
>> >> >>
>> >> >> log.cleaner.enable=true
>> >> >> log.cleanup.policy=delete
>> >> >>
>> >> >>
>> >> >> But I see similar warning logged for these properties too.
>> >> >>
>> >> >> Regards,
>> >> >> Virendra
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >--
>> >> >-- Guozhang
>> >>
>> >>
>> >
>> >
>> >--
>> >-- Guozhang
>>
>>
>
>
>-- 
>-- Guozhang



Re: status of 0.8.2

2014-07-08 Thread Jun Rao
Yes, 0.8.2 is compatible with 0.8.0 and 0.8.1 in terms of wire protocols
and the upgrade can be done in place.

Thanks,

Jun


On Tue, Jul 8, 2014 at 11:49 AM, Michal Michalski <
michal.michal...@boxever.com> wrote:

> One more question regarding 0.8.2 - is it planned to be a in-place,
> no-downtime release (I'm using 0.8.0 now)? By looking at the version number
> changes only I'd guess it is, but... ;-)
>
> Michal
>
> Kind regards,
> Michał Michalski,
> michal.michal...@boxever.com
>
>
> On 8 July 2014 18:22, Joel Koshy  wrote:
>
> > Hi Joe,
> >
> > I had a question for you in that RB since I don't fully understand it.
> > Maybe you can help clarify how the fix works.
> >
> > Thanks,
> >
> > Joel
> >
> > On Tue, Jul 08, 2014 at 10:55:56AM -0400, Joe Stein wrote:
> > > I wrote it, so I can't commit it without other committers agreeing.
> > >
> > > Last I recall I updated the patch from the feedback in the reviewboard
> > but
> > > haven't looked at it in months.
> > >
> > > I am glad though it resolved the issue you were having and we can
> figure
> > > how to get the patch to work with 0.8.1.1 if you run into problems.
> > >
> > > /***
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop 
> > > /
> > >
> > >
> > > On Tue, Jul 8, 2014 at 10:45 AM, Jason Rosenberg 
> > wrote:
> > >
> > > > Is there a blocker to getting the patch for kafka-1180 applied?  Is
> the
> > > > patch for 0.8.0 no longer compatible for trunk?  I'm actually going
> to
> > see
> > > > if I can get it to work for 0.8.1.1 today.
> > > >
> > > > Thanks,
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Mon, Jul 7, 2014 at 9:41 PM, Jun Rao  wrote:
> > > >
> > > > > Two biggest features in 0.8.2 are Kafka-based offset management and
> > the
> > > > new
> > > > > producer. We are in the final stage of testing them. We also
> haven't
> > > > fully
> > > > > tested the delete topic feature. So, we are probably 4-6 weeks away
> > from
> > > > > releasing 0.8.2.
> > > > >
> > > > > For kafka-1180, the patch hasn't been applied yet and we will need
> a
> > > > patch
> > > > > for trunk.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg 
> > > > wrote:
> > > > >
> > > > > > What's the status for an 0.8.2 release?  We are currently using
> > 0.8.0,
> > > > > and
> > > > > > would like to upgrade to take advantage of some of the per-topic
> > > > > retention
> > > > > > options available now in 0.8.1.
> > > > > >
> > > > > > However, we'd also like to take advantage of some fixes coming in
> > 0.8.2
> > > > > > (e.g. deleting topics).
> > > > > >
> > > > > > Also, we have been using a patch for (
> > > > > > https://issues.apache.org/jira/browse/KAFKA-1180) applied to
> > 0.8.0.
> > > > >  This
> > > > > > is marked as scheduled for 0.8.2, with a patch available, but I'm
> > not
> > > > > sure
> > > > > > if this has been committed and applied to the 0.8.2 branch yet.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jason
> > > > > >
> > > > >
> > > >
> >
> >
>


Re: Too Many Open Files Broker Error

2014-07-08 Thread Jun Rao
Does your test program run as the same user as Kafka broker?

Thanks,

Jun


On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul  wrote:

> Hi Guys,
>
> I’m seeing the following errors from the 0.8.1.1 broker. This occurs most
> often on the Controller machine. Then the controller process crashes, and
> the controller bounces to other machines, which causes those machines to
> crash. Looking at the file descriptors being held by the process, it’s only
> around 4000 or so(looking at . There aren’t a whole lot of connections in
> TIME_WAIT states, and I’ve increased the ephemeral port range to “16000 –
> 64000” via "/proc/sys/net/ipv4/ip_local_port_range”. I’ve written a Java
> test program to see how many sockets and files I can open. The socket is
> definitely limited by the ephemeral port range, which was around 22K at the
> time. But I
> can open tons of files, since the open file limit of the user is set to
> 100K.
>
> So given that I can theoretically open 48K sockets and probably 90K files,
> and I only see around 4K total for the Kafka broker, I’m really confused as
> to why I’m seeing this error. Is there some internal Kafka limit that I
> don’t know about?
>
> Paul Lung
>
>
>
> java.io.IOException: Too many open files
>
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
>
> at kafka.network.Acceptor.accept(SocketServer.scala:200)
>
> at kafka.network.Acceptor.run(SocketServer.scala:154)
>
> at java.lang.Thread.run(Thread.java:679)
>
> [2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor)
>
> java.io.IOException: Too many open files
>
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
>
> at kafka.network.Acceptor.accept(SocketServer.scala:200)
>
> at kafka.network.Acceptor.run(SocketServer.scala:154)
>
> at java.lang.Thread.run(Thread.java:679)
>
> [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error
> for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to
> broker 2124488:class kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
>
> [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]: Error
> writing to highwatermark file:  (kafka.server.ReplicaManager)
>
> java.io.FileNotFoundException:
> /ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-checkpoint.tmp
> (Too many open files)
>
> at java.io.FileOutputStream.open(Native Method)
>
> at java.io.FileOutputStream.(FileOutputStream.java:209)
>
> at java.io.FileOutputStream.(FileOutputStream.java:160)
>
> at java.io.FileWriter.(FileWriter.java:90)
>
> at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
>
> at
> kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:447)
>
> at
> kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:444)
>
> at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>
> at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>
> at
> kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala:444)
>
> at
> kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala:94)
>
> at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
>
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>
> at java.lang.Thread.run(Thread.java:679)
>
>
>
>


Re: Facing issues with Kafka 0.8.1.1 and kafka-reassign-partitions.sh

2014-07-08 Thread Jun Rao
Since you have replication factor 3 and only 3 brokers, you can't move data
around in the existing cluster since a partition can have at most replica
on each broker. You will need to add the new brokers in first before
running the reassignment tool.

Thanks,

Jun


On Tue, Jul 8, 2014 at 10:26 AM, Florian Dambrine 
wrote:

> I let the tool running for an entire weekend on the test cluster and on
> Monday it was still saying "failed"...
>
> I have 500 Go per Kafka node and it is a 8 nodes cluster.
>
> I am also wondering if I am using the tool correctly. Currently I am
> running the tool to rebalance everything across the entire cluster. As I
> have 3 replicas the tool requires at least 3 brokers.
>
> Should I add 3 new Kafka nodes and rebalance some topics to these new nodes
> only? I am afraid to unbalance the cluster with this option.
>
> Any suggestions?
>
> Thanks for your help.
>
>
> On Mon, Jul 7, 2014 at 9:29 PM, Jun Rao  wrote:
>
> > The failure could mean that the reassignment is still in progress. If you
> > have lots of data, it may take some time to move the data to new brokers.
> > You could observe the max lag in each broker to see how far behind new
> > replicas are (see http://kafka.apache.org/documentation.html#monitoring
> ).
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Jul 7, 2014 at 4:42 PM, Florian Dambrine 
> > wrote:
> >
> > > When I run the tool with the --verify option it says failed for the
> some
> > > partitions.
> > >
> > > The problem is I do not know if it is a zookeeper issue or if the tool
> > > really failed.
> > >
> > > I faced one time the zookeeper issue (
> > > https://issues.apache.org/jira/browse/KAFKA-1382) and by killing the
> > > responsible Kafka the partition switched from failed to completed
> > > successfully.
> > >
> > > What should I do when the Kafka tool says that it failed to move the
> > > partition?
> > >
> > >
> > >
> > >
> > > On Mon, Jul 7, 2014 at 4:33 PM, Clark Haskins
> > >  > > > wrote:
> > >
> > > > How does it get stuck?
> > > >
> > > > -Clark
> > > >
> > > > Clark Elliott Haskins III
> > > > LinkedIn DDS Site Reliability Engineer
> > > > Kafka, Zookeeper, Samza SRE
> > > > Mobile: 505.385.1484
> > > > BlueJeans: https://www.bluejeans.com/chaskins
> > > >
> > > >
> > > > chask...@linkedin.com
> > > > https://www.linkedin.com/in/clarkhaskins
> > > > There is no place like 127.0.0.1
> > > >
> > > >
> > > >
> > > >
> > > > On 7/7/14, 3:49 PM, "Florian Dambrine"  wrote:
> > > >
> > > > >Hi,
> > > > >
> > > > >I am trying to add new brokers to an existing 8 nodes Kafka cluster.
> > We
> > > > >have around 10 topics and the number of partition is set to 50. In
> > order
> > > > >to
> > > > >test the reassgin-partitions scripts, I tried on a sandbox cluster
> the
> > > > >following steps.
> > > > >
> > > > >I developed a script which is able to parse the reassignment
> partition
> > > > >plan
> > > > >given by the Kafka tool in smaller pieces (reassigning maximum 10
> > > > >partitions at a time).
> > > > >
> > > > >Unfortunately I faced some issues with the tool that sometimes get
> > stuck
> > > > >on
> > > > >one partition. In this case I have to kill and restart the three
> > Kafkas
> > > on
> > > > >which the partition has been relocated to unlock the process (One
> > kafka
> > > at
> > > > >a time).
> > > > >
> > > > >Moreover, I have also faced these two issues that are already on
> Jira:
> > > > >
> > > > >https://issues.apache.org/jira/browse/KAFKA-1382
> > > > >https://issues.apache.org/jira/browse/KAFKA-1479
> > > > >
> > > > >We really need to add new nodes to our Kafka cluster, does anybody
> > have
> > > > >already rebalance a Kafka 0.8.1.1? What could you advise me?
> > > > >
> > > > >Thanks, and feel free to ask me if you need more details.
> > > > >
> > > > >
> > > > >
> > > > >--
> > > > >*Florian Dambrine*  |  Intern, Big Data
> > > > >*GumGum*   |  *Ads that stick*
> > > > >209-797-3994  |  flor...@gumgum.com
> > > >
> > > >
> > >
> > >
> > > --
> > > *Florian Dambrine*  |  Intern, Big Data
> > > *GumGum*   |  *Ads that stick*
> > > 209-797-3994  |  flor...@gumgum.com
> > >
> >
>
>
>
> --
> *Florian Dambrine*  |  Intern, Big Data
> *GumGum*   |  *Ads that stick*
> 209-797-3994  |  flor...@gumgum.com
>


Re: can the replication factor for a topic be changed after it's created?

2014-07-08 Thread gmail
hi,
it seems we can do it now.
in official documents:
http://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor

RE: How recover leader when broker restart

2014-07-08 Thread chenlax
thank you Guozhang,;i don't know why the isr==null and i can't reappear it, all 
brokers are working,and add new topic succesfully.
Thanks,
Lax


> Date: Tue, 8 Jul 2014 11:29:05 -0700
> Subject: Re: How recover leader when broker restart
> From: wangg...@gmail.com
> To: users@kafka.apache.org
> 
> Lax,
> 
> Under that scenario you would better first fix the issue of isr==null by
> checking if anything went wrong on the brokers.
> 
> Guozhang
> 
> 
> On Mon, Jul 7, 2014 at 8:43 PM, chenlax  wrote:
> 
> > use preferred tool can rebalance leadership,but if the isr are null then
> > the leader is only -1,how i can recover the leader.
> >
> >
> > Thanks,
> > Lax
> >
> >
> > > Date: Mon, 7 Jul 2014 08:06:16 -0700
> > > Subject: Re: How recover leader when broker restart
> > > From: wangg...@gmail.com
> > > To: users@kafka.apache.org
> > >
> > > You can use the preferred leader election tool to move the leadership.
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.PreferredReplicaLeaderElectionTool
> > >
> > > Guozhang
> > >
> > >
> > > On Mon, Jul 7, 2014 at 7:56 AM, 鞠大升  wrote:
> > >
> > > > you can use the preferred leader election tool to reset leaders to
> > > > preferred replicas.
> > > > 2014年7月7日 PM10:37于 "François Langelier" 写道:
> > > >
> > > > > AFAIK, the simplest way will be to shutdown your 2 others brokers
> > after
> > > > you
> > > > > restarted your broker 1, which will force your topics to have your
> > > > broker 1
> > > > > as leader since it's the only one available, and then restart your
> > > > brokers
> > > > > 2 and 3
> > > > >
> > > > > But I can't really see why you want your leaders on broker 1...
> > > > >
> > > > >
> > > > >
> > > > > François Langelier
> > > > > Étudiant en génie Logiciel - École de Technologie Supérieure
> > > > > 
> > > > > Capitaine Club Capra 
> > > > > VP-Communication - CS Games  2014
> > > > > Jeux de Génie  2011 à 2014
> > > > > Argentier Fraternité du Piranha 
> > > > > 2012-2014
> > > > > Comité Organisateur Olympiades ÉTS 2012
> > > > > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> > > > >
> > > > >
> > > > > On 7 July 2014 05:59, 陈翔  wrote:
> > > > >
> > > > > > i have 3 broker,when i restart a broker 1,then 1 can not as
> > leader.i
> > > > want
> > > > > > to know how i can recover broker 1 as a leader.
> > > > > >
> > > > > > thanks,
> > > > > > lax
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
> >
> 
> 
> 
> -- 
> -- Guozhang
  

Re: status of 0.8.2

2014-07-08 Thread Joe Stein
Yup, that was my bad.  I updated the comment to explain how the error is
happening.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Tue, Jul 8, 2014 at 1:22 PM, Joel Koshy  wrote:

> Hi Joe,
>
> I had a question for you in that RB since I don't fully understand it.
> Maybe you can help clarify how the fix works.
>
> Thanks,
>
> Joel
>
> On Tue, Jul 08, 2014 at 10:55:56AM -0400, Joe Stein wrote:
> > I wrote it, so I can't commit it without other committers agreeing.
> >
> > Last I recall I updated the patch from the feedback in the reviewboard
> but
> > haven't looked at it in months.
> >
> > I am glad though it resolved the issue you were having and we can figure
> > how to get the patch to work with 0.8.1.1 if you run into problems.
> >
> > /***
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop 
> > /
> >
> >
> > On Tue, Jul 8, 2014 at 10:45 AM, Jason Rosenberg 
> wrote:
> >
> > > Is there a blocker to getting the patch for kafka-1180 applied?  Is the
> > > patch for 0.8.0 no longer compatible for trunk?  I'm actually going to
> see
> > > if I can get it to work for 0.8.1.1 today.
> > >
> > > Thanks,
> > >
> > > Jason
> > >
> > >
> > > On Mon, Jul 7, 2014 at 9:41 PM, Jun Rao  wrote:
> > >
> > > > Two biggest features in 0.8.2 are Kafka-based offset management and
> the
> > > new
> > > > producer. We are in the final stage of testing them. We also haven't
> > > fully
> > > > tested the delete topic feature. So, we are probably 4-6 weeks away
> from
> > > > releasing 0.8.2.
> > > >
> > > > For kafka-1180, the patch hasn't been applied yet and we will need a
> > > patch
> > > > for trunk.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > What's the status for an 0.8.2 release?  We are currently using
> 0.8.0,
> > > > and
> > > > > would like to upgrade to take advantage of some of the per-topic
> > > > retention
> > > > > options available now in 0.8.1.
> > > > >
> > > > > However, we'd also like to take advantage of some fixes coming in
> 0.8.2
> > > > > (e.g. deleting topics).
> > > > >
> > > > > Also, we have been using a patch for (
> > > > > https://issues.apache.org/jira/browse/KAFKA-1180) applied to
> 0.8.0.
> > > >  This
> > > > > is marked as scheduled for 0.8.2, with a patch available, but I'm
> not
> > > > sure
> > > > > if this has been committed and applied to the 0.8.2 branch yet.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jason
> > > > >
> > > >
> > >
>
>


New Consumer Design

2014-07-08 Thread Guozhang Wang
Hi All,

We have written a wiki a few weeks back proposing a single-threaded ZK-free
consumer client design for 0.9:

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design

We want to share some of the ideas that came up for this design to get some
early feedback. The below discussion assumes you have read the above wiki.

*Offset Management*

To make consumer clients ZK-free we need to move the offset management
utility from ZooKeeper into the Kafka servers, which then store offsets as
a special log. Key-based log compaction will be used to keep this
ever-growing log's size bounded. Brokers who are responsible for the offset
management for a given consumer group will be the leader of this special
offset log partition where the partition key will be consumer group names.
On the consumer side, instead of talking to ZK for commit and read offsets,
it talks to the servers with new offset commit and offset fetch request
types for offset management. This work has been done and will be included
in the 0.8.2 release. Details of the implementation can be found in
KAFKA-1000 and this wiki:

https://cwiki.apache.org/confluence/display/KAFKA/Inbuilt+Consumer+Offset+Management

*Group Membership Management*

The next step is to move the membership management and rebalancing utility
out of ZooKeeper as well. To do this we introduced a consumer coordinator
hosted on the Kafka servers which is responsible for keeping track of group
membership and consumption partition changes, and the corresponding
rebalancing process.

*1. Failure Detection*

More specifically, we will use a heartbeating protocol for consumer and
coordinator failure detections. The heartbeating protocol would be similar
to the one ZK used, i.e. the consumer will set a session timeout value to
the coordinator on startup, and keep sending heartbeats to the coordinator
afterwards every session-timeout / heartbeat-frequency. The coordinator
will treat a consumer as failed when it has not heard from the consumer
after session-timeout, and the consumer will treat its coordinator as
failed after it has not heard its heartbeat responses after session-timeout.

One difference with ZK heartbeat protocol is that instead of fixing the
heartbeat-frequency as three, we make this value configurable in consumer
clients. This is because the rebalancing process (we will discuss later)
latency is lower bounded by the heartbeat frequency , so we want this value
to be large while not DDoSing the servers.

*2. Consumer Rebalance*

A bunch of events can trigger a rebalance process, 1) consumer failure, 2)
new consumer joining group, 3) existing consumer change consume
topic/partitions, 4) partition change for the consuming topics. Once the
coordinator decides to trigger a rebalance it will notify the consumers
within to resend their topic/partition subscription information, and then
assign existing partitions to these consumers. On the consumers end, they
no long run any rebalance logic in a distributed manner but follows the
partition assignment it receives from the coordinator. Since the consumers
can only be notified of a rebalance triggering via heartbeat responses, the
rebalance latency is lower bounded by the heartbeat frequency; that is why
we allow consumers to negotiate this value with the coordinator upon
registration, and we would like to hear people's thoughts about this
protocol.

In addition, the rebalance logic in the new consumer will be customizable
instead of hard-written in a per-topic round-robin manner. Users will be
able to write their own assignment class following a common interface, and
consumers upon registering themselves will specify in their registration
request the assigner class name they wanted to use. If consumers within the
same group specify different algorithms the registration will be rejected.
We are not sure if it is the best way of customizing rebalance logic yet,
so any feedbacks are more than welcome.

For wildcard consumption, the consumers now will capture new topics that
are available for fetching through the topic metadata request. That is,
periodically the consumers will update its topic metadata for fetching, and
if new topics are returned in the metadata response matching its wildcard
regex, it will notify the coordinator to let it trigger a new rebalance to
assign partitions for this new topic.

There are some failure handling cases during the rebalancing process
discussed in the consumer rewrite design wiki. We would encourage people to
read it and let us know if there are any other corner cases not covered.

Moving forward, we are thinking about making the coordinator to handle both
the group management and offset management for the given groups, i.e. the
leader of the offset log partition will naturally become the coordinator of
the corresponding consumer. This allows the coordinator to reject offset
commit requests if its sender is not part of the group. To do so a consumer
id and generation i

Re: kafka 0.8.1.1 log.retention.minutes NOT being honored

2014-07-08 Thread Guozhang Wang
Yeah we probably should do that. I am not a committer though so someone
else may help fix this issue?


On Tue, Jul 8, 2014 at 2:10 PM, Jason Rosenberg  wrote:

> Ah, ok.it's just no longer documented as such in the config docs?
>
>
> On Tue, Jul 8, 2014 at 4:46 PM, Guozhang Wang  wrote:
>
> > Jason,
> >
> > getLogRetentionTimeMillis() take either "log.retention.minutes" or
> > "log.retention.hours" and transform the value into milis. So you can
> > specify using either granularity.
> >
> > Guozhang
> >
> >
> > On Tue, Jul 8, 2014 at 1:11 PM, Jason Rosenberg 
> wrote:
> >
> > > On a related note, in doing the upgrade from 0.8.0, I noticed that the
> > > config property changed from 'log.retention.hours' to
> > > 'log.retention.minutes'.  Would it have made more sense to deprecate
> > rather
> > > than replace there?
> > >
> > > Also, I notice that internally, in the KafkaConfig class, it's
> > represented
> > > as logRetentionTimeMillis() (e.g. not hours or minutes).  And the
> > per-topic
> > > version is in ms and not minutes.  So, it all seems a bit confusing
> there
> > > (is there a reason for this)?
> > >
> > > Jason
> > >
> > >
> > > On Tue, Jul 8, 2014 at 3:54 PM, Guozhang Wang 
> > wrote:
> > >
> > > > Server properties should affect on only the local instance
> separately.
> > > Are
> > > > you saying the property is not honored even on the 0.8.1 machines?
> > > >
> > > > Guozhang
> > > >
> > > > On Mon, Jul 7, 2014 at 3:55 PM, Virendra Pratap Singh <
> > > > vpsi...@yahoo-inc.com.invalid> wrote:
> > > >
> > > > > By setting this property
> > > > > log.retention.mins=10
> > > > > in the server.properties file, which is passed as argument when
> > > starting
> > > > > the broker.
> > > > >
> > > > > Virendra
> > > > >
> > > > > On 7/7/14, 3:31 PM, "Guozhang Wang"  wrote:
> > > > >
> > > > > >How do you set the retention.minutes property? Is it through
> > zk-based
> > > > > >topics tool?
> > > > > >
> > > > > >Guozhang
> > > > > >
> > > > > >
> > > > > >On Mon, Jul 7, 2014 at 3:07 PM, Virendra Pratap Singh <
> > > > > >vpsi...@yahoo-inc.com.invalid> wrote:
> > > > > >
> > > > > >> I am running a mixed cluster as I mentioned earlier. 1 broker
> > 0.8.0
> > > > and
> > > > > >> the other 0.8.1.1. Should the retention of topics for partitions
> > > > > >> owned/replicated by the broker running 0.8.1.1 not enforce the
> > > server
> > > > > >> properties settings as defined for that server.
> > > > > >>
> > > > > >> So this brings an interesting question, in case of heterogeneous
> > > > > >> environment (as is in my case, which system parameters will take
> > > > > >> preference/precedence).
> > > > > >>
> > > > > >> Virendra
> > > > > >>
> > > > > >> On 6/30/14, 9:19 AM, "Guozhang Wang" 
> wrote:
> > > > > >>
> > > > > >> >The retention.minute property is only introduced in 0.8.1:
> > > > > >> >
> > > > > >> >https://issues.apache.org/jira/browse/KAFKA-918
> > > > > >> >
> > > > > >> >if you are running 0.8.0 then it will not be recognized.
> > > > > >> >
> > > > > >> >Guozhang
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >On Fri, Jun 27, 2014 at 2:13 PM, Virendra Pratap Singh <
> > > > > >> >vpsi...@yahoo-inc.com.invalid> wrote:
> > > > > >> >
> > > > > >> >> Running a mixed 2 broker cluster. Mixed as in one of the
> > broker1
> > > is
> > > > > >> >> running 0.8.0 and broker2 one 0.8.1.1 (from the apache
> release
> > > > link.
> > > > > >> >> Directly using the tar ball, no local build used).
> > > > > >> >>
> > > > > >> >> I have set the log.retention.minutes=10. However the broker
> is
> > > not
> > > > > >> >> honoring the setting. I see its not cleaning the log.dir at
> > all.
> > > > > >> >>
> > > > > >> >> However when I set the log.retention.hours=1, then it starts
> > > > cleaning
> > > > > >> >>the
> > > > > >> >> log.
> > > > > >> >>
> > > > > >> >> When I have the log.retention.minutes set in the
> > > server.properties
> > > > > >>then
> > > > > >> >>I
> > > > > >> >> see this logged in server.log:
> > > > > >> >>
> > > > > >> >> Š..
> > > > > >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes
> > is
> > > > not
> > > > > >> >>valid
> > > > > >> >> (kafka.utils.VerifiableProperties)
> > > > > >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes
> > is
> > > > not
> > > > > >> >>valid
> > > > > >> >> (kafka.utils.VerifiableProperties)
> > > > > >> >> ŠŠ
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> I have set these properties too:
> > > > > >> >>
> > > > > >> >> log.cleaner.enable=true
> > > > > >> >> log.cleanup.policy=delete
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> But I see similar warning logged for these properties too.
> > > > > >> >>
> > > > > >> >> Regards,
> > > > > >> >> Virendra
> > > > > >> >>
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >> >--
> > > > > >> >-- Guozhang
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > >--
> > > > > >-- Guozhang
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > 

Re: producer performance

2014-07-08 Thread Guozhang Wang
The second script is using the new producer which allows sending multiple
in-flight requests whereas the first script use the current (old) producer
which sends only one request at a time to a certain broker.

The new producer will be officially released in 0.8.2.

Guozhang


On Tue, Jul 8, 2014 at 2:22 PM, Chen Song  wrote:

> While testing kafka producer performance, I found 2 testing scripts.
>
> 1) performance testing script in kafka distribution
>
> bin/kafka-producer-perf-test.sh --broker-list localhost:9092 --messages
> 1000 --topic test --threads 10 --message-size 100 --batch-size 1
> --compression-codec 1
>
> 2) performance testing script mentioned in
>
>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
>
> bin/kafka-run-class.sh
> org.apache.kafka.clients.tools.ProducerPerformance test6 5000 100
> -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
> buffer.memory=67108864 batch.size=8196
>
> based on org.apache.kafka.clients.producer.Producer.
>
>
> On my local testing environment,
>
> For 1), the loading is ~15M/s and 150,000 msgs/s for mesage size = 100, no
> matter how I adjust batch size and number of loading threads.
> For 2), it can load up to 90M/s and 1,000,000 records/s.
>
> I am wondering what makes these 2 implementations perform so differently?
>
> --
> Chen Song
>



-- 
-- Guozhang


producer performance

2014-07-08 Thread Chen Song
While testing kafka producer performance, I found 2 testing scripts.

1) performance testing script in kafka distribution

bin/kafka-producer-perf-test.sh --broker-list localhost:9092 --messages
1000 --topic test --threads 10 --message-size 100 --batch-size 1
--compression-codec 1

2) performance testing script mentioned in

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance test6 5000 100
-1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
buffer.memory=67108864 batch.size=8196

based on org.apache.kafka.clients.producer.Producer.


On my local testing environment,

For 1), the loading is ~15M/s and 150,000 msgs/s for mesage size = 100, no
matter how I adjust batch size and number of loading threads.
For 2), it can load up to 90M/s and 1,000,000 records/s.

I am wondering what makes these 2 implementations perform so differently?

-- 
Chen Song


Re: kafka 0.8.1.1 log.retention.minutes NOT being honored

2014-07-08 Thread Jason Rosenberg
Ah, ok.it's just no longer documented as such in the config docs?


On Tue, Jul 8, 2014 at 4:46 PM, Guozhang Wang  wrote:

> Jason,
>
> getLogRetentionTimeMillis() take either "log.retention.minutes" or
> "log.retention.hours" and transform the value into milis. So you can
> specify using either granularity.
>
> Guozhang
>
>
> On Tue, Jul 8, 2014 at 1:11 PM, Jason Rosenberg  wrote:
>
> > On a related note, in doing the upgrade from 0.8.0, I noticed that the
> > config property changed from 'log.retention.hours' to
> > 'log.retention.minutes'.  Would it have made more sense to deprecate
> rather
> > than replace there?
> >
> > Also, I notice that internally, in the KafkaConfig class, it's
> represented
> > as logRetentionTimeMillis() (e.g. not hours or minutes).  And the
> per-topic
> > version is in ms and not minutes.  So, it all seems a bit confusing there
> > (is there a reason for this)?
> >
> > Jason
> >
> >
> > On Tue, Jul 8, 2014 at 3:54 PM, Guozhang Wang 
> wrote:
> >
> > > Server properties should affect on only the local instance separately.
> > Are
> > > you saying the property is not honored even on the 0.8.1 machines?
> > >
> > > Guozhang
> > >
> > > On Mon, Jul 7, 2014 at 3:55 PM, Virendra Pratap Singh <
> > > vpsi...@yahoo-inc.com.invalid> wrote:
> > >
> > > > By setting this property
> > > > log.retention.mins=10
> > > > in the server.properties file, which is passed as argument when
> > starting
> > > > the broker.
> > > >
> > > > Virendra
> > > >
> > > > On 7/7/14, 3:31 PM, "Guozhang Wang"  wrote:
> > > >
> > > > >How do you set the retention.minutes property? Is it through
> zk-based
> > > > >topics tool?
> > > > >
> > > > >Guozhang
> > > > >
> > > > >
> > > > >On Mon, Jul 7, 2014 at 3:07 PM, Virendra Pratap Singh <
> > > > >vpsi...@yahoo-inc.com.invalid> wrote:
> > > > >
> > > > >> I am running a mixed cluster as I mentioned earlier. 1 broker
> 0.8.0
> > > and
> > > > >> the other 0.8.1.1. Should the retention of topics for partitions
> > > > >> owned/replicated by the broker running 0.8.1.1 not enforce the
> > server
> > > > >> properties settings as defined for that server.
> > > > >>
> > > > >> So this brings an interesting question, in case of heterogeneous
> > > > >> environment (as is in my case, which system parameters will take
> > > > >> preference/precedence).
> > > > >>
> > > > >> Virendra
> > > > >>
> > > > >> On 6/30/14, 9:19 AM, "Guozhang Wang"  wrote:
> > > > >>
> > > > >> >The retention.minute property is only introduced in 0.8.1:
> > > > >> >
> > > > >> >https://issues.apache.org/jira/browse/KAFKA-918
> > > > >> >
> > > > >> >if you are running 0.8.0 then it will not be recognized.
> > > > >> >
> > > > >> >Guozhang
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >On Fri, Jun 27, 2014 at 2:13 PM, Virendra Pratap Singh <
> > > > >> >vpsi...@yahoo-inc.com.invalid> wrote:
> > > > >> >
> > > > >> >> Running a mixed 2 broker cluster. Mixed as in one of the
> broker1
> > is
> > > > >> >> running 0.8.0 and broker2 one 0.8.1.1 (from the apache release
> > > link.
> > > > >> >> Directly using the tar ball, no local build used).
> > > > >> >>
> > > > >> >> I have set the log.retention.minutes=10. However the broker is
> > not
> > > > >> >> honoring the setting. I see its not cleaning the log.dir at
> all.
> > > > >> >>
> > > > >> >> However when I set the log.retention.hours=1, then it starts
> > > cleaning
> > > > >> >>the
> > > > >> >> log.
> > > > >> >>
> > > > >> >> When I have the log.retention.minutes set in the
> > server.properties
> > > > >>then
> > > > >> >>I
> > > > >> >> see this logged in server.log:
> > > > >> >>
> > > > >> >> Š..
> > > > >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes
> is
> > > not
> > > > >> >>valid
> > > > >> >> (kafka.utils.VerifiableProperties)
> > > > >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes
> is
> > > not
> > > > >> >>valid
> > > > >> >> (kafka.utils.VerifiableProperties)
> > > > >> >> ŠŠ
> > > > >> >>
> > > > >> >>
> > > > >> >> I have set these properties too:
> > > > >> >>
> > > > >> >> log.cleaner.enable=true
> > > > >> >> log.cleanup.policy=delete
> > > > >> >>
> > > > >> >>
> > > > >> >> But I see similar warning logged for these properties too.
> > > > >> >>
> > > > >> >> Regards,
> > > > >> >> Virendra
> > > > >> >>
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >> >--
> > > > >> >-- Guozhang
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > >--
> > > > >-- Guozhang
> > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: kafka 0.8.1.1 log.retention.minutes NOT being honored

2014-07-08 Thread Guozhang Wang
Jason,

getLogRetentionTimeMillis() take either "log.retention.minutes" or
"log.retention.hours" and transform the value into milis. So you can
specify using either granularity.

Guozhang


On Tue, Jul 8, 2014 at 1:11 PM, Jason Rosenberg  wrote:

> On a related note, in doing the upgrade from 0.8.0, I noticed that the
> config property changed from 'log.retention.hours' to
> 'log.retention.minutes'.  Would it have made more sense to deprecate rather
> than replace there?
>
> Also, I notice that internally, in the KafkaConfig class, it's represented
> as logRetentionTimeMillis() (e.g. not hours or minutes).  And the per-topic
> version is in ms and not minutes.  So, it all seems a bit confusing there
> (is there a reason for this)?
>
> Jason
>
>
> On Tue, Jul 8, 2014 at 3:54 PM, Guozhang Wang  wrote:
>
> > Server properties should affect on only the local instance separately.
> Are
> > you saying the property is not honored even on the 0.8.1 machines?
> >
> > Guozhang
> >
> > On Mon, Jul 7, 2014 at 3:55 PM, Virendra Pratap Singh <
> > vpsi...@yahoo-inc.com.invalid> wrote:
> >
> > > By setting this property
> > > log.retention.mins=10
> > > in the server.properties file, which is passed as argument when
> starting
> > > the broker.
> > >
> > > Virendra
> > >
> > > On 7/7/14, 3:31 PM, "Guozhang Wang"  wrote:
> > >
> > > >How do you set the retention.minutes property? Is it through zk-based
> > > >topics tool?
> > > >
> > > >Guozhang
> > > >
> > > >
> > > >On Mon, Jul 7, 2014 at 3:07 PM, Virendra Pratap Singh <
> > > >vpsi...@yahoo-inc.com.invalid> wrote:
> > > >
> > > >> I am running a mixed cluster as I mentioned earlier. 1 broker 0.8.0
> > and
> > > >> the other 0.8.1.1. Should the retention of topics for partitions
> > > >> owned/replicated by the broker running 0.8.1.1 not enforce the
> server
> > > >> properties settings as defined for that server.
> > > >>
> > > >> So this brings an interesting question, in case of heterogeneous
> > > >> environment (as is in my case, which system parameters will take
> > > >> preference/precedence).
> > > >>
> > > >> Virendra
> > > >>
> > > >> On 6/30/14, 9:19 AM, "Guozhang Wang"  wrote:
> > > >>
> > > >> >The retention.minute property is only introduced in 0.8.1:
> > > >> >
> > > >> >https://issues.apache.org/jira/browse/KAFKA-918
> > > >> >
> > > >> >if you are running 0.8.0 then it will not be recognized.
> > > >> >
> > > >> >Guozhang
> > > >> >
> > > >> >
> > > >> >
> > > >> >On Fri, Jun 27, 2014 at 2:13 PM, Virendra Pratap Singh <
> > > >> >vpsi...@yahoo-inc.com.invalid> wrote:
> > > >> >
> > > >> >> Running a mixed 2 broker cluster. Mixed as in one of the broker1
> is
> > > >> >> running 0.8.0 and broker2 one 0.8.1.1 (from the apache release
> > link.
> > > >> >> Directly using the tar ball, no local build used).
> > > >> >>
> > > >> >> I have set the log.retention.minutes=10. However the broker is
> not
> > > >> >> honoring the setting. I see its not cleaning the log.dir at all.
> > > >> >>
> > > >> >> However when I set the log.retention.hours=1, then it starts
> > cleaning
> > > >> >>the
> > > >> >> log.
> > > >> >>
> > > >> >> When I have the log.retention.minutes set in the
> server.properties
> > > >>then
> > > >> >>I
> > > >> >> see this logged in server.log:
> > > >> >>
> > > >> >> Š..
> > > >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes is
> > not
> > > >> >>valid
> > > >> >> (kafka.utils.VerifiableProperties)
> > > >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes is
> > not
> > > >> >>valid
> > > >> >> (kafka.utils.VerifiableProperties)
> > > >> >> ŠŠ
> > > >> >>
> > > >> >>
> > > >> >> I have set these properties too:
> > > >> >>
> > > >> >> log.cleaner.enable=true
> > > >> >> log.cleanup.policy=delete
> > > >> >>
> > > >> >>
> > > >> >> But I see similar warning logged for these properties too.
> > > >> >>
> > > >> >> Regards,
> > > >> >> Virendra
> > > >> >>
> > > >> >>
> > > >> >
> > > >> >
> > > >> >--
> > > >> >-- Guozhang
> > > >>
> > > >>
> > > >
> > > >
> > > >--
> > > >-- Guozhang
> > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: Too Many Open Files Broker Error

2014-07-08 Thread Lung, Paul
Hit the send button too fast. I verified the number of open file
descriptors from the broker by using ³sudo lsof -p², and by using ³sudo ls
-l /proc//fd | wc -l².

Paul

On 7/8/14, 1:42 PM, "Lung, Paul"  wrote:

>Hi Guys,
>
>I¹m seeing the following errors from the 0.8.1.1 broker. This occurs most
>often on the Controller machine. Then the controller process crashes, and
>the controller bounces to other machines, which causes those machines to
>crash. Looking at the file descriptors being held by the process, it¹s
>only around 4000 or so(looking at . There aren¹t a whole lot of
>connections in TIME_WAIT states, and I¹ve increased the ephemeral port
>range to ³16000 ­ 64000² via "/proc/sys/net/ipv4/ip_local_port_range².
>I¹ve written a Java test program to see how many sockets and files I can
>open. The socket is definitely limited by the ephemeral port range, which
>was around 22K at the time. But I
>can open tons of files, since the open file limit of the user is set to
>100K.
>
>So given that I can theoretically open 48K sockets and probably 90K
>files, and I only see around 4K total for the Kafka broker, I¹m really
>confused as to why I¹m seeing this error. Is there some internal Kafka
>limit that I don¹t know about?
>
>Paul Lung
>
>
>
>java.io.IOException: Too many open files
>
>at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>
>at 
>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163
>)
>
>at kafka.network.Acceptor.accept(SocketServer.scala:200)
>
>at kafka.network.Acceptor.run(SocketServer.scala:154)
>
>at java.lang.Thread.run(Thread.java:679)
>
>[2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor)
>
>java.io.IOException: Too many open files
>
>at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>
>at 
>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163
>)
>
>at kafka.network.Acceptor.accept(SocketServer.scala:200)
>
>at kafka.network.Acceptor.run(SocketServer.scala:154)
>
>at java.lang.Thread.run(Thread.java:679)
>
>[2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error
>for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to
>broker 2124488:class kafka.common.NotLeaderForPartitionException
>(kafka.server.ReplicaFetcherThread)
>
>[2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]:
>Error writing to highwatermark file:  (kafka.server.ReplicaManager)
>
>java.io.FileNotFoundException:
>/ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-c
>heckpoint.tmp (Too many open files)
>
>at java.io.FileOutputStream.open(Native Method)
>
>at java.io.FileOutputStream.(FileOutputStream.java:209)
>
>at java.io.FileOutputStream.(FileOutputStream.java:160)
>
>at java.io.FileWriter.(FileWriter.java:90)
>
>at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
>
>at 
>kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Repl
>icaManager.scala:447)
>
>at 
>kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Repl
>icaManager.scala:444)
>
>at 
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Trave
>rsableLike.scala:772)
>
>at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>
>at 
>scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:
>771)
>
>at 
>kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala:
>444)
>
>at 
>kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala:9
>4)
>
>at 
>kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
>
>at 
>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>at 
>java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>
>at 
>java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>
>at 
>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces
>s$201(ScheduledThreadPoolExecutor.java:165)
>
>at 
>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(S
>cheduledThreadPoolExecutor.java:267)
>
>at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1110)
>
>at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:603)
>
>at java.lang.Thread.run(Thread.java:679)
>
>
>



Too Many Open Files Broker Error

2014-07-08 Thread Lung, Paul
Hi Guys,

I’m seeing the following errors from the 0.8.1.1 broker. This occurs most often 
on the Controller machine. Then the controller process crashes, and the 
controller bounces to other machines, which causes those machines to crash. 
Looking at the file descriptors being held by the process, it’s only around 
4000 or so(looking at . There aren’t a whole lot of connections in TIME_WAIT 
states, and I’ve increased the ephemeral port range to “16000 – 64000” via 
"/proc/sys/net/ipv4/ip_local_port_range”. I’ve written a Java test program to 
see how many sockets and files I can open. The socket is definitely limited by 
the ephemeral port range, which was around 22K at the time. But I
can open tons of files, since the open file limit of the user is set to 100K.

So given that I can theoretically open 48K sockets and probably 90K files, and 
I only see around 4K total for the Kafka broker, I’m really confused as to why 
I’m seeing this error. Is there some internal Kafka limit that I don’t know 
about?

Paul Lung



java.io.IOException: Too many open files

at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

at kafka.network.Acceptor.accept(SocketServer.scala:200)

at kafka.network.Acceptor.run(SocketServer.scala:154)

at java.lang.Thread.run(Thread.java:679)

[2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor)

java.io.IOException: Too many open files

at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

at kafka.network.Acceptor.accept(SocketServer.scala:200)

at kafka.network.Acceptor.run(SocketServer.scala:154)

at java.lang.Thread.run(Thread.java:679)

[2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error for 
partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to broker 
2124488:class kafka.common.NotLeaderForPartitionException 
(kafka.server.ReplicaFetcherThread)

[2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]: Error 
writing to highwatermark file:  (kafka.server.ReplicaManager)

java.io.FileNotFoundException: 
/ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-checkpoint.tmp
 (Too many open files)

at java.io.FileOutputStream.open(Native Method)

at java.io.FileOutputStream.(FileOutputStream.java:209)

at java.io.FileOutputStream.(FileOutputStream.java:160)

at java.io.FileWriter.(FileWriter.java:90)

at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)

at 
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:447)

at 
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:444)

at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)

at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)

at 
kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala:444)

at 
kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala:94)

at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:679)





Re: Zookeeper Version Mismatch Problem In 0.8.1

2014-07-08 Thread Lung, Paul
Thank you.

Paul

On 7/8/14, 11:41 AM, "Guozhang Wang"  wrote:

>I think you are hitting
>
>https://issues.apache.org/jira/browse/KAFKA-1382
>
>The fix is committed to trunk already, so in the next release you should
>not see this issue any more.
>
>Guozhang
>
>
>On Tue, Jul 8, 2014 at 10:29 AM, Lung, Paul  wrote:
>
>> Hi All,
>>
>>
>> I¹m seeing the following issues after running a Kafka broker cluster
>>for 2
>> months:
>>
>>
>> [2014-07-07 17:54:19,798] ERROR Conditional update of path
>> 
>>/brokers/topics/mini__065active_80__32__miniactiveitembrn_chd_qn/
>>partitions/0/state
>> with data
>> 
>>{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":3,"isr"
>>:[2129087]}
>> and expected version 18 failed d
>>
>> ue to org.apache.zookeeper.KeeperException$BadVersionException:
>> KeeperErrorCode = BadVersion for
>> 
>>/brokers/topics/mini__065active_80__32__miniactiveitembrn_chd_qn/
>>partitions/0/state
>> (kafka.utils.ZkUtils$)
>>
>> [2014-07-07 17:54:19,809] ERROR Conditional update of path
>> 
>>/brokers/topics/mini__076active_80__32__miniactiveitembrn_chd_qn/
>>partitions/0/state
>> with data
>> 
>>{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":4,"isr"
>>:[2129087]}
>> and expected version 37 failed d
>>
>> ue to org.apache.zookeeper.KeeperException$BadVersionException:
>> KeeperErrorCode = BadVersion for
>> 
>>/brokers/topics/mini__076active_80__32__miniactiveitembrn_chd_qn/
>>partitions/0/state
>> (kafka.utils.ZkUtils$)
>>
>> [2014-07-07 17:54:19,817] ERROR Conditional update of path
>> 
>>/brokers/topics/mini__042active_80__32__miniactiveitem_chd_qn/par
>>titions/0/state
>> with data
>> 
>>{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":3,"isr"
>>:[2129087]}
>> and expected version 24 failed due
>>
>> to org.apache.zookeeper.KeeperException$BadVersionException:
>> KeeperErrorCode = BadVersion for
>> 
>>/brokers/topics/mini__042active_80__32__miniactiveitem_chd_qn/par
>>titions/0/state
>> (kafka.utils.ZkUtils$)
>>
>> [2014-07-07 17:54:19,826] ERROR Conditional update of path
>> 
>>/brokers/topics/mini__034active_80__32__miniactiveitembrn_chd_qn/
>>partitions/0/state
>> with data
>> 
>>{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":2,"isr"
>>:[2129087]}
>> and expected version 143 failed
>>
>> due to org.apache.zookeeper.KeeperException$BadVersionException:
>> KeeperErrorCode = BadVersion for
>> 
>>/brokers/topics/mini__034active_80__32__miniactiveitembrn_chd_qn/
>>partitions/0/state
>> (kafka.utils.ZkUtils$)
>>
>> [2014-07-07 17:54:19,833] ERROR Conditional update of path
>> 
>>/brokers/topics/bom__066active_80__32__miniactiveitembrn_chd_qn/p
>>artitions/0/state
>> with data
>> 
>>{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":4,"isr"
>>:[2129087]}
>> and expected version 8 failed due
>>
>>  to org.apache.zookeeper.KeeperException$BadVersionException:
>> KeeperErrorCode = BadVersion for
>> 
>>/brokers/topics/bom__066active_80__32__miniactiveitembrn_chd_qn/p
>>artitions/0/state
>> (kafka.utils.ZkUtils$)
>>
>> [2014-07-07 17:54:19,841] ERROR Conditional update of path
>> /brokers/topics/mini__009active_80__32__mini/partitions/0/state with
>> data
>> 
>>{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":2,"isr"
>>:[2129087]}
>> and expected version 152 failed due to org.apache.zookee
>>
>> per.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
>>for
>> /brokers/topics/mini__009active_80__32__mini/partitions/0/state
>> (kafka.utils.ZkUtils$)
>>
>>
>>
>> This problem is fixed after we restart the broker. Does anyone know why
>> this problem would occur?
>>
>>
>> Paul Lung
>>
>
>
>
>-- 
>-- Guozhang



Re: kafka 0.8.1.1 log.retention.minutes NOT being honored

2014-07-08 Thread Jason Rosenberg
On a related note, in doing the upgrade from 0.8.0, I noticed that the
config property changed from 'log.retention.hours' to
'log.retention.minutes'.  Would it have made more sense to deprecate rather
than replace there?

Also, I notice that internally, in the KafkaConfig class, it's represented
as logRetentionTimeMillis() (e.g. not hours or minutes).  And the per-topic
version is in ms and not minutes.  So, it all seems a bit confusing there
(is there a reason for this)?

Jason


On Tue, Jul 8, 2014 at 3:54 PM, Guozhang Wang  wrote:

> Server properties should affect on only the local instance separately. Are
> you saying the property is not honored even on the 0.8.1 machines?
>
> Guozhang
>
> On Mon, Jul 7, 2014 at 3:55 PM, Virendra Pratap Singh <
> vpsi...@yahoo-inc.com.invalid> wrote:
>
> > By setting this property
> > log.retention.mins=10
> > in the server.properties file, which is passed as argument when starting
> > the broker.
> >
> > Virendra
> >
> > On 7/7/14, 3:31 PM, "Guozhang Wang"  wrote:
> >
> > >How do you set the retention.minutes property? Is it through zk-based
> > >topics tool?
> > >
> > >Guozhang
> > >
> > >
> > >On Mon, Jul 7, 2014 at 3:07 PM, Virendra Pratap Singh <
> > >vpsi...@yahoo-inc.com.invalid> wrote:
> > >
> > >> I am running a mixed cluster as I mentioned earlier. 1 broker 0.8.0
> and
> > >> the other 0.8.1.1. Should the retention of topics for partitions
> > >> owned/replicated by the broker running 0.8.1.1 not enforce the server
> > >> properties settings as defined for that server.
> > >>
> > >> So this brings an interesting question, in case of heterogeneous
> > >> environment (as is in my case, which system parameters will take
> > >> preference/precedence).
> > >>
> > >> Virendra
> > >>
> > >> On 6/30/14, 9:19 AM, "Guozhang Wang"  wrote:
> > >>
> > >> >The retention.minute property is only introduced in 0.8.1:
> > >> >
> > >> >https://issues.apache.org/jira/browse/KAFKA-918
> > >> >
> > >> >if you are running 0.8.0 then it will not be recognized.
> > >> >
> > >> >Guozhang
> > >> >
> > >> >
> > >> >
> > >> >On Fri, Jun 27, 2014 at 2:13 PM, Virendra Pratap Singh <
> > >> >vpsi...@yahoo-inc.com.invalid> wrote:
> > >> >
> > >> >> Running a mixed 2 broker cluster. Mixed as in one of the broker1 is
> > >> >> running 0.8.0 and broker2 one 0.8.1.1 (from the apache release
> link.
> > >> >> Directly using the tar ball, no local build used).
> > >> >>
> > >> >> I have set the log.retention.minutes=10. However the broker is not
> > >> >> honoring the setting. I see its not cleaning the log.dir at all.
> > >> >>
> > >> >> However when I set the log.retention.hours=1, then it starts
> cleaning
> > >> >>the
> > >> >> log.
> > >> >>
> > >> >> When I have the log.retention.minutes set in the server.properties
> > >>then
> > >> >>I
> > >> >> see this logged in server.log:
> > >> >>
> > >> >> Š..
> > >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes is
> not
> > >> >>valid
> > >> >> (kafka.utils.VerifiableProperties)
> > >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes is
> not
> > >> >>valid
> > >> >> (kafka.utils.VerifiableProperties)
> > >> >> ŠŠ
> > >> >>
> > >> >>
> > >> >> I have set these properties too:
> > >> >>
> > >> >> log.cleaner.enable=true
> > >> >> log.cleanup.policy=delete
> > >> >>
> > >> >>
> > >> >> But I see similar warning logged for these properties too.
> > >> >>
> > >> >> Regards,
> > >> >> Virendra
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >> >--
> > >> >-- Guozhang
> > >>
> > >>
> > >
> > >
> > >--
> > >-- Guozhang
> >
> >
>
>
> --
> -- Guozhang
>


Re: kafka 0.8.1.1 log.retention.minutes NOT being honored

2014-07-08 Thread Guozhang Wang
Server properties should affect on only the local instance separately. Are
you saying the property is not honored even on the 0.8.1 machines?

Guozhang

On Mon, Jul 7, 2014 at 3:55 PM, Virendra Pratap Singh <
vpsi...@yahoo-inc.com.invalid> wrote:

> By setting this property
> log.retention.mins=10
> in the server.properties file, which is passed as argument when starting
> the broker.
>
> Virendra
>
> On 7/7/14, 3:31 PM, "Guozhang Wang"  wrote:
>
> >How do you set the retention.minutes property? Is it through zk-based
> >topics tool?
> >
> >Guozhang
> >
> >
> >On Mon, Jul 7, 2014 at 3:07 PM, Virendra Pratap Singh <
> >vpsi...@yahoo-inc.com.invalid> wrote:
> >
> >> I am running a mixed cluster as I mentioned earlier. 1 broker 0.8.0 and
> >> the other 0.8.1.1. Should the retention of topics for partitions
> >> owned/replicated by the broker running 0.8.1.1 not enforce the server
> >> properties settings as defined for that server.
> >>
> >> So this brings an interesting question, in case of heterogeneous
> >> environment (as is in my case, which system parameters will take
> >> preference/precedence).
> >>
> >> Virendra
> >>
> >> On 6/30/14, 9:19 AM, "Guozhang Wang"  wrote:
> >>
> >> >The retention.minute property is only introduced in 0.8.1:
> >> >
> >> >https://issues.apache.org/jira/browse/KAFKA-918
> >> >
> >> >if you are running 0.8.0 then it will not be recognized.
> >> >
> >> >Guozhang
> >> >
> >> >
> >> >
> >> >On Fri, Jun 27, 2014 at 2:13 PM, Virendra Pratap Singh <
> >> >vpsi...@yahoo-inc.com.invalid> wrote:
> >> >
> >> >> Running a mixed 2 broker cluster. Mixed as in one of the broker1 is
> >> >> running 0.8.0 and broker2 one 0.8.1.1 (from the apache release link.
> >> >> Directly using the tar ball, no local build used).
> >> >>
> >> >> I have set the log.retention.minutes=10. However the broker is not
> >> >> honoring the setting. I see its not cleaning the log.dir at all.
> >> >>
> >> >> However when I set the log.retention.hours=1, then it starts cleaning
> >> >>the
> >> >> log.
> >> >>
> >> >> When I have the log.retention.minutes set in the server.properties
> >>then
> >> >>I
> >> >> see this logged in server.log:
> >> >>
> >> >> Š..
> >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes is not
> >> >>valid
> >> >> (kafka.utils.VerifiableProperties)
> >> >> [2014-06-27 19:21:06,633] WARN Property log.retention.minutes is not
> >> >>valid
> >> >> (kafka.utils.VerifiableProperties)
> >> >> ŠŠ
> >> >>
> >> >>
> >> >> I have set these properties too:
> >> >>
> >> >> log.cleaner.enable=true
> >> >> log.cleanup.policy=delete
> >> >>
> >> >>
> >> >> But I see similar warning logged for these properties too.
> >> >>
> >> >> Regards,
> >> >> Virendra
> >> >>
> >> >>
> >> >
> >> >
> >> >--
> >> >-- Guozhang
> >>
> >>
> >
> >
> >--
> >-- Guozhang
>
>


-- 
-- Guozhang


Re: Monitoring Producers at Large Scale

2014-07-08 Thread Bhavesh Mistry
HI Otis,

You are right.  If the Kafka itself have problem (QUEUE is full, auto
rebalance etc, drop event), how can it transmit the logs...  So we have
tried to avoid "agent based" solution Apache Flume Agent or Syslog
configuration.

You are right we have to build a redundant transportation for monitoring
Transport Layer.

Thank you very much for suggestion.  I will look into Logsene

.
The problem is we have 4 data centers and 24000 or more producers.  so
when application team come to us our data is lost or we do not see our log
lines etc... we have to pin point what exactly happen.  So it is very ideal
to monitor/transmit/set alarm for Kafka Producers.

We replaced the Apache Flume with "Apache Kafka" as log transportation
Layer.  Agent is not required.


Thanks,
Bhavesh



On Mon, Jul 7, 2014 at 1:41 PM, Otis Gospodnetic  wrote:

> Hi,
>
> I'm late to the thread... but that "...we intercept log4j..." caught my
> attention.  Why intercept, especially if it's causing trouble?
>
> Could you use log4j syslog appender and get logs routed to wherever you
> want them via syslog, for example?
> Or you can have syslog tail log4j log files (e.g. rsyslog has "imfile" you
> can use for tailing).
>
> We use our own Logsene  for Kafka and all
> other logs and SPM  for Kafka and all other
> metrics we monitor.
>
> Oh, actually, this may help you:
>
> https://sematext.atlassian.net/wiki/display/PUBLOGSENE/Sending+Events+to+Logsene
> (ignore the Logsene-specific parts --- there is plenty of general info,
> configs, etc. for log handling)
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Thu, Jun 26, 2014 at 3:09 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Thanks for all your responses.
> >
> >
> >
> > JMX metrics are there and we do pull the metrics, but I would like to
> > capture the logs from Kafka lib as well especially WARN, FATAL and ERROR
> > etc to debug the issue.
> >
> >
> >
> > To do this, we intercept Log4j logging and send it to Kafka Log Topics,
> but
> > I realize that under heavy Kafka Lib error/warn/  it will create a
> deadlock
> > between Producer Send thread  (Logging Kafka log topic queue...)
> >
> >
> >
> > *public* *class* KafkaLog4jAppender *extends* AppenderSkeleton {
> >
> >
> >
> > Producer  producer..
> >
> > *protected* *void* append(LoggingEvent event) {
> >
> >
> > if(event.getLoggerName().startsWith("kafka")){
> >
> >  if(event is WARN, FATAL and ERROR){
> >
> > producer.send(event.getRenderedMessage())
> >
> > }
> >
> > }
> >
> >
> >
> > }
> >
> >
> > Other option is to log Kafka Logs into disk and transport logs via
> > separate process
> > to Kafka Topic and transport via https://github.com/harelba/tail2kafka
> to
> > topic...
> >
> >
> > We use Kafka for Log transportation and we want to debug/trouble shoot
> > issue via logs or create alerts/etc
> >
> >
> > Thanks,
> >
> >
> > Bhavesh
> >
> >
> >
> >
> > On Wed, Jun 25, 2014 at 10:49 AM, Neha Narkhede  >
> > wrote:
> >
> > > We monitor producers or for that matter any process/service using JMX
> > > metrics. Every server and service in LinkedIn sends metrics in a Kafka
> > > message to a metrics Kafka cluster. We have subscribers that connect to
> > the
> > > metrics cluster to index that data in RRDs.
> > >
> > > Our aim is to expose all important metrics through JMX. We are doing
> that
> > > for the new producer under org.apache.kafka.clients.producer. Feel free
> > to
> > > take a look at that and give feedback.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Tue, Jun 24, 2014 at 7:59 PM, Darion Yaphet <
> darion.yap...@gmail.com>
> > > wrote:
> > >
> > > > Sorry I want to  know  you want to monitor kafka producers or kafka
> > > brokers
> > > > and zookeepers ?
> > > > It's seems you will want to monitor monitor Exceptions eg Leader Not
> > > Found,
> > > > Queue is full, resend fail  etc  are kafka cluster
> > > >
> > > >
> > > > 2014-06-25 8:20 GMT+08:00 Bhavesh Mistry  >:
> > > >
> > > > > We use Kafka as Transport Layer to transport application logs.  How
> > do
> > > we
> > > > > monitor Producers at large scales about 6000 boxes x 4 topic per
> box
> > so
> > > > > roughly 24000 producers (spread across multiple data center.. we
> have
> > > > > brokers per DC).  We do the monitoring based on logs.  I have tried
> > > > > intercepting logs via Log4J custom implementation which only
> > intercept
> > > > WARN
> > > > > and ERROR and FATAL events  org.apache.log4j.AppenderSkeleton
> append
> > > > method
> > > > > which send its logs to brokers (This is working but after load
> > testing
> > > it
> > > > > is causing deadlock some times between ProducerSendThread and
> > > Producer).
> > > > >
> > > > > I know ther

Re: status of 0.8.2

2014-07-08 Thread Michal Michalski
One more question regarding 0.8.2 - is it planned to be a in-place,
no-downtime release (I'm using 0.8.0 now)? By looking at the version number
changes only I'd guess it is, but... ;-)

Michal

Kind regards,
Michał Michalski,
michal.michal...@boxever.com


On 8 July 2014 18:22, Joel Koshy  wrote:

> Hi Joe,
>
> I had a question for you in that RB since I don't fully understand it.
> Maybe you can help clarify how the fix works.
>
> Thanks,
>
> Joel
>
> On Tue, Jul 08, 2014 at 10:55:56AM -0400, Joe Stein wrote:
> > I wrote it, so I can't commit it without other committers agreeing.
> >
> > Last I recall I updated the patch from the feedback in the reviewboard
> but
> > haven't looked at it in months.
> >
> > I am glad though it resolved the issue you were having and we can figure
> > how to get the patch to work with 0.8.1.1 if you run into problems.
> >
> > /***
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop 
> > /
> >
> >
> > On Tue, Jul 8, 2014 at 10:45 AM, Jason Rosenberg 
> wrote:
> >
> > > Is there a blocker to getting the patch for kafka-1180 applied?  Is the
> > > patch for 0.8.0 no longer compatible for trunk?  I'm actually going to
> see
> > > if I can get it to work for 0.8.1.1 today.
> > >
> > > Thanks,
> > >
> > > Jason
> > >
> > >
> > > On Mon, Jul 7, 2014 at 9:41 PM, Jun Rao  wrote:
> > >
> > > > Two biggest features in 0.8.2 are Kafka-based offset management and
> the
> > > new
> > > > producer. We are in the final stage of testing them. We also haven't
> > > fully
> > > > tested the delete topic feature. So, we are probably 4-6 weeks away
> from
> > > > releasing 0.8.2.
> > > >
> > > > For kafka-1180, the patch hasn't been applied yet and we will need a
> > > patch
> > > > for trunk.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > What's the status for an 0.8.2 release?  We are currently using
> 0.8.0,
> > > > and
> > > > > would like to upgrade to take advantage of some of the per-topic
> > > > retention
> > > > > options available now in 0.8.1.
> > > > >
> > > > > However, we'd also like to take advantage of some fixes coming in
> 0.8.2
> > > > > (e.g. deleting topics).
> > > > >
> > > > > Also, we have been using a patch for (
> > > > > https://issues.apache.org/jira/browse/KAFKA-1180) applied to
> 0.8.0.
> > > >  This
> > > > > is marked as scheduled for 0.8.2, with a patch available, but I'm
> not
> > > > sure
> > > > > if this has been committed and applied to the 0.8.2 branch yet.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jason
> > > > >
> > > >
> > >
>
>


Re: Zookeeper Version Mismatch Problem In 0.8.1

2014-07-08 Thread Guozhang Wang
I think you are hitting

https://issues.apache.org/jira/browse/KAFKA-1382

The fix is committed to trunk already, so in the next release you should
not see this issue any more.

Guozhang


On Tue, Jul 8, 2014 at 10:29 AM, Lung, Paul  wrote:

> Hi All,
>
>
> I’m seeing the following issues after running a Kafka broker cluster for 2
> months:
>
>
> [2014-07-07 17:54:19,798] ERROR Conditional update of path
> /brokers/topics/mini__065active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
> with data
> {"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":3,"isr":[2129087]}
> and expected version 18 failed d
>
> ue to org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /brokers/topics/mini__065active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
> (kafka.utils.ZkUtils$)
>
> [2014-07-07 17:54:19,809] ERROR Conditional update of path
> /brokers/topics/mini__076active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
> with data
> {"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":4,"isr":[2129087]}
> and expected version 37 failed d
>
> ue to org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /brokers/topics/mini__076active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
> (kafka.utils.ZkUtils$)
>
> [2014-07-07 17:54:19,817] ERROR Conditional update of path
> /brokers/topics/mini__042active_80__32__miniactiveitem_chd_qn/partitions/0/state
> with data
> {"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":3,"isr":[2129087]}
> and expected version 24 failed due
>
> to org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /brokers/topics/mini__042active_80__32__miniactiveitem_chd_qn/partitions/0/state
> (kafka.utils.ZkUtils$)
>
> [2014-07-07 17:54:19,826] ERROR Conditional update of path
> /brokers/topics/mini__034active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
> with data
> {"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":2,"isr":[2129087]}
> and expected version 143 failed
>
> due to org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /brokers/topics/mini__034active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
> (kafka.utils.ZkUtils$)
>
> [2014-07-07 17:54:19,833] ERROR Conditional update of path
> /brokers/topics/bom__066active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
> with data
> {"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":4,"isr":[2129087]}
> and expected version 8 failed due
>
>  to org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /brokers/topics/bom__066active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
> (kafka.utils.ZkUtils$)
>
> [2014-07-07 17:54:19,841] ERROR Conditional update of path
> /brokers/topics/mini__009active_80__32__mini/partitions/0/state with
> data
> {"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":2,"isr":[2129087]}
> and expected version 152 failed due to org.apache.zookee
>
> per.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for
> /brokers/topics/mini__009active_80__32__mini/partitions/0/state
> (kafka.utils.ZkUtils$)
>
>
>
> This problem is fixed after we restart the broker. Does anyone know why
> this problem would occur?
>
>
> Paul Lung
>



-- 
-- Guozhang


Re: How recover leader when broker restart

2014-07-08 Thread Guozhang Wang
Lax,

Under that scenario you would better first fix the issue of isr==null by
checking if anything went wrong on the brokers.

Guozhang


On Mon, Jul 7, 2014 at 8:43 PM, chenlax  wrote:

> use preferred tool can rebalance leadership,but if the isr are null then
> the leader is only -1,how i can recover the leader.
>
>
> Thanks,
> Lax
>
>
> > Date: Mon, 7 Jul 2014 08:06:16 -0700
> > Subject: Re: How recover leader when broker restart
> > From: wangg...@gmail.com
> > To: users@kafka.apache.org
> >
> > You can use the preferred leader election tool to move the leadership.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.PreferredReplicaLeaderElectionTool
> >
> > Guozhang
> >
> >
> > On Mon, Jul 7, 2014 at 7:56 AM, 鞠大升  wrote:
> >
> > > you can use the preferred leader election tool to reset leaders to
> > > preferred replicas.
> > > 2014年7月7日 PM10:37于 "François Langelier" 写道:
> > >
> > > > AFAIK, the simplest way will be to shutdown your 2 others brokers
> after
> > > you
> > > > restarted your broker 1, which will force your topics to have your
> > > broker 1
> > > > as leader since it's the only one available, and then restart your
> > > brokers
> > > > 2 and 3
> > > >
> > > > But I can't really see why you want your leaders on broker 1...
> > > >
> > > >
> > > >
> > > > François Langelier
> > > > Étudiant en génie Logiciel - École de Technologie Supérieure
> > > > 
> > > > Capitaine Club Capra 
> > > > VP-Communication - CS Games  2014
> > > > Jeux de Génie  2011 à 2014
> > > > Argentier Fraternité du Piranha 
> > > > 2012-2014
> > > > Comité Organisateur Olympiades ÉTS 2012
> > > > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> > > >
> > > >
> > > > On 7 July 2014 05:59, 陈翔  wrote:
> > > >
> > > > > i have 3 broker,when i restart a broker 1,then 1 can not as
> leader.i
> > > want
> > > > > to know how i can recover broker 1 as a leader.
> > > > >
> > > > > thanks,
> > > > > lax
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
>
>



-- 
-- Guozhang


Re: Facing issues with Kafka 0.8.1.1 and kafka-reassign-partitions.sh

2014-07-08 Thread Florian Dambrine
I realized that I did not respond to you Clark,

Here is the entire Json that I sliced in multiple pieces:

{"version":1,"partitions":[{"topic":"SLOTS","partition":20,"replicas":[101421743,105114483,101461702]},{"topic":"RTB","partition":12,"replicas":[101671664,101812541,101862816]},{"topic":"B_IMPRESSION","partition":25,"replicas":[101461782,101812541,101862816]},{"topic":"RTB","partition":19,"replicas":[101461702,101811991,101812541]},{"topic":"B_IMPRESSION","partition":18,"replicas":[101811991,102311671,105114483]},{"topic":"AD_EVENTS","partition":16,"replicas":[102311671,101862816,105114483]},{"topic":"ASSETS","partition":28,"replicas":[101671664,101862816,102311671]},{"topic":"AD_EVENTS","partition":34,"replicas":[102311671,101421743,101461702]},{"topic":"AD_EVENTS","partition":43,"replicas":[102311671,101461702,101461782]},{"topic":"D_STATISTICS","partition":4,"replicas":[101811991,101461782,101671664]},{"topic":"ASSETS","partition":41,"replicas":[102311671,101461782,101671664]},{"topic":"SLOTS","partition":16,"replicas":[101812541,101671664,101811991]},{"topic":"ASSETS","partition":44,"replicas":[101461702,101812541,101862816]},{"topic":"ASSETS","partition":6,"replicas":[105114483,102311671,101421743]},{"topic":"RTB","partition":31,"replicas":[101811991,105114483,101421743]},{"topic":"B_IMPRESSION","partition":29,"replicas":[101862816,101461702,101461782]},{"topic":"SLOTS","partition":21,"replicas":[101461702,101421743,101461782]},{"topic":"ASSETS","partition":39,"replicas":[101812541,101421743,101461702]},{"topic":"ASSETS","partition":49,"replicas":[101862816,101461782,101671664]},{"topic":"AD_EVENTS","partition":32,"replicas":[101812541,102311671,105114483]},{"topic":"SLOTS","partition":31,"replicas":[101461782,101671664,101811991]},{"topic":"RTB","partition":23,"replicas":[101812541,105114483,101421743]},{"topic":"ASSETS","partition":23,"replicas":[102311671,101421743,101461702]},{"topic":"B_IMPRESSION","partition":26,"replicas":[101671664,101862816,102311671]},{"topic":"RTB","partition":47,"replicas":[101461782,105114483,101421743]},{"topic":"AD_EVENTS","partition":26,"replicas":[105114483,101421743,101461702]},{"topic":"B_IMPRESSION","partition":40,"replicas":[105114483,101811991,101812541]},{"topic":"D_STATISTICS","partition":36,"replicas":[101421743,101671664,101811991]},{"topic":"SLOTS","partition":28,"replicas":[105114483,101421743,101461702]},{"topic":"AD_EVENTS","partition":36,"replicas":[101421743,101671664,101811991]},{"topic":"D_STATISTICS","partition":49,"replicas":[101811991,105114483,101421743]},{"topic":"ASSETS","partition":11,"replicas":[101811991,101812541,101862816]},{"topic":"AD_EVENTS","partition":24,"replicas":[101862816,102311671,105114483]},{"topic":"RTB","partition":3,"replicas":[101671664,101811991,101812541]},{"topic":"RTB","partition":41,"replicas":[101812541,101461702,101461782]},{"topic":"ASSETS","partition":14,"replicas":[102311671,105114483,101421743]},{"topic":"RTB","partition":30,"replicas":[101671664,102311671,105114483]},{"topic":"SLOTS","partition":26,"replicas":[101862816,101812541,102311671]},{"topic":"D_STATISTICS","partition":0,"replicas":[101421743,102311671,105114483]},{"topic":"D_STATISTICS","partition":8,"replicas":[105114483,101862816,102311671]},{"topic":"B_IMPRESSION","partition":36,"replicas":[101811991,101421743,101461702]},{"topic":"SLOTS","partition":49,"replicas":[101461782,101812541,101862816]},{"topic":"D_STATISTICS","partition":22,"replicas":[101811991,101812541,101862816]},{"topic":"B_IMPRESSION","partition":38,"replicas":[101862816,101461782,101671664]},{"topic":"AD_EVENTS","partition":11,"replicas":[101461782,101461702,101671664]},{"topic":"AD_EVENTS","partition":39,"replicas":[101671664,101862816,102311671]},{"topic":"SLOTS","partition":25,"replicas":[101812541,101811991,101862816]},{"topic":"D_STATISTICS","partition":44,"replicas":[105114483,101461782,101671664]},{"topic":"ASSETS","partition":47,"replicas":[101811991,101421743,101461702]},{"topic":"AD_EVENTS","partition":12,"replicas":[101671664,101461782,101811991]},{"topic":"SLOTS","partition":4,"replicas":[101461782,105114483,101421743]},{"topic":"SLOTS","partition":47,"replicas":[101421743,101671664,101811991]},{"topic":"ASSETS","partition":29,"replicas":[101811991,102311671,105114483]},{"topic":"B_IMPRESSION","partition":22,"replicas":[105114483,101461782,101671664]},{"topic":"RTB","partition":14,"replicas":[101812541,102311671,105114483]},{"topic":"D_STATISTICS","partition":39,"replicas":[101671664,101862816,102311671]},{"topic":"ASSETS","partition":12,"replicas":[101812541,101862816,102311671]},{"topic":"D_STATISTICS","partition":31,"replicas":[101811991,101862816,102311671]},{"topic":"SLOTS","partition":0,"replicas":[102311671,101811991,101812541]},{"topic":"RTB","partition":22,"replicas":[101811991,102311671,105114483]},{"topic":"D_STATISTICS","partition":2,"replicas":[101461782,101421743,101461702]},{"topic":"SLOTS","partition":43,"replicas":[101812541,102311671,105114483]},{"

Re: Facing issues with Kafka 0.8.1.1 and kafka-reassign-partitions.sh

2014-07-08 Thread Florian Dambrine
Here is the entire logic to rebalance the cluster which is done by this
groovy script (
https://github.com/Lowess/Kafka/blob/master/KafkaPartitionRebalancer.groovy)

#1: Request the zookeeper and get the broker id list
#2: Request zookeeper and get the list of topic
#3: Generate the topic-to-move.json which looks like:

{
  "version": 1,
  "topics": [
{
  "topic": "SLOTS"
},
{
  "topic": "ASSETS"
},
{
  "topic": "AD_EVENTS"
},
{
  "topic": "B_IMPRESSION"
},
{
  "topic": "B_STATISTICS"
},
{
  "topic": "PAGES"
},
{
  "topic": "RTB"
},
{
  "topic": "D_STATISTICS"
},
{
  "topic": "D_REPORTING"
}
  ]
}

#4: Upload this file on the kafka node (/tmp/topics-to-move.json) and
run the following command:
bin/kafka-reassign-partitions.sh --zookeeper ZK_IP:2181
--topics-to-move-json-file /tmp/topics-to-move.json --generate
--broker-list "ALL_BROKERS_THAT_ARE_RETURNED_BY_ZOOKEEPER_ON_STEP_#1"

#5: Parse the json returned by the previous step and slice it into
smaller json (the number of partitions contained in a Json is limited
by the groovy script (10 partitions in this example))that look like:

{
  "version": 1,
  "partitions": [
{
  "topic": "D_REPORTING",
  "partition": 2,
  "replicas": [
102311671,
10517222,
102311679
  ]
},
{
  "topic": "AD_EVENTS",
  "partition": 48,
  "replicas": [
102311671,
109715277,
101531906
  ]
},
{
  "topic": "D_STATISTICS",
  "partition": 47,
  "replicas": [
109715277,
10517222,
102311679
  ]
},
{
  "topic": "SLOTS",
  "partition": 46,
  "replicas": [
101131445,
102336284,
10517222
  ]
},
{
  "topic": "RTB",
  "partition": 48,
  "replicas": [
101021441,
102311671,
102336284
  ]
},
{
  "topic": "PAGES",
  "partition": 14,
  "replicas": [
102311679,
102311671,
102336284
  ]
},
{
  "topic": "ASSETS",
  "partition": 35,
  "replicas": [
10517222,
101131445,
102311679
  ]
},
{
  "topic": "B_IMPRESSION",
  "partition": 34,
  "replicas": [
101131445,
102311672,
102311671
  ]
},
{
  "topic": "B_STATISTICS",
  "partition": 19,
  "replicas": [
109715277,
101531906,
102311672
  ]
},
{
  "topic": "AD_EVENTS",
  "partition": 18,
  "replicas": [
109715277,
102311671,
102336284
  ]
}
  ]
}

#6: Upload the previous Json on the kafka node
(/tmp/expand-cluster-reassignment.json) and run the following command:
bin/kafka-reassign-partitions.sh --zookeeper ZK_IP:2181
--reassignment-json-file
/tmp/expand-cluster-reassignment.json --execute --broker-list
"ALL_BROKERS_THAT_ARE_RETURNED_BY_ZOOKEEPER_ON_STEP_#1"

#7: Loop on the verification step while the json returned by the following
command contains failed partitions:
bin/kafka-reassign-partitions.sh --zookeeper ZK_IP:2181
--reassignment-json-file
/tmp/expand-cluster-reassignment.json --verify --broker-list
"ALL_BROKERS_THAT_ARE_RETURNED_BY_ZOOKEEPER_ON_STEP_#1"

#8 Execute a new json part file similar as step #5 util all of them have
ran.


Hope that will help you guys.


On Tue, Jul 8, 2014 at 10:31 AM, Clark Haskins <
chask...@linkedin.com.invalid> wrote:

> Can you copy/paste the json you are passing to the reassignment tool? Plus
> the commands. Also do a describe on your topics.
>
> -Clark
>
> Clark Elliott Haskins III
> LinkedIn DDS Site Reliability Engineer
> Kafka, Zookeeper, Samza SRE
> Mobile: 505.385.1484
> BlueJeans: https://www.bluejeans.com/chaskins
>
>
> chask...@linkedin.com
> https://www.linkedin.com/in/clarkhaskins
> There is no place like 127.0.0.1
>
>
>
>
> On 7/8/14, 10:26 AM, "Florian Dambrine"  wrote:
>
> >I let the tool running for an entire weekend on the test cluster and on
> >Monday it was still saying "failed"...
> >
> >I have 500 Go per Kafka node and it is a 8 nodes cluster.
> >
> >I am also wondering if I am using the tool correctly. Currently I am
> >running the tool to rebalance everything across the entire cluster. As I
> >have 3 replicas the tool requires at least 3 brokers.
> >
> >Should I add 3 new Kafka nodes and rebalance some topics to these new
> >nodes
> >only? I am afraid to unbalance the cluster with this option.
> >
> >Any suggestions?
> >
> >Thanks for your help.
> >
> >
> >On Mon, Jul 7, 2014 at 9:29 PM, Jun Rao  wrote:
> >
> >> The failure could mean that the reassignment is still in progress. If
> >>you
> >> have lots of data, it may take some time to move the data to new
> >>brokers.
> >> You could observe the max lag in each broker to see how far behind new
> >> replicas are (see
> >>http://kafka.apache.org/documentation.html#monito

Re: Facing issues with Kafka 0.8.1.1 and kafka-reassign-partitions.sh

2014-07-08 Thread Clark Haskins
Can you copy/paste the json you are passing to the reassignment tool? Plus
the commands. Also do a describe on your topics.

-Clark

Clark Elliott Haskins III
LinkedIn DDS Site Reliability Engineer
Kafka, Zookeeper, Samza SRE
Mobile: 505.385.1484
BlueJeans: https://www.bluejeans.com/chaskins


chask...@linkedin.com
https://www.linkedin.com/in/clarkhaskins
There is no place like 127.0.0.1




On 7/8/14, 10:26 AM, "Florian Dambrine"  wrote:

>I let the tool running for an entire weekend on the test cluster and on
>Monday it was still saying "failed"...
>
>I have 500 Go per Kafka node and it is a 8 nodes cluster.
>
>I am also wondering if I am using the tool correctly. Currently I am
>running the tool to rebalance everything across the entire cluster. As I
>have 3 replicas the tool requires at least 3 brokers.
>
>Should I add 3 new Kafka nodes and rebalance some topics to these new
>nodes
>only? I am afraid to unbalance the cluster with this option.
>
>Any suggestions?
>
>Thanks for your help.
>
>
>On Mon, Jul 7, 2014 at 9:29 PM, Jun Rao  wrote:
>
>> The failure could mean that the reassignment is still in progress. If
>>you
>> have lots of data, it may take some time to move the data to new
>>brokers.
>> You could observe the max lag in each broker to see how far behind new
>> replicas are (see
>>http://kafka.apache.org/documentation.html#monitoring).
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Mon, Jul 7, 2014 at 4:42 PM, Florian Dambrine 
>> wrote:
>>
>> > When I run the tool with the --verify option it says failed for the
>>some
>> > partitions.
>> >
>> > The problem is I do not know if it is a zookeeper issue or if the tool
>> > really failed.
>> >
>> > I faced one time the zookeeper issue (
>> > https://issues.apache.org/jira/browse/KAFKA-1382) and by killing the
>> > responsible Kafka the partition switched from failed to completed
>> > successfully.
>> >
>> > What should I do when the Kafka tool says that it failed to move the
>> > partition?
>> >
>> >
>> >
>> >
>> > On Mon, Jul 7, 2014 at 4:33 PM, Clark Haskins
>> > > > > wrote:
>> >
>> > > How does it get stuck?
>> > >
>> > > -Clark
>> > >
>> > > Clark Elliott Haskins III
>> > > LinkedIn DDS Site Reliability Engineer
>> > > Kafka, Zookeeper, Samza SRE
>> > > Mobile: 505.385.1484
>> > > BlueJeans: https://www.bluejeans.com/chaskins
>> > >
>> > >
>> > > chask...@linkedin.com
>> > > https://www.linkedin.com/in/clarkhaskins
>> > > There is no place like 127.0.0.1
>> > >
>> > >
>> > >
>> > >
>> > > On 7/7/14, 3:49 PM, "Florian Dambrine"  wrote:
>> > >
>> > > >Hi,
>> > > >
>> > > >I am trying to add new brokers to an existing 8 nodes Kafka
>>cluster.
>> We
>> > > >have around 10 topics and the number of partition is set to 50. In
>> order
>> > > >to
>> > > >test the reassgin-partitions scripts, I tried on a sandbox cluster
>>the
>> > > >following steps.
>> > > >
>> > > >I developed a script which is able to parse the reassignment
>>partition
>> > > >plan
>> > > >given by the Kafka tool in smaller pieces (reassigning maximum 10
>> > > >partitions at a time).
>> > > >
>> > > >Unfortunately I faced some issues with the tool that sometimes get
>> stuck
>> > > >on
>> > > >one partition. In this case I have to kill and restart the three
>> Kafkas
>> > on
>> > > >which the partition has been relocated to unlock the process (One
>> kafka
>> > at
>> > > >a time).
>> > > >
>> > > >Moreover, I have also faced these two issues that are already on
>>Jira:
>> > > >
>> > > >https://issues.apache.org/jira/browse/KAFKA-1382
>> > > >https://issues.apache.org/jira/browse/KAFKA-1479
>> > > >
>> > > >We really need to add new nodes to our Kafka cluster, does anybody
>> have
>> > > >already rebalance a Kafka 0.8.1.1? What could you advise me?
>> > > >
>> > > >Thanks, and feel free to ask me if you need more details.
>> > > >
>> > > >
>> > > >
>> > > >--
>> > > >*Florian Dambrine*  |  Intern, Big Data
>> > > >*GumGum*   |  *Ads that stick*
>> > > >209-797-3994  |  flor...@gumgum.com
>> > >
>> > >
>> >
>> >
>> > --
>> > *Florian Dambrine*  |  Intern, Big Data
>> > *GumGum*   |  *Ads that stick*
>> > 209-797-3994  |  flor...@gumgum.com
>> >
>>
>
>
>
>-- 
>*Florian Dambrine*  |  Intern, Big Data
>*GumGum*   |  *Ads that stick*
>209-797-3994  |  flor...@gumgum.com



Zookeeper Version Mismatch Problem In 0.8.1

2014-07-08 Thread Lung, Paul
Hi All,


I’m seeing the following issues after running a Kafka broker cluster for 2 
months:


[2014-07-07 17:54:19,798] ERROR Conditional update of path 
/brokers/topics/mini__065active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
 with data 
{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":3,"isr":[2129087]}
 and expected version 18 failed d

ue to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode 
= BadVersion for 
/brokers/topics/mini__065active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
 (kafka.utils.ZkUtils$)

[2014-07-07 17:54:19,809] ERROR Conditional update of path 
/brokers/topics/mini__076active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
 with data 
{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":4,"isr":[2129087]}
 and expected version 37 failed d

ue to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode 
= BadVersion for 
/brokers/topics/mini__076active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
 (kafka.utils.ZkUtils$)

[2014-07-07 17:54:19,817] ERROR Conditional update of path 
/brokers/topics/mini__042active_80__32__miniactiveitem_chd_qn/partitions/0/state
 with data 
{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":3,"isr":[2129087]}
 and expected version 24 failed due

to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
BadVersion for 
/brokers/topics/mini__042active_80__32__miniactiveitem_chd_qn/partitions/0/state
 (kafka.utils.ZkUtils$)

[2014-07-07 17:54:19,826] ERROR Conditional update of path 
/brokers/topics/mini__034active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
 with data 
{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":2,"isr":[2129087]}
 and expected version 143 failed

due to org.apache.zookeeper.KeeperException$BadVersionException: 
KeeperErrorCode = BadVersion for 
/brokers/topics/mini__034active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
 (kafka.utils.ZkUtils$)

[2014-07-07 17:54:19,833] ERROR Conditional update of path 
/brokers/topics/bom__066active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
 with data 
{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":4,"isr":[2129087]}
 and expected version 8 failed due

 to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
BadVersion for 
/brokers/topics/bom__066active_80__32__miniactiveitembrn_chd_qn/partitions/0/state
 (kafka.utils.ZkUtils$)

[2014-07-07 17:54:19,841] ERROR Conditional update of path 
/brokers/topics/mini__009active_80__32__mini/partitions/0/state with data 
{"controller_epoch":9,"leader":2129087,"version":1,"leader_epoch":2,"isr":[2129087]}
 and expected version 152 failed due to org.apache.zookee

per.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for 
/brokers/topics/mini__009active_80__32__mini/partitions/0/state 
(kafka.utils.ZkUtils$)



This problem is fixed after we restart the broker. Does anyone know why this 
problem would occur?


Paul Lung


Re: Facing issues with Kafka 0.8.1.1 and kafka-reassign-partitions.sh

2014-07-08 Thread Florian Dambrine
I let the tool running for an entire weekend on the test cluster and on
Monday it was still saying "failed"...

I have 500 Go per Kafka node and it is a 8 nodes cluster.

I am also wondering if I am using the tool correctly. Currently I am
running the tool to rebalance everything across the entire cluster. As I
have 3 replicas the tool requires at least 3 brokers.

Should I add 3 new Kafka nodes and rebalance some topics to these new nodes
only? I am afraid to unbalance the cluster with this option.

Any suggestions?

Thanks for your help.


On Mon, Jul 7, 2014 at 9:29 PM, Jun Rao  wrote:

> The failure could mean that the reassignment is still in progress. If you
> have lots of data, it may take some time to move the data to new brokers.
> You could observe the max lag in each broker to see how far behind new
> replicas are (see http://kafka.apache.org/documentation.html#monitoring).
>
> Thanks,
>
> Jun
>
>
> On Mon, Jul 7, 2014 at 4:42 PM, Florian Dambrine 
> wrote:
>
> > When I run the tool with the --verify option it says failed for the some
> > partitions.
> >
> > The problem is I do not know if it is a zookeeper issue or if the tool
> > really failed.
> >
> > I faced one time the zookeeper issue (
> > https://issues.apache.org/jira/browse/KAFKA-1382) and by killing the
> > responsible Kafka the partition switched from failed to completed
> > successfully.
> >
> > What should I do when the Kafka tool says that it failed to move the
> > partition?
> >
> >
> >
> >
> > On Mon, Jul 7, 2014 at 4:33 PM, Clark Haskins
> >  > > wrote:
> >
> > > How does it get stuck?
> > >
> > > -Clark
> > >
> > > Clark Elliott Haskins III
> > > LinkedIn DDS Site Reliability Engineer
> > > Kafka, Zookeeper, Samza SRE
> > > Mobile: 505.385.1484
> > > BlueJeans: https://www.bluejeans.com/chaskins
> > >
> > >
> > > chask...@linkedin.com
> > > https://www.linkedin.com/in/clarkhaskins
> > > There is no place like 127.0.0.1
> > >
> > >
> > >
> > >
> > > On 7/7/14, 3:49 PM, "Florian Dambrine"  wrote:
> > >
> > > >Hi,
> > > >
> > > >I am trying to add new brokers to an existing 8 nodes Kafka cluster.
> We
> > > >have around 10 topics and the number of partition is set to 50. In
> order
> > > >to
> > > >test the reassgin-partitions scripts, I tried on a sandbox cluster the
> > > >following steps.
> > > >
> > > >I developed a script which is able to parse the reassignment partition
> > > >plan
> > > >given by the Kafka tool in smaller pieces (reassigning maximum 10
> > > >partitions at a time).
> > > >
> > > >Unfortunately I faced some issues with the tool that sometimes get
> stuck
> > > >on
> > > >one partition. In this case I have to kill and restart the three
> Kafkas
> > on
> > > >which the partition has been relocated to unlock the process (One
> kafka
> > at
> > > >a time).
> > > >
> > > >Moreover, I have also faced these two issues that are already on Jira:
> > > >
> > > >https://issues.apache.org/jira/browse/KAFKA-1382
> > > >https://issues.apache.org/jira/browse/KAFKA-1479
> > > >
> > > >We really need to add new nodes to our Kafka cluster, does anybody
> have
> > > >already rebalance a Kafka 0.8.1.1? What could you advise me?
> > > >
> > > >Thanks, and feel free to ask me if you need more details.
> > > >
> > > >
> > > >
> > > >--
> > > >*Florian Dambrine*  |  Intern, Big Data
> > > >*GumGum*   |  *Ads that stick*
> > > >209-797-3994  |  flor...@gumgum.com
> > >
> > >
> >
> >
> > --
> > *Florian Dambrine*  |  Intern, Big Data
> > *GumGum*   |  *Ads that stick*
> > 209-797-3994  |  flor...@gumgum.com
> >
>



-- 
*Florian Dambrine*  |  Intern, Big Data
*GumGum*   |  *Ads that stick*
209-797-3994  |  flor...@gumgum.com


Re: status of 0.8.2

2014-07-08 Thread Joel Koshy
Hi Joe,

I had a question for you in that RB since I don't fully understand it.
Maybe you can help clarify how the fix works.

Thanks,

Joel

On Tue, Jul 08, 2014 at 10:55:56AM -0400, Joe Stein wrote:
> I wrote it, so I can't commit it without other committers agreeing.
> 
> Last I recall I updated the patch from the feedback in the reviewboard but
> haven't looked at it in months.
> 
> I am glad though it resolved the issue you were having and we can figure
> how to get the patch to work with 0.8.1.1 if you run into problems.
> 
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
> 
> 
> On Tue, Jul 8, 2014 at 10:45 AM, Jason Rosenberg  wrote:
> 
> > Is there a blocker to getting the patch for kafka-1180 applied?  Is the
> > patch for 0.8.0 no longer compatible for trunk?  I'm actually going to see
> > if I can get it to work for 0.8.1.1 today.
> >
> > Thanks,
> >
> > Jason
> >
> >
> > On Mon, Jul 7, 2014 at 9:41 PM, Jun Rao  wrote:
> >
> > > Two biggest features in 0.8.2 are Kafka-based offset management and the
> > new
> > > producer. We are in the final stage of testing them. We also haven't
> > fully
> > > tested the delete topic feature. So, we are probably 4-6 weeks away from
> > > releasing 0.8.2.
> > >
> > > For kafka-1180, the patch hasn't been applied yet and we will need a
> > patch
> > > for trunk.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg 
> > wrote:
> > >
> > > > What's the status for an 0.8.2 release?  We are currently using 0.8.0,
> > > and
> > > > would like to upgrade to take advantage of some of the per-topic
> > > retention
> > > > options available now in 0.8.1.
> > > >
> > > > However, we'd also like to take advantage of some fixes coming in 0.8.2
> > > > (e.g. deleting topics).
> > > >
> > > > Also, we have been using a patch for (
> > > > https://issues.apache.org/jira/browse/KAFKA-1180) applied to 0.8.0.
> > >  This
> > > > is marked as scheduled for 0.8.2, with a patch available, but I'm not
> > > sure
> > > > if this has been committed and applied to the 0.8.2 branch yet.
> > > >
> > > > Thanks,
> > > >
> > > > Jason
> > > >
> > >
> >



Re: status of 0.8.2

2014-07-08 Thread Joe Stein
I think it should be ok that code as I recall is isolated.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Tue, Jul 8, 2014 at 11:07 AM, Jason Rosenberg  wrote:

> Thanks Joe, I'll let you know what I find.  Do you anticipate any issues
> with it working in 0.8.1.1?
>
> Jason
>
>
> On Tue, Jul 8, 2014 at 10:55 AM, Joe Stein  wrote:
>
> > I wrote it, so I can't commit it without other committers agreeing.
> >
> > Last I recall I updated the patch from the feedback in the reviewboard
> but
> > haven't looked at it in months.
> >
> > I am glad though it resolved the issue you were having and we can figure
> > how to get the patch to work with 0.8.1.1 if you run into problems.
> >
> > /***
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop 
> > /
> >
> >
> > On Tue, Jul 8, 2014 at 10:45 AM, Jason Rosenberg 
> wrote:
> >
> > > Is there a blocker to getting the patch for kafka-1180 applied?  Is the
> > > patch for 0.8.0 no longer compatible for trunk?  I'm actually going to
> > see
> > > if I can get it to work for 0.8.1.1 today.
> > >
> > > Thanks,
> > >
> > > Jason
> > >
> > >
> > > On Mon, Jul 7, 2014 at 9:41 PM, Jun Rao  wrote:
> > >
> > > > Two biggest features in 0.8.2 are Kafka-based offset management and
> the
> > > new
> > > > producer. We are in the final stage of testing them. We also haven't
> > > fully
> > > > tested the delete topic feature. So, we are probably 4-6 weeks away
> > from
> > > > releasing 0.8.2.
> > > >
> > > > For kafka-1180, the patch hasn't been applied yet and we will need a
> > > patch
> > > > for trunk.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > What's the status for an 0.8.2 release?  We are currently using
> > 0.8.0,
> > > > and
> > > > > would like to upgrade to take advantage of some of the per-topic
> > > > retention
> > > > > options available now in 0.8.1.
> > > > >
> > > > > However, we'd also like to take advantage of some fixes coming in
> > 0.8.2
> > > > > (e.g. deleting topics).
> > > > >
> > > > > Also, we have been using a patch for (
> > > > > https://issues.apache.org/jira/browse/KAFKA-1180) applied to
> 0.8.0.
> > > >  This
> > > > > is marked as scheduled for 0.8.2, with a patch available, but I'm
> not
> > > > sure
> > > > > if this has been committed and applied to the 0.8.2 branch yet.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jason
> > > > >
> > > >
> > >
> >
>


Re: status of 0.8.2

2014-07-08 Thread Jason Rosenberg
Thanks Joe, I'll let you know what I find.  Do you anticipate any issues
with it working in 0.8.1.1?

Jason


On Tue, Jul 8, 2014 at 10:55 AM, Joe Stein  wrote:

> I wrote it, so I can't commit it without other committers agreeing.
>
> Last I recall I updated the patch from the feedback in the reviewboard but
> haven't looked at it in months.
>
> I am glad though it resolved the issue you were having and we can figure
> how to get the patch to work with 0.8.1.1 if you run into problems.
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>
>
> On Tue, Jul 8, 2014 at 10:45 AM, Jason Rosenberg  wrote:
>
> > Is there a blocker to getting the patch for kafka-1180 applied?  Is the
> > patch for 0.8.0 no longer compatible for trunk?  I'm actually going to
> see
> > if I can get it to work for 0.8.1.1 today.
> >
> > Thanks,
> >
> > Jason
> >
> >
> > On Mon, Jul 7, 2014 at 9:41 PM, Jun Rao  wrote:
> >
> > > Two biggest features in 0.8.2 are Kafka-based offset management and the
> > new
> > > producer. We are in the final stage of testing them. We also haven't
> > fully
> > > tested the delete topic feature. So, we are probably 4-6 weeks away
> from
> > > releasing 0.8.2.
> > >
> > > For kafka-1180, the patch hasn't been applied yet and we will need a
> > patch
> > > for trunk.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg 
> > wrote:
> > >
> > > > What's the status for an 0.8.2 release?  We are currently using
> 0.8.0,
> > > and
> > > > would like to upgrade to take advantage of some of the per-topic
> > > retention
> > > > options available now in 0.8.1.
> > > >
> > > > However, we'd also like to take advantage of some fixes coming in
> 0.8.2
> > > > (e.g. deleting topics).
> > > >
> > > > Also, we have been using a patch for (
> > > > https://issues.apache.org/jira/browse/KAFKA-1180) applied to 0.8.0.
> > >  This
> > > > is marked as scheduled for 0.8.2, with a patch available, but I'm not
> > > sure
> > > > if this has been committed and applied to the 0.8.2 branch yet.
> > > >
> > > > Thanks,
> > > >
> > > > Jason
> > > >
> > >
> >
>


Re: status of 0.8.2

2014-07-08 Thread Joe Stein
I wrote it, so I can't commit it without other committers agreeing.

Last I recall I updated the patch from the feedback in the reviewboard but
haven't looked at it in months.

I am glad though it resolved the issue you were having and we can figure
how to get the patch to work with 0.8.1.1 if you run into problems.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Tue, Jul 8, 2014 at 10:45 AM, Jason Rosenberg  wrote:

> Is there a blocker to getting the patch for kafka-1180 applied?  Is the
> patch for 0.8.0 no longer compatible for trunk?  I'm actually going to see
> if I can get it to work for 0.8.1.1 today.
>
> Thanks,
>
> Jason
>
>
> On Mon, Jul 7, 2014 at 9:41 PM, Jun Rao  wrote:
>
> > Two biggest features in 0.8.2 are Kafka-based offset management and the
> new
> > producer. We are in the final stage of testing them. We also haven't
> fully
> > tested the delete topic feature. So, we are probably 4-6 weeks away from
> > releasing 0.8.2.
> >
> > For kafka-1180, the patch hasn't been applied yet and we will need a
> patch
> > for trunk.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg 
> wrote:
> >
> > > What's the status for an 0.8.2 release?  We are currently using 0.8.0,
> > and
> > > would like to upgrade to take advantage of some of the per-topic
> > retention
> > > options available now in 0.8.1.
> > >
> > > However, we'd also like to take advantage of some fixes coming in 0.8.2
> > > (e.g. deleting topics).
> > >
> > > Also, we have been using a patch for (
> > > https://issues.apache.org/jira/browse/KAFKA-1180) applied to 0.8.0.
> >  This
> > > is marked as scheduled for 0.8.2, with a patch available, but I'm not
> > sure
> > > if this has been committed and applied to the 0.8.2 branch yet.
> > >
> > > Thanks,
> > >
> > > Jason
> > >
> >
>


Re: status of 0.8.2

2014-07-08 Thread Jason Rosenberg
Is there a blocker to getting the patch for kafka-1180 applied?  Is the
patch for 0.8.0 no longer compatible for trunk?  I'm actually going to see
if I can get it to work for 0.8.1.1 today.

Thanks,

Jason


On Mon, Jul 7, 2014 at 9:41 PM, Jun Rao  wrote:

> Two biggest features in 0.8.2 are Kafka-based offset management and the new
> producer. We are in the final stage of testing them. We also haven't fully
> tested the delete topic feature. So, we are probably 4-6 weeks away from
> releasing 0.8.2.
>
> For kafka-1180, the patch hasn't been applied yet and we will need a patch
> for trunk.
>
> Thanks,
>
> Jun
>
>
> On Mon, Jul 7, 2014 at 7:31 AM, Jason Rosenberg  wrote:
>
> > What's the status for an 0.8.2 release?  We are currently using 0.8.0,
> and
> > would like to upgrade to take advantage of some of the per-topic
> retention
> > options available now in 0.8.1.
> >
> > However, we'd also like to take advantage of some fixes coming in 0.8.2
> > (e.g. deleting topics).
> >
> > Also, we have been using a patch for (
> > https://issues.apache.org/jira/browse/KAFKA-1180) applied to 0.8.0.
>  This
> > is marked as scheduled for 0.8.2, with a patch available, but I'm not
> sure
> > if this has been committed and applied to the 0.8.2 branch yet.
> >
> > Thanks,
> >
> > Jason
> >
>