Broker don't get back in ISR when killed and restarted

2014-09-30 Thread florent valdelievre
Hi There,

Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
Kafka version: kafka_2.8.0-0.8.1.1



192.168.1.180
zookeeper:2181
broker:9092(broker.id=1)
zookeeper.connect=localhost:2181

192.168.1.190
broker:9092(broker.id=2)
zookeeper.connect=192.168.1.180:2181

1) I start both kafka server, everything Ok
2) I create a topic

kafka-topics.sh --create --zookeeper 192.168.1.180:2181  --topic
hibe-test-1 --partitions 1 --replication-factor 2

Everything is good, both brokers are in the ISR

[shopmedia@staging2:~] $kafka-topics.sh --describe --zookeeper
192.168.1.180:2181 --topic hibe-test-1
Topic:hibe-test-1   PartitionCount:1ReplicationFactor:2
Configs:
Topic: hibe-test-1  Partition: 0Leader: 2   Replicas:
2,1   Isr: 2,1

3) I Kill Kafka-server on 192.168.1.190(broker 2) ( kill -9 as if a server
would crash )

[shopmedia@staging2:~] $kafka-topics.sh --describe --zookeeper
192.168.1.180:2181 --topic hibe-test-1
Topic:hibe-test-1   PartitionCount:1ReplicationFactor:2
Configs:
Topic: hibe-test-1  Partition: 0Leader: 1   Replicas:
2,1   Isr: 1

4) I start Kafka-server on 192.168.1.190

I get few errors in kafka-server stdout:

[2014-09-30 04:44:54,543] INFO conflict in /controller data:
{"version":1,"brokerid":2,"timestamp":"1412066694524"} stored data:
{"version":1,"brokerid":1,"timestamp":"1412066540118"}
(kafka.utils.ZkUtils$)

and( this one occurs a lot and uses 40% CPU on our server )

[2014-09-30 04:55:32,295] ERROR [ReplicaFetcherThread-0-1], Error for
partition [hibe-test-1,0] to broker 1:class kafka.common.UnknownException
(kafka.server.ReplicaFetcherThread)
[2014-09-30 04:55:32,299] ERROR [KafkaApi-2] error when handling request
Name: FetchRequest; Version: 0; CorrelationId: 215; ClientId:
ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes;
RequestInfo: [hibe-test-1,0] -> PartitionFetchInfo(3,1048576)
(kafka.server.KafkaApis)
kafka.common.KafkaException: Shouldn't set logEndOffset for replica 2
partition [hibe-test-1,0] since it's local
at kafka.cluster.Replica.logEndOffset_$eq(Replica.scala:46)
at
kafka.cluster.Partition.updateLeaderHWAndMaybeExpandIsr(Partition.scala:236)


5) broker=2 never gets back in the ISR

Any idea ?


After creating a topic, broker gets dropped from ISR

2014-09-30 Thread florent valdelievre
Hi again,

192.168.1.180
Zk: 192.168.1.180:2181
Kafka: 9092
Broker.id = 1
zookeeper.connect=192.168.1.180:2181

--

192.168.1.190
Zk: 192.168.1.180:2181
Kafka: 9092
Broker.id = 2
zookeeper.connect=192.168.1.180:2181

--

I start both Kafka-server
I create a topic using the following command ( the command is launch on
192.168.1.180)

kafka-topics.sh --create --zookeeper 192.168.1.180:2181  --topic
hibe-test-12 --partitions 1 --replication-factor 2

*Stdout on broker.id =1*
[2014-09-30 05:51:01,586] INFO Created log for partition [hibe-test-12,0]
in /home/shopmedia/apps/kafka/log with properties {segment.index.bytes ->
10485760, file.delete.delay.ms -> 6, segment.bytes -> 536870912,
flush.ms -> 9223372036854775807, delete.retention.ms -> 8640,
index.interval.bytes -> 4096, retention.bytes -> -1, cleanup.policy ->
delete, segment.ms -> 60480, max.message.bytes -> 112,
flush.messages -> 9223372036854775807, min.cleanable.dirty.ratio -> 0.5,
retention.ms -> 360}. (kafka.log.LogManager)
[2014-09-30 05:51:01,588] WARN Partition [hibe-test-12,0] on broker 1: No
checkpointed highwatermark is found for partition [hibe-test-12,0]
(kafka.cluster.Partition)
[2014-09-30 05:51:19,366] INFO Partition [hibe-test-12,0] on broker 1:
Shrinking ISR for partition [hibe-test-12,0] from 1,2 to 1
(kafka.cluster.Partition)

--


*Stdout on broker.id =2*

[2014-09-30 05:51:11,952] ERROR [ReplicaFetcherThread-0-1], Error for
partition [hibe-test-12,0] to broker 1:class kafka.common.UnknownException
(kafka.server.ReplicaFetcherThread)
[2014-09-30 05:51:11,954] ERROR [KafkaApi-2] error when handling request
Name: FetchRequest; Version: 0; CorrelationId: 2647; ClientId:
ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes;
RequestInfo: [hibe-test-12,0] -> PartitionFetchInfo(0,1048576)
(kafka.server.KafkaApis)
kafka.common.KafkaException: Shouldn't set logEndOffset for replica 2
partition [hibe-test-12,0] since it's local

This error is high CPU usage, roughly 40% CPU usage until i kill
kafka-server

Please note that i have removed log fdata rom both server beforehand as
well as remove zk data using : zkCli.sh
rmr /brokers

Broker.id = 2 never gets in the ISR

Do you have an idea what is causing this error ?


Re: AWS EC2 deployment best practices

2014-09-30 Thread Joe Crobak
I didn't know about KAFKA-1215, thanks. I'm not sure it would fully address
my concerns of a producer writing to the partition leader in different AZ,
though.

To answer your question, I was thinking ephemerals with replication, yes.
With a reservation, it's pretty easy to get e.g. two i2.xlarge for an
amortized cost below a single m2.2xlarge with the same amount of EBS
storage and provisioned IOPs.

On Mon, Sep 29, 2014 at 9:40 PM, Philip O'Toole <
philip.oto...@yahoo.com.invalid> wrote:

> If only Kafka had rack awarenessyou could run 1 cluster and set up the
> replicas in different AZs.
>
>
> https://issues.apache.org/jira/browse/KAFKA-1215
>
> As for your question about ephemeral versus EBS, I presume you are
> proposing to use ephemeral *with* replicas, right?
>
>
> Philip
>
>
>
> -
> http://www.philipotoole.com
>
>
> On Monday, September 29, 2014 9:45 PM, Joe Crobak 
> wrote:
>
>
>
> We're planning a deploy to AWS EC2, and I was hoping to get some advice on
> best practices. I've seen the Loggly presentation [1], which has some good
> recommendations on instance types and EBS setup. Aside from that, there
> seem to be several options in terms of multi-Availability Zone (AZ)
> deployment. The ones we're considering are:
>
> 1) Treat each AZ as a separate data center. Producers write to the kafka
> cluster in the same AZ. For consumption, two options:
> 1a) designate one cluster the "master" cluster and use mirrormaker. This
> was discussed here [2] where some gotchas related to offset management were
> raised.
> 1b) Build consumers to consume from both clusters (e.g. Two camus jobs-one
> for each cluster).
>
> Pros:
> * if there's a network partition between AZs (or extra latency), the
> consumer(s) will catch up once the event is resolved.
> * If an AZ goes offline, only unprocessed data in that AZ is lost until the
> AZ comes back online. The other AZ is unaffected. (consume failover is more
> complicated in 1a, it seems).
> Cons:
> * Duplicate infrastructure and either more moving parts (1a) or more
> complicated consumers (1b).
> * It's unclear how this scales if one wants to add a second region to the
> mix.
>
> 2) The second option is to treat AZs as the same data center. In this case,
> there's no guarantee that a writer is writing to a node in the same AZ.
>
> Pros:
> * Simplified setup-all data is in one place.
> Cons:
> * Harder to design for availability—what if the leader of the partition is
> in a different AZ than the producer and there's a partition between AZs? If
> latency is high or throughput is low between AZs, write throughput suffers
> if `request.required.acks` = -1
>
>
> Some other considerations:
> * Zookeeper deploy—the best practice seems to be a 3-node cluster across 3
> AZs, but option 1a/b would let us do separate clusters per AZ.
> * EBS / provisioned IOPs—The Loggly presentation predates Kafka 0.8
> replication. Are folks using ephemeral storage instead of EBS now?
> Provisioned IOPs can get expensive pretty quickly.
>
> Any suggestions/experience along these lines (or others!) would be greatly
> appreciated. If there's good feedback, I'd be happy to put together a wiki
> page with the details.
>
> Thanks,
> Joe
>
> [1] http://search-hadoop.com/m/4TaT4BQRJy
> [2] http://search-hadoop.com/m/4TaT49l0Gh/AWS+availability+zone/v=plain
>


Re: AWS EC2 deployment best practices

2014-09-30 Thread Philip O'Toole
OK, yeah, speaking from experience I would be comfortable with using the 
ephemeral storage if it's replicated across AZs. More and more EC2 instances 
have local SSDs, so you'll get great IO. Of course, you better monitor your 
instance, and if a instance terminates, you're vulnerable if a second instance 
is lost. It might argue for 3 copies.

As you correctly pointed out in your original e-mail, the Loggly setup predated 
0.8 -- so there was no replication to worry about. We ran 3-broker clusters, 
and put a broker, of each cluster, in a different AZ. This did mean that during 
an AZ failure that certain brokers would be unavailable (but the messages were 
still on disk, ready for processing when the AZ came back online), but it did 
mean that there was always some Kafka brokers running somewhere that were 
reachable, and incoming traffic could be sent there. The Producers we wrote 
took care of dealing with this. In other words the pipeline kept moving data.


Of course, in a healthy pipeline, each message was written to ES within a 
matter of seconds, and we had replication there (as outlined in the 
accompanying talk). It all worked very well.


Philip

 
-
http://www.philipotoole.com 


On Tuesday, September 30, 2014 2:49 PM, Joe Crobak  wrote:
 


I didn't know about KAFKA-1215, thanks. I'm not sure it would fully address
my concerns of a producer writing to the partition leader in different AZ,
though.

To answer your question, I was thinking ephemerals with replication, yes.
With a reservation, it's pretty easy to get e.g. two i2.xlarge for an
amortized cost below a single m2.2xlarge with the same amount of EBS
storage and provisioned IOPs.


On Mon, Sep 29, 2014 at 9:40 PM, Philip O'Toole <
philip.oto...@yahoo.com.invalid> wrote:

> If only Kafka had rack awarenessyou could run 1 cluster and set up the
> replicas in different AZs.
>
>
> https://issues.apache.org/jira/browse/KAFKA-1215
>
> As for your question about ephemeral versus EBS, I presume you are
> proposing to use ephemeral *with* replicas, right?
>
>
> Philip
>
>
>
> -
> http://www.philipotoole.com
>
>
> On Monday, September 29, 2014 9:45 PM, Joe Crobak 
> wrote:
>
>
>
> We're planning a deploy to AWS EC2, and I was hoping to get some advice on
> best practices. I've seen the Loggly presentation [1], which has some good
> recommendations on instance types and EBS setup. Aside from that, there
> seem to be several options in terms of multi-Availability Zone (AZ)
> deployment. The ones we're considering are:
>
> 1) Treat each AZ as a separate data center. Producers write to the kafka
> cluster in the same AZ. For consumption, two options:
> 1a) designate one cluster the "master" cluster and use mirrormaker. This
> was discussed here [2] where some gotchas related to offset management were
> raised.
> 1b) Build consumers to consume from both clusters (e.g. Two camus jobs-one
> for each cluster).
>
> Pros:
> * if there's a network partition between AZs (or extra latency), the
> consumer(s) will catch up once the event is resolved.
> * If an AZ goes offline, only unprocessed data in that AZ is lost until the
> AZ comes back online. The other AZ is unaffected. (consume failover is more
> complicated in 1a, it seems).
> Cons:
> * Duplicate infrastructure and either more moving parts (1a) or more
> complicated consumers (1b).
> * It's unclear how this scales if one wants to add a second region to the
> mix.
>
> 2) The second option is to treat AZs as the same data center. In this case,
> there's no guarantee that a writer is writing to a node in the same AZ.
>
> Pros:
> * Simplified setup-all data is in one place.
> Cons:
> * Harder to design for availability—what if the leader of the partition is
> in a different AZ than the producer and there's a partition between AZs? If
> latency is high or throughput is low between AZs, write throughput suffers
> if `request.required.acks` = -1
>
>
> Some other considerations:
> * Zookeeper deploy—the best practice seems to be a 3-node cluster across 3
> AZs, but option 1a/b would let us do separate clusters per AZ.
> * EBS / provisioned IOPs—The Loggly presentation predates Kafka 0.8
> replication. Are folks using ephemeral storage instead of EBS now?
> Provisioned IOPs can get expensive pretty quickly.
>
> Any suggestions/experience along these lines (or others!) would be greatly
> appreciated. If there's good feedback, I'd be happy to put together a wiki
> page with the details.
>
> Thanks,
> Joe
>
> [1] http://search-hadoop.com/m/4TaT4BQRJy
> [2] http://search-hadoop.com/m/4TaT49l0Gh/AWS+availability+zone/v=plain
>

RE: BadVersion state in Kafka Logs

2014-09-30 Thread Seshadri, Balaji
I would love to help you guys to make Kafka best in Pub/Sub, will continue 
doing that whenever I can.

Do we have 0.8.1.2 release tag  or should we apply patch on top of 0.8.1.1 tag 
because we need this KAFKA-1382 JIRA ?.

Balaji

From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
Sent: Monday, September 29, 2014 5:21 PM
To: Seshadri, Balaji
Cc: users@kafka.apache.org
Subject: Re: BadVersion state in Kafka Logs

It is difficult to predict an exact date. Though all the discussions of the 
progress and ETA are on the mailing list. You can follow the discussions to 
know the details and/or offer to help out on the outstanding issues.

On Mon, Sep 29, 2014 at 3:48 PM, Seshadri, Balaji 
mailto:balaji.sesha...@dish.com>> wrote:
Neha,

Do you know the date in Oct when 0.8.2 is going to be out ?.

Thanks,

Balaji

From: Neha Narkhede 
[mailto:neha.narkh...@gmail.com]
Sent: Thursday, September 25, 2014 1:08 PM
To: Seshadri, Balaji
Cc: users@kafka.apache.org

Subject: Re: BadVersion state in Kafka Logs

We are close to the release. I'd probably expect 0.8.2 sometime in October.

On Thu, Sep 25, 2014 at 10:37 AM, Seshadri, Balaji 
mailto:balaji.sesha...@dish.com>> wrote:
Hi Neha,

Do you know when are you guys releasing 0.8.2 ?.

Thanks,

Balaji

-Original Message-
From: Seshadri, Balaji 
[mailto:balaji.sesha...@dish.com]
Sent: Thursday, September 25, 2014 9:41 AM
To: users@kafka.apache.org
Subject: RE: BadVersion state in Kafka Logs

Thanks for the replay.

Please let me know if we can use trunk as 0.8.2 is not yet released.

Balaji

From: Neha Narkhede [neha.narkh...@gmail.com]
Sent: Wednesday, September 24, 2014 6:32 PM
To: users@kafka.apache.org
Subject: Re: BadVersion state in Kafka Logs

From the logs you've attached, my guess is it's most likely due to KAFKA-1382.

Thanks,
Neha

On Wed, Sep 24, 2014 at 10:48 AM, Seshadri, Balaji 
mailto:balaji.sesha...@dish.com>
> wrote:

> Hi,
>
>
>
> We got the below error in our logs and our consumers stopped consuming
> any data ?.It worked only after restart.
>
>
>
> We would like to confirm that it's because we are running with
> 0.8-beta version and not 0.8 release version to convince "THE MGMT" guys.
>
>
>
> Please let me know if it's this KAFKA-1382 causing the issue.
>
>
>
> Thanks,
>
>
>
> Balaji
>
>
>
> *From:* Gulia, Vikram
> *Sent:* Wednesday, September 24, 2014 8:43 AM
> *To:* Sharma, Navdeep; #IT-MAD DES; #IT-MAA
> *Cc:* Alam, Mohammad Shah
> *Subject:* RE: 9/23 prod issue - offline kafka partitions.
>
>
>
> Adding full MAA distro.
>
>
>
> DES Offshore looked in to the logs on kafka servers and seems like the
> issue we encountered yesterday may be described in these threads,
> please have a look -
>
>
>
> http://permalink.gmane.org/gmane.comp.apache.kafka.user/1904
>
>
>
> https://issues.apache.org/jira/browse/KAFKA-1382 (it describes the
> fix/patch which is available in 0.8.1.2/0.8.2)
>
>
>
> Thank You,
>
> Vikram Gulia
>
>
>
> *From:* Sharma, Navdeep
> *Sent:* Wednesday, September 24, 2014 6:53 AM
> *To:* Gulia, Vikram; #IT-MAD DES
> *Cc:* #IT-MAA Offshore; Alam, Mohammad Shah
> *Subject:* RE: 9/23 prod issue - offline kafka partitions.
>
>
>
> Hi Vikram,
>
>
>
> We analyzed  below mentioned issue with MAA-Offshore (Abhishek) and
> found that the error occurred only on 23 Sept. This is  not historical
> as we checked last 4 days logs.
>
>
>
> It looks like that consumer got stopped on September 22 2014 for Linux
> patching activty.MAA started consumer September  23 2014 at 1:00 AM.
>
>
>
> *Issue *in  server log   *"* *BadVersion for
> /brokers/topics/rain-burn-in/partitions/121/state"*  but it is not
> present in previous 4 days logs.
>
> More detail of this error can be found at-
>
> http://permalink.gmane.org/gmane.comp.apache.kafka.user/1904
>
>
>
> We are not sure about data loss in this scenario and working on this.
>
>
>
>
>
>
>
> Let us know if any concerns.
>
>
>
> [image: cid:image001.gif@01CF7B0A.03F21580]
>
> Navdeep Sharma
> Developer - offshore,  Middleware Applications & Development o India:
> 0120-4532000 - 2234
> c: +91-9911698102
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Gulia, Vikram
> *Sent:* Tuesday, September 23, 2014 6:17 PM
> *To:* #IT-MAD DES
> *Subject:* FW: 9/23 prod issue - offline kafka partitions.
>
>
>
> DES Offshore dev,
>
>
>
> Please work with MAA offshore to monitor the kafka broker as we had
> this incident where lot of partitions went offline around 1.45 PM MST
> and MAA has to restart the kafka servers. We may have lost messages
> and we need to see if there is a way to figure out what was the impact.
>
>
>
> Also, check the logs for kafka servers and see if we can figure out
> why did partitions go offline or are un-available? Let us know if you
> find anything 

Re: multi-node and multi-broker kafka cluster setup

2014-09-30 Thread Guozhang Wang
Hello,

In general it is not required to have the kafka brokers installed on the
same nodes of the zk servers, and each node can host multiple kafka
brokers: you just need to make sure they do not share the same port and the
same data dir.

Guozhang

On Mon, Sep 29, 2014 at 8:31 PM, Sa Li  wrote:

> Hi,
> I am kinda newbie to kafka, I plan to build a cluster with multiple nodes,
> and multiple brokers on each node, I can find tutorials for set multiple
> brokers cluster in single node, say
>
> http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/
> Also I can find some instructions for multiple node setup, but with single
> broker on each node. I have not seen any documents to teach me how to setup
> multiple nodes cluster and multiple brokers in each node. I notice some
> documents points out: we should install kafka on each node which makes
> sense, and all the brokers in each node should connect to same zookeeper. I
> am confused since I thought I could setup a zookeeper ensemble cluster
> separately, and all the brokers connecting to this zookeeper cluster and
> this zk cluster doesn’t have to be the server hosting the kafka, but some
> tutorial says I should install zookeeper on each kafka node.
>
> Here is my plan:
> - I have three nodes: kfServer1, kfserver2, kfserver3,
> - kfserver1 and kfserver2 are configured as the zookeeper ensemble, which
> i have done.
>   zk.connect=kfserver1:2181,kfserver2:2181
> - broker1, broker2, broker3 are in kfserver1,
>   broker4, broker5, broker6 are on kfserver2,
>   broker7, broker8, broker9 are on kfserver3.
>
> When I am configuring, the zk DataDir is in local directory of each node,
> instead located at the zk ensemble directory, is that correct? So far, I
> couldnot make above scheme working, anyone have ever made multi-node and
> multi-broker kafka cluster setup?
>
> thanks
>
> Alec
>
>
>


-- 
-- Guozhang


Re: BadVersion state in Kafka Logs

2014-09-30 Thread Joe Stein
Does the patch in KAFKA-1382 apply on the 0.8.1 branch?  If not if you
could make a patch that does would be great.

I will kick off a discussion for KAFKA-1382 and the scala 2.11 for 0.8.1.2
release (and see what others may think we should do like the gradle changes
I think we should do too for src release issues (and the jars in the
repo)).  I will send that on dev/user in a little bit (please comment +1
community support please on that thread for the release).

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Tue, Sep 30, 2014 at 11:10 AM, Seshadri, Balaji  wrote:

> I would love to help you guys to make Kafka best in Pub/Sub, will continue
> doing that whenever I can.
>
> Do we have 0.8.1.2 release tag  or should we apply patch on top of 0.8.1.1
> tag because we need this KAFKA-1382 JIRA ?.
>
> Balaji
>
> From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> Sent: Monday, September 29, 2014 5:21 PM
> To: Seshadri, Balaji
> Cc: users@kafka.apache.org
> Subject: Re: BadVersion state in Kafka Logs
>
> It is difficult to predict an exact date. Though all the discussions of
> the progress and ETA are on the mailing list. You can follow the
> discussions to know the details and/or offer to help out on the outstanding
> issues.
>
> On Mon, Sep 29, 2014 at 3:48 PM, Seshadri, Balaji <
> balaji.sesha...@dish.com> wrote:
> Neha,
>
> Do you know the date in Oct when 0.8.2 is going to be out ?.
>
> Thanks,
>
> Balaji
>
> From: Neha Narkhede [mailto:neha.narkh...@gmail.com neha.narkh...@gmail.com>]
> Sent: Thursday, September 25, 2014 1:08 PM
> To: Seshadri, Balaji
> Cc: users@kafka.apache.org
>
> Subject: Re: BadVersion state in Kafka Logs
>
> We are close to the release. I'd probably expect 0.8.2 sometime in October.
>
> On Thu, Sep 25, 2014 at 10:37 AM, Seshadri, Balaji <
> balaji.sesha...@dish.com> wrote:
> Hi Neha,
>
> Do you know when are you guys releasing 0.8.2 ?.
>
> Thanks,
>
> Balaji
>
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com balaji.sesha...@dish.com>]
> Sent: Thursday, September 25, 2014 9:41 AM
> To: users@kafka.apache.org
> Subject: RE: BadVersion state in Kafka Logs
>
> Thanks for the replay.
>
> Please let me know if we can use trunk as 0.8.2 is not yet released.
>
> Balaji
> 
> From: Neha Narkhede [neha.narkh...@gmail.com neha.narkh...@gmail.com>]
> Sent: Wednesday, September 24, 2014 6:32 PM
> To: users@kafka.apache.org
> Subject: Re: BadVersion state in Kafka Logs
>
> From the logs you've attached, my guess is it's most likely due to
> KAFKA-1382.
>
> Thanks,
> Neha
>
> On Wed, Sep 24, 2014 at 10:48 AM, Seshadri, Balaji <
> balaji.sesha...@dish.com
> > wrote:
>
> > Hi,
> >
> >
> >
> > We got the below error in our logs and our consumers stopped consuming
> > any data ?.It worked only after restart.
> >
> >
> >
> > We would like to confirm that it's because we are running with
> > 0.8-beta version and not 0.8 release version to convince "THE MGMT" guys.
> >
> >
> >
> > Please let me know if it's this KAFKA-1382 causing the issue.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Balaji
> >
> >
> >
> > *From:* Gulia, Vikram
> > *Sent:* Wednesday, September 24, 2014 8:43 AM
> > *To:* Sharma, Navdeep; #IT-MAD DES; #IT-MAA
> > *Cc:* Alam, Mohammad Shah
> > *Subject:* RE: 9/23 prod issue - offline kafka partitions.
> >
> >
> >
> > Adding full MAA distro.
> >
> >
> >
> > DES Offshore looked in to the logs on kafka servers and seems like the
> > issue we encountered yesterday may be described in these threads,
> > please have a look -
> >
> >
> >
> > http://permalink.gmane.org/gmane.comp.apache.kafka.user/1904
> >
> >
> >
> > https://issues.apache.org/jira/browse/KAFKA-1382 (it describes the
> > fix/patch which is available in 0.8.1.2/0.8.2)
> >
> >
> >
> > Thank You,
> >
> > Vikram Gulia
> >
> >
> >
> > *From:* Sharma, Navdeep
> > *Sent:* Wednesday, September 24, 2014 6:53 AM
> > *To:* Gulia, Vikram; #IT-MAD DES
> > *Cc:* #IT-MAA Offshore; Alam, Mohammad Shah
> > *Subject:* RE: 9/23 prod issue - offline kafka partitions.
> >
> >
> >
> > Hi Vikram,
> >
> >
> >
> > We analyzed  below mentioned issue with MAA-Offshore (Abhishek) and
> > found that the error occurred only on 23 Sept. This is  not historical
> > as we checked last 4 days logs.
> >
> >
> >
> > It looks like that consumer got stopped on September 22 2014 for Linux
> > patching activty.MAA started consumer September  23 2014 at 1:00 AM.
> >
> >
> >
> > *Issue *in  server log   *"* *BadVersion for
> > /brokers/topics/rain-burn-in/partitions/121/state"

Re: rebalance brokers

2014-09-30 Thread Guozhang Wang
In 0.7 one is required to manually create partitions for existing topics
when we add new brokers (i.e. modify the zk registration path).

On Mon, Sep 29, 2014 at 9:07 PM, Guangle Fan  wrote:

> Guozhang, yes, the version running on there is still 0.7, you are right,
> there is no such concept of replicas on it. Is there a way to rebalance
> partitions across all brokers when adding new nodes ?
>
> On Mon, Sep 29, 2014 at 6:11 PM, Guozhang Wang  wrote:
>
> > Hi Guangle,
> >
> > Replication is only introduced in 0.8, with 0.7 there should not have
> > leader / follower replicas. Could you verify the version of your Kafka
> > cluster?
> >
> > Guozhang
> >
> > On Mon, Sep 29, 2014 at 4:30 PM, Guangle Fan 
> wrote:
> >
> > > Hi, All
> > >
> > > We have some old kafka nodes that are still running 0.7.
> > >
> > > We disable the replica of topics there. Recently we added some node
> > brokers
> > > to that cluster, I found new nodes won't got written or read even
> though
> > > it's registered correctly in zookeeper.
> > >
> > > I think in this case, it's because all these nodes became followers of
> > > partitions. I need to rebalance leadership of partitions across all
> > > brokers.
> > >
> > > If my thought is correct, how shall I rebalance leadership ?
> > >
> > > Thanks!
> > >
> > > Guangle
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


kafka & docker

2014-09-30 Thread Mingtao Zhang
Hi,

Any one has dockerized kafka working?

Should we specify the ip address?

I expected everything working on just localhost but saw this "SEVERE:
Producer connection to 172.17.0.3:9092 unsuccessful".

Thanks in advance!

Best Regards,
Mingtao


RE: BadVersion state in Kafka Logs

2014-09-30 Thread Seshadri, Balaji
Hi Joe,

I did not try on 0.8.1 branch ,I can try and see if it goes through when I get 
some breather.

Thanks for initiating on 0.8.1.2.

Thanks,

Balaji

-Original Message-
From: Joe Stein [mailto:joe.st...@stealth.ly]
Sent: Tuesday, September 30, 2014 9:34 AM
To: users@kafka.apache.org
Cc: Neha Narkhede
Subject: Re: BadVersion state in Kafka Logs

Does the patch in KAFKA-1382 apply on the 0.8.1 branch?  If not if you could 
make a patch that does would be great.

I will kick off a discussion for KAFKA-1382 and the scala 2.11 for 0.8.1.2 
release (and see what others may think we should do like the gradle changes I 
think we should do too for src release issues (and the jars in the repo)).  I 
will send that on dev/user in a little bit (please comment +1 community support 
please on that thread for the release).

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Tue, Sep 30, 2014 at 11:10 AM, Seshadri, Balaji  wrote:

> I would love to help you guys to make Kafka best in Pub/Sub, will
> continue doing that whenever I can.
>
> Do we have 0.8.1.2 release tag  or should we apply patch on top of
> 0.8.1.1 tag because we need this KAFKA-1382 JIRA ?.
>
> Balaji
>
> From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> Sent: Monday, September 29, 2014 5:21 PM
> To: Seshadri, Balaji
> Cc: users@kafka.apache.org
> Subject: Re: BadVersion state in Kafka Logs
>
> It is difficult to predict an exact date. Though all the discussions
> of the progress and ETA are on the mailing list. You can follow the
> discussions to know the details and/or offer to help out on the
> outstanding issues.
>
> On Mon, Sep 29, 2014 at 3:48 PM, Seshadri, Balaji <
> balaji.sesha...@dish.com> wrote:
> Neha,
>
> Do you know the date in Oct when 0.8.2 is going to be out ?.
>
> Thanks,
>
> Balaji
>
> From: Neha Narkhede [mailto:neha.narkh...@gmail.com neha.narkh...@gmail.com>]
> Sent: Thursday, September 25, 2014 1:08 PM
> To: Seshadri, Balaji
> Cc: users@kafka.apache.org
>
> Subject: Re: BadVersion state in Kafka Logs
>
> We are close to the release. I'd probably expect 0.8.2 sometime in October.
>
> On Thu, Sep 25, 2014 at 10:37 AM, Seshadri, Balaji <
> balaji.sesha...@dish.com> wrote:
> Hi Neha,
>
> Do you know when are you guys releasing 0.8.2 ?.
>
> Thanks,
>
> Balaji
>
> -Original Message-
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com balaji.sesha...@dish.com>]
> Sent: Thursday, September 25, 2014 9:41 AM
> To: users@kafka.apache.org
> Subject: RE: BadVersion state in Kafka Logs
>
> Thanks for the replay.
>
> Please let me know if we can use trunk as 0.8.2 is not yet released.
>
> Balaji
> 
> From: Neha Narkhede [neha.narkh...@gmail.com neha.narkh...@gmail.com>]
> Sent: Wednesday, September 24, 2014 6:32 PM
> To: users@kafka.apache.org
> Subject: Re: BadVersion state in Kafka Logs
>
> From the logs you've attached, my guess is it's most likely due to
> KAFKA-1382.
>
> Thanks,
> Neha
>
> On Wed, Sep 24, 2014 at 10:48 AM, Seshadri, Balaji <
> balaji.sesha...@dish.com
> > wrote:
>
> > Hi,
> >
> >
> >
> > We got the below error in our logs and our consumers stopped
> > consuming any data ?.It worked only after restart.
> >
> >
> >
> > We would like to confirm that it's because we are running with
> > 0.8-beta version and not 0.8 release version to convince "THE MGMT" guys.
> >
> >
> >
> > Please let me know if it's this KAFKA-1382 causing the issue.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Balaji
> >
> >
> >
> > *From:* Gulia, Vikram
> > *Sent:* Wednesday, September 24, 2014 8:43 AM
> > *To:* Sharma, Navdeep; #IT-MAD DES; #IT-MAA
> > *Cc:* Alam, Mohammad Shah
> > *Subject:* RE: 9/23 prod issue - offline kafka partitions.
> >
> >
> >
> > Adding full MAA distro.
> >
> >
> >
> > DES Offshore looked in to the logs on kafka servers and seems like
> > the issue we encountered yesterday may be described in these
> > threads, please have a look -
> >
> >
> >
> > http://permalink.gmane.org/gmane.comp.apache.kafka.user/1904
> >
> >
> >
> > https://issues.apache.org/jira/browse/KAFKA-1382 (it describes the
> > fix/patch which is available in 0.8.1.2/0.8.2)
> >
> >
> >
> > Thank You,
> >
> > Vikram Gulia
> >
> >
> >
> > *From:* Sharma, Navdeep
> > *Sent:* Wednesday, September 24, 2014 6:53 AM
> > *To:* Gulia, Vikram; #IT-MAD DES
> > *Cc:* #IT-MAA Offshore; Alam, Mohammad Shah
> > *Subject:* RE: 9/23 prod issue - offline kafka partitions.
> >
> >
> >
> > Hi Vikram,
> >
> >
> >
> > We analyzed  below mentioned issue with MAA-Offshore (Abhishek) and
> > found that the 

Re: kafka & docker

2014-09-30 Thread Joe Stein
You need to change the advertised hostname.

Take a look https://registry.hub.docker.com/u/stealthly/docker-kafka/ and
https://registry.hub.docker.com/u/stealthly/docker-zookeeper/ we use it
often for local testing here is how to start
https://github.com/stealthly/docker-kafka/blob/master/start-broker.sh e.g.
https://github.com/stealthly/metrics-kafka/blob/master/bootstrap.sh#L13

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Tue, Sep 30, 2014 at 11:46 AM, Mingtao Zhang 
wrote:

> Hi,
>
> Any one has dockerized kafka working?
>
> Should we specify the ip address?
>
> I expected everything working on just localhost but saw this "SEVERE:
> Producer connection to 172.17.0.3:9092 unsuccessful".
>
> Thanks in advance!
>
> Best Regards,
> Mingtao
>


[DISCUSS] 0.8.1.2 Release

2014-09-30 Thread Joe Stein
Hi, I wanted to kick off a specific discussion on a 0.8.1.2 release.

Here are the JIRAs I would like to propose to back port a patch (if not
already done so) and apply them to the 0.8.1 branch for a 0.8.1.2 release

https://issues.apache.org/jira/browse/KAFKA-1502 (source jar is empty)
https://issues.apache.org/jira/browse/KAFKA-1419 (cross build for scala
2.11)
https://issues.apache.org/jira/browse/KAFKA-1382 (Update zkVersion on
partition state update failures)
https://issues.apache.org/jira/browse/KAFKA-1490 (remove gradlew initial
setup output from source distribution)
https://issues.apache.org/jira/browse/KAFKA-1645 (some more jars in our src
release)

If the community and committers can comment on the patches proposed that
would be great. If I missed any bring them up or if you think any I have
proposed shouldn't be int he release bring that up too please.

Once we have consensus on this thread my thought was that I would apply and
commit the agreed to tickets to the 0.8.1 branch. If any tickets don't
apply of course a back port patch has to happen through our standard
process (not worried about that we have some engineering cycles to
contribute to making that happen). Once that is all done, I will build
0.8.1.2 release artifacts and call a VOTE for RC1.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


Still Stale TopicMetadata

2014-09-30 Thread Christofer Hedbrandh
Hi Kafka users,

Was there ever a JIRA ticket filed for this?

"Re: Stale TopicMetadata"

http://mail-archives.apache.org/mod_mbox/kafka-users/201307.mbox/%3ce238b018f88c39429066fc8c4bfd0c2e019be...@esv4-mbx01.linkedin.biz%3E

As far as I can tell this is still an issue in 0.8.1.1

Using the python client (VERSION 0.2-alpha):
client = KafkaClient(host, port)
request_id = client._next_id()
KafkaProtocol.encode_metadata_request(client.client_id, request_id,
topic_names)
response = client._send_broker_unaware_request(request_id, request)
brokers, topics = KafkaProtocol.decode_metadata_response(response)

the meta data returned tells me only a subset of the replicas are in-sync

E.g.
{'test-topic-1': {0: PartitionMetadata(topic='test-topic-1', partition=0,
leader=2018497752, replicas=(2018497752, 915105820, 1417963519),
isr=(2018497752,))}}

but when I fetch meta data with the kafka-topics.sh --describe tool, it
looks like all replicas are in sync.

Topic:test-topic-1 PartitionCount:1 ReplicationFactor:3 Configs:retention.ms
=60480
Topic: test-topic-1 Partition: 0 Leader: 2018497752 Replicas:
2018497752,915105820,1417963519 Isr: 2018497752,915105820,1417963519

I looked around for a JIRA ticket for this but couldn't find one. Please
let me know where this bug is tracked.

Thanks,
Christofer


Re: Still Stale TopicMetadata

2014-09-30 Thread Joe Stein
I believe this is the ticket https://issues.apache.org/jira/browse/KAFKA-972

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Tue, Sep 30, 2014 at 1:00 PM, Christofer Hedbrandh <
christo...@knewton.com> wrote:

> Hi Kafka users,
>
> Was there ever a JIRA ticket filed for this?
>
> "Re: Stale TopicMetadata"
>
>
> http://mail-archives.apache.org/mod_mbox/kafka-users/201307.mbox/%3ce238b018f88c39429066fc8c4bfd0c2e019be...@esv4-mbx01.linkedin.biz%3E
>
> As far as I can tell this is still an issue in 0.8.1.1
>
> Using the python client (VERSION 0.2-alpha):
> client = KafkaClient(host, port)
> request_id = client._next_id()
> KafkaProtocol.encode_metadata_request(client.client_id, request_id,
> topic_names)
> response = client._send_broker_unaware_request(request_id, request)
> brokers, topics = KafkaProtocol.decode_metadata_response(response)
>
> the meta data returned tells me only a subset of the replicas are in-sync
>
> E.g.
> {'test-topic-1': {0: PartitionMetadata(topic='test-topic-1', partition=0,
> leader=2018497752, replicas=(2018497752, 915105820, 1417963519),
> isr=(2018497752,))}}
>
> but when I fetch meta data with the kafka-topics.sh --describe tool, it
> looks like all replicas are in sync.
>
> Topic:test-topic-1 PartitionCount:1 ReplicationFactor:3 Configs:
> retention.ms
> =60480
> Topic: test-topic-1 Partition: 0 Leader: 2018497752 Replicas:
> 2018497752,915105820,1417963519 Isr: 2018497752,915105820,1417963519
>
> I looked around for a JIRA ticket for this but couldn't find one. Please
> let me know where this bug is tracked.
>
> Thanks,
> Christofer
>


Created topic by 2 partitions, only can use the one partition

2014-09-30 Thread Jiang Jacky
Hi, Guys
It is very weird, I created a topic with 2 partitions couple weeks ago, and
I can only production the message to partition 0, not partition 1, but for
now, I created a new topic again with 2 partitions, it does work.
So whats problem of the old topic? I tried to describe the old topic, I
found the following message

Topic:   Partition: 0Leader: 1   Replicas: 1
Isr: 1
Topic:   Partition: 1Leader: 2   Replicas: 2
Isr: 2

Please let me know, if the topic is screwed up.

Thank you


Re: kafka & docker

2014-09-30 Thread Buntu Dev
Thanks Joe.. seems quite handy. Is there a 'Kafka->HDFS with Camus' docker
as well one can play around with?

On Tue, Sep 30, 2014 at 9:00 AM, Joe Stein  wrote:

> You need to change the advertised hostname.
>
> Take a look https://registry.hub.docker.com/u/stealthly/docker-kafka/ and
> https://registry.hub.docker.com/u/stealthly/docker-zookeeper/ we use it
> often for local testing here is how to start
> https://github.com/stealthly/docker-kafka/blob/master/start-broker.sh e.g.
> https://github.com/stealthly/metrics-kafka/blob/master/bootstrap.sh#L13
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>
> On Tue, Sep 30, 2014 at 11:46 AM, Mingtao Zhang 
> wrote:
>
> > Hi,
> >
> > Any one has dockerized kafka working?
> >
> > Should we specify the ip address?
> >
> > I expected everything working on just localhost but saw this "SEVERE:
> > Producer connection to 172.17.0.3:9092 unsuccessful".
> >
> > Thanks in advance!
> >
> > Best Regards,
> > Mingtao
> >
>


Re: kafka & docker

2014-09-30 Thread Daniel Compton
Hi Joe

What's the story for persisting data with Docker? Do you use a data volume or 
do you just start fresh every time you start the Docker instance?

Daniel.

> On 1/10/2014, at 7:13 am, Buntu Dev  wrote:
> 
> Thanks Joe.. seems quite handy. Is there a 'Kafka->HDFS with Camus' docker
> as well one can play around with?
> 
>> On Tue, Sep 30, 2014 at 9:00 AM, Joe Stein  wrote:
>> 
>> You need to change the advertised hostname.
>> 
>> Take a look https://registry.hub.docker.com/u/stealthly/docker-kafka/ and
>> https://registry.hub.docker.com/u/stealthly/docker-zookeeper/ we use it
>> often for local testing here is how to start
>> https://github.com/stealthly/docker-kafka/blob/master/start-broker.sh e.g.
>> https://github.com/stealthly/metrics-kafka/blob/master/bootstrap.sh#L13
>> 
>> /***
>> Joe Stein
>> Founder, Principal Consultant
>> Big Data Open Source Security LLC
>> http://www.stealth.ly
>> Twitter: @allthingshadoop 
>> /
>> 
>> On Tue, Sep 30, 2014 at 11:46 AM, Mingtao Zhang 
>> wrote:
>> 
>>> Hi,
>>> 
>>> Any one has dockerized kafka working?
>>> 
>>> Should we specify the ip address?
>>> 
>>> I expected everything working on just localhost but saw this "SEVERE:
>>> Producer connection to 172.17.0.3:9092 unsuccessful".
>>> 
>>> Thanks in advance!
>>> 
>>> Best Regards,
>>> Mingtao
>> 


Re: kafka & docker

2014-09-30 Thread Joe Stein
<<  Is there a 'Kafka->HDFS with Camus' docker as well one can play around
with?

Not that I know of.  These folks
http://blog.sequenceiq.com/blog/2014/09/15/hadoop-2-5-1-docker/ have nice
Hadoop docker containers may be a good starting point.

<< What's the story for persisting data with Docker? Do you use a data
volume or do you just start fresh every time you start the Docker instance?

We only use it for development & testing so starting fresh and
bootstrapping data is how we use it when we do. Volumes should work fine
though for persisting if need be.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Tue, Sep 30, 2014 at 3:00 PM, Daniel Compton 
wrote:

> Hi Joe
>
> What's the story for persisting data with Docker? Do you use a data volume
> or do you just start fresh every time you start the Docker instance?
>
> Daniel.
>
> > On 1/10/2014, at 7:13 am, Buntu Dev  wrote:
> >
> > Thanks Joe.. seems quite handy. Is there a 'Kafka->HDFS with Camus'
> docker
> > as well one can play around with?
> >
> >> On Tue, Sep 30, 2014 at 9:00 AM, Joe Stein 
> wrote:
> >>
> >> You need to change the advertised hostname.
> >>
> >> Take a look https://registry.hub.docker.com/u/stealthly/docker-kafka/
> and
> >> https://registry.hub.docker.com/u/stealthly/docker-zookeeper/ we use it
> >> often for local testing here is how to start
> >> https://github.com/stealthly/docker-kafka/blob/master/start-broker.sh
> e.g.
> >> https://github.com/stealthly/metrics-kafka/blob/master/bootstrap.sh#L13
> >>
> >> /***
> >> Joe Stein
> >> Founder, Principal Consultant
> >> Big Data Open Source Security LLC
> >> http://www.stealth.ly
> >> Twitter: @allthingshadoop 
> >> /
> >>
> >> On Tue, Sep 30, 2014 at 11:46 AM, Mingtao Zhang  >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Any one has dockerized kafka working?
> >>>
> >>> Should we specify the ip address?
> >>>
> >>> I expected everything working on just localhost but saw this "SEVERE:
> >>> Producer connection to 172.17.0.3:9092 unsuccessful".
> >>>
> >>> Thanks in advance!
> >>>
> >>> Best Regards,
> >>> Mingtao
> >>
>


Re: [DISCUSS] 0.8.1.2 Release

2014-09-30 Thread Neha Narkhede
Can we discuss the need for 0.8.1.2? I'm wondering if it's related to the
timeline of 0.8.2 in any way? For instance, if we can get 0.8.2 out in the
next 2-3 weeks, do we still need to get 0.8.1.2 out or can people just
upgrade to 0.8.2?

On Tue, Sep 30, 2014 at 9:53 AM, Joe Stein  wrote:

> Hi, I wanted to kick off a specific discussion on a 0.8.1.2 release.
>
> Here are the JIRAs I would like to propose to back port a patch (if not
> already done so) and apply them to the 0.8.1 branch for a 0.8.1.2 release
>
> https://issues.apache.org/jira/browse/KAFKA-1502 (source jar is empty)
> https://issues.apache.org/jira/browse/KAFKA-1419 (cross build for scala
> 2.11)
> https://issues.apache.org/jira/browse/KAFKA-1382 (Update zkVersion on
> partition state update failures)
> https://issues.apache.org/jira/browse/KAFKA-1490 (remove gradlew initial
> setup output from source distribution)
> https://issues.apache.org/jira/browse/KAFKA-1645 (some more jars in our
> src
> release)
>
> If the community and committers can comment on the patches proposed that
> would be great. If I missed any bring them up or if you think any I have
> proposed shouldn't be int he release bring that up too please.
>
> Once we have consensus on this thread my thought was that I would apply and
> commit the agreed to tickets to the 0.8.1 branch. If any tickets don't
> apply of course a back port patch has to happen through our standard
> process (not worried about that we have some engineering cycles to
> contribute to making that happen). Once that is all done, I will build
> 0.8.1.2 release artifacts and call a VOTE for RC1.
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>


RE: [DISCUSS] 0.8.1.2 Release

2014-09-30 Thread Seshadri, Balaji
In DISH we are having issues in 0.8-beta version used in PROD, it's crashing 
every 2 days and becoming a blocker for us.

It would be great if we get 0.8.2 or 0.8.1.2 whichever is faster as we can't 
wait for 3 weeks as our new Order Management system is going to sit on top of 
Kafka.

-Original Message-
From: Neha Narkhede [mailto:neha.narkh...@gmail.com] 
Sent: Tuesday, September 30, 2014 1:37 PM
To: users@kafka.apache.org
Cc: d...@kafka.apache.org
Subject: Re: [DISCUSS] 0.8.1.2 Release

Can we discuss the need for 0.8.1.2? I'm wondering if it's related to the 
timeline of 0.8.2 in any way? For instance, if we can get 0.8.2 out in the next 
2-3 weeks, do we still need to get 0.8.1.2 out or can people just upgrade to 
0.8.2?

On Tue, Sep 30, 2014 at 9:53 AM, Joe Stein  wrote:

> Hi, I wanted to kick off a specific discussion on a 0.8.1.2 release.
>
> Here are the JIRAs I would like to propose to back port a patch (if 
> not already done so) and apply them to the 0.8.1 branch for a 0.8.1.2 
> release
>
> https://issues.apache.org/jira/browse/KAFKA-1502 (source jar is empty)
> https://issues.apache.org/jira/browse/KAFKA-1419 (cross build for 
> scala
> 2.11)
> https://issues.apache.org/jira/browse/KAFKA-1382 (Update zkVersion on 
> partition state update failures)
> https://issues.apache.org/jira/browse/KAFKA-1490 (remove gradlew 
> initial setup output from source distribution)
> https://issues.apache.org/jira/browse/KAFKA-1645 (some more jars in 
> our src
> release)
>
> If the community and committers can comment on the patches proposed 
> that would be great. If I missed any bring them up or if you think any 
> I have proposed shouldn't be int he release bring that up too please.
>
> Once we have consensus on this thread my thought was that I would 
> apply and commit the agreed to tickets to the 0.8.1 branch. If any 
> tickets don't apply of course a back port patch has to happen through 
> our standard process (not worried about that we have some engineering 
> cycles to contribute to making that happen). Once that is all done, I 
> will build
> 0.8.1.2 release artifacts and call a VOTE for RC1.
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>


Re: [DISCUSS] 0.8.1.2 Release

2014-09-30 Thread Jonathan Weeks
I was one asking for 0.8.1.2 a few weeks back, when 0.8.2 was at least 6-8 
weeks out.

If we truly believe that 0.8.2 will go “golden” and stable in 2-3 weeks, I, for 
one, don’t need a 0.8.1.2, but it depends on the confidence in shipping 0.8.2 
soonish.

YMMV,

-Jonathan


On Sep 30, 2014, at 12:37 PM, Neha Narkhede  wrote:

> Can we discuss the need for 0.8.1.2? I'm wondering if it's related to the
> timeline of 0.8.2 in any way? For instance, if we can get 0.8.2 out in the
> next 2-3 weeks, do we still need to get 0.8.1.2 out or can people just
> upgrade to 0.8.2?
> 
> On Tue, Sep 30, 2014 at 9:53 AM, Joe Stein  wrote:
> 
>> Hi, I wanted to kick off a specific discussion on a 0.8.1.2 release.
>> 
>> Here are the JIRAs I would like to propose to back port a patch (if not
>> already done so) and apply them to the 0.8.1 branch for a 0.8.1.2 release
>> 
>> https://issues.apache.org/jira/browse/KAFKA-1502 (source jar is empty)
>> https://issues.apache.org/jira/browse/KAFKA-1419 (cross build for scala
>> 2.11)
>> https://issues.apache.org/jira/browse/KAFKA-1382 (Update zkVersion on
>> partition state update failures)
>> https://issues.apache.org/jira/browse/KAFKA-1490 (remove gradlew initial
>> setup output from source distribution)
>> https://issues.apache.org/jira/browse/KAFKA-1645 (some more jars in our
>> src
>> release)
>> 
>> If the community and committers can comment on the patches proposed that
>> would be great. If I missed any bring them up or if you think any I have
>> proposed shouldn't be int he release bring that up too please.
>> 
>> Once we have consensus on this thread my thought was that I would apply and
>> commit the agreed to tickets to the 0.8.1 branch. If any tickets don't
>> apply of course a back port patch has to happen through our standard
>> process (not worried about that we have some engineering cycles to
>> contribute to making that happen). Once that is all done, I will build
>> 0.8.1.2 release artifacts and call a VOTE for RC1.
>> 
>> /***
>> Joe Stein
>> Founder, Principal Consultant
>> Big Data Open Source Security LLC
>> http://www.stealth.ly
>> Twitter: @allthingshadoop 
>> /
>> 



Re: BadVersion state in Kafka Logs

2014-09-30 Thread Joe Stein
Have you tried increasing your broker's zookeeper session timeout as a work
around for now to alleviate the issue?  Is that an option for you? Assuming
that is the culprit you are timing zk sessions out and bumping into
KAFKA-1382 on the reconnect? Not knowing enough about what is going on with
the cluster it is hard to say if anything negative will come from it but
seems like it might be a an approach to try... if you can figure out what
is causing the session to timeout and fix *that* it would be a solution
also if it is happening every couple days (as another email thread
states) something is going on that may not just be fixed by a single patch.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Tue, Sep 30, 2014 at 11:49 AM, Seshadri, Balaji  wrote:

> Hi Joe,
>
> I did not try on 0.8.1 branch ,I can try and see if it goes through when I
> get some breather.
>
> Thanks for initiating on 0.8.1.2.
>
> Thanks,
>
> Balaji
>
> -Original Message-
> From: Joe Stein [mailto:joe.st...@stealth.ly]
> Sent: Tuesday, September 30, 2014 9:34 AM
> To: users@kafka.apache.org
> Cc: Neha Narkhede
> Subject: Re: BadVersion state in Kafka Logs
>
> Does the patch in KAFKA-1382 apply on the 0.8.1 branch?  If not if you
> could make a patch that does would be great.
>
> I will kick off a discussion for KAFKA-1382 and the scala 2.11 for 0.8.1.2
> release (and see what others may think we should do like the gradle changes
> I think we should do too for src release issues (and the jars in the
> repo)).  I will send that on dev/user in a little bit (please comment +1
> community support please on that thread for the release).
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>
> On Tue, Sep 30, 2014 at 11:10 AM, Seshadri, Balaji <
> balaji.sesha...@dish.com
> > wrote:
>
> > I would love to help you guys to make Kafka best in Pub/Sub, will
> > continue doing that whenever I can.
> >
> > Do we have 0.8.1.2 release tag  or should we apply patch on top of
> > 0.8.1.1 tag because we need this KAFKA-1382 JIRA ?.
> >
> > Balaji
> >
> > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > Sent: Monday, September 29, 2014 5:21 PM
> > To: Seshadri, Balaji
> > Cc: users@kafka.apache.org
> > Subject: Re: BadVersion state in Kafka Logs
> >
> > It is difficult to predict an exact date. Though all the discussions
> > of the progress and ETA are on the mailing list. You can follow the
> > discussions to know the details and/or offer to help out on the
> > outstanding issues.
> >
> > On Mon, Sep 29, 2014 at 3:48 PM, Seshadri, Balaji <
> > balaji.sesha...@dish.com> wrote:
> > Neha,
> >
> > Do you know the date in Oct when 0.8.2 is going to be out ?.
> >
> > Thanks,
> >
> > Balaji
> >
> > From: Neha Narkhede [mailto:neha.narkh...@gmail.com > neha.narkh...@gmail.com>]
> > Sent: Thursday, September 25, 2014 1:08 PM
> > To: Seshadri, Balaji
> > Cc: users@kafka.apache.org
> >
> > Subject: Re: BadVersion state in Kafka Logs
> >
> > We are close to the release. I'd probably expect 0.8.2 sometime in
> October.
> >
> > On Thu, Sep 25, 2014 at 10:37 AM, Seshadri, Balaji <
> > balaji.sesha...@dish.com> wrote:
> > Hi Neha,
> >
> > Do you know when are you guys releasing 0.8.2 ?.
> >
> > Thanks,
> >
> > Balaji
> >
> > -Original Message-
> > From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com > balaji.sesha...@dish.com>]
> > Sent: Thursday, September 25, 2014 9:41 AM
> > To: users@kafka.apache.org
> > Subject: RE: BadVersion state in Kafka Logs
> >
> > Thanks for the replay.
> >
> > Please let me know if we can use trunk as 0.8.2 is not yet released.
> >
> > Balaji
> > 
> > From: Neha Narkhede [neha.narkh...@gmail.com > neha.narkh...@gmail.com>]
> > Sent: Wednesday, September 24, 2014 6:32 PM
> > To: users@kafka.apache.org
> > Subject: Re: BadVersion state in Kafka Logs
> >
> > From the logs you've attached, my guess is it's most likely due to
> > KAFKA-1382.
> >
> > Thanks,
> > Neha
> >
> > On Wed, Sep 24, 2014 at 10:48 AM, Seshadri, Balaji <
> > balaji.sesha...@dish.com
> > > wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > We got the below error in our logs and our consumers stopped
> > > consuming any data ?.It worked only after restart.
> > >
> > >
> > >
> > > We would like to confirm that it's because we are running with
> > > 0.8-beta version and not 0.8 release version t

Re: multi-node and multi-broker kafka cluster setup

2014-09-30 Thread Daniel Compton
Hi Sa

While it's possible to run multiple brokers on a single machine, I would be 
interested to hear why you would want to. Kafka is very efficient and can use 
all of the system resources under load. Running multiple brokers would increase 
zookeeper load, force resource sharing between the Kafka processes, and require 
more admin overhead. 

Additionally, you almost certainly want to run three Zookeepers. Two Zookeepers 
gives you no more reliability than one because ZK voting is based on a majority 
vote. If neither ZK can reach a majority on its own then it will fail. More 
info at http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7

Daniel.

> On 1/10/2014, at 4:35 am, Guozhang Wang  wrote:
> 
> Hello,
> 
> In general it is not required to have the kafka brokers installed on the
> same nodes of the zk servers, and each node can host multiple kafka
> brokers: you just need to make sure they do not share the same port and the
> same data dir.
> 
> Guozhang
> 
>> On Mon, Sep 29, 2014 at 8:31 PM, Sa Li  wrote:
>> 
>> Hi,
>> I am kinda newbie to kafka, I plan to build a cluster with multiple nodes,
>> and multiple brokers on each node, I can find tutorials for set multiple
>> brokers cluster in single node, say
>> 
>> http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/
>> Also I can find some instructions for multiple node setup, but with single
>> broker on each node. I have not seen any documents to teach me how to setup
>> multiple nodes cluster and multiple brokers in each node. I notice some
>> documents points out: we should install kafka on each node which makes
>> sense, and all the brokers in each node should connect to same zookeeper. I
>> am confused since I thought I could setup a zookeeper ensemble cluster
>> separately, and all the brokers connecting to this zookeeper cluster and
>> this zk cluster doesn’t have to be the server hosting the kafka, but some
>> tutorial says I should install zookeeper on each kafka node.
>> 
>> Here is my plan:
>> - I have three nodes: kfServer1, kfserver2, kfserver3,
>> - kfserver1 and kfserver2 are configured as the zookeeper ensemble, which
>> i have done.
>>  zk.connect=kfserver1:2181,kfserver2:2181
>> - broker1, broker2, broker3 are in kfserver1,
>>  broker4, broker5, broker6 are on kfserver2,
>>  broker7, broker8, broker9 are on kfserver3.
>> 
>> When I am configuring, the zk DataDir is in local directory of each node,
>> instead located at the zk ensemble directory, is that correct? So far, I
>> couldnot make above scheme working, anyone have ever made multi-node and
>> multi-broker kafka cluster setup?
>> 
>> thanks
>> 
>> Alec
> 
> 
> -- 
> -- Guozhang


RE: BadVersion state in Kafka Logs

2014-09-30 Thread Seshadri, Balaji
The zookeeper session timeout is 60 secs ,but that did not help.

We are having broker crash and unresponsive, we got the "conditional update" 
failed error when broker crashed which confirmed that it is because of 
KAFKA-1382.

server.log.2014-09-23:2014-09-23 13:54:48 ERROR utils.ZkUtils$ - Conditional 
update of path 
/brokers/topics/dish-promo-application-access/partitions/128/state with data { 
"controller_epoch":40, "isr":[ 6, 1 ], "leader":1, "leader_epoch":99, 
"version":1 } and expected version 150 failed due to 
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
BadVersion for 
/brokers/topics/dish-promo-application-access/partitions/128/state

We are in very old version 0.8-beta so it's not just patch but switching to 
stable release version which also has the patch.

-Original Message-
From: Joe Stein [mailto:joe.st...@stealth.ly]
Sent: Tuesday, September 30, 2014 2:01 PM
To: users@kafka.apache.org
Cc: Neha Narkhede
Subject: Re: BadVersion state in Kafka Logs

Have you tried increasing your broker's zookeeper session timeout as a work 
around for now to alleviate the issue?  Is that an option for you? Assuming 
that is the culprit you are timing zk sessions out and bumping into
KAFKA-1382 on the reconnect? Not knowing enough about what is going on with the 
cluster it is hard to say if anything negative will come from it but seems like 
it might be a an approach to try... if you can figure out what is causing the 
session to timeout and fix *that* it would be a solution also if it is 
happening every couple days (as another email thread
states) something is going on that may not just be fixed by a single patch.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Tue, Sep 30, 2014 at 11:49 AM, Seshadri, Balaji  wrote:

> Hi Joe,
>
> I did not try on 0.8.1 branch ,I can try and see if it goes through
> when I get some breather.
>
> Thanks for initiating on 0.8.1.2.
>
> Thanks,
>
> Balaji
>
> -Original Message-
> From: Joe Stein [mailto:joe.st...@stealth.ly]
> Sent: Tuesday, September 30, 2014 9:34 AM
> To: users@kafka.apache.org
> Cc: Neha Narkhede
> Subject: Re: BadVersion state in Kafka Logs
>
> Does the patch in KAFKA-1382 apply on the 0.8.1 branch?  If not if you
> could make a patch that does would be great.
>
> I will kick off a discussion for KAFKA-1382 and the scala 2.11 for
> 0.8.1.2 release (and see what others may think we should do like the
> gradle changes I think we should do too for src release issues (and
> the jars in the repo)).  I will send that on dev/user in a little bit
> (please comment +1 community support please on that thread for the release).
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>
> On Tue, Sep 30, 2014 at 11:10 AM, Seshadri, Balaji <
> balaji.sesha...@dish.com
> > wrote:
>
> > I would love to help you guys to make Kafka best in Pub/Sub, will
> > continue doing that whenever I can.
> >
> > Do we have 0.8.1.2 release tag  or should we apply patch on top of
> > 0.8.1.1 tag because we need this KAFKA-1382 JIRA ?.
> >
> > Balaji
> >
> > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > Sent: Monday, September 29, 2014 5:21 PM
> > To: Seshadri, Balaji
> > Cc: users@kafka.apache.org
> > Subject: Re: BadVersion state in Kafka Logs
> >
> > It is difficult to predict an exact date. Though all the discussions
> > of the progress and ETA are on the mailing list. You can follow the
> > discussions to know the details and/or offer to help out on the
> > outstanding issues.
> >
> > On Mon, Sep 29, 2014 at 3:48 PM, Seshadri, Balaji <
> > balaji.sesha...@dish.com> wrote:
> > Neha,
> >
> > Do you know the date in Oct when 0.8.2 is going to be out ?.
> >
> > Thanks,
> >
> > Balaji
> >
> > From: Neha Narkhede [mailto:neha.narkh...@gmail.com > neha.narkh...@gmail.com>]
> > Sent: Thursday, September 25, 2014 1:08 PM
> > To: Seshadri, Balaji
> > Cc: users@kafka.apache.org
> >
> > Subject: Re: BadVersion state in Kafka Logs
> >
> > We are close to the release. I'd probably expect 0.8.2 sometime in
> October.
> >
> > On Thu, Sep 25, 2014 at 10:37 AM, Seshadri, Balaji <
> > balaji.sesha...@dish.com> wrote:
> > Hi Neha,
> >
> > Do you know when are you guys releasing 0.8.2 ?.
> >
> > Thanks,
> >
> > Balaji
> >
> > -Original Message-
> > From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com > balaji.sesha...@dish.com>]
> > Sent: Thursday, September 25, 2014 9:41 AM
> > To: users@kafka.apache.org

Re: BadVersion state in Kafka Logs

2014-09-30 Thread Joe Stein
It sounds like you have a much deeper rooted problem.  Is zookeeper
swapping?  Something has to be causing this.  After you fix this symptom
you will probably start to see constant leader elections and the isr
shrinking/growing and constant consumer rebalancing (or at least every
minute) and a herd affect up/down stream occuring.  You need to figure out
what is causing the long session timeout and resolve that, IMHO.  Zookeeper
health is the first place to look.  Next would be the network.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Tue, Sep 30, 2014 at 4:57 PM, Seshadri, Balaji 
wrote:

> The zookeeper session timeout is 60 secs ,but that did not help.
>
> We are having broker crash and unresponsive, we got the "conditional
> update" failed error when broker crashed which confirmed that it is because
> of KAFKA-1382.
>
> server.log.2014-09-23:2014-09-23 13:54:48 ERROR utils.ZkUtils$ -
> Conditional update of path
> /brokers/topics/dish-promo-application-access/partitions/128/state with
> data { "controller_epoch":40, "isr":[ 6, 1 ], "leader":1,
> "leader_epoch":99, "version":1 } and expected version 150 failed due to
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for
> /brokers/topics/dish-promo-application-access/partitions/128/state
>
> We are in very old version 0.8-beta so it's not just patch but switching
> to stable release version which also has the patch.
>
> -Original Message-
> From: Joe Stein [mailto:joe.st...@stealth.ly]
> Sent: Tuesday, September 30, 2014 2:01 PM
> To: users@kafka.apache.org
> Cc: Neha Narkhede
> Subject: Re: BadVersion state in Kafka Logs
>
> Have you tried increasing your broker's zookeeper session timeout as a
> work around for now to alleviate the issue?  Is that an option for you?
> Assuming that is the culprit you are timing zk sessions out and bumping into
> KAFKA-1382 on the reconnect? Not knowing enough about what is going on
> with the cluster it is hard to say if anything negative will come from it
> but seems like it might be a an approach to try... if you can figure out
> what is causing the session to timeout and fix *that* it would be a
> solution also if it is happening every couple days (as another email
> thread
> states) something is going on that may not just be fixed by a single patch.
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>
> On Tue, Sep 30, 2014 at 11:49 AM, Seshadri, Balaji <
> balaji.sesha...@dish.com
> > wrote:
>
> > Hi Joe,
> >
> > I did not try on 0.8.1 branch ,I can try and see if it goes through
> > when I get some breather.
> >
> > Thanks for initiating on 0.8.1.2.
> >
> > Thanks,
> >
> > Balaji
> >
> > -Original Message-
> > From: Joe Stein [mailto:joe.st...@stealth.ly]
> > Sent: Tuesday, September 30, 2014 9:34 AM
> > To: users@kafka.apache.org
> > Cc: Neha Narkhede
> > Subject: Re: BadVersion state in Kafka Logs
> >
> > Does the patch in KAFKA-1382 apply on the 0.8.1 branch?  If not if you
> > could make a patch that does would be great.
> >
> > I will kick off a discussion for KAFKA-1382 and the scala 2.11 for
> > 0.8.1.2 release (and see what others may think we should do like the
> > gradle changes I think we should do too for src release issues (and
> > the jars in the repo)).  I will send that on dev/user in a little bit
> > (please comment +1 community support please on that thread for the
> release).
> >
> > /***
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop 
> > /
> >
> > On Tue, Sep 30, 2014 at 11:10 AM, Seshadri, Balaji <
> > balaji.sesha...@dish.com
> > > wrote:
> >
> > > I would love to help you guys to make Kafka best in Pub/Sub, will
> > > continue doing that whenever I can.
> > >
> > > Do we have 0.8.1.2 release tag  or should we apply patch on top of
> > > 0.8.1.1 tag because we need this KAFKA-1382 JIRA ?.
> > >
> > > Balaji
> > >
> > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > > Sent: Monday, September 29, 2014 5:21 PM
> > > To: Seshadri, Balaji
> > > Cc: users@kafka.apache.org
> > > Subject: Re: BadVersion state in Kafka Logs
> > >
> > > It is difficult to predict an exact date. Though all the discussions
> > > of the progress and ETA are on the mailing list. You can follow the
> > > discussions to know the details and/or offer to help out on the
> > > outstanding issues.
> > >

Re: Zookeeper reconnect failed due to 'state changed (Expired)'

2014-09-30 Thread Jun Rao
With ack=1, acked messages could be lost when the leader fails.

Thanks,

Jun

On Mon, Sep 29, 2014 at 8:04 AM, Andrew Otto  wrote:

> This happened again to me this weekend.  I've done some sleuthing, and
> definitely can see some crazy paging stats when this lock up happens.  For
> the curious, more info can be found here:
> https://bugzilla.wikimedia.org/show_bug.cgi?id=69667.  I had tuned
> dirty_expire_centisecs from 30 seconds to 10, but this still happened
> again.  I'll continue to troubleshoot and tune.  This slow going because it
> is not regularly reproducible.  I have to make a single change and then
> wait a week or two for the timeout to occur.
>
> Here's a related question.  When the timeout happens, we lose some
> messages.  Our producer is varnishkafka, which uses the librdkafka producer
> client.  librdkafka keeps track of produce errors.  We
> have kafka.topic.request.required.acks = 1.  According to librdkafka, all
> messages sent have been ACKed by the leader of the partition to which the
> messages are sent.  Also, when we lose messages due to this timeout, the
> broker that times out is always the controller.  When it attempts to
> reconnect to Zookeeper, we see:
>
>   INFO  kafka.utils.ZkUtils$  - conflict in /controller data:
> {"version":1,"brokerid":21,"timestamp":"1411879981756"} stored data:
> {"version":1,"brokerid":22,"timestamp":"1407187809296"}
>
> In the case when a controller drops out of the ISR for a few seconds, is it
> possible for this confused broker to drop ACKed messages?
>
>
>
> On Thu, Jul 3, 2014 at 12:48 AM, Jun Rao  wrote:
>
> > Are you on Linux? We have seen this pattern (user/sys time low and real
> > time high in GC time) before. In our case, the problem was due to disk
> > I/Os. When there are lots of dirty pages (in our case, this is caused by
> > log4j logging), Linux can draft user threads (in this case GC threads) to
> > flush the dirty pages. So, all those time in real was spent on disk I/Os,
> > rather than real GCs. The fix is to tune dirty_expire_centisecs and
> > dirty_writeback_centisecs
> > to flush dirty pages more frequently to avoid such drafting.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jul 2, 2014 at 1:32 PM, Andrew Otto  wrote:
> >
> > > Hi again!
> > >
> > > I've been having this issue consistently since I first started this
> > thread,
> > > but it was happening infrequently enough for me to brush it aside and
> > just
> > > run an election to rebalance brokers.
> > >
> > > I recently expanded (and reinstalled) our Kafka cluster so that it now
> > has
> > > 4 brokers with a default replication factor of 3 for each partition.  I
> > > also switched over to the G1GC as recommended here:
> > > https://kafka.apache.org/081/ops.html (even though we are still
> running
> > > Kafka 0.8.0; we hope to upgrade soon).
> > >
> > > Now, only one of the 4 brokers (analytics1021, the same problem broker
> we
> > > saw before) gets its ZK connection expired even more frequently.
> > >  Previously it was less than once a week, now I am seeing this happen
> > > multiple times a day.
> > >
> > > I've posted all the relevant logs from a recent event here:
> > > https://gist.github.com/ottomata/e42480446c627ea0af22
> > >
> > > This includes the GC log on the offending Kafka broker during the time
> > this
> > > happened.  I am pretty green when it comes to GC tuning, but I do see
> > this
> > > interesting stat:
> > >
> > >
> > >  [Times: user=0.14 sys=0.00, real=11.47 secs]
> > >
> > > Did Kafka's JVM really just take 11.47 secs to do a GC there?  I'm
> > > probably missing something, but I don't see which part of that real
> > > time summary makes up the bulk of that GC time
> > >
> > > This is strange, riight?  This broker is identically configured to all
> > > its peers, and should be handling on average the exact same amount and
> > > type of traffic.  Anyone have any advice?
> > >
> > > Thanks!
> > > -Andrew Otto
> > >
> > >
> > >
> > >
> > > On Fri, Mar 21, 2014 at 6:48 PM, Neha Narkhede <
> neha.narkh...@gmail.com>
> > > wrote:
> > >
> > > > I see, that makes sense. Thanks a lot for clarifying!
> > > >
> > > > -Neha
> > > >
> > > >
> > > > On Fri, Mar 21, 2014 at 11:01 AM, Bae, Jae Hyeon  >
> > > > wrote:
> > > >
> > > > > Let me clarify the situation. I forgot to mention that my case
> might
> > > not
> > > > be
> > > > > general one because Netflix is using Apache Curator as the main
> > > zookeeper
> > > > > client and ZkClient in Kafka should be bridged to Apache Curator,
> so
> > > the
> > > > > behavior I have seen might not be general one.
> > > > >
> > > > > Kafka's ZKSessionExpireListener.handleNewSession() is reinstating
> all
> > > > > ephemeral nodes and watchers but handleNewSession() was not kicked
> in
> > > my
> > > > > case. So, I created Netflix internal version of ZkClient to replace
> > > > > ephemeral node creation and watcher reinstating.
> > > > >
> > > > > I have a plan to remove all external dependency from Kafka soon

Re: Created topic by 2 partitions, only can use the one partition

2014-09-30 Thread Guozhang Wang
Hi Jiang,

Which producer client did you use? And did you specify any keys for your
sent messages?

Guozhang

On Tue, Sep 30, 2014 at 10:45 AM, Jiang Jacky  wrote:

> Hi, Guys
> It is very weird, I created a topic with 2 partitions couple weeks ago, and
> I can only production the message to partition 0, not partition 1, but for
> now, I created a new topic again with 2 partitions, it does work.
> So whats problem of the old topic? I tried to describe the old topic, I
> found the following message
>
> Topic:   Partition: 0Leader: 1   Replicas: 1
> Isr: 1
> Topic:   Partition: 1Leader: 2   Replicas: 2
> Isr: 2
>
> Please let me know, if the topic is screwed up.
>
> Thank you
>



-- 
-- Guozhang


Re: BadVersion state in Kafka Logs

2014-09-30 Thread Joe Stein
Also check for really long/bad GC pauses as another possibility. Not sure
your JDK and JVM_OPTS and if you are setting like this
https://kafka.apache.org/documentation.html#java or not. You need to find
some "spike" somewhere right before that error happens to track down what
is causing the timeouts.

On Tue, Sep 30, 2014 at 6:33 PM, Joe Stein  wrote:

> It sounds like you have a much deeper rooted problem.  Is zookeeper
> swapping?  Something has to be causing this.  After you fix this symptom
> you will probably start to see constant leader elections and the isr
> shrinking/growing and constant consumer rebalancing (or at least every
> minute) and a herd affect up/down stream occuring.  You need to figure out
> what is causing the long session timeout and resolve that, IMHO.  Zookeeper
> health is the first place to look.  Next would be the network.
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>
> On Tue, Sep 30, 2014 at 4:57 PM, Seshadri, Balaji <
> balaji.sesha...@dish.com> wrote:
>
>> The zookeeper session timeout is 60 secs ,but that did not help.
>>
>> We are having broker crash and unresponsive, we got the "conditional
>> update" failed error when broker crashed which confirmed that it is because
>> of KAFKA-1382.
>>
>> server.log.2014-09-23:2014-09-23 13:54:48 ERROR utils.ZkUtils$ -
>> Conditional update of path
>> /brokers/topics/dish-promo-application-access/partitions/128/state with
>> data { "controller_epoch":40, "isr":[ 6, 1 ], "leader":1,
>> "leader_epoch":99, "version":1 } and expected version 150 failed due to
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
>> BadVersion for
>> /brokers/topics/dish-promo-application-access/partitions/128/state
>>
>> We are in very old version 0.8-beta so it's not just patch but switching
>> to stable release version which also has the patch.
>>
>> -Original Message-
>> From: Joe Stein [mailto:joe.st...@stealth.ly]
>> Sent: Tuesday, September 30, 2014 2:01 PM
>> To: users@kafka.apache.org
>> Cc: Neha Narkhede
>> Subject: Re: BadVersion state in Kafka Logs
>>
>> Have you tried increasing your broker's zookeeper session timeout as a
>> work around for now to alleviate the issue?  Is that an option for you?
>> Assuming that is the culprit you are timing zk sessions out and bumping into
>> KAFKA-1382 on the reconnect? Not knowing enough about what is going on
>> with the cluster it is hard to say if anything negative will come from it
>> but seems like it might be a an approach to try... if you can figure out
>> what is causing the session to timeout and fix *that* it would be a
>> solution also if it is happening every couple days (as another email
>> thread
>> states) something is going on that may not just be fixed by a single
>> patch.
>>
>> /***
>>  Joe Stein
>>  Founder, Principal Consultant
>>  Big Data Open Source Security LLC
>>  http://www.stealth.ly
>>  Twitter: @allthingshadoop 
>> /
>>
>> On Tue, Sep 30, 2014 at 11:49 AM, Seshadri, Balaji <
>> balaji.sesha...@dish.com
>> > wrote:
>>
>> > Hi Joe,
>> >
>> > I did not try on 0.8.1 branch ,I can try and see if it goes through
>> > when I get some breather.
>> >
>> > Thanks for initiating on 0.8.1.2.
>> >
>> > Thanks,
>> >
>> > Balaji
>> >
>> > -Original Message-
>> > From: Joe Stein [mailto:joe.st...@stealth.ly]
>> > Sent: Tuesday, September 30, 2014 9:34 AM
>> > To: users@kafka.apache.org
>> > Cc: Neha Narkhede
>> > Subject: Re: BadVersion state in Kafka Logs
>> >
>> > Does the patch in KAFKA-1382 apply on the 0.8.1 branch?  If not if you
>> > could make a patch that does would be great.
>> >
>> > I will kick off a discussion for KAFKA-1382 and the scala 2.11 for
>> > 0.8.1.2 release (and see what others may think we should do like the
>> > gradle changes I think we should do too for src release issues (and
>> > the jars in the repo)).  I will send that on dev/user in a little bit
>> > (please comment +1 community support please on that thread for the
>> release).
>> >
>> > /***
>> >  Joe Stein
>> >  Founder, Principal Consultant
>> >  Big Data Open Source Security LLC
>> >  http://www.stealth.ly
>> >  Twitter: @allthingshadoop 
>> > /
>> >
>> > On Tue, Sep 30, 2014 at 11:10 AM, Seshadri, Balaji <
>> > balaji.sesha...@dish.com
>> > > wrote:
>> >
>> > > I would love to help you guys to make Kafka best in Pub/Sub, will
>> > > continue doing that whenever I can.
>> > >
>> > > Do we have 0.8.1.2 release tag  or should we apply patch on top of
>> > > 0.8.1.1 tag because we need this KAF

Re: AWS EC2 deployment best practices

2014-09-30 Thread James Cheng
I'm also interested in hearing more about deploying Kafka in AWS.

I was also considering options like your 1a and 2. I ran some calculations and 
one interesting thing I ran across was bandwidth costs between AZs.

In 1a, if you can have your producers and consumers in the same AZ as the 
"master", then you won't have to pay any bandwidth costs for your 
producers/consumers. You will have to pay bandwidth costs for the mirror-maker 
traffic between clusters in different AZs.

In 2, if your producers and consumers are writing/reading to different AZs, 
then you are paying bandwidth costs between AZs for both producers and 
consumers. In my cost calculation for a modest size cluster, my bandwidth costs 
were roughly the same as my (EC2 instance + EBS) costs.

An idea for #2 is to deploy your producers and your consumers so that they 
always are deployed in the AZ that contains the partitions they want to 
read/write. Or, said another way, move your partitions to the brokers in the 
same AZs as where your producers/consumers are. I think it's doable, but it 
means means that you'd want to write a Kafka client library that is aware of 
your AZ's, and also manage the cluster partitions in-sync with your 
producer/consumer deployments.

With ephemeral disks, I imagine that Kafka would become network bound. In case 
you find it useful, I ran some network performance tests against different EC2 
instances. I only went as far as c3.4xlarge.

https://docs.google.com/spreadsheets/d/1QF-4EO3PQ_YOLbvf6HKpqBTNQ8fyYeRuDMrlDYlK0yQ/pubchart?oid=1634430904&format=interactive

-James

On Sep 30, 2014, at 7:47 AM, Philip O'Toole  
wrote:

> OK, yeah, speaking from experience I would be comfortable with using the 
> ephemeral storage if it's replicated across AZs. More and more EC2 instances 
> have local SSDs, so you'll get great IO. Of course, you better monitor your 
> instance, and if a instance terminates, you're vulnerable if a second 
> instance is lost. It might argue for 3 copies.
> 
> As you correctly pointed out in your original e-mail, the Loggly setup 
> predated 0.8 -- so there was no replication to worry about. We ran 3-broker 
> clusters, and put a broker, of each cluster, in a different AZ. This did mean 
> that during an AZ failure that certain brokers would be unavailable (but the 
> messages were still on disk, ready for processing when the AZ came back 
> online), but it did mean that there was always some Kafka brokers running 
> somewhere that were reachable, and incoming traffic could be sent there. The 
> Producers we wrote took care of dealing with this. In other words the 
> pipeline kept moving data.
> 
> 
> Of course, in a healthy pipeline, each message was written to ES within a 
> matter of seconds, and we had replication there (as outlined in the 
> accompanying talk). It all worked very well.
> 
> 
> Philip
> 
> 
> -
> http://www.philipotoole.com 
> 
> 
> On Tuesday, September 30, 2014 2:49 PM, Joe Crobak  wrote:
> 
> 
> 
> I didn't know about KAFKA-1215, thanks. I'm not sure it would fully address
> my concerns of a producer writing to the partition leader in different AZ,
> though.
> 
> To answer your question, I was thinking ephemerals with replication, yes.
> With a reservation, it's pretty easy to get e.g. two i2.xlarge for an
> amortized cost below a single m2.2xlarge with the same amount of EBS
> storage and provisioned IOPs.
> 
> 
> On Mon, Sep 29, 2014 at 9:40 PM, Philip O'Toole <
> philip.oto...@yahoo.com.invalid> wrote:
> 
>> If only Kafka had rack awarenessyou could run 1 cluster and set up the
>> replicas in different AZs.
>> 
>> 
>> https://issues.apache.org/jira/browse/KAFKA-1215
>> 
>> As for your question about ephemeral versus EBS, I presume you are
>> proposing to use ephemeral *with* replicas, right?
>> 
>> 
>> Philip
>> 
>> 
>> 
>> -
>> http://www.philipotoole.com
>> 
>> 
>> On Monday, September 29, 2014 9:45 PM, Joe Crobak 
>> wrote:
>> 
>> 
>> 
>> We're planning a deploy to AWS EC2, and I was hoping to get some advice on
>> best practices. I've seen the Loggly presentation [1], which has some good
>> recommendations on instance types and EBS setup. Aside from that, there
>> seem to be several options in terms of multi-Availability Zone (AZ)
>> deployment. The ones we're considering are:
>> 
>> 1) Treat each AZ as a separate data center. Producers write to the kafka
>> cluster in the same AZ. For consumption, two options:
>> 1a) designate one cluster the "master" cluster and use mirrormaker. This
>> was discussed here [2] where some gotchas related to offset management were
>> raised.
>> 1b) Build consumers to consume from both clusters (e.g. Two camus jobs-one
>> for each cluster).
>> 
>> Pros:
>> * if there's a network partition between AZs (or extra latency), the
>> consumer(s) will catch up once the event is resolved.
>> * If an AZ goes offline, only unprocessed data in that AZ is lost

Re: After creating a topic, broker gets dropped from ISR

2014-09-30 Thread Jun Rao
Could you check that the broker host registered in ZK is the ip that you
are expecting?

Thanks,

Jun

On Tue, Sep 30, 2014 at 3:03 AM, florent valdelievre <
florentvaldelie...@gmail.com> wrote:

> Hi again,
>
> 192.168.1.180
> Zk: 192.168.1.180:2181
> Kafka: 9092
> Broker.id = 1
> zookeeper.connect=192.168.1.180:2181
>
> --
>
> 192.168.1.190
> Zk: 192.168.1.180:2181
> Kafka: 9092
> Broker.id = 2
> zookeeper.connect=192.168.1.180:2181
>
> --
>
> I start both Kafka-server
> I create a topic using the following command ( the command is launch on
> 192.168.1.180)
>
> kafka-topics.sh --create --zookeeper 192.168.1.180:2181  --topic
> hibe-test-12 --partitions 1 --replication-factor 2
>
> *Stdout on broker.id =1*
> [2014-09-30 05:51:01,586] INFO Created log for partition [hibe-test-12,0]
> in /home/shopmedia/apps/kafka/log with properties {segment.index.bytes ->
> 10485760, file.delete.delay.ms -> 6, segment.bytes -> 536870912,
> flush.ms -> 9223372036854775807, delete.retention.ms -> 8640,
> index.interval.bytes -> 4096, retention.bytes -> -1, cleanup.policy ->
> delete, segment.ms -> 60480, max.message.bytes -> 112,
> flush.messages -> 9223372036854775807, min.cleanable.dirty.ratio -> 0.5,
> retention.ms -> 360}. (kafka.log.LogManager)
> [2014-09-30 05:51:01,588] WARN Partition [hibe-test-12,0] on broker 1: No
> checkpointed highwatermark is found for partition [hibe-test-12,0]
> (kafka.cluster.Partition)
> [2014-09-30 05:51:19,366] INFO Partition [hibe-test-12,0] on broker 1:
> Shrinking ISR for partition [hibe-test-12,0] from 1,2 to 1
> (kafka.cluster.Partition)
>
> --
>
>
> *Stdout on broker.id =2*
>
> [2014-09-30 05:51:11,952] ERROR [ReplicaFetcherThread-0-1], Error for
> partition [hibe-test-12,0] to broker 1:class kafka.common.UnknownException
> (kafka.server.ReplicaFetcherThread)
> [2014-09-30 05:51:11,954] ERROR [KafkaApi-2] error when handling request
> Name: FetchRequest; Version: 0; CorrelationId: 2647; ClientId:
> ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes;
> RequestInfo: [hibe-test-12,0] -> PartitionFetchInfo(0,1048576)
> (kafka.server.KafkaApis)
> kafka.common.KafkaException: Shouldn't set logEndOffset for replica 2
> partition [hibe-test-12,0] since it's local
>
> This error is high CPU usage, roughly 40% CPU usage until i kill
> kafka-server
>
> Please note that i have removed log fdata rom both server beforehand as
> well as remove zk data using : zkCli.sh
> rmr /brokers
>
> Broker.id = 2 never gets in the ISR
>
> Do you have an idea what is causing this error ?
>


Re: Still Stale TopicMetadata

2014-09-30 Thread Jun Rao
Actually, this is probably the more relevant jira: KAFKA-1367.

Thanks,

Jun

On Tue, Sep 30, 2014 at 10:00 AM, Christofer Hedbrandh <
christo...@knewton.com> wrote:

> Hi Kafka users,
>
> Was there ever a JIRA ticket filed for this?
>
> "Re: Stale TopicMetadata"
>
>
> http://mail-archives.apache.org/mod_mbox/kafka-users/201307.mbox/%3ce238b018f88c39429066fc8c4bfd0c2e019be...@esv4-mbx01.linkedin.biz%3E
>
> As far as I can tell this is still an issue in 0.8.1.1
>
> Using the python client (VERSION 0.2-alpha):
> client = KafkaClient(host, port)
> request_id = client._next_id()
> KafkaProtocol.encode_metadata_request(client.client_id, request_id,
> topic_names)
> response = client._send_broker_unaware_request(request_id, request)
> brokers, topics = KafkaProtocol.decode_metadata_response(response)
>
> the meta data returned tells me only a subset of the replicas are in-sync
>
> E.g.
> {'test-topic-1': {0: PartitionMetadata(topic='test-topic-1', partition=0,
> leader=2018497752, replicas=(2018497752, 915105820, 1417963519),
> isr=(2018497752,))}}
>
> but when I fetch meta data with the kafka-topics.sh --describe tool, it
> looks like all replicas are in sync.
>
> Topic:test-topic-1 PartitionCount:1 ReplicationFactor:3 Configs:
> retention.ms
> =60480
> Topic: test-topic-1 Partition: 0 Leader: 2018497752 Replicas:
> 2018497752,915105820,1417963519 Isr: 2018497752,915105820,1417963519
>
> I looked around for a JIRA ticket for this but couldn't find one. Please
> let me know where this bug is tracked.
>
> Thanks,
> Christofer
>