date:20141021

How many partition can one single machine handle in Kafka?

2014-10-21 Thread Xiaobin She

hello, everyone

I'm new to kafka, I'm wondering what's the max num of partition can one
siggle machine handle in Kafka?

Is there an sugeest num?

Thanks.

xiaobinshe

Re: taking broker down and returning it does not restore cluster state (nor rebalance)

2014-10-21 Thread Shlomi Hazan

trying to reproduce failed: after somewhat long minutes I noticed that the
partition leaders regained balance again, and the only issue left is that
the preferred replica was not balanced as it was before taking the broker
down. meaning, that the output of the topic description shows broker 1 (out
of 3) as preferred replica (first in ISR) in 66% of the cases instead of
expected 33%.



On Mon, Oct 20, 2014 at 11:36 PM, Joel Koshy jjkosh...@gmail.com wrote:

 As Neha mentioned, with rep factor 2x, this shouldn't normally cause
 an issue.

 Taking the broker down will cause the leader to move to another
 replica; consumers and producers will rediscover the new leader; no
 rebalances should be triggered.

 When you bring the broker back up, unless you run a preferred replica
 leader re-election the broker will remain a follower. Again, there
 will be no effect on the producers or consumers (i.e., no rebalances).

 If you can reproduce this easily, can you please send exact steps to
 reproduce and send over your consumer logs?

 Thanks,

 Joel

 On Mon, Oct 20, 2014 at 09:13:27PM +0300, Shlomi Hazan wrote:
  Yes I did. It is set to 2.
  On Oct 20, 2014 5:38 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:
 
   Did you ensure that your replication factor was set higher than 1? If
 so,
   things should recover automatically after adding the killed broker back
   into the cluster.
  
   On Mon, Oct 20, 2014 at 1:32 AM, Shlomi Hazan shl...@viber.com
 wrote:
  
Hi,
   
Running some tests on 0811 and wanted to see what happens when a
 broker
   is
taken down with 'kill'. I bumped into the situation at the subject
 where
launching the broker back left him a bit out of the game as far as I
   could
see using stack driver metrics.
Trying to rebalance with verify consumer rebalance return an error
 no
owner for partition for all partitions of that topic (128
 partitions).
moreover, yet aside from the issue at hand, changing the group name
 to a
non-existent group returned success.
taking both the consumers and producers down allowed the rebalance to
return success...
   
And the question is:
How do you restore 100% state after taking down a broker? what is the
   best
practice? what needs be checked and what needs be done?
   
Shlomi

Re: How to produce and consume events in 2 DCs?

2014-10-21 Thread Erik van oosten

Thanks Neha,

Unfortunately, the maintenance overhead of 2 more clusters is not acceptable to 
us.

Would you accept a pull request on mirror maker that would rename topics on the 
fly?

For example by accepting the parameter rename:
   —rename src1/dest1,src2/dest2
or, extended with RE support:
   —rename old_(.*)/new_\1

Kind regards,
Erik.


Op 20 okt. 2014, om 16:43 heeft Neha Narkhede neha.narkh...@gmail.com het 
volgende geschreven:

 Another way to set up this kind of mirroring is by deploying 2 clusters in
 each DC - a local Kafka cluster and an aggregate Kafka cluster. The mirror
 maker copies data from both the DC's local clusters into the aggregate
 clusters. So if you want access to a topic with data from both DC's, you
 subscribe to the aggregate cluster.
 
 Thanks,
 Neha
 
 On Mon, Oct 20, 2014 at 7:07 AM, Erik van oosten 
 e.vanoos...@grons.nl.invalid wrote:
 
 Hi,
 
 We have 2 data centers that produce events. Each DC has to process events
 from both DCs.
 
 I had the following in mind:
 
   DC 1 | DC 2
events  |events
   +  +  +  |   +  +  +
   |  |  |  |   |  |  |
   v  v  v  |   v  v  v
 ++ | ++
 | Receiver topic | | | Receiver topic |
 ++   ++
 |  |   mirroring  ||
 |  |   +--+|
 |  |   |   |
 |  ++  |
 v  vv  v
 ++ | ++
 | Consumer topic | | | Consumer topic |
 ++ | ++
   +  +  +  |   +  +  +
   |  |  |  |   |  |  |
   v  v  v  |   v  v  v
  consumers |  consumers
 
 
 As each DC has a single Kafka cluster, on each DC the receiver topic and
 consumer topic needs to be on the same cluster.
 Unfortunately, mirror maker does not seem to support mirroring to a topic
 with another name.
 
 Is there another tool we could use?
 Or, is there another approach for producing and consuming from 2 DCs?
 
 Kind regards,
Erik.
 
 —
 Erik van Oosten
 http://www.day-to-day-stuff.blogspot.nl/

Clean Kafka Queue

2014-10-21 Thread Eduardo Costa Alfaia

Hi Guys,

Is there a manner of cleaning  a kafka queue after that the consumer consume 
the messages?

Thanks 
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

Re: Sending Same Message to Two Topics on Same Broker Cluster

2014-10-21 Thread Neha Narkhede

I'm not sure I understood your concern about invoking send() twice, once
with each topic. Are you worried about the network overhead? Whether Kafka
does this transparently or not, sending messages to different topics will
carry some overhead. I think the design of the API is much more intuitive
and cleaner if a message is sent to a topic partition.

On Mon, Oct 20, 2014 at 9:17 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com
wrote:

Hi Neha,

Yes, I understand that but when transmitting single message (I can not set
List of all topics) Only Single one. So I will to add same message in
buffer with different topic. If Kakfa protocol, allows to add multiple
topic then message does not have to be re-transmited over the wire to add
to multiple topic.

The Producer record only allow one topic.

http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html

Thanks for your quick response and I appreciate your help.

Thanks,

Bhavesh

On Mon, Oct 20, 2014 at 9:10 PM, Neha Narkhede neha.narkh...@gmail.com
wrote:

Not really. You need producers to send data to Kafka.

On Mon, Oct 20, 2014 at 9:05 PM, Bhavesh Mistry
mistry.p.bhav...@gmail.com
wrote:

Hi Kakfa Team,

I would like to send a single message to multiple topics (two for now)
without re-transmitting the message from producer to brokers. Is this
possible?

Both Producers Scala and Java does not allow this. I do not have to
do
this all the time only based on application condition.

Thanks in advance of your help !!

Thanks,

Bhavesh

Re: Performance issues

2014-10-21 Thread Mohit Anchlia

I have a java test that produces messages and then consumer consumers it.
Consumers are active all the time. There is 1 consumer for 1 producer. I am
measuring the time between the message is successfully written to the queue
and the time consumer picks it up.
On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com
wrote:

 Can you give more information about the performance test? Which test? Which
 queue? How did you measure the dequeue latency.

 On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  I am running a performance test and from what I am seeing is that
 messages
  are taking about 100ms to pop from the queue itself and hence making the
  test slow. I am looking for pointers of how I can troubleshoot this
 issue.
 
  There seems to be plenty of CPU and IO available. I am running 22
 producers
  and 22 consumers in the same group.

Re: Clean Kafka Queue

2014-10-21 Thread Harsha

you can use log.retention.hours or log.retention.bytes to prune the log  
more info on that config here
https://kafka.apache.org/08/configuration.html
if you want to delete a message after the consumer processed a message
there is no api for it.
-Harsha


On Tue, Oct 21, 2014, at 08:00 AM, Eduardo Costa Alfaia wrote:
 Hi Guys,
 
 Is there a manner of cleaning  a kafka queue after that the consumer
 consume the messages?
 
 Thanks 
 -- 
 Informativa sulla Privacy: http://www.unibs.it/node/8155

Re: Sending Same Message to Two Topics on Same Broker Cluster

2014-10-21 Thread Bhavesh Mistry

Hi Neha,

All, I am saying is that if same byte[] or data has to go to two topics
then, I have to call send twice and with same data has to transfer over the
wire twice (assuming the partition is on same broker for two topics, then
it not efficient.). If Kafka Protocol allows to set multiple topics and
partitions for request then it would me great.
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ProduceRequest
*ProducerRecord
http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord(java.lang.String,
byte[], byte[])*(java.lang.String *topic*, byte[] key, byte[] value)

Thanks,

Bhavesh

On Tue, Oct 21, 2014 at 8:26 AM, Neha Narkhede neha.narkh...@gmail.com
wrote:

On Mon, Oct 20, 2014 at 9:17 PM, Bhavesh Mistry
mistry.p.bhav...@gmail.com
wrote:

Hi Neha,

Yes, I understand that but when transmitting single message (I can not
set
List of all topics) Only Single one. So I will to add same message in
buffer with different topic. If Kakfa protocol, allows to add multiple
topic then message does not have to be re-transmited over the wire to add
to multiple topic.

The Producer record only allow one topic.

http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html

Thanks for your quick response and I appreciate your help.

Thanks,

Bhavesh

On Mon, Oct 20, 2014 at 9:10 PM, Neha Narkhede neha.narkh...@gmail.com
wrote:

Not really. You need producers to send data to Kafka.

On Mon, Oct 20, 2014 at 9:05 PM, Bhavesh Mistry
mistry.p.bhav...@gmail.com
wrote:

Hi Kakfa Team,

I would like to send a single message to multiple topics (two for
now)
without re-transmitting the message from producer to brokers. Is
this
possible?

Both Producers Scala and Java does not allow this. I do not have to
do
this all the time only based on application condition.

Thanks in advance of your help !!

Thanks,

Bhavesh

Re: Clean Kafka Queue

2014-10-21 Thread Joe Stein

The concept of truncate topic comes up a lot.  I will add it as an item
to https://issues.apache.org/jira/browse/KAFKA-1694

It is a scary feature though, it might be best to wait until authorizations
are in place before we release it.

With 0.8.2 you can delete topics so at least you can start fresh easier.
That should work in the mean time.  0.8.2-beta should be out this week :)

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

On Tue, Oct 21, 2014 at 12:03 PM, Harsha ka...@harsha.io wrote:

 you can use log.retention.hours or log.retention.bytes to prune the log
 more info on that config here
 https://kafka.apache.org/08/configuration.html
 if you want to delete a message after the consumer processed a message
 there is no api for it.
 -Harsha


 On Tue, Oct 21, 2014, at 08:00 AM, Eduardo Costa Alfaia wrote:
  Hi Guys,
 
  Is there a manner of cleaning  a kafka queue after that the consumer
  consume the messages?
 
  Thanks
  --
  Informativa sulla Privacy: http://www.unibs.it/node/8155

Re: Clean Kafka Queue

2014-10-21 Thread Eduardo Costa Alfaia

Ok guys,

Thanks by the help.

Regards
 On Oct 21, 2014, at 18:30, Joe Stein joe.st...@stealth.ly wrote:
 
 The concept of truncate topic comes up a lot.  I will add it as an item
 to https://issues.apache.org/jira/browse/KAFKA-1694
 
 It is a scary feature though, it might be best to wait until authorizations
 are in place before we release it.
 
 With 0.8.2 you can delete topics so at least you can start fresh easier.
 That should work in the mean time.  0.8.2-beta should be out this week :)
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /
 
 On Tue, Oct 21, 2014 at 12:03 PM, Harsha ka...@harsha.io wrote:
 
 you can use log.retention.hours or log.retention.bytes to prune the log
 more info on that config here
 https://kafka.apache.org/08/configuration.html
 if you want to delete a message after the consumer processed a message
 there is no api for it.
 -Harsha
 
 
 On Tue, Oct 21, 2014, at 08:00 AM, Eduardo Costa Alfaia wrote:
 Hi Guys,
 
 Is there a manner of cleaning  a kafka queue after that the consumer
 consume the messages?
 
 Thanks
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

Re: Sending Same Message to Two Topics on Same Broker Cluster

2014-10-21 Thread Jay Kreps

Hey Bhavesh,

This would only work if both topics happened to be on the same machine,
which generally they wouldn't.

-Jay

On Tue, Oct 21, 2014 at 9:14 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com
wrote:

Hi Neha,

https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ProduceRequest
*ProducerRecord

http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord(java.lang.String
,
byte[], byte[])*(java.lang.String *topic*, byte[] key, byte[] value)

Thanks,

Bhavesh

On Tue, Oct 21, 2014 at 8:26 AM, Neha Narkhede neha.narkh...@gmail.com
wrote:

I'm not sure I understood your concern about invoking send() twice, once
with each topic. Are you worried about the network overhead? Whether
Kafka
does this transparently or not, sending messages to different topics will
carry some overhead. I think the design of the API is much more intuitive
and cleaner if a message is sent to a topic partition.

On Mon, Oct 20, 2014 at 9:17 PM, Bhavesh Mistry
mistry.p.bhav...@gmail.com
wrote:

Hi Neha,

Yes, I understand that but when transmitting single message (I can not
set
List of all topics) Only Single one. So I will to add same message in
buffer with different topic. If Kakfa protocol, allows to add multiple
topic then message does not have to be re-transmited over the wire to
add
to multiple topic.

The Producer record only allow one topic.

http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html

Thanks for your quick response and I appreciate your help.

Thanks,

Bhavesh

On Mon, Oct 20, 2014 at 9:10 PM, Neha Narkhede
neha.narkh...@gmail.com
wrote:

Not really. You need producers to send data to Kafka.

On Mon, Oct 20, 2014 at 9:05 PM, Bhavesh Mistry
mistry.p.bhav...@gmail.com
wrote:

Hi Kakfa Team,

I would like to send a single message to multiple topics (two for
now)
without re-transmitting the message from producer to brokers. Is
this
possible?

Both Producers Scala and Java does not allow this. I do not have
to
do
this all the time only based on application condition.

Thanks in advance of your help !!

Thanks,

Bhavesh

Re: Performance issues

2014-10-21 Thread Mohit Anchlia

This is the version I am using: kafka_2.10-0.8.1.1

I think this is fairly recent version
On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote:

 What version of Kafka is this? Can you try the same test against trunk? We
 fixed a couple of latency related bugs which may be the cause.

 -Jay

 On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  It's consistently close to 100ms which makes me believe that there are
 some
  settings that I might have to tweak, however, I am not sure how to
 confirm
  that assumption :)
  On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   I have a java test that produces messages and then consumer consumers
 it.
   Consumers are active all the time. There is 1 consumer for 1 producer.
 I
  am
   measuring the time between the message is successfully written to the
  queue
   and the time consumer picks it up.
  
   On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
 neha.narkh...@gmail.com
   wrote:
  
   Can you give more information about the performance test? Which test?
   Which
   queue? How did you measure the dequeue latency.
  
   On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
I am running a performance test and from what I am seeing is that
   messages
are taking about 100ms to pop from the queue itself and hence making
  the
test slow. I am looking for pointers of how I can troubleshoot this
   issue.
   
There seems to be plenty of CPU and IO available. I am running 22
   producers
and 22 consumers in the same group.

Sizing Cluster

2014-10-21 Thread Pete Wright

Hi There,
I have a question regarding sizing disk for kafka brokers.  Let's say I
have systems capable of providing 10TB of storage, and they act as Kafka
brokers.  If I were to deploy two of these nodes, and enable replication
in Kafka, would I actually have 10TB available for my producers to write
to?  Is there any overhead I should be concerned with?

I guess I am just wanting to make sure that there are not any major
pitfalls in deploying a two-node cluster, versus say a 3-node cluster.

Any advice or best-practices would be very helpful!

Thanks in advance,
-pete


-- 
Pete Wright
Systems Architect
Rubicon Project
pwri...@rubiconproject.com
310.309.9298

0.8.1.2

2014-10-21 Thread Shlomi Hazan

Hi All,
Will version 0.8.1.2 happen?
Shlomi

Re: Sizing Cluster

2014-10-21 Thread István

One thing that you have to keep in mind is that moving 10T between nodes
takes long time. If you have a node failure and you need to rebuild
(resync) the data your system is going to be vulnerable against the second
node failure. You could mitigate this with using raid. I think generally
speaking 3 node clusters are better for production purposes.

I.

On Tue, Oct 21, 2014 at 11:12 AM, Pete Wright pwri...@rubiconproject.com
wrote:

 Hi There,
 I have a question regarding sizing disk for kafka brokers.  Let's
 say I
 have systems capable of providing 10TB of storage, and they act as Kafka
 brokers.  If I were to deploy two of these nodes, and enable replication
 in Kafka, would I actually have 10TB available for my producers to write
 to?  Is there any overhead I should be concerned with?

 I guess I am just wanting to make sure that there are not any major
 pitfalls in deploying a two-node cluster, versus say a 3-node cluster.

 Any advice or best-practices would be very helpful!

 Thanks in advance,
 -pete


 --
 Pete Wright
 Systems Architect
 Rubicon Project
 pwri...@rubiconproject.com
 310.309.9298




-- 
the sun shines for all

Re: Performance issues

2014-10-21 Thread Jay Kreps

There was a bug that could lead to the fetch request from the consumer
hitting it's timeout instead of being immediately triggered by the produce
request. To see if you are effected by that set you consumer max wait time
to 1 ms and see if the latency drops to 1 ms (or, alternately, try with
trunk and see if that fixes the problem).

The reason I suspect this problem is because the default timeout in the
java consumer is 100ms.

-Jay

On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 This is the version I am using: kafka_2.10-0.8.1.1

 I think this is fairly recent version
 On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote:

  What version of Kafka is this? Can you try the same test against trunk?
 We
  fixed a couple of latency related bugs which may be the cause.
 
  -Jay
 
  On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   It's consistently close to 100ms which makes me believe that there are
  some
   settings that I might have to tweak, however, I am not sure how to
  confirm
   that assumption :)
   On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com
 
   wrote:
  
I have a java test that produces messages and then consumer consumers
  it.
Consumers are active all the time. There is 1 consumer for 1
 producer.
  I
   am
measuring the time between the message is successfully written to the
   queue
and the time consumer picks it up.
   
On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
  neha.narkh...@gmail.com
wrote:
   
Can you give more information about the performance test? Which
 test?
Which
queue? How did you measure the dequeue latency.
   
On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
  mohitanch...@gmail.com
wrote:
   
 I am running a performance test and from what I am seeing is that
messages
 are taking about 100ms to pop from the queue itself and hence
 making
   the
 test slow. I am looking for pointers of how I can troubleshoot
 this
issue.

 There seems to be plenty of CPU and IO available. I am running 22
producers
 and 22 consumers in the same group.

Re: Performance issues

2014-10-21 Thread Mohit Anchlia

Is this a parameter I need to set it in kafka server or on the client side?
Also, can you help point out which one exactly is consumer max wait time
from this list?

https://kafka.apache.org/08/configuration.html

On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote:

 There was a bug that could lead to the fetch request from the consumer
 hitting it's timeout instead of being immediately triggered by the produce
 request. To see if you are effected by that set you consumer max wait time
 to 1 ms and see if the latency drops to 1 ms (or, alternately, try with
 trunk and see if that fixes the problem).

 The reason I suspect this problem is because the default timeout in the
 java consumer is 100ms.

 -Jay

 On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  This is the version I am using: kafka_2.10-0.8.1.1
 
  I think this is fairly recent version
  On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
   What version of Kafka is this? Can you try the same test against trunk?
  We
   fixed a couple of latency related bugs which may be the cause.
  
   -Jay
  
   On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
It's consistently close to 100ms which makes me believe that there
 are
   some
settings that I might have to tweak, however, I am not sure how to
   confirm
that assumption :)
On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
 mohitanch...@gmail.com
  
wrote:
   
 I have a java test that produces messages and then consumer
 consumers
   it.
 Consumers are active all the time. There is 1 consumer for 1
  producer.
   I
am
 measuring the time between the message is successfully written to
 the
queue
 and the time consumer picks it up.

 On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
   neha.narkh...@gmail.com
 wrote:

 Can you give more information about the performance test? Which
  test?
 Which
 queue? How did you measure the dequeue latency.

 On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  I am running a performance test and from what I am seeing is
 that
 messages
  are taking about 100ms to pop from the queue itself and hence
  making
the
  test slow. I am looking for pointers of how I can troubleshoot
  this
 issue.
 
  There seems to be plenty of CPU and IO available. I am running
 22
 producers
  and 22 consumers in the same group.

frequent periods of ~1500 replicas not in sync

2014-10-21 Thread Neil Harkins

Hi. I've got a 5 node cluster running Kafka 0.8.1,
with 4697 partitions (2 replicas each) across 564 topics.
I'm sending it about 1% of our total messaging load now,
and several times a day there is a period where 1~1500
partitions have one replica not in sync. Is this normal?
If a consumer is reading from a replica that gets deemed
not in sync, does it get redirected to the good replica?
Is there a #partitions over which maintenance tasks
become infeasible?

Relevant config bits:
auto.leader.rebalance.enable=true
leader.imbalance.per.broker.percentage=20
leader.imbalance.check.interval.seconds=30
replica.lag.time.max.ms=1
replica.lag.max.messages=4000
num.replica.fetchers=4
replica.fetch.max.bytes=10485760

Not necessarily correlated to those periods,
I see a lot of these errors in the logs:

[2014-10-20 21:23:26,999] 21963614 [ReplicaFetcherThread-3-1] ERROR
kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-3-1], Error
in fetch Name: FetchRequest; Version: 0; CorrelationId: 77423;
ClientId: ReplicaFetcherThread-3-1; ReplicaId: 2; MaxWait: 500 ms;
MinBytes: 1 bytes; RequestInfo: ...

And a few of these:

[2014-10-20 21:23:39,555] 3467527 [kafka-scheduler-2] ERROR
kafka.utils.ZkUtils$  - Conditional update of path
/brokers/topics/foo.bar/partitions/3/state with data
{controller_epoch:11,leader:3,version:1,leader_epoch:109,isr:[3]}
and expected version 197 failed due to
org.apache.zookeeper.KeeperException$BadVersionException:
KeeperErrorCode = BadVersion for
/brokers/topics/foo.bar/partitions/3/state

And this one I assume is a client closing the connection non-gracefully,
thus should probably be a warning, not an error?:

[2014-10-20 21:54:15,599] 23812214 [kafka-processor-9092-3] ERROR
kafka.network.Processor  - Closing socket for /10.31.0.224 because of
error

-neil

Re: frequent periods of ~1500 replicas not in sync

2014-10-21 Thread Gwen Shapira

Consumers always read from the leader replica, which is always in sync
by definition. So you are good there.
The concern would be if the leader crashes during this period.



On Tue, Oct 21, 2014 at 2:56 PM, Neil Harkins nhark...@gmail.com wrote:
 Hi. I've got a 5 node cluster running Kafka 0.8.1,
 with 4697 partitions (2 replicas each) across 564 topics.
 I'm sending it about 1% of our total messaging load now,
 and several times a day there is a period where 1~1500
 partitions have one replica not in sync. Is this normal?
 If a consumer is reading from a replica that gets deemed
 not in sync, does it get redirected to the good replica?
 Is there a #partitions over which maintenance tasks
 become infeasible?

 Relevant config bits:
 auto.leader.rebalance.enable=true
 leader.imbalance.per.broker.percentage=20
 leader.imbalance.check.interval.seconds=30
 replica.lag.time.max.ms=1
 replica.lag.max.messages=4000
 num.replica.fetchers=4
 replica.fetch.max.bytes=10485760

 Not necessarily correlated to those periods,
 I see a lot of these errors in the logs:

 [2014-10-20 21:23:26,999] 21963614 [ReplicaFetcherThread-3-1] ERROR
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-3-1], Error
 in fetch Name: FetchRequest; Version: 0; CorrelationId: 77423;
 ClientId: ReplicaFetcherThread-3-1; ReplicaId: 2; MaxWait: 500 ms;
 MinBytes: 1 bytes; RequestInfo: ...

 And a few of these:

 [2014-10-20 21:23:39,555] 3467527 [kafka-scheduler-2] ERROR
 kafka.utils.ZkUtils$  - Conditional update of path
 /brokers/topics/foo.bar/partitions/3/state with data
 {controller_epoch:11,leader:3,version:1,leader_epoch:109,isr:[3]}
 and expected version 197 failed due to
 org.apache.zookeeper.KeeperException$BadVersionException:
 KeeperErrorCode = BadVersion for
 /brokers/topics/foo.bar/partitions/3/state

 And this one I assume is a client closing the connection non-gracefully,
 thus should probably be a warning, not an error?:

 [2014-10-20 21:54:15,599] 23812214 [kafka-processor-9092-3] ERROR
 kafka.network.Processor  - Closing socket for /10.31.0.224 because of
 error

 -neil

Re: frequent periods of ~1500 replicas not in sync

2014-10-21 Thread Guozhang Wang

Neil, what you are seeing could probably be KAFKA-1407
https://issues.apache.org/jira/browse/KAFKA-1407.

On Tue, Oct 21, 2014 at 12:03 PM, Gwen Shapira gshap...@cloudera.com
wrote:

 Consumers always read from the leader replica, which is always in sync
 by definition. So you are good there.
 The concern would be if the leader crashes during this period.



 On Tue, Oct 21, 2014 at 2:56 PM, Neil Harkins nhark...@gmail.com wrote:
  Hi. I've got a 5 node cluster running Kafka 0.8.1,
  with 4697 partitions (2 replicas each) across 564 topics.
  I'm sending it about 1% of our total messaging load now,
  and several times a day there is a period where 1~1500
  partitions have one replica not in sync. Is this normal?
  If a consumer is reading from a replica that gets deemed
  not in sync, does it get redirected to the good replica?
  Is there a #partitions over which maintenance tasks
  become infeasible?
 
  Relevant config bits:
  auto.leader.rebalance.enable=true
  leader.imbalance.per.broker.percentage=20
  leader.imbalance.check.interval.seconds=30
  replica.lag.time.max.ms=1
  replica.lag.max.messages=4000
  num.replica.fetchers=4
  replica.fetch.max.bytes=10485760
 
  Not necessarily correlated to those periods,
  I see a lot of these errors in the logs:
 
  [2014-10-20 21:23:26,999] 21963614 [ReplicaFetcherThread-3-1] ERROR
  kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-3-1], Error
  in fetch Name: FetchRequest; Version: 0; CorrelationId: 77423;
  ClientId: ReplicaFetcherThread-3-1; ReplicaId: 2; MaxWait: 500 ms;
  MinBytes: 1 bytes; RequestInfo: ...
 
  And a few of these:
 
  [2014-10-20 21:23:39,555] 3467527 [kafka-scheduler-2] ERROR
  kafka.utils.ZkUtils$  - Conditional update of path
  /brokers/topics/foo.bar/partitions/3/state with data
 
 {controller_epoch:11,leader:3,version:1,leader_epoch:109,isr:[3]}
  and expected version 197 failed due to
  org.apache.zookeeper.KeeperException$BadVersionException:
  KeeperErrorCode = BadVersion for
  /brokers/topics/foo.bar/partitions/3/state
 
  And this one I assume is a client closing the connection non-gracefully,
  thus should probably be a warning, not an error?:
 
  [2014-10-20 21:54:15,599] 23812214 [kafka-processor-9092-3] ERROR
  kafka.network.Processor  - Closing socket for /10.31.0.224 because of
  error
 
  -neil




-- 
-- Guozhang

Re: Performance issues

2014-10-21 Thread Guozhang Wang

This is a consumer config:

fetch.wait.max.ms

On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 Is this a parameter I need to set it in kafka server or on the client side?
 Also, can you help point out which one exactly is consumer max wait time
 from this list?

 https://kafka.apache.org/08/configuration.html

 On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote:

  There was a bug that could lead to the fetch request from the consumer
  hitting it's timeout instead of being immediately triggered by the
 produce
  request. To see if you are effected by that set you consumer max wait
 time
  to 1 ms and see if the latency drops to 1 ms (or, alternately, try with
  trunk and see if that fixes the problem).
 
  The reason I suspect this problem is because the default timeout in the
  java consumer is 100ms.
 
  -Jay
 
  On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   This is the version I am using: kafka_2.10-0.8.1.1
  
   I think this is fairly recent version
   On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com
 wrote:
  
What version of Kafka is this? Can you try the same test against
 trunk?
   We
fixed a couple of latency related bugs which may be the cause.
   
-Jay
   
On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
  mohitanch...@gmail.com
wrote:
   
 It's consistently close to 100ms which makes me believe that there
  are
some
 settings that I might have to tweak, however, I am not sure how to
confirm
 that assumption :)
 On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
  mohitanch...@gmail.com
   
 wrote:

  I have a java test that produces messages and then consumer
  consumers
it.
  Consumers are active all the time. There is 1 consumer for 1
   producer.
I
 am
  measuring the time between the message is successfully written to
  the
 queue
  and the time consumer picks it up.
 
  On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
neha.narkh...@gmail.com
  wrote:
 
  Can you give more information about the performance test? Which
   test?
  Which
  queue? How did you measure the dequeue latency.
 
  On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
mohitanch...@gmail.com
  wrote:
 
   I am running a performance test and from what I am seeing is
  that
  messages
   are taking about 100ms to pop from the queue itself and hence
   making
 the
   test slow. I am looking for pointers of how I can troubleshoot
   this
  issue.
  
   There seems to be plenty of CPU and IO available. I am running
  22
  producers
   and 22 consumers in the same group.
  
 
 
 

   
  
 




-- 
-- Guozhang

Re: Sizing Cluster

2014-10-21 Thread Pete Wright

Thanks Istvan - I think I understand what you are say here - although I
was under the impression that if I ensured each topic was being
replicated N+1 times a two node cluster would ensure each node has a
copy of the entire contents of the message bus at any given time.

I agree with your assessment though that having 3 nodes is a more
durable configuration, but was hoping others could explain how they
calculate capacity and scaling issues on their storage subsystems.

Cheers,
-pete

On 10/21/14 11:28, István wrote:
 One thing that you have to keep in mind is that moving 10T between nodes
 takes long time. If you have a node failure and you need to rebuild
 (resync) the data your system is going to be vulnerable against the second
 node failure. You could mitigate this with using raid. I think generally
 speaking 3 node clusters are better for production purposes.
 
 I.
 
 On Tue, Oct 21, 2014 at 11:12 AM, Pete Wright pwri...@rubiconproject.com
 wrote:
 
 Hi There,
 I have a question regarding sizing disk for kafka brokers.  Let's
 say I
 have systems capable of providing 10TB of storage, and they act as Kafka
 brokers.  If I were to deploy two of these nodes, and enable replication
 in Kafka, would I actually have 10TB available for my producers to write
 to?  Is there any overhead I should be concerned with?

 I guess I am just wanting to make sure that there are not any major
 pitfalls in deploying a two-node cluster, versus say a 3-node cluster.

 Any advice or best-practices would be very helpful!

 Thanks in advance,
 -pete


 --
 Pete Wright
 Systems Architect
 Rubicon Project
 pwri...@rubiconproject.com
 310.309.9298

 
 
 

-- 
Pete Wright
Systems Architect
Rubicon Project
pwri...@rubiconproject.com
310.309.9298

Re: How many partition can one single machine handle in Kafka?

2014-10-21 Thread Guozhang Wang

Xiaobin,

This FAQ may give you some hints:

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic
?

On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She xiaobin...@gmail.com wrote:

 hello, everyone

 I'm new to kafka, I'm wondering what's the max num of partition can one
 siggle machine handle in Kafka?

 Is there an sugeest num?

 Thanks.

 xiaobinshe




-- 
-- Guozhang

Re: Strange behavior during un-clean leader election

2014-10-21 Thread Guozhang Wang

Bryan,

Did you take down some brokers in your cluster while hitting KAFKA-1028? If
yes, you may be hitting KAFKA-1647 also.

Guozhang

On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher bjb...@gmail.com wrote:

 Hi everyone,

 We run a 3 Kafka cluster using 0.8.1.1 with all topics having a replication
 factor of 3 meaning every broker has a replica of every partition.

 We recently ran into this issue (
 https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss within
 Kafka. We understand why it happened and have plans to try to ensure it
 doesn't happen again.

 The strange part was that the broker that was chosen for the un-clean
 leader election seemed to drop all of its own data about the partition in
 the process as our monitoring shows the broker offset was reset to 0 for a
 number of partitions.

 Following the broker's server logs in chronological order for a particular
 partition that saw data loss I see this,

 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6
 with log end offset 528026

 2014-10-16 10:20:18,144 WARN
 kafka.controller.OfflinePartitionLeaderSelector:
 [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [TOPIC,6].
 Elect leader 1 from live brokers 1,2. There's potential data loss.

 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6]
 on broker 1: No checkpointed highwatermark is found for partition [TOPIC,6]

 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to
 offset 0.

 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index
 /storage/kafka/00/kafka_data/TOPIC-6/00528024.index.deleted

 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from
 log TOPIC-6.

 I'm not too worried about this since I'm hoping to move to Kafka 0.8.2 ASAP
 but I was curious if anyone could explain this behavior.

 -Bryan




-- 
-- Guozhang

Partition and Replica assignment for a Topic

2014-10-21 Thread Jonathan Creasy

I¹d like to be able to see a little more detail for a topic.

What is the best way to get this information?

Topic   Partition   Replica Broker
topic1  1   1   3
topic1  1   2   4
topic1  1   3   1
topic1  2   1   1
topic1  2   2   3
topic1  2   3   2

I¹d like to be able to create topic allocations dashboards, similar to the
index allocations dashboards in the Elasticsearch plugin Marvell.

Basically, translating index - topic, shard - partition, replica -
replica, node - broker.

-Jonathan

Re: Partition and Replica assignment for a Topic

2014-10-21 Thread Gwen Shapira

Anything missing in the output of:
kafka-topics.sh --describe --zookeeper localhost:2181
?

On Tue, Oct 21, 2014 at 4:29 PM, Jonathan Creasy
jonathan.cre...@turn.com wrote:
 I¹d like to be able to see a little more detail for a topic.

 What is the best way to get this information?

 Topic   Partition   Replica Broker
 topic1  1   1   3
 topic1  1   2   4
 topic1  1   3   1
 topic1  2   1   1
 topic1  2   2   3
 topic1  2   3   2

 I¹d like to be able to create topic allocations dashboards, similar to the
 index allocations dashboards in the Elasticsearch plugin Marvell.

 Basically, translating index - topic, shard - partition, replica -
 replica, node - broker.

 -Jonathan

Re: Strange behavior during un-clean leader election

2014-10-21 Thread Bryan Baugher

Yes the cluster was to a degree restarted in a rolling fashion but due to
some other events causing the brokers to be rather confused the ISR for a
number of partitions became empty and a new controller was elected.
KAFKA-1647 sounds exactly like the problem I encountered. Thank you.

On Tue, Oct 21, 2014 at 3:28 PM, Guozhang Wang wangg...@gmail.com wrote:

 Bryan,

 Did you take down some brokers in your cluster while hitting KAFKA-1028? If
 yes, you may be hitting KAFKA-1647 also.

 Guozhang

 On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher bjb...@gmail.com wrote:

  Hi everyone,
 
  We run a 3 Kafka cluster using 0.8.1.1 with all topics having a
 replication
  factor of 3 meaning every broker has a replica of every partition.
 
  We recently ran into this issue (
  https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss
 within
  Kafka. We understand why it happened and have plans to try to ensure it
  doesn't happen again.
 
  The strange part was that the broker that was chosen for the un-clean
  leader election seemed to drop all of its own data about the partition in
  the process as our monitoring shows the broker offset was reset to 0 for
 a
  number of partitions.
 
  Following the broker's server logs in chronological order for a
 particular
  partition that saw data loss I see this,
 
  2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6
  with log end offset 528026
 
  2014-10-16 10:20:18,144 WARN
  kafka.controller.OfflinePartitionLeaderSelector:
  [OfflinePartitionLeaderSelector]: No broker in ISR is alive for
 [TOPIC,6].
  Elect leader 1 from live brokers 1,2. There's potential data loss.
 
  2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6]
  on broker 1: No checkpointed highwatermark is found for partition
 [TOPIC,6]
 
  2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to
  offset 0.
 
  2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index
  /storage/kafka/00/kafka_data/TOPIC-6/00528024.index.deleted
 
  2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from
  log TOPIC-6.
 
  I'm not too worried about this since I'm hoping to move to Kafka 0.8.2
 ASAP
  but I was curious if anyone could explain this behavior.
 
  -Bryan
 



 --
 -- Guozhang




-- 
Bryan

Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?

2014-10-21 Thread Olson,Andrew

https://issues.apache.org/jira/browse/KAFKA-1647 sounds serious enough to 
include in 0.8.2-beta if possible.

CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Re: How many partition can one single machine handle in Kafka?

2014-10-21 Thread Todd Palino

As far as the number of partitions a single broker can handle, we've set
our cap at 4000 partitions (including replicas). Above that we've seen some
performance and stability issues.

-Todd

On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She xiaobin...@gmail.com wrote:

 hello, everyone

 I'm new to kafka, I'm wondering what's the max num of partition can one
 siggle machine handle in Kafka?

 Is there an sugeest num?

 Thanks.

 xiaobinshe

Re: Performance issues

2014-10-21 Thread Mohit Anchlia

I set the property to 1 in the consumer code that is passed to
createJavaConsumerConnector
code, but it didn't seem to help

props.put(fetch.wait.max.ms, fetchMaxWait);

On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote:

 This is a consumer config:

 fetch.wait.max.ms

 On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Is this a parameter I need to set it in kafka server or on the client
 side?
  Also, can you help point out which one exactly is consumer max wait time
  from this list?
 
  https://kafka.apache.org/08/configuration.html
 
  On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
   There was a bug that could lead to the fetch request from the consumer
   hitting it's timeout instead of being immediately triggered by the
  produce
   request. To see if you are effected by that set you consumer max wait
  time
   to 1 ms and see if the latency drops to 1 ms (or, alternately, try with
   trunk and see if that fixes the problem).
  
   The reason I suspect this problem is because the default timeout in the
   java consumer is 100ms.
  
   -Jay
  
   On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
This is the version I am using: kafka_2.10-0.8.1.1
   
I think this is fairly recent version
On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com
  wrote:
   
 What version of Kafka is this? Can you try the same test against
  trunk?
We
 fixed a couple of latency related bugs which may be the cause.

 -Jay

 On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  It's consistently close to 100ms which makes me believe that
 there
   are
 some
  settings that I might have to tweak, however, I am not sure how
 to
 confirm
  that assumption :)
  On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
   mohitanch...@gmail.com

  wrote:
 
   I have a java test that produces messages and then consumer
   consumers
 it.
   Consumers are active all the time. There is 1 consumer for 1
producer.
 I
  am
   measuring the time between the message is successfully written
 to
   the
  queue
   and the time consumer picks it up.
  
   On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
 neha.narkh...@gmail.com
   wrote:
  
   Can you give more information about the performance test?
 Which
test?
   Which
   queue? How did you measure the dequeue latency.
  
   On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
I am running a performance test and from what I am seeing is
   that
   messages
are taking about 100ms to pop from the queue itself and
 hence
making
  the
test slow. I am looking for pointers of how I can
 troubleshoot
this
   issue.
   
There seems to be plenty of CPU and IO available. I am
 running
   22
   producers
and 22 consumers in the same group.
   
  
  
  
 

   
  
 



 --
 -- Guozhang

Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?

2014-10-21 Thread Joe Stein

It doesn't look like a showstopper (all replicas for a partition going down
is rare and bigger issue if it happens) but it is good for folks to know
about it going in, definitely!

In either case I changed the fix version for that ticket to 0.8.2 so it
shows up now it is a blocker for final I think yes.

I just sent a vote on the dev thread for 0.8.2-beta feel free to
comment/vote on that thread if folks feel different about having KAFKA-1647
in the beta to make sure we get the most out of it then we can roll another
RC once it is in.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

On Tue, Oct 21, 2014 at 4:49 PM, Olson,Andrew aols...@cerner.com wrote:

 https://issues.apache.org/jira/browse/KAFKA-1647 sounds serious enough to
 include in 0.8.2-beta if possible.

 CONFIDENTIALITY NOTICE This message and any included attachments are from
 Cerner Corporation and are intended only for the addressee. The information
 contained in this message is confidential and may constitute inside or
 non-public information under international, federal, or state securities
 laws. Unauthorized forwarding, printing, copying, distribution, or use of
 such information is strictly prohibited and may be unlawful. If you are not
 the addressee, please promptly delete this message and notify the sender of
 the delivery error by e-mail or you may call Cerner's corporate offices in
 Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

Re: Partition and Replica assignment for a Topic

2014-10-21 Thread Jonathan Creasy

Heh, I think I was mis-interpreting that output.

Taking this output for example:

Topic:REPL-atl1-us  PartitionCount:256  ReplicationFactor:1
Configs:
Topic: REPL-atl1-us Partition: 0Leader: 32  Replicas:
32Isr: 32
Topic: REPL-atl1-us Partition: 1Leader: 33  Replicas:
33Isr: 33
Topic: REPL-atl1-us Partition: 2Leader: 34  Replicas:
34Isr: 34
Topic: REPL-atl1-us Partition: 3Leader: 35  Replicas:
35Isr: 35
[…]


I read that to mean that partition 0 was primary on broker 32, it had 32
replicas (somewhere) and that there were 32 in-sync replicas.

After you asked I went and looked at the docs on that.

I think it does indeed show me exactly what I’m looking for.

Thanks!



On 10/21/14, 3:32 PM, Gwen Shapira gshap...@cloudera.com wrote:

Anything missing in the output of:
kafka-topics.sh --describe --zookeeper localhost:2181
?

On Tue, Oct 21, 2014 at 4:29 PM, Jonathan Creasy
jonathan.cre...@turn.com wrote:
 I¹d like to be able to see a little more detail for a topic.

 What is the best way to get this information?

 Topic   Partition   Replica Broker
 topic1  1   1   3
 topic1  1   2   4
 topic1  1   3   1
 topic1  2   1   1
 topic1  2   2   3
 topic1  2   3   2

 I¹d like to be able to create topic allocations dashboards, similar to
the
 index allocations dashboards in the Elasticsearch plugin Marvell.

 Basically, translating index - topic, shard - partition, replica -
 replica, node - broker.

 -Jonathan

Re: Performance issues

2014-10-21 Thread Mohit Anchlia

Most of the consumer threads seems to be waiting:

ConsumerFetcherThread-groupA_ip-10-38-19-230-1413925671158-3cc3e22f-0-0
prio=10 tid=0x7f0aa84db800 nid=0x5be9 runnable [0x7f0a5a618000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked 0x9515bec0 (a sun.nio.ch.Util$2)
- locked 0x9515bea8 (a
java.util.Collections$UnmodifiableSet)
- locked 0x95511d00 (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:221)
- locked 0x9515bd28 (a java.lang.Object)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
- locked 0x95293828 (a
sun.nio.ch.SocketAdaptor$SocketInputStream)
at
java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
- locked 0x9515bcb0 (a java.lang.Object)
at kafka.utils.Utils$.read(Utils.scala:375)

On Tue, Oct 21, 2014 at 2:15 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I set the property to 1 in the consumer code that is passed to 
 createJavaConsumerConnector
 code, but it didn't seem to help

 props.put(fetch.wait.max.ms, fetchMaxWait);

 On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote:

 This is a consumer config:

 fetch.wait.max.ms

 On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Is this a parameter I need to set it in kafka server or on the client
 side?
  Also, can you help point out which one exactly is consumer max wait time
  from this list?
 
  https://kafka.apache.org/08/configuration.html
 
  On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com
 wrote:
 
   There was a bug that could lead to the fetch request from the consumer
   hitting it's timeout instead of being immediately triggered by the
  produce
   request. To see if you are effected by that set you consumer max wait
  time
   to 1 ms and see if the latency drops to 1 ms (or, alternately, try
 with
   trunk and see if that fixes the problem).
  
   The reason I suspect this problem is because the default timeout in
 the
   java consumer is 100ms.
  
   -Jay
  
   On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
This is the version I am using: kafka_2.10-0.8.1.1
   
I think this is fairly recent version
On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com
  wrote:
   
 What version of Kafka is this? Can you try the same test against
  trunk?
We
 fixed a couple of latency related bugs which may be the cause.

 -Jay

 On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  It's consistently close to 100ms which makes me believe that
 there
   are
 some
  settings that I might have to tweak, however, I am not sure how
 to
 confirm
  that assumption :)
  On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
   mohitanch...@gmail.com

  wrote:
 
   I have a java test that produces messages and then consumer
   consumers
 it.
   Consumers are active all the time. There is 1 consumer for 1
producer.
 I
  am
   measuring the time between the message is successfully
 written to
   the
  queue
   and the time consumer picks it up.
  
   On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
 neha.narkh...@gmail.com
   wrote:
  
   Can you give more information about the performance test?
 Which
test?
   Which
   queue? How did you measure the dequeue latency.
  
   On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
I am running a performance test and from what I am seeing
 is
   that
   messages
are taking about 100ms to pop from the queue itself and
 hence
making
  the
test slow. I am looking for pointers of how I can
 troubleshoot
this
   issue.
   
There seems to be plenty of CPU and IO available. I am
 running
   22
   producers
and 22 consumers in the same group.
   
  
  
  
 

   
  
 



 --
 -- Guozhang

Re: How many partition can one single machine handle in Kafka?

2014-10-21 Thread Neil Harkins

On Tue, Oct 21, 2014 at 2:10 PM, Todd Palino tpal...@gmail.com wrote:
 As far as the number of partitions a single broker can handle, we've set
 our cap at 4000 partitions (including replicas). Above that we've seen some
 performance and stability issues.

How many brokers? I'm curious: what kinds of problems would affect
a single broker with a large number of partitions, but not affect the
entire cluster with even more partitions?

Re: How to produce and consume events in 2 DCs?

2014-10-21 Thread Steven Wu

I think it doesn't have to be two more clusters. can be just two more
topics. MirrorMaker can copy from source topics in both regions into one
aggregate topic.

On Tue, Oct 21, 2014 at 1:54 AM, Erik van oosten 
e.vanoos...@grons.nl.invalid wrote:

 Thanks Neha,

 Unfortunately, the maintenance overhead of 2 more clusters is not
 acceptable to us.

 Would you accept a pull request on mirror maker that would rename topics
 on the fly?

 For example by accepting the parameter rename:
—rename src1/dest1,src2/dest2
 or, extended with RE support:
—rename old_(.*)/new_\1

 Kind regards,
 Erik.


 Op 20 okt. 2014, om 16:43 heeft Neha Narkhede neha.narkh...@gmail.com
 het volgende geschreven:

  Another way to set up this kind of mirroring is by deploying 2 clusters
 in
  each DC - a local Kafka cluster and an aggregate Kafka cluster. The
 mirror
  maker copies data from both the DC's local clusters into the aggregate
  clusters. So if you want access to a topic with data from both DC's, you
  subscribe to the aggregate cluster.
 
  Thanks,
  Neha
 
  On Mon, Oct 20, 2014 at 7:07 AM, Erik van oosten 
  e.vanoos...@grons.nl.invalid wrote:
 
  Hi,
 
  We have 2 data centers that produce events. Each DC has to process
 events
  from both DCs.
 
  I had the following in mind:
 
DC 1 | DC 2
 events  |events
+  +  +  |   +  +  +
|  |  |  |   |  |  |
v  v  v  |   v  v  v
  ++ | ++
  | Receiver topic | | | Receiver topic |
  ++   ++
  |  |   mirroring  ||
  |  |   +--+|
  |  |   |   |
  |  ++  |
  v  vv  v
  ++ | ++
  | Consumer topic | | | Consumer topic |
  ++ | ++
+  +  +  |   +  +  +
|  |  |  |   |  |  |
v  v  v  |   v  v  v
   consumers |  consumers
 
 
  As each DC has a single Kafka cluster, on each DC the receiver topic and
  consumer topic needs to be on the same cluster.
  Unfortunately, mirror maker does not seem to support mirroring to a
 topic
  with another name.
 
  Is there another tool we could use?
  Or, is there another approach for producing and consuming from 2 DCs?
 
  Kind regards,
 Erik.
 
  —
  Erik van Oosten
  http://www.day-to-day-stuff.blogspot.nl/

Re: taking broker down and returning it does not restore cluster state (nor rebalance)

2014-10-21 Thread Jun Rao

To balance the leaders, you can run the tool in
http://kafka.apache.org/documentation.html#basic_ops_leader_balancing

In the upcoming 0.8.2 release, we have fixed the auto leader balancing
logic. So leaders will be balanced automatically.

Thanks,

Jun

On Tue, Oct 21, 2014 at 12:19 AM, Shlomi Hazan shl...@viber.com wrote:

 trying to reproduce failed: after somewhat long minutes I noticed that the
 partition leaders regained balance again, and the only issue left is that
 the preferred replica was not balanced as it was before taking the broker
 down. meaning, that the output of the topic description shows broker 1 (out
 of 3) as preferred replica (first in ISR) in 66% of the cases instead of
 expected 33%.



 On Mon, Oct 20, 2014 at 11:36 PM, Joel Koshy jjkosh...@gmail.com wrote:

  As Neha mentioned, with rep factor 2x, this shouldn't normally cause
  an issue.
 
  Taking the broker down will cause the leader to move to another
  replica; consumers and producers will rediscover the new leader; no
  rebalances should be triggered.
 
  When you bring the broker back up, unless you run a preferred replica
  leader re-election the broker will remain a follower. Again, there
  will be no effect on the producers or consumers (i.e., no rebalances).
 
  If you can reproduce this easily, can you please send exact steps to
  reproduce and send over your consumer logs?
 
  Thanks,
 
  Joel
 
  On Mon, Oct 20, 2014 at 09:13:27PM +0300, Shlomi Hazan wrote:
   Yes I did. It is set to 2.
   On Oct 20, 2014 5:38 PM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
  
Did you ensure that your replication factor was set higher than 1? If
  so,
things should recover automatically after adding the killed broker
 back
into the cluster.
   
On Mon, Oct 20, 2014 at 1:32 AM, Shlomi Hazan shl...@viber.com
  wrote:
   
 Hi,

 Running some tests on 0811 and wanted to see what happens when a
  broker
is
 taken down with 'kill'. I bumped into the situation at the subject
  where
 launching the broker back left him a bit out of the game as far as
 I
could
 see using stack driver metrics.
 Trying to rebalance with verify consumer rebalance return an
 error
  no
 owner for partition for all partitions of that topic (128
  partitions).
 moreover, yet aside from the issue at hand, changing the group name
  to a
 non-existent group returned success.
 taking both the consumers and producers down allowed the rebalance
 to
 return success...

 And the question is:
 How do you restore 100% state after taking down a broker? what is
 the
best
 practice? what needs be checked and what needs be done?

 Shlomi

Re: 0.8.1.2

2014-10-21 Thread Jun Rao

We are voting an 0.8.2 beta release right now.

Thanks,

Jun

On Tue, Oct 21, 2014 at 11:17 AM, Shlomi Hazan shl...@viber.com wrote:

 Hi All,
 Will version 0.8.1.2 happen?
 Shlomi

Re: How many partition can one single machine handle in Kafka?

2014-10-21 Thread Xiaobin She

Todd,

Actually I'm wondering how kafka handle so much partition, with one
partition there is at least one file on disk, and with 4000 partition,
there will be at least 4000 files.

When all these partitions have write request, how did Kafka make the write
operation on the disk to be sequential (which is emphasized in the design
document of Kafka) and make sure the disk access is effective?

Thank you for your reply.

xiaobinshe



2014-10-22 5:10 GMT+08:00 Todd Palino tpal...@gmail.com:

 As far as the number of partitions a single broker can handle, we've set
 our cap at 4000 partitions (including replicas). Above that we've seen some
 performance and stability issues.

 -Todd

 On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She xiaobin...@gmail.com
 wrote:

  hello, everyone
 
  I'm new to kafka, I'm wondering what's the max num of partition can one
  siggle machine handle in Kafka?
 
  Is there an sugeest num?
 
  Thanks.
 
  xiaobinshe

Re: Sizing Cluster

2014-10-21 Thread István

Hi Pete,

Yes you are right, both nodes has all of the data. I was just wondering
what is the scenario for losing one node, in production it might not fly.
If this is for testing only, you are good.

Answering your question, I think retention policy (log.retention.hours) is
for controlling the disk utilization. I think disk IO (log.flush.* section)
and network IO (num.network.threads, etc.) saturation you might want to
measure during tests and spec it based on that. Here is a link with
examples for the full list of relevant settings, with more description:
https://kafka.apache.org/08/ops.html.

 I guess the most important question is, how many clients do you want to
support. You could work out how much space you need based on that, assuming
few things. For more complete documentation refer to:
https://kafka.apache.org/08/configuration.html

Regards,
Istvan






On Tue, Oct 21, 2014 at 1:22 PM, Pete Wright pwri...@rubiconproject.com
wrote:

 Thanks Istvan - I think I understand what you are say here - although I
 was under the impression that if I ensured each topic was being
 replicated N+1 times a two node cluster would ensure each node has a
 copy of the entire contents of the message bus at any given time.

 I agree with your assessment though that having 3 nodes is a more
 durable configuration, but was hoping others could explain how they
 calculate capacity and scaling issues on their storage subsystems.

 Cheers,
 -pete

 On 10/21/14 11:28, István wrote:
  One thing that you have to keep in mind is that moving 10T between nodes
  takes long time. If you have a node failure and you need to rebuild
  (resync) the data your system is going to be vulnerable against the
 second
  node failure. You could mitigate this with using raid. I think generally
  speaking 3 node clusters are better for production purposes.
 
  I.
 
  On Tue, Oct 21, 2014 at 11:12 AM, Pete Wright 
 pwri...@rubiconproject.com
  wrote:
 
  Hi There,
  I have a question regarding sizing disk for kafka brokers.
 Let's
  say I
  have systems capable of providing 10TB of storage, and they act as Kafka
  brokers.  If I were to deploy two of these nodes, and enable replication
  in Kafka, would I actually have 10TB available for my producers to write
  to?  Is there any overhead I should be concerned with?
 
  I guess I am just wanting to make sure that there are not any major
  pitfalls in deploying a two-node cluster, versus say a 3-node cluster.
 
  Any advice or best-practices would be very helpful!
 
  Thanks in advance,
  -pete
 
 
  --
  Pete Wright
  Systems Architect
  Rubicon Project
  pwri...@rubiconproject.com
  310.309.9298
 
 
 
 

 --
 Pete Wright
 Systems Architect
 Rubicon Project
 pwri...@rubiconproject.com
 310.309.9298




-- 
the sun shines for all

39 matches

Mail list logo