How many partition can one single machine handle in Kafka?
hello, everyone I'm new to kafka, I'm wondering what's the max num of partition can one siggle machine handle in Kafka? Is there an sugeest num? Thanks. xiaobinshe
Re: taking broker down and returning it does not restore cluster state (nor rebalance)
trying to reproduce failed: after somewhat long minutes I noticed that the partition leaders regained balance again, and the only issue left is that the preferred replica was not balanced as it was before taking the broker down. meaning, that the output of the topic description shows broker 1 (out of 3) as preferred replica (first in ISR) in 66% of the cases instead of expected 33%. On Mon, Oct 20, 2014 at 11:36 PM, Joel Koshy jjkosh...@gmail.com wrote: As Neha mentioned, with rep factor 2x, this shouldn't normally cause an issue. Taking the broker down will cause the leader to move to another replica; consumers and producers will rediscover the new leader; no rebalances should be triggered. When you bring the broker back up, unless you run a preferred replica leader re-election the broker will remain a follower. Again, there will be no effect on the producers or consumers (i.e., no rebalances). If you can reproduce this easily, can you please send exact steps to reproduce and send over your consumer logs? Thanks, Joel On Mon, Oct 20, 2014 at 09:13:27PM +0300, Shlomi Hazan wrote: Yes I did. It is set to 2. On Oct 20, 2014 5:38 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Did you ensure that your replication factor was set higher than 1? If so, things should recover automatically after adding the killed broker back into the cluster. On Mon, Oct 20, 2014 at 1:32 AM, Shlomi Hazan shl...@viber.com wrote: Hi, Running some tests on 0811 and wanted to see what happens when a broker is taken down with 'kill'. I bumped into the situation at the subject where launching the broker back left him a bit out of the game as far as I could see using stack driver metrics. Trying to rebalance with verify consumer rebalance return an error no owner for partition for all partitions of that topic (128 partitions). moreover, yet aside from the issue at hand, changing the group name to a non-existent group returned success. taking both the consumers and producers down allowed the rebalance to return success... And the question is: How do you restore 100% state after taking down a broker? what is the best practice? what needs be checked and what needs be done? Shlomi
Re: How to produce and consume events in 2 DCs?
Thanks Neha, Unfortunately, the maintenance overhead of 2 more clusters is not acceptable to us. Would you accept a pull request on mirror maker that would rename topics on the fly? For example by accepting the parameter rename: —rename src1/dest1,src2/dest2 or, extended with RE support: —rename old_(.*)/new_\1 Kind regards, Erik. Op 20 okt. 2014, om 16:43 heeft Neha Narkhede neha.narkh...@gmail.com het volgende geschreven: Another way to set up this kind of mirroring is by deploying 2 clusters in each DC - a local Kafka cluster and an aggregate Kafka cluster. The mirror maker copies data from both the DC's local clusters into the aggregate clusters. So if you want access to a topic with data from both DC's, you subscribe to the aggregate cluster. Thanks, Neha On Mon, Oct 20, 2014 at 7:07 AM, Erik van oosten e.vanoos...@grons.nl.invalid wrote: Hi, We have 2 data centers that produce events. Each DC has to process events from both DCs. I had the following in mind: DC 1 | DC 2 events |events + + + | + + + | | | | | | | v v v | v v v ++ | ++ | Receiver topic | | | Receiver topic | ++ ++ | | mirroring || | | +--+| | | | | | ++ | v vv v ++ | ++ | Consumer topic | | | Consumer topic | ++ | ++ + + + | + + + | | | | | | | v v v | v v v consumers | consumers As each DC has a single Kafka cluster, on each DC the receiver topic and consumer topic needs to be on the same cluster. Unfortunately, mirror maker does not seem to support mirroring to a topic with another name. Is there another tool we could use? Or, is there another approach for producing and consuming from 2 DCs? Kind regards, Erik. — Erik van Oosten http://www.day-to-day-stuff.blogspot.nl/
Clean Kafka Queue
Hi Guys, Is there a manner of cleaning a kafka queue after that the consumer consume the messages? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Sending Same Message to Two Topics on Same Broker Cluster
I'm not sure I understood your concern about invoking send() twice, once with each topic. Are you worried about the network overhead? Whether Kafka does this transparently or not, sending messages to different topics will carry some overhead. I think the design of the API is much more intuitive and cleaner if a message is sent to a topic partition. On Mon, Oct 20, 2014 at 9:17 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Neha, Yes, I understand that but when transmitting single message (I can not set List of all topics) Only Single one. So I will to add same message in buffer with different topic. If Kakfa protocol, allows to add multiple topic then message does not have to be re-transmited over the wire to add to multiple topic. The Producer record only allow one topic. http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html Thanks for your quick response and I appreciate your help. Thanks, Bhavesh On Mon, Oct 20, 2014 at 9:10 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Not really. You need producers to send data to Kafka. On Mon, Oct 20, 2014 at 9:05 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kakfa Team, I would like to send a single message to multiple topics (two for now) without re-transmitting the message from producer to brokers. Is this possible? Both Producers Scala and Java does not allow this. I do not have to do this all the time only based on application condition. Thanks in advance of your help !! Thanks, Bhavesh
Re: Performance issues
I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.
Re: Clean Kafka Queue
you can use log.retention.hours or log.retention.bytes to prune the log more info on that config here https://kafka.apache.org/08/configuration.html if you want to delete a message after the consumer processed a message there is no api for it. -Harsha On Tue, Oct 21, 2014, at 08:00 AM, Eduardo Costa Alfaia wrote: Hi Guys, Is there a manner of cleaning a kafka queue after that the consumer consume the messages? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Sending Same Message to Two Topics on Same Broker Cluster
Hi Neha, All, I am saying is that if same byte[] or data has to go to two topics then, I have to call send twice and with same data has to transfer over the wire twice (assuming the partition is on same broker for two topics, then it not efficient.). If Kafka Protocol allows to set multiple topics and partitions for request then it would me great. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ProduceRequest *ProducerRecord http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord(java.lang.String, byte[], byte[])*(java.lang.String *topic*, byte[] key, byte[] value) Thanks, Bhavesh On Tue, Oct 21, 2014 at 8:26 AM, Neha Narkhede neha.narkh...@gmail.com wrote: I'm not sure I understood your concern about invoking send() twice, once with each topic. Are you worried about the network overhead? Whether Kafka does this transparently or not, sending messages to different topics will carry some overhead. I think the design of the API is much more intuitive and cleaner if a message is sent to a topic partition. On Mon, Oct 20, 2014 at 9:17 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Neha, Yes, I understand that but when transmitting single message (I can not set List of all topics) Only Single one. So I will to add same message in buffer with different topic. If Kakfa protocol, allows to add multiple topic then message does not have to be re-transmited over the wire to add to multiple topic. The Producer record only allow one topic. http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html Thanks for your quick response and I appreciate your help. Thanks, Bhavesh On Mon, Oct 20, 2014 at 9:10 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Not really. You need producers to send data to Kafka. On Mon, Oct 20, 2014 at 9:05 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kakfa Team, I would like to send a single message to multiple topics (two for now) without re-transmitting the message from producer to brokers. Is this possible? Both Producers Scala and Java does not allow this. I do not have to do this all the time only based on application condition. Thanks in advance of your help !! Thanks, Bhavesh
Re: Clean Kafka Queue
The concept of truncate topic comes up a lot. I will add it as an item to https://issues.apache.org/jira/browse/KAFKA-1694 It is a scary feature though, it might be best to wait until authorizations are in place before we release it. With 0.8.2 you can delete topics so at least you can start fresh easier. That should work in the mean time. 0.8.2-beta should be out this week :) /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Oct 21, 2014 at 12:03 PM, Harsha ka...@harsha.io wrote: you can use log.retention.hours or log.retention.bytes to prune the log more info on that config here https://kafka.apache.org/08/configuration.html if you want to delete a message after the consumer processed a message there is no api for it. -Harsha On Tue, Oct 21, 2014, at 08:00 AM, Eduardo Costa Alfaia wrote: Hi Guys, Is there a manner of cleaning a kafka queue after that the consumer consume the messages? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Clean Kafka Queue
Ok guys, Thanks by the help. Regards On Oct 21, 2014, at 18:30, Joe Stein joe.st...@stealth.ly wrote: The concept of truncate topic comes up a lot. I will add it as an item to https://issues.apache.org/jira/browse/KAFKA-1694 It is a scary feature though, it might be best to wait until authorizations are in place before we release it. With 0.8.2 you can delete topics so at least you can start fresh easier. That should work in the mean time. 0.8.2-beta should be out this week :) /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Oct 21, 2014 at 12:03 PM, Harsha ka...@harsha.io wrote: you can use log.retention.hours or log.retention.bytes to prune the log more info on that config here https://kafka.apache.org/08/configuration.html if you want to delete a message after the consumer processed a message there is no api for it. -Harsha On Tue, Oct 21, 2014, at 08:00 AM, Eduardo Costa Alfaia wrote: Hi Guys, Is there a manner of cleaning a kafka queue after that the consumer consume the messages? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Sending Same Message to Two Topics on Same Broker Cluster
Hey Bhavesh, This would only work if both topics happened to be on the same machine, which generally they wouldn't. -Jay On Tue, Oct 21, 2014 at 9:14 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Neha, All, I am saying is that if same byte[] or data has to go to two topics then, I have to call send twice and with same data has to transfer over the wire twice (assuming the partition is on same broker for two topics, then it not efficient.). If Kafka Protocol allows to set multiple topics and partitions for request then it would me great. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ProduceRequest *ProducerRecord http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord(java.lang.String , byte[], byte[])*(java.lang.String *topic*, byte[] key, byte[] value) Thanks, Bhavesh On Tue, Oct 21, 2014 at 8:26 AM, Neha Narkhede neha.narkh...@gmail.com wrote: I'm not sure I understood your concern about invoking send() twice, once with each topic. Are you worried about the network overhead? Whether Kafka does this transparently or not, sending messages to different topics will carry some overhead. I think the design of the API is much more intuitive and cleaner if a message is sent to a topic partition. On Mon, Oct 20, 2014 at 9:17 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Neha, Yes, I understand that but when transmitting single message (I can not set List of all topics) Only Single one. So I will to add same message in buffer with different topic. If Kakfa protocol, allows to add multiple topic then message does not have to be re-transmited over the wire to add to multiple topic. The Producer record only allow one topic. http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/producer/ProducerRecord.html Thanks for your quick response and I appreciate your help. Thanks, Bhavesh On Mon, Oct 20, 2014 at 9:10 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Not really. You need producers to send data to Kafka. On Mon, Oct 20, 2014 at 9:05 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kakfa Team, I would like to send a single message to multiple topics (two for now) without re-transmitting the message from producer to brokers. Is this possible? Both Producers Scala and Java does not allow this. I do not have to do this all the time only based on application condition. Thanks in advance of your help !! Thanks, Bhavesh
Re: Performance issues
This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.
Sizing Cluster
Hi There, I have a question regarding sizing disk for kafka brokers. Let's say I have systems capable of providing 10TB of storage, and they act as Kafka brokers. If I were to deploy two of these nodes, and enable replication in Kafka, would I actually have 10TB available for my producers to write to? Is there any overhead I should be concerned with? I guess I am just wanting to make sure that there are not any major pitfalls in deploying a two-node cluster, versus say a 3-node cluster. Any advice or best-practices would be very helpful! Thanks in advance, -pete -- Pete Wright Systems Architect Rubicon Project pwri...@rubiconproject.com 310.309.9298
0.8.1.2
Hi All, Will version 0.8.1.2 happen? Shlomi
Re: Sizing Cluster
One thing that you have to keep in mind is that moving 10T between nodes takes long time. If you have a node failure and you need to rebuild (resync) the data your system is going to be vulnerable against the second node failure. You could mitigate this with using raid. I think generally speaking 3 node clusters are better for production purposes. I. On Tue, Oct 21, 2014 at 11:12 AM, Pete Wright pwri...@rubiconproject.com wrote: Hi There, I have a question regarding sizing disk for kafka brokers. Let's say I have systems capable of providing 10TB of storage, and they act as Kafka brokers. If I were to deploy two of these nodes, and enable replication in Kafka, would I actually have 10TB available for my producers to write to? Is there any overhead I should be concerned with? I guess I am just wanting to make sure that there are not any major pitfalls in deploying a two-node cluster, versus say a 3-node cluster. Any advice or best-practices would be very helpful! Thanks in advance, -pete -- Pete Wright Systems Architect Rubicon Project pwri...@rubiconproject.com 310.309.9298 -- the sun shines for all
Re: Performance issues
There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.
Re: Performance issues
Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.
frequent periods of ~1500 replicas not in sync
Hi. I've got a 5 node cluster running Kafka 0.8.1, with 4697 partitions (2 replicas each) across 564 topics. I'm sending it about 1% of our total messaging load now, and several times a day there is a period where 1~1500 partitions have one replica not in sync. Is this normal? If a consumer is reading from a replica that gets deemed not in sync, does it get redirected to the good replica? Is there a #partitions over which maintenance tasks become infeasible? Relevant config bits: auto.leader.rebalance.enable=true leader.imbalance.per.broker.percentage=20 leader.imbalance.check.interval.seconds=30 replica.lag.time.max.ms=1 replica.lag.max.messages=4000 num.replica.fetchers=4 replica.fetch.max.bytes=10485760 Not necessarily correlated to those periods, I see a lot of these errors in the logs: [2014-10-20 21:23:26,999] 21963614 [ReplicaFetcherThread-3-1] ERROR kafka.server.ReplicaFetcherThread - [ReplicaFetcherThread-3-1], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 77423; ClientId: ReplicaFetcherThread-3-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: ... And a few of these: [2014-10-20 21:23:39,555] 3467527 [kafka-scheduler-2] ERROR kafka.utils.ZkUtils$ - Conditional update of path /brokers/topics/foo.bar/partitions/3/state with data {controller_epoch:11,leader:3,version:1,leader_epoch:109,isr:[3]} and expected version 197 failed due to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/foo.bar/partitions/3/state And this one I assume is a client closing the connection non-gracefully, thus should probably be a warning, not an error?: [2014-10-20 21:54:15,599] 23812214 [kafka-processor-9092-3] ERROR kafka.network.Processor - Closing socket for /10.31.0.224 because of error -neil
Re: frequent periods of ~1500 replicas not in sync
Consumers always read from the leader replica, which is always in sync by definition. So you are good there. The concern would be if the leader crashes during this period. On Tue, Oct 21, 2014 at 2:56 PM, Neil Harkins nhark...@gmail.com wrote: Hi. I've got a 5 node cluster running Kafka 0.8.1, with 4697 partitions (2 replicas each) across 564 topics. I'm sending it about 1% of our total messaging load now, and several times a day there is a period where 1~1500 partitions have one replica not in sync. Is this normal? If a consumer is reading from a replica that gets deemed not in sync, does it get redirected to the good replica? Is there a #partitions over which maintenance tasks become infeasible? Relevant config bits: auto.leader.rebalance.enable=true leader.imbalance.per.broker.percentage=20 leader.imbalance.check.interval.seconds=30 replica.lag.time.max.ms=1 replica.lag.max.messages=4000 num.replica.fetchers=4 replica.fetch.max.bytes=10485760 Not necessarily correlated to those periods, I see a lot of these errors in the logs: [2014-10-20 21:23:26,999] 21963614 [ReplicaFetcherThread-3-1] ERROR kafka.server.ReplicaFetcherThread - [ReplicaFetcherThread-3-1], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 77423; ClientId: ReplicaFetcherThread-3-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: ... And a few of these: [2014-10-20 21:23:39,555] 3467527 [kafka-scheduler-2] ERROR kafka.utils.ZkUtils$ - Conditional update of path /brokers/topics/foo.bar/partitions/3/state with data {controller_epoch:11,leader:3,version:1,leader_epoch:109,isr:[3]} and expected version 197 failed due to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/foo.bar/partitions/3/state And this one I assume is a client closing the connection non-gracefully, thus should probably be a warning, not an error?: [2014-10-20 21:54:15,599] 23812214 [kafka-processor-9092-3] ERROR kafka.network.Processor - Closing socket for /10.31.0.224 because of error -neil
Re: frequent periods of ~1500 replicas not in sync
Neil, what you are seeing could probably be KAFKA-1407 https://issues.apache.org/jira/browse/KAFKA-1407. On Tue, Oct 21, 2014 at 12:03 PM, Gwen Shapira gshap...@cloudera.com wrote: Consumers always read from the leader replica, which is always in sync by definition. So you are good there. The concern would be if the leader crashes during this period. On Tue, Oct 21, 2014 at 2:56 PM, Neil Harkins nhark...@gmail.com wrote: Hi. I've got a 5 node cluster running Kafka 0.8.1, with 4697 partitions (2 replicas each) across 564 topics. I'm sending it about 1% of our total messaging load now, and several times a day there is a period where 1~1500 partitions have one replica not in sync. Is this normal? If a consumer is reading from a replica that gets deemed not in sync, does it get redirected to the good replica? Is there a #partitions over which maintenance tasks become infeasible? Relevant config bits: auto.leader.rebalance.enable=true leader.imbalance.per.broker.percentage=20 leader.imbalance.check.interval.seconds=30 replica.lag.time.max.ms=1 replica.lag.max.messages=4000 num.replica.fetchers=4 replica.fetch.max.bytes=10485760 Not necessarily correlated to those periods, I see a lot of these errors in the logs: [2014-10-20 21:23:26,999] 21963614 [ReplicaFetcherThread-3-1] ERROR kafka.server.ReplicaFetcherThread - [ReplicaFetcherThread-3-1], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 77423; ClientId: ReplicaFetcherThread-3-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: ... And a few of these: [2014-10-20 21:23:39,555] 3467527 [kafka-scheduler-2] ERROR kafka.utils.ZkUtils$ - Conditional update of path /brokers/topics/foo.bar/partitions/3/state with data {controller_epoch:11,leader:3,version:1,leader_epoch:109,isr:[3]} and expected version 197 failed due to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/foo.bar/partitions/3/state And this one I assume is a client closing the connection non-gracefully, thus should probably be a warning, not an error?: [2014-10-20 21:54:15,599] 23812214 [kafka-processor-9092-3] ERROR kafka.network.Processor - Closing socket for /10.31.0.224 because of error -neil -- -- Guozhang
Re: Performance issues
This is a consumer config: fetch.wait.max.ms On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group. -- -- Guozhang
Re: Sizing Cluster
Thanks Istvan - I think I understand what you are say here - although I was under the impression that if I ensured each topic was being replicated N+1 times a two node cluster would ensure each node has a copy of the entire contents of the message bus at any given time. I agree with your assessment though that having 3 nodes is a more durable configuration, but was hoping others could explain how they calculate capacity and scaling issues on their storage subsystems. Cheers, -pete On 10/21/14 11:28, István wrote: One thing that you have to keep in mind is that moving 10T between nodes takes long time. If you have a node failure and you need to rebuild (resync) the data your system is going to be vulnerable against the second node failure. You could mitigate this with using raid. I think generally speaking 3 node clusters are better for production purposes. I. On Tue, Oct 21, 2014 at 11:12 AM, Pete Wright pwri...@rubiconproject.com wrote: Hi There, I have a question regarding sizing disk for kafka brokers. Let's say I have systems capable of providing 10TB of storage, and they act as Kafka brokers. If I were to deploy two of these nodes, and enable replication in Kafka, would I actually have 10TB available for my producers to write to? Is there any overhead I should be concerned with? I guess I am just wanting to make sure that there are not any major pitfalls in deploying a two-node cluster, versus say a 3-node cluster. Any advice or best-practices would be very helpful! Thanks in advance, -pete -- Pete Wright Systems Architect Rubicon Project pwri...@rubiconproject.com 310.309.9298 -- Pete Wright Systems Architect Rubicon Project pwri...@rubiconproject.com 310.309.9298
Re: How many partition can one single machine handle in Kafka?
Xiaobin, This FAQ may give you some hints: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic ? On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She xiaobin...@gmail.com wrote: hello, everyone I'm new to kafka, I'm wondering what's the max num of partition can one siggle machine handle in Kafka? Is there an sugeest num? Thanks. xiaobinshe -- -- Guozhang
Re: Strange behavior during un-clean leader election
Bryan, Did you take down some brokers in your cluster while hitting KAFKA-1028? If yes, you may be hitting KAFKA-1647 also. Guozhang On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher bjb...@gmail.com wrote: Hi everyone, We run a 3 Kafka cluster using 0.8.1.1 with all topics having a replication factor of 3 meaning every broker has a replica of every partition. We recently ran into this issue ( https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss within Kafka. We understand why it happened and have plans to try to ensure it doesn't happen again. The strange part was that the broker that was chosen for the un-clean leader election seemed to drop all of its own data about the partition in the process as our monitoring shows the broker offset was reset to 0 for a number of partitions. Following the broker's server logs in chronological order for a particular partition that saw data loss I see this, 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6 with log end offset 528026 2014-10-16 10:20:18,144 WARN kafka.controller.OfflinePartitionLeaderSelector: [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [TOPIC,6]. Elect leader 1 from live brokers 1,2. There's potential data loss. 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6] on broker 1: No checkpointed highwatermark is found for partition [TOPIC,6] 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to offset 0. 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index /storage/kafka/00/kafka_data/TOPIC-6/00528024.index.deleted 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from log TOPIC-6. I'm not too worried about this since I'm hoping to move to Kafka 0.8.2 ASAP but I was curious if anyone could explain this behavior. -Bryan -- -- Guozhang
Partition and Replica assignment for a Topic
I¹d like to be able to see a little more detail for a topic. What is the best way to get this information? Topic Partition Replica Broker topic1 1 1 3 topic1 1 2 4 topic1 1 3 1 topic1 2 1 1 topic1 2 2 3 topic1 2 3 2 I¹d like to be able to create topic allocations dashboards, similar to the index allocations dashboards in the Elasticsearch plugin Marvell. Basically, translating index - topic, shard - partition, replica - replica, node - broker. -Jonathan
Re: Partition and Replica assignment for a Topic
Anything missing in the output of: kafka-topics.sh --describe --zookeeper localhost:2181 ? On Tue, Oct 21, 2014 at 4:29 PM, Jonathan Creasy jonathan.cre...@turn.com wrote: I¹d like to be able to see a little more detail for a topic. What is the best way to get this information? Topic Partition Replica Broker topic1 1 1 3 topic1 1 2 4 topic1 1 3 1 topic1 2 1 1 topic1 2 2 3 topic1 2 3 2 I¹d like to be able to create topic allocations dashboards, similar to the index allocations dashboards in the Elasticsearch plugin Marvell. Basically, translating index - topic, shard - partition, replica - replica, node - broker. -Jonathan
Re: Strange behavior during un-clean leader election
Yes the cluster was to a degree restarted in a rolling fashion but due to some other events causing the brokers to be rather confused the ISR for a number of partitions became empty and a new controller was elected. KAFKA-1647 sounds exactly like the problem I encountered. Thank you. On Tue, Oct 21, 2014 at 3:28 PM, Guozhang Wang wangg...@gmail.com wrote: Bryan, Did you take down some brokers in your cluster while hitting KAFKA-1028? If yes, you may be hitting KAFKA-1647 also. Guozhang On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher bjb...@gmail.com wrote: Hi everyone, We run a 3 Kafka cluster using 0.8.1.1 with all topics having a replication factor of 3 meaning every broker has a replica of every partition. We recently ran into this issue ( https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss within Kafka. We understand why it happened and have plans to try to ensure it doesn't happen again. The strange part was that the broker that was chosen for the un-clean leader election seemed to drop all of its own data about the partition in the process as our monitoring shows the broker offset was reset to 0 for a number of partitions. Following the broker's server logs in chronological order for a particular partition that saw data loss I see this, 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6 with log end offset 528026 2014-10-16 10:20:18,144 WARN kafka.controller.OfflinePartitionLeaderSelector: [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [TOPIC,6]. Elect leader 1 from live brokers 1,2. There's potential data loss. 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6] on broker 1: No checkpointed highwatermark is found for partition [TOPIC,6] 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to offset 0. 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index /storage/kafka/00/kafka_data/TOPIC-6/00528024.index.deleted 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from log TOPIC-6. I'm not too worried about this since I'm hoping to move to Kafka 0.8.2 ASAP but I was curious if anyone could explain this behavior. -Bryan -- -- Guozhang -- Bryan
Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?
https://issues.apache.org/jira/browse/KAFKA-1647 sounds serious enough to include in 0.8.2-beta if possible. CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
Re: How many partition can one single machine handle in Kafka?
As far as the number of partitions a single broker can handle, we've set our cap at 4000 partitions (including replicas). Above that we've seen some performance and stability issues. -Todd On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She xiaobin...@gmail.com wrote: hello, everyone I'm new to kafka, I'm wondering what's the max num of partition can one siggle machine handle in Kafka? Is there an sugeest num? Thanks. xiaobinshe
Re: Performance issues
I set the property to 1 in the consumer code that is passed to createJavaConsumerConnector code, but it didn't seem to help props.put(fetch.wait.max.ms, fetchMaxWait); On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote: This is a consumer config: fetch.wait.max.ms On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group. -- -- Guozhang
Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?
It doesn't look like a showstopper (all replicas for a partition going down is rare and bigger issue if it happens) but it is good for folks to know about it going in, definitely! In either case I changed the fix version for that ticket to 0.8.2 so it shows up now it is a blocker for final I think yes. I just sent a vote on the dev thread for 0.8.2-beta feel free to comment/vote on that thread if folks feel different about having KAFKA-1647 in the beta to make sure we get the most out of it then we can roll another RC once it is in. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Oct 21, 2014 at 4:49 PM, Olson,Andrew aols...@cerner.com wrote: https://issues.apache.org/jira/browse/KAFKA-1647 sounds serious enough to include in 0.8.2-beta if possible. CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
Re: Partition and Replica assignment for a Topic
Heh, I think I was mis-interpreting that output. Taking this output for example: Topic:REPL-atl1-us PartitionCount:256 ReplicationFactor:1 Configs: Topic: REPL-atl1-us Partition: 0Leader: 32 Replicas: 32Isr: 32 Topic: REPL-atl1-us Partition: 1Leader: 33 Replicas: 33Isr: 33 Topic: REPL-atl1-us Partition: 2Leader: 34 Replicas: 34Isr: 34 Topic: REPL-atl1-us Partition: 3Leader: 35 Replicas: 35Isr: 35 […] I read that to mean that partition 0 was primary on broker 32, it had 32 replicas (somewhere) and that there were 32 in-sync replicas. After you asked I went and looked at the docs on that. I think it does indeed show me exactly what I’m looking for. Thanks! On 10/21/14, 3:32 PM, Gwen Shapira gshap...@cloudera.com wrote: Anything missing in the output of: kafka-topics.sh --describe --zookeeper localhost:2181 ? On Tue, Oct 21, 2014 at 4:29 PM, Jonathan Creasy jonathan.cre...@turn.com wrote: I¹d like to be able to see a little more detail for a topic. What is the best way to get this information? Topic Partition Replica Broker topic1 1 1 3 topic1 1 2 4 topic1 1 3 1 topic1 2 1 1 topic1 2 2 3 topic1 2 3 2 I¹d like to be able to create topic allocations dashboards, similar to the index allocations dashboards in the Elasticsearch plugin Marvell. Basically, translating index - topic, shard - partition, replica - replica, node - broker. -Jonathan
Re: Performance issues
Most of the consumer threads seems to be waiting: ConsumerFetcherThread-groupA_ip-10-38-19-230-1413925671158-3cc3e22f-0-0 prio=10 tid=0x7f0aa84db800 nid=0x5be9 runnable [0x7f0a5a618000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked 0x9515bec0 (a sun.nio.ch.Util$2) - locked 0x9515bea8 (a java.util.Collections$UnmodifiableSet) - locked 0x95511d00 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:221) - locked 0x9515bd28 (a java.lang.Object) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) - locked 0x95293828 (a sun.nio.ch.SocketAdaptor$SocketInputStream) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) - locked 0x9515bcb0 (a java.lang.Object) at kafka.utils.Utils$.read(Utils.scala:375) On Tue, Oct 21, 2014 at 2:15 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I set the property to 1 in the consumer code that is passed to createJavaConsumerConnector code, but it didn't seem to help props.put(fetch.wait.max.ms, fetchMaxWait); On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote: This is a consumer config: fetch.wait.max.ms On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group. -- -- Guozhang
Re: How many partition can one single machine handle in Kafka?
On Tue, Oct 21, 2014 at 2:10 PM, Todd Palino tpal...@gmail.com wrote: As far as the number of partitions a single broker can handle, we've set our cap at 4000 partitions (including replicas). Above that we've seen some performance and stability issues. How many brokers? I'm curious: what kinds of problems would affect a single broker with a large number of partitions, but not affect the entire cluster with even more partitions?
Re: How to produce and consume events in 2 DCs?
I think it doesn't have to be two more clusters. can be just two more topics. MirrorMaker can copy from source topics in both regions into one aggregate topic. On Tue, Oct 21, 2014 at 1:54 AM, Erik van oosten e.vanoos...@grons.nl.invalid wrote: Thanks Neha, Unfortunately, the maintenance overhead of 2 more clusters is not acceptable to us. Would you accept a pull request on mirror maker that would rename topics on the fly? For example by accepting the parameter rename: —rename src1/dest1,src2/dest2 or, extended with RE support: —rename old_(.*)/new_\1 Kind regards, Erik. Op 20 okt. 2014, om 16:43 heeft Neha Narkhede neha.narkh...@gmail.com het volgende geschreven: Another way to set up this kind of mirroring is by deploying 2 clusters in each DC - a local Kafka cluster and an aggregate Kafka cluster. The mirror maker copies data from both the DC's local clusters into the aggregate clusters. So if you want access to a topic with data from both DC's, you subscribe to the aggregate cluster. Thanks, Neha On Mon, Oct 20, 2014 at 7:07 AM, Erik van oosten e.vanoos...@grons.nl.invalid wrote: Hi, We have 2 data centers that produce events. Each DC has to process events from both DCs. I had the following in mind: DC 1 | DC 2 events |events + + + | + + + | | | | | | | v v v | v v v ++ | ++ | Receiver topic | | | Receiver topic | ++ ++ | | mirroring || | | +--+| | | | | | ++ | v vv v ++ | ++ | Consumer topic | | | Consumer topic | ++ | ++ + + + | + + + | | | | | | | v v v | v v v consumers | consumers As each DC has a single Kafka cluster, on each DC the receiver topic and consumer topic needs to be on the same cluster. Unfortunately, mirror maker does not seem to support mirroring to a topic with another name. Is there another tool we could use? Or, is there another approach for producing and consuming from 2 DCs? Kind regards, Erik. — Erik van Oosten http://www.day-to-day-stuff.blogspot.nl/
Re: taking broker down and returning it does not restore cluster state (nor rebalance)
To balance the leaders, you can run the tool in http://kafka.apache.org/documentation.html#basic_ops_leader_balancing In the upcoming 0.8.2 release, we have fixed the auto leader balancing logic. So leaders will be balanced automatically. Thanks, Jun On Tue, Oct 21, 2014 at 12:19 AM, Shlomi Hazan shl...@viber.com wrote: trying to reproduce failed: after somewhat long minutes I noticed that the partition leaders regained balance again, and the only issue left is that the preferred replica was not balanced as it was before taking the broker down. meaning, that the output of the topic description shows broker 1 (out of 3) as preferred replica (first in ISR) in 66% of the cases instead of expected 33%. On Mon, Oct 20, 2014 at 11:36 PM, Joel Koshy jjkosh...@gmail.com wrote: As Neha mentioned, with rep factor 2x, this shouldn't normally cause an issue. Taking the broker down will cause the leader to move to another replica; consumers and producers will rediscover the new leader; no rebalances should be triggered. When you bring the broker back up, unless you run a preferred replica leader re-election the broker will remain a follower. Again, there will be no effect on the producers or consumers (i.e., no rebalances). If you can reproduce this easily, can you please send exact steps to reproduce and send over your consumer logs? Thanks, Joel On Mon, Oct 20, 2014 at 09:13:27PM +0300, Shlomi Hazan wrote: Yes I did. It is set to 2. On Oct 20, 2014 5:38 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Did you ensure that your replication factor was set higher than 1? If so, things should recover automatically after adding the killed broker back into the cluster. On Mon, Oct 20, 2014 at 1:32 AM, Shlomi Hazan shl...@viber.com wrote: Hi, Running some tests on 0811 and wanted to see what happens when a broker is taken down with 'kill'. I bumped into the situation at the subject where launching the broker back left him a bit out of the game as far as I could see using stack driver metrics. Trying to rebalance with verify consumer rebalance return an error no owner for partition for all partitions of that topic (128 partitions). moreover, yet aside from the issue at hand, changing the group name to a non-existent group returned success. taking both the consumers and producers down allowed the rebalance to return success... And the question is: How do you restore 100% state after taking down a broker? what is the best practice? what needs be checked and what needs be done? Shlomi
Re: 0.8.1.2
We are voting an 0.8.2 beta release right now. Thanks, Jun On Tue, Oct 21, 2014 at 11:17 AM, Shlomi Hazan shl...@viber.com wrote: Hi All, Will version 0.8.1.2 happen? Shlomi
Re: How many partition can one single machine handle in Kafka?
Todd, Actually I'm wondering how kafka handle so much partition, with one partition there is at least one file on disk, and with 4000 partition, there will be at least 4000 files. When all these partitions have write request, how did Kafka make the write operation on the disk to be sequential (which is emphasized in the design document of Kafka) and make sure the disk access is effective? Thank you for your reply. xiaobinshe 2014-10-22 5:10 GMT+08:00 Todd Palino tpal...@gmail.com: As far as the number of partitions a single broker can handle, we've set our cap at 4000 partitions (including replicas). Above that we've seen some performance and stability issues. -Todd On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She xiaobin...@gmail.com wrote: hello, everyone I'm new to kafka, I'm wondering what's the max num of partition can one siggle machine handle in Kafka? Is there an sugeest num? Thanks. xiaobinshe
Re: Sizing Cluster
Hi Pete, Yes you are right, both nodes has all of the data. I was just wondering what is the scenario for losing one node, in production it might not fly. If this is for testing only, you are good. Answering your question, I think retention policy (log.retention.hours) is for controlling the disk utilization. I think disk IO (log.flush.* section) and network IO (num.network.threads, etc.) saturation you might want to measure during tests and spec it based on that. Here is a link with examples for the full list of relevant settings, with more description: https://kafka.apache.org/08/ops.html. I guess the most important question is, how many clients do you want to support. You could work out how much space you need based on that, assuming few things. For more complete documentation refer to: https://kafka.apache.org/08/configuration.html Regards, Istvan On Tue, Oct 21, 2014 at 1:22 PM, Pete Wright pwri...@rubiconproject.com wrote: Thanks Istvan - I think I understand what you are say here - although I was under the impression that if I ensured each topic was being replicated N+1 times a two node cluster would ensure each node has a copy of the entire contents of the message bus at any given time. I agree with your assessment though that having 3 nodes is a more durable configuration, but was hoping others could explain how they calculate capacity and scaling issues on their storage subsystems. Cheers, -pete On 10/21/14 11:28, István wrote: One thing that you have to keep in mind is that moving 10T between nodes takes long time. If you have a node failure and you need to rebuild (resync) the data your system is going to be vulnerable against the second node failure. You could mitigate this with using raid. I think generally speaking 3 node clusters are better for production purposes. I. On Tue, Oct 21, 2014 at 11:12 AM, Pete Wright pwri...@rubiconproject.com wrote: Hi There, I have a question regarding sizing disk for kafka brokers. Let's say I have systems capable of providing 10TB of storage, and they act as Kafka brokers. If I were to deploy two of these nodes, and enable replication in Kafka, would I actually have 10TB available for my producers to write to? Is there any overhead I should be concerned with? I guess I am just wanting to make sure that there are not any major pitfalls in deploying a two-node cluster, versus say a 3-node cluster. Any advice or best-practices would be very helpful! Thanks in advance, -pete -- Pete Wright Systems Architect Rubicon Project pwri...@rubiconproject.com 310.309.9298 -- Pete Wright Systems Architect Rubicon Project pwri...@rubiconproject.com 310.309.9298 -- the sun shines for all