[jira] [Created] (KAFKA-2014) Chaos Monkey / Failure Inducer for Kafka
Mayuresh Gharat created KAFKA-2014: -- Summary: Chaos Monkey / Failure Inducer for Kafka Key: KAFKA-2014 URL: https://issues.apache.org/jira/browse/KAFKA-2014 Project: Kafka Issue Type: Task Reporter: Mayuresh Gharat Assignee: Mayuresh Gharat Implement a Chaos Monkey for kafka, that will help us catch any shortcomings in the test environment before going to production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector
[ https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357167#comment-14357167 ] Jiangjie Qin commented on KAFKA-1716: - [~ashwin.jayaprakash] It should work when you call from other thread. From the thread stack trace it looks the consumer fetcher thread blocked in processPartitionData(), Can you provide a full thread dump so we can investigate why this happen? hang during shutdown of ZookeeperConsumerConnector -- Key: KAFKA-1716 URL: https://issues.apache.org/jira/browse/KAFKA-1716 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.1.1 Reporter: Sean Fay Assignee: Neha Narkhede Attachments: kafka-shutdown-stuck.log It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}} to wedge in the case that some consumer fetcher threads receive messages during the shutdown process. Shutdown thread: {code}-- Parking to wait for: java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0 at jrockit/vm/Locks.park0(J)V(Native Method) at jrockit/vm/Locks.park(Locks.java:2230) at sun/misc/Unsafe.park(ZJ)V(Native Method) at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156) at java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207) at kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36) at kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71) at kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121) at kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120) at scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226) at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39) at scala/collection/mutable/HashMap.foreach(HashMap.scala:98) at scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120) ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock] at kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148) at kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171) at kafka/consumer/ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:167){code} ConsumerFetcherThread: {code}-- Parking to wait for: java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x2aaaebcc7568 at jrockit/vm/Locks.park0(J)V(Native Method) at jrockit/vm/Locks.park(Locks.java:2230) at sun/misc/Unsafe.park(ZJ)V(Native Method) at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156) at java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java/util/concurrent/LinkedBlockingQueue.put(LinkedBlockingQueue.java:306) at kafka/consumer/PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60) at kafka/consumer/ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49) at kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:130) at kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:111) at scala/collection/immutable/HashMap$HashMap1.foreach(HashMap.scala:224) at scala/collection/immutable/HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:111) at kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:111) at kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:111) at kafka/utils/Utils$.inLock(Utils.scala:538) at
[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false
[ https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357509#comment-14357509 ] Jiangjie Qin commented on KAFKA-2011: - Yes, 6000 ms sounds too short. Can you try bumping it up to 60? 200MB/s sounds OK for 5 brokers. Rebalance with auto.leader.rebalance.enable=false -- Key: KAFKA-2011 URL: https://issues.apache.org/jira/browse/KAFKA-2011 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2.0 Environment: 5 Hosts of below config: x86_64 32-bit, 64-bit Little Endian 24 GenuineIntel CPUs Model 44 1600.000MHz RAM 189 GB GNU/Linux Reporter: K Zakee Priority: Blocker Attachments: controller-logs-1.zip, controller-logs-2.zip Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as below: auto.leader.rebalance.enable=false controlled.shutdown.enable=true controlled.shutdown.max.retries=1 controlled.shutdown.retry.backoff.ms=5000 default.replication.factor=3 log.cleaner.enable=true log.cleaner.threads=5 log.cleanup.policy=delete log.flush.scheduler.interval.ms=3000 log.retention.minutes=1440 log.segment.bytes=1073741824 message.max.bytes=100 num.io.threads=14 num.network.threads=14 num.partitions=10 queued.max.requests=500 num.replica.fetchers=4 replica.fetch.max.bytes=1048576 replica.fetch.min.bytes=51200 replica.lag.max.messages=5000 replica.lag.time.max.ms=3 replica.fetch.wait.max.ms=1000 fetch.purgatory.purge.interval.requests=5000 producer.purgatory.purge.interval.requests=5000 delete.topic.enable=true Logs show rebalance happening well up to 24 hours after the start. [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election: (kafka.controller.KafkaController) … [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions (kafka.controller.KafkaController) ... [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election: (kafka.controller.KafkaController) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31706: Patch for KAFKA-1997
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/#review76121 --- LGTM overall except one potential issue on consumer metrics colliding. core/src/main/scala/kafka/tools/MirrorMaker.scala https://reviews.apache.org/r/31706/#comment123549 Should we add the index as the suffix to the consumer id in the consumerConfig to distinguish different connector instances? Otherwise the consumer metrics would be collided. core/src/main/scala/kafka/tools/MirrorMaker.scala https://reviews.apache.org/r/31706/#comment123551 Add a comment like Creating just on stream per each connector instance core/src/main/scala/kafka/tools/MirrorMaker.scala https://reviews.apache.org/r/31706/#comment123553 Leave a TODO comment that after KAFKA-1660 this will be changed accordingly. core/src/main/scala/kafka/tools/MirrorMaker.scala https://reviews.apache.org/r/31706/#comment123554 Why not just import java.util.List? - Guozhang Wang On March 11, 2015, 1:31 a.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/ --- (Updated March 11, 2015, 1:31 a.m.) Review request for kafka. Bugs: KAFKA-1997 https://issues.apache.org/jira/browse/KAFKA-1997 Repository: kafka Description --- Addressed Guozhang's comments. Changed the exit behavior on send failure because close(0) is not ready yet. Will submit followup patch after KAFKA-1660 is checked in. Expanded imports from _ and * to full class path Incorporated Joel's comments. Addressed Joel's comments. Addressed Guozhang's comments. Diffs - clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f core/src/main/scala/kafka/consumer/PartitionAssignor.scala e6ff7683a0df4a7d221e949767e57c34703d5aad core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 5487259751ebe19f137948249aa1fd2637d2deb4 core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 7f45a90ba6676290172b7da54c15ee5dc1a42a2e core/src/main/scala/kafka/tools/MirrorMaker.scala bafa379ff57bc46458ea8409406f5046dc9c973e core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 543070f4fd3e96f3183cae9ee2ccbe843409ee58 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 19640cc55b5baa0a26a808d708b7f4caf491c9f0 Diff: https://reviews.apache.org/r/31706/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357640#comment-14357640 ] Guozhang Wang commented on KAFKA-1501: -- Thanks for the patch [~ewencp]. I have started to look at your patch but want to get some clarifications: TestUtils.choosePorts should return a random port number each time it gets called, but then since the socket is closed, the next time it gets called, the same port number could be returned before it gets used, hence causing the conflict? transient unit tests failures due to port already in use Key: KAFKA-1501 URL: https://issues.apache.org/jira/browse/KAFKA-1501 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Assignee: Guozhang Wang Labels: newbie Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501_2015-03-09_11:41:07.patch, test-100.out, test-100.out, test-27.out, test-29.out, test-32.out, test-35.out, test-38.out, test-4.out, test-42.out, test-45.out, test-46.out, test-51.out, test-55.out, test-58.out, test-59.out, test-60.out, test-69.out, test-72.out, test-74.out, test-76.out, test-84.out, test-87.out, test-91.out, test-92.out Saw the following transient failures. kafka.api.ProducerFailureHandlingTest testTooLargeRecordWithAckOne FAILED kafka.common.KafkaException: Socket server failed to bind to localhost:59909: Address already in use. at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) at kafka.network.Acceptor.init(SocketServer.scala:141) at kafka.network.SocketServer.startup(SocketServer.scala:68) at kafka.server.KafkaServer.startup(KafkaServer.scala:95) at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) at kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357660#comment-14357660 ] Sriharsha Chintalapani commented on KAFKA-1684: --- Created reviewboard https://reviews.apache.org/r/31958/diff/ against branch origin/trunk Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Ivan Lyutov Attachments: KAFKA-1684.patch, KAFKA-1684.patch Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer
[ https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357529#comment-14357529 ] Guozhang Wang commented on KAFKA-1910: -- [~junrao] Could you take a look at the patch so that I can check-in the fix if it LGTY? Refactor KafkaConsumer -- Key: KAFKA-1910 URL: https://issues.apache.org/jira/browse/KAFKA-1910 Project: Kafka Issue Type: Sub-task Components: consumer Reporter: Guozhang Wang Assignee: Guozhang Wang Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch KafkaConsumer now contains all the logic on the consumer side, making it a very huge class file, better re-factoring it to have multiple layers on top of KafkaClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings
On March 11, 2015, 8:39 p.m., Jiangjie Qin wrote: Works for me but still see the following line: there were 12 feature warning(s); re-run with -feature for details I tried ./gradlew jar -feature, but it seems does not work at all. If this is the related issue, can we solve it in this patch? Otherwise we can create another ticket to address it. Thanks for looking at this Jiangjie, In order for feature warnings to display, you have to add scalaCompileOptions.additionalParameters = [-feature] below line 131 in build.gradle at the root of the project. I posted the verbose output from enabling the flag on the ticket: https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14356280page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14356280. Bottom line is: I don't think we can fix these feature warnings until Kafka stops supporting scala 2.9. I get these build errors when trying to build a 2.9 jar with the language imports brought in: /Users/blake/src/kafka/core/src/main/scala/kafka/javaapi/Implicits.scala:19: object language is not a member of package scala import scala.language.implicitConversions * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. BUILD FAILED Does that make sense? - Blake --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31925/#review76122 --- On March 11, 2015, 4:35 a.m., Blake Smith wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31925/ --- (Updated March 11, 2015, 4:35 a.m.) Review request for kafka. Bugs: KAFKA-1054 https://issues.apache.org/jira/browse/KAFKA-1054 Repository: kafka Description --- Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) up to date with the latest trunk and fixed 2 more lingering compiler warnings. Diffs - core/src/main/scala/kafka/admin/AdminUtils.scala b700110 core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc core/src/main/scala/kafka/server/KafkaServer.scala dddef93 core/src/main/scala/kafka/utils/Utils.scala 738c1af core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 Diff: https://reviews.apache.org/r/31925/diff/ Testing --- Ran full test suite. Thanks, Blake Smith
Re: Can I be added as a contributor?
Grant, I added your perms for Confluence. Grayson, I couldn't find a confluence account for you so couldn't give you perms. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Tue, Mar 10, 2015 at 8:20 AM, Grant Henke ghe...@cloudera.com wrote: Thanks Joe. I added a Confluence account. On Tue, Mar 10, 2015 at 12:04 AM, Joe Stein joe.st...@stealth.ly wrote: Grant, I added you. Grayson and Grant, you should both also please setup Confluence accounts and we can grant perms to Confluence also too for your username. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Tue, Mar 10, 2015 at 12:54 AM, Grant Henke ghe...@cloudera.com wrote: I am also starting to work with the Kafka codebase with plans to contribute more significantly in the near future. Could I also be added to the contributor list so that I can assign myself tickets? Thank you, Grant On Mon, Mar 9, 2015 at 1:39 PM, Guozhang Wang wangg...@gmail.com wrote: Added grayson.c...@gmail.com to the list. On Mon, Mar 9, 2015 at 10:41 AM, Grayson Chao gc...@linkedin.com.invalid wrote: Hello Kafka devs, I'm working on the ops side of Kafka at LinkedIn (embedded SRE on the Kafka team) and would like to start familiarizing myself with the codebase with a view to eventually making substantial contributions. Could you please add me as a contributor to the Kafka JIRA so that I can assign myself a newbie ticket? Thanks! Grayson -- Grayson Chao Data Infra Streaming SRE -- -- Guozhang -- Grant Henke Solutions Consultant | Cloudera ghe...@cloudera.com | 920-980-8979 twitter.com/ghenke http://twitter.com/gchenke | linkedin.com/in/granthenke -- Grant Henke Solutions Consultant | Cloudera ghe...@cloudera.com | 920-980-8979 twitter.com/ghenke http://twitter.com/gchenke | linkedin.com/in/granthenke
Re: Can I be added as a contributor?
Hi, Sorry to pile on :) but could I be added as a contributor and to confluence as well? I am brocknoland on JIRA and brockn at gmail on confluence. Cheers! Brock On Wed, Mar 11, 2015 at 1:44 PM, Joe Stein joe.st...@stealth.ly wrote: Grant, I added your perms for Confluence. Grayson, I couldn't find a confluence account for you so couldn't give you perms. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Tue, Mar 10, 2015 at 8:20 AM, Grant Henke ghe...@cloudera.com wrote: Thanks Joe. I added a Confluence account. On Tue, Mar 10, 2015 at 12:04 AM, Joe Stein joe.st...@stealth.ly wrote: Grant, I added you. Grayson and Grant, you should both also please setup Confluence accounts and we can grant perms to Confluence also too for your username. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Tue, Mar 10, 2015 at 12:54 AM, Grant Henke ghe...@cloudera.com wrote: I am also starting to work with the Kafka codebase with plans to contribute more significantly in the near future. Could I also be added to the contributor list so that I can assign myself tickets? Thank you, Grant On Mon, Mar 9, 2015 at 1:39 PM, Guozhang Wang wangg...@gmail.com wrote: Added grayson.c...@gmail.com to the list. On Mon, Mar 9, 2015 at 10:41 AM, Grayson Chao gc...@linkedin.com.invalid wrote: Hello Kafka devs, I'm working on the ops side of Kafka at LinkedIn (embedded SRE on the Kafka team) and would like to start familiarizing myself with the codebase with a view to eventually making substantial contributions. Could you please add me as a contributor to the Kafka JIRA so that I can assign myself a newbie ticket? Thanks! Grayson -- Grayson Chao Data Infra Streaming SRE -- -- Guozhang -- Grant Henke Solutions Consultant | Cloudera ghe...@cloudera.com | 920-980-8979 twitter.com/ghenke http://twitter.com/gchenke | linkedin.com/in/granthenke -- Grant Henke Solutions Consultant | Cloudera ghe...@cloudera.com | 920-980-8979 twitter.com/ghenke http://twitter.com/gchenke | linkedin.com/in/granthenke
Re: Review Request 31927: Patch for KAFKA-1461
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31927/#review76130 --- Yes, we need to set the backoff for the consumerFetcherThread as well. We can just use refreshLeaderBackoffMs. core/src/main/scala/kafka/server/AbstractFetcherThread.scala https://reviews.apache.org/r/31927/#comment123577 Instead of adding the new backOffWaitLatch, we can probably just wait on the existing partitionMapCond. This way, if there is a change in partitions, the thread can wake up immediately. We probably can improve shutdown() a bit to wake up the thread waiting on the partitionMapCond. To do that, we can do the following steps in shutdown(). initiateShutdown() partitionMapCond.signalAll() awaitShutdown() simpleConsumer.close() - Jun Rao On March 11, 2015, 5:41 p.m., Sriharsha Chintalapani wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31927/ --- (Updated March 11, 2015, 5:41 p.m.) Review request for kafka. Bugs: KAFKA-1461 https://issues.apache.org/jira/browse/KAFKA-1461 Repository: kafka Description --- KAFKA-1461. Replica fetcher thread does not implement any back-off behavior. Diffs - core/src/main/scala/kafka/server/AbstractFetcherThread.scala 8c281d4668f92eff95a4a5df3c03c4b5b20e7095 core/src/main/scala/kafka/server/KafkaConfig.scala 48e33626695ad8a28b0018362ac225f11df94973 core/src/main/scala/kafka/server/ReplicaFetcherThread.scala d6d14fbd167fb8b085729cda5158898b1a3ee314 core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala c124c8df5b5079e5ffbd0c4ea359562a66aaf317 core/src/test/scala/unit/kafka/utils/TestUtils.scala 52c79201af7c60f9b44a0aaa09cdf968d89a7b87 Diff: https://reviews.apache.org/r/31927/diff/ Testing --- Thanks, Sriharsha Chintalapani
Review Request 31958: Patch for KAFKA-1684
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31958/ --- Review request for kafka. Bugs: KAFKA-1684 https://issues.apache.org/jira/browse/KAFKA-1684 Repository: kafka Description --- KAFKA-1684. Implement TLS/SSL authentication. Diffs - core/src/main/scala/kafka/network/Channel.scala PRE-CREATION core/src/main/scala/kafka/network/SocketServer.scala 76ce41aed6e04ac5ba88395c4d5008aca17f9a73 core/src/main/scala/kafka/network/ssl/SSLChannel.scala PRE-CREATION core/src/main/scala/kafka/network/ssl/SSLConnectionConfig.scala PRE-CREATION core/src/main/scala/kafka/server/KafkaConfig.scala 48e33626695ad8a28b0018362ac225f11df94973 core/src/main/scala/kafka/server/KafkaServer.scala dddef938fabae157ed8644536eb1a2f329fb42b7 core/src/main/scala/kafka/utils/SSLAuthUtils.scala PRE-CREATION core/src/test/scala/unit/kafka/network/SocketServerTest.scala 0af23abf146d99e3d6cf31e5d6b95a9e63318ddb core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala c124c8df5b5079e5ffbd0c4ea359562a66aaf317 core/src/test/scala/unit/kafka/utils/TestSSLUtils.scala PRE-CREATION Diff: https://reviews.apache.org/r/31958/diff/ Testing --- Thanks, Sriharsha Chintalapani
Re: Review Request 31706: Patch for KAFKA-1997
On March 11, 2015, 8:39 p.m., Guozhang Wang wrote: core/src/main/scala/kafka/tools/MirrorMaker.scala, line 195 https://reviews.apache.org/r/31706/diff/6-7/?file=890015#file890015line195 Should we add the index as the suffix to the consumer id in the consumerConfig to distinguish different connector instances? Otherwise the consumer metrics would be collided. Good point! On March 11, 2015, 8:39 p.m., Guozhang Wang wrote: core/src/main/scala/kafka/tools/MirrorMaker.scala, lines 477-481 https://reviews.apache.org/r/31706/diff/6-7/?file=890015#file890015line477 Why not just import java.util.List? Because List is a native type for scala, even after we imported java.util.List, we still need util.List to avoid collision. - Jiangjie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/#review76121 --- On March 11, 2015, 1:31 a.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/ --- (Updated March 11, 2015, 1:31 a.m.) Review request for kafka. Bugs: KAFKA-1997 https://issues.apache.org/jira/browse/KAFKA-1997 Repository: kafka Description --- Addressed Guozhang's comments. Changed the exit behavior on send failure because close(0) is not ready yet. Will submit followup patch after KAFKA-1660 is checked in. Expanded imports from _ and * to full class path Incorporated Joel's comments. Addressed Joel's comments. Addressed Guozhang's comments. Diffs - clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f core/src/main/scala/kafka/consumer/PartitionAssignor.scala e6ff7683a0df4a7d221e949767e57c34703d5aad core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 5487259751ebe19f137948249aa1fd2637d2deb4 core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 7f45a90ba6676290172b7da54c15ee5dc1a42a2e core/src/main/scala/kafka/tools/MirrorMaker.scala bafa379ff57bc46458ea8409406f5046dc9c973e core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 543070f4fd3e96f3183cae9ee2ccbe843409ee58 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 19640cc55b5baa0a26a808d708b7f4caf491c9f0 Diff: https://reviews.apache.org/r/31706/diff/ Testing --- Thanks, Jiangjie Qin
Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31925/#review76122 --- Works for me but still see the following line: there were 12 feature warning(s); re-run with -feature for details I tried ./gradlew jar -feature, but it seems does not work at all. If this is the related issue, can we solve it in this patch? Otherwise we can create another ticket to address it. - Jiangjie Qin On March 11, 2015, 4:35 a.m., Blake Smith wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31925/ --- (Updated March 11, 2015, 4:35 a.m.) Review request for kafka. Bugs: KAFKA-1054 https://issues.apache.org/jira/browse/KAFKA-1054 Repository: kafka Description --- Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) up to date with the latest trunk and fixed 2 more lingering compiler warnings. Diffs - core/src/main/scala/kafka/admin/AdminUtils.scala b700110 core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc core/src/main/scala/kafka/server/KafkaServer.scala dddef93 core/src/main/scala/kafka/utils/Utils.scala 738c1af core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 Diff: https://reviews.apache.org/r/31925/diff/ Testing --- Ran full test suite. Thanks, Blake Smith
[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357673#comment-14357673 ] Ewen Cheslack-Postava commented on KAFKA-1501: -- [~guozhang] Yes, that's correct, so the only way to completely avoid the problem is to allow the kernel to assign the port automatically. I haven't checked, but I also wouldn't be surprised if the kernel actually saves recently freed ports to use for a fast path, which could explain why this happens more frequently than you might think it would given the fairly large range used for ephemeral ports. transient unit tests failures due to port already in use Key: KAFKA-1501 URL: https://issues.apache.org/jira/browse/KAFKA-1501 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Assignee: Guozhang Wang Labels: newbie Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501_2015-03-09_11:41:07.patch, test-100.out, test-100.out, test-27.out, test-29.out, test-32.out, test-35.out, test-38.out, test-4.out, test-42.out, test-45.out, test-46.out, test-51.out, test-55.out, test-58.out, test-59.out, test-60.out, test-69.out, test-72.out, test-74.out, test-76.out, test-84.out, test-87.out, test-91.out, test-92.out Saw the following transient failures. kafka.api.ProducerFailureHandlingTest testTooLargeRecordWithAckOne FAILED kafka.common.KafkaException: Socket server failed to bind to localhost:59909: Address already in use. at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) at kafka.network.Acceptor.init(SocketServer.scala:141) at kafka.network.SocketServer.startup(SocketServer.scala:68) at kafka.server.KafkaServer.startup(KafkaServer.scala:95) at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) at kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1684: -- Attachment: KAFKA-1684.patch Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Ivan Lyutov Attachments: KAFKA-1684.patch, KAFKA-1684.patch Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357668#comment-14357668 ] Jun Rao commented on KAFKA-1461: [~sriharsha] and [~guozhang], thinking about this a bit more. There are really two types of states that we need to manage in AbstractFetcherThread. The first one is the connection state, i.e., if a connection breaks, we want to backoff the reconnection. The second one is the partition state, i.e., if the partition hits an exception, we want to backoff that particular partition a bit. The first one is what [~sriharsha]'s current RB is addressing. How about let's complete this first since it affects the performance of the unit tests? Once that's committed, we can address the second one, which is in [~sriharsha]'s initial patch. Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2006) switch the broker server over to the new java protocol definitions
[ https://issues.apache.org/jira/browse/KAFKA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-2006: - Priority: Major (was: Blocker) switch the broker server over to the new java protocol definitions -- Key: KAFKA-2006 URL: https://issues.apache.org/jira/browse/KAFKA-2006 Project: Kafka Issue Type: Bug Reporter: Joe Stein Assignee: Andrii Biletskyi Fix For: 0.8.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false
[ https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357643#comment-14357643 ] K Zakee commented on KAFKA-2011: Did you mean 600 secs (10 mins)? Rebalance with auto.leader.rebalance.enable=false -- Key: KAFKA-2011 URL: https://issues.apache.org/jira/browse/KAFKA-2011 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2.0 Environment: 5 Hosts of below config: x86_64 32-bit, 64-bit Little Endian 24 GenuineIntel CPUs Model 44 1600.000MHz RAM 189 GB GNU/Linux Reporter: K Zakee Priority: Blocker Attachments: controller-logs-1.zip, controller-logs-2.zip Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as below: auto.leader.rebalance.enable=false controlled.shutdown.enable=true controlled.shutdown.max.retries=1 controlled.shutdown.retry.backoff.ms=5000 default.replication.factor=3 log.cleaner.enable=true log.cleaner.threads=5 log.cleanup.policy=delete log.flush.scheduler.interval.ms=3000 log.retention.minutes=1440 log.segment.bytes=1073741824 message.max.bytes=100 num.io.threads=14 num.network.threads=14 num.partitions=10 queued.max.requests=500 num.replica.fetchers=4 replica.fetch.max.bytes=1048576 replica.fetch.min.bytes=51200 replica.lag.max.messages=5000 replica.lag.time.max.ms=3 replica.fetch.wait.max.ms=1000 fetch.purgatory.purge.interval.requests=5000 producer.purgatory.purge.interval.requests=5000 delete.topic.enable=true Logs show rebalance happening well up to 24 hours after the start. [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election: (kafka.controller.KafkaController) … [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions (kafka.controller.KafkaController) ... [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election: (kafka.controller.KafkaController) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2003) Add upgrade tests
[ https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357325#comment-14357325 ] Abhishek Nigam commented on KAFKA-2003: --- Hi Gwen, I did not realize might patch got uploaded to RB but the link was not attached to the jira. I just added it in the comments section of 1888. https://reviews.apache.org/r/30809/ Add upgrade tests - Key: KAFKA-2003 URL: https://issues.apache.org/jira/browse/KAFKA-2003 Project: Kafka Issue Type: Improvement Reporter: Gwen Shapira Assignee: Ashish K Singh To test protocol changes, compatibility and upgrade process, we need a good way to test different versions of the product together and to test end-to-end upgrade process. For example, for 0.8.2 to 0.8.3 test we want to check: * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers? * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time? * Can 0.8.2 clients run against a cluster of 0.8.3 brokers? There are probably more questions. But an automated framework that can test those and report results will be a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357263#comment-14357263 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31927/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31927: Patch for KAFKA-1461
On March 11, 2015, 5:06 p.m., Jun Rao wrote: core/src/main/scala/kafka/server/KafkaConfig.scala, line 555 https://reviews.apache.org/r/31927/diff/1/?file=891131#file891131line555 The rest of the timeouts are of Int. It may be useful to move everthing to long in the future, but that can be done in a separate jira. Thanks for the review. Please check the latest patch. Do we want to add another config for ConsumerFetcherThread with this patch it doesn't pass any timeout value to AbstractFetcherThread. - Sriharsha --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31927/#review76075 --- On March 11, 2015, 5:41 p.m., Sriharsha Chintalapani wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31927/ --- (Updated March 11, 2015, 5:41 p.m.) Review request for kafka. Bugs: KAFKA-1461 https://issues.apache.org/jira/browse/KAFKA-1461 Repository: kafka Description --- KAFKA-1461. Replica fetcher thread does not implement any back-off behavior. Diffs - core/src/main/scala/kafka/server/AbstractFetcherThread.scala 8c281d4668f92eff95a4a5df3c03c4b5b20e7095 core/src/main/scala/kafka/server/KafkaConfig.scala 48e33626695ad8a28b0018362ac225f11df94973 core/src/main/scala/kafka/server/ReplicaFetcherThread.scala d6d14fbd167fb8b085729cda5158898b1a3ee314 core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala c124c8df5b5079e5ffbd0c4ea359562a66aaf317 core/src/test/scala/unit/kafka/utils/TestUtils.scala 52c79201af7c60f9b44a0aaa09cdf968d89a7b87 Diff: https://reviews.apache.org/r/31927/diff/ Testing --- Thanks, Sriharsha Chintalapani
[jira] [Updated] (KAFKA-2006) switch the broker server over to the new java protocol definitions
[ https://issues.apache.org/jira/browse/KAFKA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-2006: - Assignee: (was: Andrii Biletskyi) switch the broker server over to the new java protocol definitions -- Key: KAFKA-2006 URL: https://issues.apache.org/jira/browse/KAFKA-2006 Project: Kafka Issue Type: Bug Reporter: Joe Stein Fix For: 0.8.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false
[ https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357426#comment-14357426 ] K Zakee commented on KAFKA-2011: 1) The zk timeout I see occurring when the controller migration happened. Current ZK timeout setting on brokers is default value of 6000ms. Is increasing the timeout recommended? 2) As for the data loads, we did have the high producer loads unto 200MB/s for a stretch of hours, and reducing gradually to 150 to 130. But given the 1GBps NIC on each of five brokers and 3 zookeepers, do you think this data loads would be a problem? 3) Do you think changing any configuration setting provided above may help ? Rebalance with auto.leader.rebalance.enable=false -- Key: KAFKA-2011 URL: https://issues.apache.org/jira/browse/KAFKA-2011 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2.0 Environment: 5 Hosts of below config: x86_64 32-bit, 64-bit Little Endian 24 GenuineIntel CPUs Model 44 1600.000MHz RAM 189 GB GNU/Linux Reporter: K Zakee Priority: Blocker Attachments: controller-logs-1.zip, controller-logs-2.zip Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as below: auto.leader.rebalance.enable=false controlled.shutdown.enable=true controlled.shutdown.max.retries=1 controlled.shutdown.retry.backoff.ms=5000 default.replication.factor=3 log.cleaner.enable=true log.cleaner.threads=5 log.cleanup.policy=delete log.flush.scheduler.interval.ms=3000 log.retention.minutes=1440 log.segment.bytes=1073741824 message.max.bytes=100 num.io.threads=14 num.network.threads=14 num.partitions=10 queued.max.requests=500 num.replica.fetchers=4 replica.fetch.max.bytes=1048576 replica.fetch.min.bytes=51200 replica.lag.max.messages=5000 replica.lag.time.max.ms=3 replica.fetch.wait.max.ms=1000 fetch.purgatory.purge.interval.requests=5000 producer.purgatory.purge.interval.requests=5000 delete.topic.enable=true Logs show rebalance happening well up to 24 hours after the start. [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election: (kafka.controller.KafkaController) … [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions (kafka.controller.KafkaController) ... [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election: (kafka.controller.KafkaController) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing
Sorry for not catching up on this thread earlier, I wanted to-do this before the KIP got its updates so we could discuss if need be and not waste more time re-writing/working things that folks have issues with or such. I captured all the comments so far here with responses. So fair assignment by count (taking into account the current partition count of each broker) is very good. However, it's worth noting that all partitions are not created equal. We have actually been performing more rebalance work based on the partition size on disk, as given equal retention of all topics, the size on disk is a better indicator of the amount of traffic a partition gets, both in terms of storage and network traffic. Overall, this seems to be a better balance. Agreed though this is out of scope (imho) for what the motivations for the KIP were. The motivations section is blank (that is on me) but honestly it is because we did all the development, went back and forth with Neha on the testing and then had to back it all into the KIP process... Its a time/resource/scheduling and hope to update this soon on the KIP ... all of this is in the JIRA and code patch so its not like it is not there just not in the place maybe were folks are looking since we changed where folks should look. Initial cut at Motivations: the --generate is not used by a lot of folks because they don't trust it. Issues such as giving different results sometimes when you run it. Also other feedback from the community that it does not account for specific uses cases like adding new brokers and removing brokers (which is where that patch started https://issues.apache.org/jira/browse/KAFKA-1678 but then we changed it after review into just --rebalance https://issues.apache.org/jira/browse/KAFKA-1792). The use case for add and remove brokers is one that happens in AWS and auto scailing. There are other reasons for this too of course. The goal originally was to make what folks are already coding today (with the output of available in the project for the community. Based on the discussion in the JIRA with Neha we all agreed that making it be a faire rebalance would fulfill both uses cases. In addition to this, I think there is very much a need to have Kafka be rack-aware. That is, to be able to assure that for a given cluster, you never assign all replicas for a given partition in the same rack. This would allow us to guard against maintenances or power failures that affect a full rack of systems (or a given switch). Agreed, this though I think is out of scope for this change and something we can also do in the future. There is more that we have to figure out for rack aware specifically answering how do we know what rack the broker is on. I really really (really) worry that we keep trying to put too much into a single change the discussions go into rabbit holes and good important features (that are community driven) that could get out there will get bogged down with different uses cases and scope creep. So, I think rack awareness is its own KIP that has two parts... setting broker rack and rebalancing for that. That features doesn't invalidate the need for --rebalance but can be built on top of it. I think it would make sense to implement the reassignment logic as a pluggable component. That way it would be easy to select a scheme when performing a reassignment (count, size, rack aware). Configuring a default scheme for a cluster would allow for the brokers to create new topics and partitions in compliance with the requested policy. I don't agree with this because right now you get back the current state of the partitions so you can (today) write whatever logic you want (with the information that is there). With --rebalance you also get that back so moving forward. Moving forward we can maybe expose more information so that folks can write different logic they want (like partition number, location (label string for rack), size, throughput average, etc, etc, etc... but again... that to me is a different KIP entirely from the motivations of this one. If eventually we want to make it plugable then we should have a KIP and discussion around it I just don't see how it relates to the original motivations of the change. Is it possible to describe the proposed partition reassignment algorithm in more detail on the KIP? In fact, it would be really easy to understand if we had some concrete examples comparing partition assignment with the old algorithm and the new. sure, it is in the JIRA linked to the KIP too though https://issues.apache.org/jira/browse/KAFKA-1792 and documented in comments in the patch also as requested. Let me know if this is what you are looking for and we can simply update the KIP with this information or give more detail specifically what you think might be missing please. Would we want to support some kind of automated reassignment of existing partitions (personally - no. I want to trigger that manually because it is a very disk and
Re: Review Request 31830: Patch for KAFKA-2009
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31830/ --- (Updated March 11, 2015, 6:26 p.m.) Review request for kafka. Bugs: KAFKA-2009 https://issues.apache.org/jira/browse/KAFKA-2009 Repository: kafka Description (updated) --- Because the send callback could be fired in producer.send() as well, so unacked offset needs to be added to unacked offsets list before call producer.send() Diffs (updated) - core/src/main/scala/kafka/tools/MirrorMaker.scala bafa379ff57bc46458ea8409406f5046dc9c973e Diff: https://reviews.apache.org/r/31830/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Commented] (KAFKA-2009) Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker
[ https://issues.apache.org/jira/browse/KAFKA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357351#comment-14357351 ] Jiangjie Qin commented on KAFKA-2009: - Just submitted another small fix patch. I just realized that I still need to add unacked offset into unacked offsets list before calling producer.send(), because the callback can also be fired in producer.send(). The reason I added the unacked offset after calling producer.send() was that I'm worrying that if something wrong in producer.send() occurs, the offset would never be removed, but since we exiting on any exception in producer thread, it might not cause further problem as the entire mirror maker will exit. Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker Key: KAFKA-2009 URL: https://issues.apache.org/jira/browse/KAFKA-2009 Project: Kafka Issue Type: Bug Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-2009.patch, KAFKA-2009_2015-03-11_11:26:57.patch This ticket is to fix the mirror maker problem on current trunk which is introduced in KAFKA-1650. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31927: Patch for KAFKA-1461
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31927/ --- (Updated March 11, 2015, 5:41 p.m.) Review request for kafka. Bugs: KAFKA-1461 https://issues.apache.org/jira/browse/KAFKA-1461 Repository: kafka Description --- KAFKA-1461. Replica fetcher thread does not implement any back-off behavior. Diffs (updated) - core/src/main/scala/kafka/server/AbstractFetcherThread.scala 8c281d4668f92eff95a4a5df3c03c4b5b20e7095 core/src/main/scala/kafka/server/KafkaConfig.scala 48e33626695ad8a28b0018362ac225f11df94973 core/src/main/scala/kafka/server/ReplicaFetcherThread.scala d6d14fbd167fb8b085729cda5158898b1a3ee314 core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala c124c8df5b5079e5ffbd0c4ea359562a66aaf317 core/src/test/scala/unit/kafka/utils/TestUtils.scala 52c79201af7c60f9b44a0aaa09cdf968d89a7b87 Diff: https://reviews.apache.org/r/31927/diff/ Testing --- Thanks, Sriharsha Chintalapani
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-11_10:41:26.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer
[ https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357366#comment-14357366 ] Guozhang Wang commented on KAFKA-1910: -- Got some problems with RB, uploading the patch here for a quick review: {code} diff --git a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java index e972efb..436f9b2 100644 --- a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java +++ b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java @@ -129,7 +129,7 @@ public final class Coordinator { // process the response JoinGroupResponse response = new JoinGroupResponse(resp.responseBody()); -// TODO: needs to handle disconnects and errors +// TODO: needs to handle disconnects and errors, should not just throw exceptions Errors.forCode(response.errorCode()).maybeThrow(); this.consumerId = response.consumerId(); diff --git a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java index 27c78b8..8b71fba 100644 --- a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java +++ b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java @@ -231,11 +231,12 @@ public class FetcherK, V { log.debug(Fetched offset {} for partition {}, offset, topicPartition); return offset; } else if (errorCode == Errors.NOT_LEADER_FOR_PARTITION.code() -|| errorCode == Errors.LEADER_NOT_AVAILABLE.code()) { +|| errorCode == Errors.UNKNOWN_TOPIC_OR_PARTITION.code()) { log.warn(Attempt to fetch offsets for partition {} failed due to obsolete leadership information, retrying., topicPartition); awaitMetadataUpdate(); } else { +// TODO: we should not just throw exceptions but should handle and log it. Errors.forCode(errorCode).maybeThrow(); } } diff --git a/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java b/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java index af704f3..f706086 100644 --- a/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java +++ b/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java @@ -45,7 +45,9 @@ public class ListOffsetResponse extends AbstractRequestResponse { /** * Possible error code: * - * TODO + * UNKNOWN_TOPIC_OR_PARTITION (3) + * NOT_LEADER_FOR_PARTITION (6) + * UNKNOWN (-1) */ private static final String OFFSETS_KEY_NAME = offsets; diff --git a/core/src/test/scala/integration/kafka/api/ConsumerTest.scala b/core/src/test/scala/integration/kafka/api/ConsumerTest.scala index fed37e3..8eae1ab 100644 --- a/core/src/test/scala/integration/kafka/api/ConsumerTest.scala +++ b/core/src/test/scala/integration/kafka/api/ConsumerTest.scala @@ -260,8 +260,10 @@ class ConsumerTest extends IntegrationTestHarness with Logging { var iter: Int = 0 override def doWork(): Unit = { - killRandomBroker() + info(Killed broker %d.format(killRandomBroker())) + Thread.sleep(500) restartDeadBrokers() + info(Restarted all brokers) iter += 1 if (iter == numIters) {code} Refactor KafkaConsumer -- Key: KAFKA-1910 URL: https://issues.apache.org/jira/browse/KAFKA-1910 Project: Kafka Issue Type: Sub-task Components: consumer Reporter: Guozhang Wang Assignee: Guozhang Wang Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch KafkaConsumer now contains all the logic on the consumer side, making it a very huge class file, better re-factoring it to have multiple layers on top of KafkaClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2009) Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker
[ https://issues.apache.org/jira/browse/KAFKA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin updated KAFKA-2009: Attachment: KAFKA-2009_2015-03-11_11:26:57.patch Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker Key: KAFKA-2009 URL: https://issues.apache.org/jira/browse/KAFKA-2009 Project: Kafka Issue Type: Bug Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-2009.patch, KAFKA-2009_2015-03-11_11:26:57.patch This ticket is to fix the mirror maker problem on current trunk which is introduced in KAFKA-1650. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31830: Patch for KAFKA-2009
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31830/#review76145 --- Ship it! Ship It! - Onur Karaman On March 11, 2015, 6:26 p.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31830/ --- (Updated March 11, 2015, 6:26 p.m.) Review request for kafka. Bugs: KAFKA-2009 https://issues.apache.org/jira/browse/KAFKA-2009 Repository: kafka Description --- Because the send callback could be fired in producer.send() as well, so unacked offset needs to be added to unacked offsets list before call producer.send() Diffs - core/src/main/scala/kafka/tools/MirrorMaker.scala bafa379ff57bc46458ea8409406f5046dc9c973e Diff: https://reviews.apache.org/r/31830/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer
[ https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357725#comment-14357725 ] Jun Rao commented on KAFKA-1910: Are the changes in ConsumerTest needed? The extra logging and sleep seem to be just for debugging. Other than that. +1. Ran the tests locally 20 times and they all pass. Refactor KafkaConsumer -- Key: KAFKA-1910 URL: https://issues.apache.org/jira/browse/KAFKA-1910 Project: Kafka Issue Type: Sub-task Components: consumer Reporter: Guozhang Wang Assignee: Guozhang Wang Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch KafkaConsumer now contains all the logic on the consumer side, making it a very huge class file, better re-factoring it to have multiple layers on top of KafkaClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31925/#review76149 --- Ship it! Makes sense. Not a committer but looks good to me :) Unit tests passed and compilation warnings went away. - Jiangjie Qin On March 11, 2015, 4:35 a.m., Blake Smith wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31925/ --- (Updated March 11, 2015, 4:35 a.m.) Review request for kafka. Bugs: KAFKA-1054 https://issues.apache.org/jira/browse/KAFKA-1054 Repository: kafka Description --- Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) up to date with the latest trunk and fixed 2 more lingering compiler warnings. Diffs - core/src/main/scala/kafka/admin/AdminUtils.scala b700110 core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc core/src/main/scala/kafka/server/KafkaServer.scala dddef93 core/src/main/scala/kafka/utils/Utils.scala 738c1af core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 Diff: https://reviews.apache.org/r/31925/diff/ Testing --- Ran full test suite. Thanks, Blake Smith
Re: Review Request 30809: Patch for KAFKA-1888
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review76157 --- This looks like a very good start. I think the framework is flexible enough to allow us to add a variety of upgrade tests. I'm looking forward to it. I have few comments, but mostly I'm still confused on how this will be used. Perhaps more comments or even a readme is in order You wrote that we invoke test.sh dir1 dir2, what should each directory contain? just the kafka jar of different versions? or an entire installation (including bin/ and conf/)? Which one of the directories should be the newer and which is older? does it matter? Which version of clients will be used. Perhaps a more descriptive name for test.sh can help too. I'm guessing we'll have a whole collection of those test scripts soon. Gwen build.gradle https://reviews.apache.org/r/30809/#comment123636 This should probably be a test dependency (if needed at all) Packaging Guava will be a pain, since so many systems use different versions of Guava and they are all incompatible. core/src/main/scala/kafka/tools/ContinuousValidationTest.java https://reviews.apache.org/r/30809/#comment123635 Do we really want to do this? We are using joptsimple for a bunch of other tools. It is easier to read, maintain, nice error messages, help screen, etc. system_test/broker_upgrade/bin/kafka-run-class.sh https://reviews.apache.org/r/30809/#comment123638 Why did we decide to duplicate this entire file? - Gwen Shapira On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer
[ https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357794#comment-14357794 ] Guozhang Wang commented on KAFKA-1910: -- Thanks Jun, incorporated the comments and commit to trunk. Refactor KafkaConsumer -- Key: KAFKA-1910 URL: https://issues.apache.org/jira/browse/KAFKA-1910 Project: Kafka Issue Type: Sub-task Components: consumer Reporter: Guozhang Wang Assignee: Guozhang Wang Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch KafkaConsumer now contains all the logic on the consumer side, making it a very huge class file, better re-factoring it to have multiple layers on top of KafkaClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-11_18:17:51.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357927#comment-14357927 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31927/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31927: Patch for KAFKA-1461
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31927/ --- (Updated March 12, 2015, 1:17 a.m.) Review request for kafka. Bugs: KAFKA-1461 https://issues.apache.org/jira/browse/KAFKA-1461 Repository: kafka Description --- KAFKA-1461. Replica fetcher thread does not implement any back-off behavior. Diffs (updated) - core/src/main/scala/kafka/consumer/ConsumerFetcherThread.scala ee6139c901082358382c5ef892281386bf6fc91b core/src/main/scala/kafka/server/AbstractFetcherThread.scala 8c281d4668f92eff95a4a5df3c03c4b5b20e7095 core/src/main/scala/kafka/server/KafkaConfig.scala 48e33626695ad8a28b0018362ac225f11df94973 core/src/main/scala/kafka/server/ReplicaFetcherThread.scala d6d14fbd167fb8b085729cda5158898b1a3ee314 core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala c124c8df5b5079e5ffbd0c4ea359562a66aaf317 core/src/test/scala/unit/kafka/utils/TestUtils.scala 52c79201af7c60f9b44a0aaa09cdf968d89a7b87 Diff: https://reviews.apache.org/r/31927/diff/ Testing --- Thanks, Sriharsha Chintalapani
Re: Review Request 31706: Patch for KAFKA-1997
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/ --- (Updated March 11, 2015, 10:20 p.m.) Review request for kafka. Bugs: KAFKA-1997 https://issues.apache.org/jira/browse/KAFKA-1997 Repository: kafka Description (updated) --- Addressed Guozhang's comments. Changed the exit behavior on send failure because close(0) is not ready yet. Will submit followup patch after KAFKA-1660 is checked in. Expanded imports from _ and * to full class path Incorporated Joel's comments. Addressed Joel's comments. Addressed Guozhang's comments. Incorporated Guozhang's comments. Diffs (updated) - clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f core/src/main/scala/kafka/consumer/PartitionAssignor.scala e6ff7683a0df4a7d221e949767e57c34703d5aad core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 5487259751ebe19f137948249aa1fd2637d2deb4 core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 7f45a90ba6676290172b7da54c15ee5dc1a42a2e core/src/main/scala/kafka/tools/MirrorMaker.scala bafa379ff57bc46458ea8409406f5046dc9c973e core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 543070f4fd3e96f3183cae9ee2ccbe843409ee58 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 19640cc55b5baa0a26a808d708b7f4caf491c9f0 Diff: https://reviews.apache.org/r/31706/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Commented] (KAFKA-1997) Refactor Mirror Maker
[ https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357714#comment-14357714 ] Jiangjie Qin commented on KAFKA-1997: - Updated reviewboard https://reviews.apache.org/r/31706/diff/ against branch origin/trunk Refactor Mirror Maker - Key: KAFKA-1997 URL: https://issues.apache.org/jira/browse/KAFKA-1997 Project: Kafka Issue Type: Improvement Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch Refactor mirror maker based on KIP-3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30809: Patch for KAFKA-1888
On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line183 This is essentially a sync approach, can we use callback to do this? Abhishek Nigam wrote: This is intentional. We want to make sure the event has successfully reached the brokers. This enables us to form a reasonable expectation of what the consumer should expect. The callback should be able to make sure everything goes well otherwise you can chose stop sending or do whatever you want. One big issue about this approach is that you will only send a single message out for each batch, and that would be very slow especially if you don's set linger time to be some thing very small even zero. Ideally the test case should work with differnt producer settings, I would say at least ack=1 and ack=-1, also with different batch size and linger time. Sending a single message out for each batch does not look a very useful test case. On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line184 When a send fails, should we at least log the sequence number? Abhishek Nigam wrote: I log the exception and the logger gives me the timestamp in the logs. Maybe I am missing something. Can you explain the rationale of why we would want to log the sequence number on the producer side when send fails. Suppose someone is reading the log because something went wrong, wouldn't it be much faster to see how many and which messages are lost if you have sequence number logged? For example, with sequence number, you can see producer saying that I messge 1,2,3 sent successfully while message 4 failed. And consumer would say, I expect to see 1,2,3 but I just got 1,3. 2 is lost. In your current log, what you can see is just a message wasn't sent successfully, and one message was not consumed as expected. It's more complicated to debug, right? - Jiangjie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review76173 --- On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Build failed in Jenkins: KafkaPreCommit #36
See https://builds.apache.org/job/KafkaPreCommit/36/changes Changes: [wangguoz] KAFKA-1910 Follow-up again; fix ListOffsetResponse handling for the expected error codes -- [...truncated 1643 lines...] at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:40) kafka.integration.PrimitiveApiTest testPipelinedProduceRequests FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.integration.PrimitiveApiTest.kafka$integration$KafkaServerTestHarness$$super$setUp(PrimitiveApiTest.scala:40) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44) at kafka.integration.PrimitiveApiTest.kafka$integration$ProducerConsumerTestHarness$$super$setUp(PrimitiveApiTest.scala:40) at kafka.integration.ProducerConsumerTestHarness$class.setUp(ProducerConsumerTestHarness.scala:33) at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:40) kafka.integration.AutoOffsetResetTest testResetToEarliestWhenOffsetTooHigh FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44) at kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46) kafka.integration.AutoOffsetResetTest testResetToEarliestWhenOffsetTooLow FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44) at kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46) kafka.integration.AutoOffsetResetTest testResetToLatestWhenOffsetTooHigh FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44) at kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46) kafka.integration.AutoOffsetResetTest testResetToLatestWhenOffsetTooLow FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at
Re: Review Request 30809: Patch for KAFKA-1888
On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line183 This is essentially a sync approach, can we use callback to do this? This is intentional. We want to make sure the event has successfully reached the brokers. This enables us to form a reasonable expectation of what the consumer should expect. On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line184 When a send fails, should we at least log the sequence number? I log the exception and the logger gives me the timestamp in the logs. Maybe I am missing something. Can you explain the rationale of why we would want to log the sequence number on the producer side when send fails. On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 321 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line321 Similar to producer, can we log the expected sequence number and the seq we actually saw? Sure in the cases where this a mismatch I could do that. On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 386 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line386 Can we use KafkaThread here? I will take a look at that. - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review76173 --- On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Jenkins build is back to normal : Kafka-trunk #423
See https://builds.apache.org/job/Kafka-trunk/423/changes
Re: Review Request 30809: Patch for KAFKA-1888
On March 11, 2015, 11:12 p.m., Gwen Shapira wrote: This looks like a very good start. I think the framework is flexible enough to allow us to add a variety of upgrade tests. I'm looking forward to it. I have few comments, but mostly I'm still confused on how this will be used. Perhaps more comments or even a readme is in order You wrote that we invoke test.sh dir1 dir2, what should each directory contain? just the kafka jar of different versions? or an entire installation (including bin/ and conf/)? Which one of the directories should be the newer and which is older? does it matter? Which version of clients will be used. Perhaps a more descriptive name for test.sh can help too. I'm guessing we'll have a whole collection of those test scripts soon. Gwen The directory containing the kafka jars. kafka_2.10-0.8.3-SNAPSHOT.jar kafka-clients-0.8.3-SNAPSHOT.jar The other jars are shared between both the kafka brokers. On March 11, 2015, 11:12 p.m., Gwen Shapira wrote: build.gradle, line 209 https://reviews.apache.org/r/30809/diff/3/?file=889854#file889854line209 This should probably be a test dependency (if needed at all) Packaging Guava will be a pain, since so many systems use different versions of Guava and they are all incompatible. Guava provides an excellent rate limiter which I am using in the test and have used in the past. When you talk about packaging we are already pulling in other external libraries like zookeeper with a specific version which the applications might be using extensively and might similarly run into conflicts. If you have a suggestion for a library which provides rate limiting(less popular) than guava then I can use that instead otherwise I will move this dependency to the test for now. On March 11, 2015, 11:12 p.m., Gwen Shapira wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 409-440 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line409 Do we really want to do this? We are using joptsimple for a bunch of other tools. It is easier to read, maintain, nice error messages, help screen, etc. Thanks, I will switch to jobOpts. On March 11, 2015, 11:12 p.m., Gwen Shapira wrote: system_test/broker_upgrade/bin/kafka-run-class.sh, lines 152-156 https://reviews.apache.org/r/30809/diff/3/?file=889856#file889856line152 Why did we decide to duplicate this entire file? The only difference is that it takes an additional argument which contains the directory from which the kafka jars should be pulled. Would you recommend adding it to the original script as an optional argument? - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review76157 --- On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Review Request 31967: Patch for KAFKA-1546
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31967/ --- Review request for kafka. Bugs: KAFKA-1546 https://issues.apache.org/jira/browse/KAFKA-1546 Repository: kafka Description --- Patch for KAFKA-1546 Diffs - core/src/main/scala/kafka/cluster/Partition.scala c4bf48a801007ebe7497077d2018d6dffe1677d4 core/src/main/scala/kafka/cluster/Replica.scala bd13c20338ce3d73113224440e858a12814e5adb core/src/main/scala/kafka/log/Log.scala 06b8ecc5d11a1acfbaf3c693c42bf3ce5b2cd86d core/src/main/scala/kafka/server/KafkaConfig.scala 48e33626695ad8a28b0018362ac225f11df94973 core/src/main/scala/kafka/server/ReplicaManager.scala c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 92152358c95fa9178d71bd1c079af0a0bd8f1da8 core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala c124c8df5b5079e5ffbd0c4ea359562a66aaf317 core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 Diff: https://reviews.apache.org/r/31967/diff/ Testing --- Thanks, Aditya Auradkar
Re: Review Request 31967: Patch for KAFKA-1546
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31967/ --- (Updated March 12, 2015, 1:39 a.m.) Review request for kafka. Bugs: KAFKA-1546 https://issues.apache.org/jira/browse/KAFKA-1546 Repository: kafka Description (updated) --- Patch for KAFKA-1546 Brief summary of changes: - Added a lagBegin metric inside Replica to track the lag in terms of time since the replica did not read from the LEO - Using lag begin value in the check for ISR expand and shrink - Removed the max lag messages config since it is no longer necessary - Returning the initialLogEndOffset in LogReadResult corresponding to the the LEO before actually reading from the log. - Unit test cases to test ISR shrinkage and expansion Diffs - core/src/main/scala/kafka/cluster/Partition.scala c4bf48a801007ebe7497077d2018d6dffe1677d4 core/src/main/scala/kafka/cluster/Replica.scala bd13c20338ce3d73113224440e858a12814e5adb core/src/main/scala/kafka/log/Log.scala 06b8ecc5d11a1acfbaf3c693c42bf3ce5b2cd86d core/src/main/scala/kafka/server/KafkaConfig.scala 48e33626695ad8a28b0018362ac225f11df94973 core/src/main/scala/kafka/server/ReplicaManager.scala c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 92152358c95fa9178d71bd1c079af0a0bd8f1da8 core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala c124c8df5b5079e5ffbd0c4ea359562a66aaf317 core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 Diff: https://reviews.apache.org/r/31967/diff/ Testing --- Thanks, Aditya Auradkar
0.8.3 release plan
There hasn't been any public discussion about the 0.8.3 release plan. There seems to be a lot of work in flight, work with patches and review that could/should get committed but now just pending KIPS, work without KIP but that is in trunk already (e.g. the new Consumer) that would be the the release but missing the KIP for the release... What does this mean for the 0.8.3 release? What are we trying to get out and when? Also looking at https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan there seems to be things we are getting earlier (which is great of course) so are we going to try to up the version and go with 0.9.0? 0.8.2.0 ended up getting very bloated and that delayed it much longer than we had originally communicated to the community and want to make sure we take that feedback from the community and try to improve upon it. Thanks! ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - -
Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer
+1 (binding) On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer -- -- Guozhang
Re: 0.8.3 release plan
Yeah I'd be in favor of a quicker, smaller release but I think as long as we have these big things in flight we should probably keep the release criteria feature-based rather than time-based, though (e.g. when X works not every other month). Ideally the next release would have at least a beta version of the new consumer. I think having a new hunk of code like that available but marked as beta is maybe a good way to go, as it gets it into peoples hands for testing. This way we can declare the API not fully locked down until the final release too, since mostly users only look at stuff after we release it. Maybe we can try to construct a schedule around this? -Jay On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein joe.st...@stealth.ly wrote: There hasn't been any public discussion about the 0.8.3 release plan. There seems to be a lot of work in flight, work with patches and review that could/should get committed but now just pending KIPS, work without KIP but that is in trunk already (e.g. the new Consumer) that would be the the release but missing the KIP for the release... What does this mean for the 0.8.3 release? What are we trying to get out and when? Also looking at https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan there seems to be things we are getting earlier (which is great of course) so are we going to try to up the version and go with 0.9.0? 0.8.2.0 ended up getting very bloated and that delayed it much longer than we had originally communicated to the community and want to make sure we take that feedback from the community and try to improve upon it. Thanks! ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - -
[jira] [Commented] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358063#comment-14358063 ] Jay Kreps commented on KAFKA-1546: -- Well iiuc the wonderfulness of this feature is that it actually doesn't add any new configs, it removes an old one that was impossible to set correctly and slightly modifies the meaning of an existing one to do what it sounds like it does. So I do think that for 99.5% of the world this will be like, wow, Kafka replication is much more robust. I do think it is definitely a bug fix not a feature. But hey, I love me some KIPs, so I can't object to a nice write-up if you think it would be good to have. Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1546: - Fix Version/s: 0.8.3 Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2001) OffsetCommitTest hang during setup
[ https://issues.apache.org/jira/browse/KAFKA-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-2001: - Fix Version/s: 0.8.3 OffsetCommitTest hang during setup -- Key: KAFKA-2001 URL: https://issues.apache.org/jira/browse/KAFKA-2001 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Joel Koshy Fix For: 0.8.3 OffsetCommitTest seems to hang in trunk reliably. The following is the stacktrace. It seems to hang during setup. Test worker prio=5 tid=7fb5ab154800 nid=0x11198e000 waiting on condition [11198c000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at kafka.server.OffsetCommitTest$$anonfun$setUp$2.apply(OffsetCommitTest.scala:59) at kafka.server.OffsetCommitTest$$anonfun$setUp$2.apply(OffsetCommitTest.scala:58) at scala.collection.immutable.Stream.dropWhile(Stream.scala:825) at kafka.server.OffsetCommitTest.setUp(OffsetCommitTest.scala:58) at junit.framework.TestCase.runBare(TestCase.java:132) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:91) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:86) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:49) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:48) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1986) Producer request failure rate should not include InvalidMessageSizeException and OffsetOutOfRangeException
[ https://issues.apache.org/jira/browse/KAFKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1986: - Fix Version/s: 0.8.3 Producer request failure rate should not include InvalidMessageSizeException and OffsetOutOfRangeException -- Key: KAFKA-1986 URL: https://issues.apache.org/jira/browse/KAFKA-1986 Project: Kafka Issue Type: Sub-task Reporter: Aditya Auradkar Assignee: Aditya Auradkar Fix For: 0.8.3 Attachments: KAFKA-1986.patch InvalidMessageSizeException and OffsetOutOfRangeException should not be counted a failedProduce and failedFetch requests since they are client side errors. They should be treated similarly to UnknownTopicOrPartitionException or NotLeaderForPartitionException -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1938) [doc] Quick start example should reference appropriate Kafka version
[ https://issues.apache.org/jira/browse/KAFKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1938: - Fix Version/s: 0.8.3 [doc] Quick start example should reference appropriate Kafka version Key: KAFKA-1938 URL: https://issues.apache.org/jira/browse/KAFKA-1938 Project: Kafka Issue Type: Improvement Components: website Affects Versions: 0.8.2.0 Reporter: Stevo Slavic Assignee: Manikumar Reddy Priority: Trivial Fix For: 0.8.3 Attachments: lz4-compression.patch, remove-081-references-1.patch, remove-081-references.patch Kafka 0.8.2.0 documentation, quick start example on https://kafka.apache.org/documentation.html#quickstart in step 1 links and instructs reader to download Kafka 0.8.1.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1925) Coordinator Node Id set to INT_MAX breaks coordinator metadata updates
[ https://issues.apache.org/jira/browse/KAFKA-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1925: - Fix Version/s: 0.8.3 Coordinator Node Id set to INT_MAX breaks coordinator metadata updates -- Key: KAFKA-1925 URL: https://issues.apache.org/jira/browse/KAFKA-1925 Project: Kafka Issue Type: Sub-task Components: consumer Reporter: Guozhang Wang Assignee: Guozhang Wang Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-1925.v1.patch KafkaConsumer used INT_MAX to mimic a new socket for coordinator (details can be found in KAFKA-1760). However, this behavior breaks the coordinator as the underlying NetworkClient only used the node id to determine when to initiate a new connection: {code} if (connectionStates.canConnect(node.id(), now)) // if we are interested in sending to a node and we don't have a connection to it, initiate one initiateConnect(node, now); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1943) Producer request failure rate should not include MessageSetSizeTooLarge and MessageSizeTooLargeException
[ https://issues.apache.org/jira/browse/KAFKA-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1943: - Fix Version/s: 0.8.3 Producer request failure rate should not include MessageSetSizeTooLarge and MessageSizeTooLargeException Key: KAFKA-1943 URL: https://issues.apache.org/jira/browse/KAFKA-1943 Project: Kafka Issue Type: Sub-task Reporter: Aditya A Auradkar Assignee: Aditya Auradkar Fix For: 0.8.3 Attachments: KAFKA-1943.patch If MessageSetSizeTooLargeException or MessageSizeTooLargeException is thrown from Log, then ReplicaManager counts it as a failed produce request. My understanding is that this metric should only count failures as a result of broker issues and not bad requests sent by the clients. If the message or message set is too large, then it is a client side error and should not be reported. (similar to NotLeaderForPartitionException, UnknownTopicOrPartitionException). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1957) code doc typo
[ https://issues.apache.org/jira/browse/KAFKA-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1957: - Fix Version/s: 0.8.3 code doc typo - Key: KAFKA-1957 URL: https://issues.apache.org/jira/browse/KAFKA-1957 Project: Kafka Issue Type: Improvement Components: config Reporter: Yaguo Zhou Priority: Trivial Fix For: 0.8.3 Attachments: 0001-fix-typo-SO_SNDBUFF-SO_SNDBUF-SO_RCVBUFF-SO_RCVBUF.patch There are doc typo in kafka.server.KafkaConfig.scala, SO_SNDBUFF should be SO_SNDBUF and SO_RCVBUFF should be SO_RCVBUF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1948) kafka.api.consumerTests are hanging
[ https://issues.apache.org/jira/browse/KAFKA-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1948: - Fix Version/s: 0.8.3 kafka.api.consumerTests are hanging --- Key: KAFKA-1948 URL: https://issues.apache.org/jira/browse/KAFKA-1948 Project: Kafka Issue Type: Bug Reporter: Gwen Shapira Assignee: Guozhang Wang Fix For: 0.8.3 Attachments: KAFKA-1948.patch Noticed today that very often when I run the full test suite, it hangs on kafka.api.consumerTest (not always same test though). It doesn't reproduce 100% of the time, but enough to be very annoying. I also saw it happening on trunk after KAFKA-1333: https://builds.apache.org/view/All/job/Kafka-trunk/389/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1959) Class CommitThread overwrite group of Thread class causing compile errors
[ https://issues.apache.org/jira/browse/KAFKA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1959: - Fix Version/s: 0.8.3 Class CommitThread overwrite group of Thread class causing compile errors - Key: KAFKA-1959 URL: https://issues.apache.org/jira/browse/KAFKA-1959 Project: Kafka Issue Type: Bug Components: core Environment: scala 2.10.4 Reporter: Tong Li Assignee: Tong Li Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1959.patch, compileError.png class CommitThread(id: Int, partitionCount: Int, commitIntervalMs: Long, zkClient: ZkClient) extends ShutdownableThread(commit-thread) with KafkaMetricsGroup { private val group = group- + id group overwrite class Thread group member, causing the following compile error: overriding variable group in class Thread of type ThreadGroup; value group has weaker access privileges; it should not be private -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1969) NPE in unit test for new consumer
[ https://issues.apache.org/jira/browse/KAFKA-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1969: - Fix Version/s: 0.8.3 NPE in unit test for new consumer - Key: KAFKA-1969 URL: https://issues.apache.org/jira/browse/KAFKA-1969 Project: Kafka Issue Type: Bug Reporter: Neha Narkhede Assignee: Guozhang Wang Labels: newbie Fix For: 0.8.3 Attachments: stack.out {code} kafka.api.ConsumerTest testConsumptionWithBrokerFailures FAILED java.lang.NullPointerException at org.apache.kafka.clients.consumer.KafkaConsumer.ensureCoordinatorReady(KafkaConsumer.java:1238) at org.apache.kafka.clients.consumer.KafkaConsumer.initiateCoordinatorRequest(KafkaConsumer.java:1189) at org.apache.kafka.clients.consumer.KafkaConsumer.commit(KafkaConsumer.java:777) at org.apache.kafka.clients.consumer.KafkaConsumer.commit(KafkaConsumer.java:816) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:704) at kafka.api.ConsumerTest.consumeWithBrokerFailures(ConsumerTest.scala:167) at kafka.api.ConsumerTest.testConsumptionWithBrokerFailures(ConsumerTest.scala:152) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1964) testPartitionReassignmentCallback hangs occasionally
[ https://issues.apache.org/jira/browse/KAFKA-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1964: - Fix Version/s: 0.8.3 testPartitionReassignmentCallback hangs occasionally - Key: KAFKA-1964 URL: https://issues.apache.org/jira/browse/KAFKA-1964 Project: Kafka Issue Type: Bug Components: admin Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Guozhang Wang Labels: newbie++ Fix For: 0.8.3 Attachments: stack.out -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1960) .gitignore does not exclude test generated files and folders.
[ https://issues.apache.org/jira/browse/KAFKA-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1960: - Fix Version/s: 0.8.3 .gitignore does not exclude test generated files and folders. - Key: KAFKA-1960 URL: https://issues.apache.org/jira/browse/KAFKA-1960 Project: Kafka Issue Type: Bug Components: build Reporter: Tong Li Assignee: Tong Li Priority: Minor Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1960.patch gradle test can create quite few folders, .gitignore should exclude these files for an easier git submit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1997) Refactor Mirror Maker
[ https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin updated KAFKA-1997: Attachment: KAFKA-1997_2015-03-11_15:20:18.patch Refactor Mirror Maker - Key: KAFKA-1997 URL: https://issues.apache.org/jira/browse/KAFKA-1997 Project: Kafka Issue Type: Improvement Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch Refactor mirror maker based on KIP-3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30809: Patch for KAFKA-1888
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review76173 --- core/src/main/scala/kafka/tools/ContinuousValidationTest.java https://reviews.apache.org/r/30809/#comment123652 This is essentially a sync approach, can we use callback to do this? core/src/main/scala/kafka/tools/ContinuousValidationTest.java https://reviews.apache.org/r/30809/#comment123653 When a send fails, should we at least log the sequence number? core/src/main/scala/kafka/tools/ContinuousValidationTest.java https://reviews.apache.org/r/30809/#comment123659 Similar to producer, can we log the expected sequence number and the seq we actually saw? core/src/main/scala/kafka/tools/ContinuousValidationTest.java https://reviews.apache.org/r/30809/#comment123655 Can we use KafkaThread here? - Jiangjie Qin On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
[jira] [Commented] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357947#comment-14357947 ] Aditya A Auradkar commented on KAFKA-1546: -- Created reviewboard https://reviews.apache.org/r/31967/diff/ against branch origin/trunk Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Attachments: KAFKA-1546.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya A Auradkar updated KAFKA-1546: - Attachment: KAFKA-1546.patch Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Attachments: KAFKA-1546.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31706: Patch for KAFKA-1997
On March 12, 2015, 1:22 a.m., Guozhang Wang wrote: Hit this unit test failure, is this relevant? -- kafka.consumer.ZookeeperConsumerConnectorTest testConsumerRebalanceListener FAILED junit.framework.AssertionFailedError: expected:List((0,group1_consumer1-0), (1,group1_consumer2-0)) but was:ArrayBuffer((1,group1_consumer2-0)) at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:71) at kafka.consumer.ZookeeperConsumerConnectorTest.testConsumerRebalanceListener(ZookeeperConsumerConnectorTest.scala:393) It is a bug but irrelavant to this patch I believe. The reason is that consumer 2 finishes rebalance before consumer 1 does, so zookeeper did not have the complete ownership info yet when we do the assertion. I fixed this by consuming one message from consumer 1 before checking ownership info from zookeeper. - Jiangjie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/#review76189 --- On March 11, 2015, 10:20 p.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/ --- (Updated March 11, 2015, 10:20 p.m.) Review request for kafka. Bugs: KAFKA-1997 https://issues.apache.org/jira/browse/KAFKA-1997 Repository: kafka Description --- Addressed Guozhang's comments. Changed the exit behavior on send failure because close(0) is not ready yet. Will submit followup patch after KAFKA-1660 is checked in. Expanded imports from _ and * to full class path Incorporated Joel's comments. Addressed Joel's comments. Addressed Guozhang's comments. Incorporated Guozhang's comments. Diffs - clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f core/src/main/scala/kafka/consumer/PartitionAssignor.scala e6ff7683a0df4a7d221e949767e57c34703d5aad core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 5487259751ebe19f137948249aa1fd2637d2deb4 core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 7f45a90ba6676290172b7da54c15ee5dc1a42a2e core/src/main/scala/kafka/tools/MirrorMaker.scala bafa379ff57bc46458ea8409406f5046dc9c973e core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 543070f4fd3e96f3183cae9ee2ccbe843409ee58 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 19640cc55b5baa0a26a808d708b7f4caf491c9f0 Diff: https://reviews.apache.org/r/31706/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Updated] (KAFKA-1997) Refactor Mirror Maker
[ https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin updated KAFKA-1997: Attachment: KAFKA-1997_2015-03-11_19:10:53.patch Refactor Mirror Maker - Key: KAFKA-1997 URL: https://issues.apache.org/jira/browse/KAFKA-1997 Project: Kafka Issue Type: Improvement Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch, KAFKA-1997_2015-03-11_19:10:53.patch Refactor mirror maker based on KIP-3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1997) Refactor Mirror Maker
[ https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357993#comment-14357993 ] Jiangjie Qin commented on KAFKA-1997: - Updated reviewboard https://reviews.apache.org/r/31706/diff/ against branch origin/trunk Refactor Mirror Maker - Key: KAFKA-1997 URL: https://issues.apache.org/jira/browse/KAFKA-1997 Project: Kafka Issue Type: Improvement Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch, KAFKA-1997_2015-03-11_19:10:53.patch Refactor mirror maker based on KIP-3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358039#comment-14358039 ] Jay Kreps commented on KAFKA-1546: -- Personally I don't think it really needs a KIP, it subtly changes the meaning of one config, but it actually changes it to mean what everyone thinks it currently means. What do you think? I think this one is less about user expectations or our opinions and more about does it actually work. Speaking of which... [~auradkar] What is the test plan for this? It is trivially easy to reproduce the problems with the old approach. Start a server with default settings and 1-2 replicas and use the perf test to generate a ton of load with itty bitty messages and just watch the replicas drop in and out of sync. We should concoct the most brutal case of this and validate that unless the follower actually falls behind it never failure detects out of the ISR. But we also need to check the reverse condition, that both a soft death and a lag are still detected. You can cause a soft death by setting the zk session timeout to something massive and just using unix signals to pause the process. You can cause lag by just running some commands on one of the followers to eat up all the cpu or I/O while a load test is running until the follower falls behind. Both cases should get caught. Anyhow, awesome job getting this done. I think this is one of the biggest stability issues in Kafka right now. The patch lgtm, but it would be good for [~junrao] and [~nehanarkhede] to take a look. Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358040#comment-14358040 ] Sriharsha Chintalapani commented on KAFKA-1461: --- [~charmalloc] since there aren't any interface changes I am not sure if a KIP is necessary. Ofcourse we added a new config for replica.fetch.backoff.ms If this warrants a KIP than I can write up one. Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[VOTE] KIP-15 add a close method with timeout to KafkaProducer
https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer
Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer
Could the KIP confluence please have updated the discussion thread link, thanks... could you also remove the template boilerplate at the top *This page is meant as a template ..* so we can capture it for the release cleanly. Also I don't really/fully understand how this is different than flush(time); close() and why close has its own timeout also? Lastly, what is the forceClose flag? This isn't documented in the public interface so it isn't clear how to completely use the feature just by reading the KIP. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Wed, Mar 11, 2015 at 11:24 PM, Guozhang Wang wangg...@gmail.com wrote: +1 (binding) On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer -- -- Guozhang
Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer
Yeah guys, I'd like to second that. I'd really really love to get the quality of these to the point where we could broadly solicit user input and use them as a permanent document of the alternatives and rationale. I know it is a little painful to have process, but I think we all saw what happened to the previous clients as public interfaces so I really really really want us to just be incredibly thoughtful and disciplined as we make changes. I think we all want to avoid another client rewrite. To second Joe's question in a more specific way, I think an alternative I don't see considered to give close() a bounded time is just to enforce the request time on the client side, which will cause all requests to be failed after the request timeout expires. This was the same behavior as for flush. In the case where the user just wants to ensure close doesn't block forever I think that may be sufficient? So one alternative might be to just do that request timeout feature and add a new producer config that is something like abort.on.failure=false which causes the producer to hard exit if it can't send a request. Which I think is closer to what you want, with this just being a way to implement that behavior. I'm not sure if this is better or worse, but we should be sure before we make the change. I also have a concern about producer.close(0, TimeUnit.MILLISECONDS) not meaning close with a timeout of 0 ms. I realize this exists in other java apis, but it is so confusing it even confused us into having that recent producer bug because of course all the other numbers mean wait that long. I'd propose close()--block until all completed close(0, TimeUnit.MILLISECONDS)--block for 0 ms close(5, TimeUnit.MILLISECONDS)--block for 5 ms close(-1, TimeUnit.MILLISECONDS)--error because blocking for negative ms would mean completing in the past :-) -Jay On Wed, Mar 11, 2015 at 8:31 PM, Joe Stein joe.st...@stealth.ly wrote: Could the KIP confluence please have updated the discussion thread link, thanks... could you also remove the template boilerplate at the top *This page is meant as a template ..* so we can capture it for the release cleanly. Also I don't really/fully understand how this is different than flush(time); close() and why close has its own timeout also? Lastly, what is the forceClose flag? This isn't documented in the public interface so it isn't clear how to completely use the feature just by reading the KIP. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Wed, Mar 11, 2015 at 11:24 PM, Guozhang Wang wangg...@gmail.com wrote: +1 (binding) On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer -- -- Guozhang
[jira] [Commented] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358058#comment-14358058 ] Joe Stein commented on KAFKA-1546: -- we could also mark the JIRA as a bug instead of improvment Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358074#comment-14358074 ] Aditya Auradkar commented on KAFKA-1546: I'll write a short KIP on this and circulate it tomorrow. In the meantime, I guess Jun/Neha can also review it since the actual fix has been discussed in enough detail on this jira. Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.8.3 release plan
With regard to mm, I was kind of assuming just based on the amount of work that that would go in for sure, but yeah I agree it is important. -Jay On Wed, Mar 11, 2015 at 9:39 PM, Jay Kreps jay.kr...@gmail.com wrote: What I was trying to say was let's do a real release whenever either consumer or authn is done whichever happens first (or both if they can happen close together)--not sure which is more likely to slip. WRT the beta thing I think the question for people is whether the beta period was helpful or not in getting a more stable release? We could either do a beta release again or we could just do a normal release and call the consumer feature experimental or whatever...basically something to get it in peoples hands before it is supposed to work perfectly and never change again. -Jay On Wed, Mar 11, 2015 at 9:27 PM, Gwen Shapira gshap...@cloudera.com wrote: So basically you are suggesting - lets do a beta release whenever we feel the new consumer is done? This can definitely work. I'd prefer holding for MM improvements too. IMO, its not just more improvements like flush() and compression optimization. Current MirrorMaker can lose data, which makes it pretty useless for its job. We hear lots of requests for robust MM from our customers, so I can imagine its pretty important to the Kafka community (unless I have a completely skewed sample). Gwen On Wed, Mar 11, 2015 at 9:18 PM, Jay Kreps jay.kr...@gmail.com wrote: Yeah the real question is always what will we block on? I don't think we should try to hold back smaller changes. In this bucket I would include most things you described: mm improvements, replica assignment tool improvements, flush, purgatory improvements, compression optimization, etc. Likely these will all get done in time as well as many things that kind of pop up from users but probably aren't worth doing a release for on their own. If one of them slips that fine. I also don't think we should try to hold back work that is done if it isn't on a list. I would consider either SSL+SASL or the consumer worthy of a release on its own. If they finish close to the same time that is great. We can maybe just assess as these evolve where the other one is at and make a call whether it will be one or both? -Jay On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira gshap...@cloudera.com wrote: If we are going in terms of features, I can see the following features getting in in the next month or two: * New consumer * Improved Mirror Maker (I've seen tons of interest) * Centralized admin requests (aka KIP-4) * Nicer replica-reassignment tool * SSL (and perhaps also SASL)? I think this collection will make a nice release. Perhaps we can cap it there and focus (as a community) on getting these in, we can have a release without too much scope creep in the not-very-distant-future? Even just 3 out of these 5 will still make a nice incremental improvement. Gwen On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps jay.kr...@gmail.com wrote: Yeah I'd be in favor of a quicker, smaller release but I think as long as we have these big things in flight we should probably keep the release criteria feature-based rather than time-based, though (e.g. when X works not every other month). Ideally the next release would have at least a beta version of the new consumer. I think having a new hunk of code like that available but marked as beta is maybe a good way to go, as it gets it into peoples hands for testing. This way we can declare the API not fully locked down until the final release too, since mostly users only look at stuff after we release it. Maybe we can try to construct a schedule around this? -Jay On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein joe.st...@stealth.ly wrote: There hasn't been any public discussion about the 0.8.3 release plan. There seems to be a lot of work in flight, work with patches and review that could/should get committed but now just pending KIPS, work without KIP but that is in trunk already (e.g. the new Consumer) that would be the the release but missing the KIP for the release... What does this mean for the 0.8.3 release? What are we trying to get out and when? Also looking at https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan there seems to be things we are getting earlier (which is great of course) so are we going to try to up the version and go with 0.9.0? 0.8.2.0 ended up getting very bloated and that delayed it much longer than we had originally communicated to the community and want to make sure we take that feedback from the community and try to improve upon it. Thanks! ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - -
Re: 0.8.3 release plan
What I was trying to say was let's do a real release whenever either consumer or authn is done whichever happens first (or both if they can happen close together)--not sure which is more likely to slip. WRT the beta thing I think the question for people is whether the beta period was helpful or not in getting a more stable release? We could either do a beta release again or we could just do a normal release and call the consumer feature experimental or whatever...basically something to get it in peoples hands before it is supposed to work perfectly and never change again. -Jay On Wed, Mar 11, 2015 at 9:27 PM, Gwen Shapira gshap...@cloudera.com wrote: So basically you are suggesting - lets do a beta release whenever we feel the new consumer is done? This can definitely work. I'd prefer holding for MM improvements too. IMO, its not just more improvements like flush() and compression optimization. Current MirrorMaker can lose data, which makes it pretty useless for its job. We hear lots of requests for robust MM from our customers, so I can imagine its pretty important to the Kafka community (unless I have a completely skewed sample). Gwen On Wed, Mar 11, 2015 at 9:18 PM, Jay Kreps jay.kr...@gmail.com wrote: Yeah the real question is always what will we block on? I don't think we should try to hold back smaller changes. In this bucket I would include most things you described: mm improvements, replica assignment tool improvements, flush, purgatory improvements, compression optimization, etc. Likely these will all get done in time as well as many things that kind of pop up from users but probably aren't worth doing a release for on their own. If one of them slips that fine. I also don't think we should try to hold back work that is done if it isn't on a list. I would consider either SSL+SASL or the consumer worthy of a release on its own. If they finish close to the same time that is great. We can maybe just assess as these evolve where the other one is at and make a call whether it will be one or both? -Jay On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira gshap...@cloudera.com wrote: If we are going in terms of features, I can see the following features getting in in the next month or two: * New consumer * Improved Mirror Maker (I've seen tons of interest) * Centralized admin requests (aka KIP-4) * Nicer replica-reassignment tool * SSL (and perhaps also SASL)? I think this collection will make a nice release. Perhaps we can cap it there and focus (as a community) on getting these in, we can have a release without too much scope creep in the not-very-distant-future? Even just 3 out of these 5 will still make a nice incremental improvement. Gwen On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps jay.kr...@gmail.com wrote: Yeah I'd be in favor of a quicker, smaller release but I think as long as we have these big things in flight we should probably keep the release criteria feature-based rather than time-based, though (e.g. when X works not every other month). Ideally the next release would have at least a beta version of the new consumer. I think having a new hunk of code like that available but marked as beta is maybe a good way to go, as it gets it into peoples hands for testing. This way we can declare the API not fully locked down until the final release too, since mostly users only look at stuff after we release it. Maybe we can try to construct a schedule around this? -Jay On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein joe.st...@stealth.ly wrote: There hasn't been any public discussion about the 0.8.3 release plan. There seems to be a lot of work in flight, work with patches and review that could/should get committed but now just pending KIPS, work without KIP but that is in trunk already (e.g. the new Consumer) that would be the the release but missing the KIP for the release... What does this mean for the 0.8.3 release? What are we trying to get out and when? Also looking at https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan there seems to be things we are getting earlier (which is great of course) so are we going to try to up the version and go with 0.9.0? 0.8.2.0 ended up getting very bloated and that delayed it much longer than we had originally communicated to the community and want to make sure we take that feedback from the community and try to improve upon it. Thanks! ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - -
[jira] [Comment Edited] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358096#comment-14358096 ] Honghai Chen edited comment on KAFKA-1646 at 3/12/15 4:48 AM: -- Hey, [~jkreps] Would you like help check the review at https://reviews.apache.org/r/29091/diff/7/ , really appreciate, thanks. was (Author: waldenchen): Het, [~jkreps] Would you like help check the review at https://reviews.apache.org/r/29091/diff/7/ , really appreciate, thanks. Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Assignee: xueqiang wang Labels: newbie, patch Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358096#comment-14358096 ] Honghai Chen commented on KAFKA-1646: - Het, [~jkreps] Would you like help check the review at https://reviews.apache.org/r/29091/diff/7/ , really appreciate, thanks. Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Assignee: xueqiang wang Labels: newbie, patch Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [KIP-DISCUSSION] KIP-13 Quotas
1. Cool 2. Yeah I just wanted to flag the dependency/interaction. 3. Cool, I think we are in agreement then that a pluggable system could possibly be nice but we can get to know it operationally before deciding to expose such a thing. 4. Yeah, I agree, let's do it as a separate discussion. We actually had a full discussion and vote back when we started down the path with metrics, but I think there were some concerns so let's talk about it a bit more and see. 5. Yeah I think my concern was just the resulting api. Basically because the logic for each quota is different--at the very least a different metric to check and different requests type to compute the value from, it seems that the seemingly generic api just masks the fact that we handle each case separately. I.e. the implementation of the method internally would be def check(request: T) { if(request.instanceOf[ProduceRequest]) [check produce request] if(request.instanceOf[FetchRequest]) [check fetch request] .. } So basically we have logic specific to each request, but rather than putting that logic into the method for handling that request we kind of put it into a big case statement. So it seems like this doesn't really abstract things since any time you add a new thing to quota you have to jump instead the big case statement and add a new case, right? I think I may be misunderstanding though...in any case not arguing that we want to just shove this into the existing methods I just want to make sure if we introduce an abstraction its a good one. 6. Yes, I think it is preferable not to have the seesaw effect in the delay time. So if you need to impose 20 seconds of delay it is better to delay all 200 requests 100 ms each rather than 199 requests 0 ms and one request 20 seconds. Several reasons for this: a. gives predictable latency to the producer. b. avoids hitting the request timeout on the one slow request c. there is a trade-off between window size and delay time. If the window is too small the estimate will be inaccurate and you will accidentally penalize an okay client (e.g. imagine a 100 ms window, one big request could overflow it). If the window is too large you will allow the system to be brought to its knees for a long period of time prior to the throttling. The other important question here is the details of the windowing policy. If the window resets every 30 seconds, the client exhausts it in 10 seconds, then is throttled for 20, then it resets and the client starts blitzing again. The result is basically 10 second outages every 30 seconds as the throttling expires and the client goes full tilt, crushing the server. So the quotas don't really do their job very well. -Jay On Mon, Mar 9, 2015 at 6:22 PM, Aditya Auradkar aaurad...@linkedin.com.invalid wrote: Thanks for the comments Jay and Todd. Firstly, I'd like to address some of Jay's points. 1. You are right that we don't need to disable quotas on a per-client basis. 2. Configuration management: I must admit, I haven't read Joe's proposal in detail. My thinking was that we can keep configuration management separate from this discussion since this is already quite a meaty topic. Let me spend some time reading that KIP and then I can add more detail to the quota KIP. 3. Custom Quota implementations: I don't think it is necessarily a bad idea to have a interface called the QuotaManager(RequestThrottler). This doesn't necessarily mean exposing the interface as a public API. It is a mechanism to limit code changes to 1-2 specific classes. It prevents quota logic from bleeding into multiples places in the code as happens in any big piece of code. I fully agree that we should not expose this as a public API unless there is a very strong reason to. This seems to be more of an implementation detail. 4. Metrics Package: I'll add a section on the wiki about using things from the metrics package. Currently, the quota stuff is located in clients/common/metrics. This means that we will have to migrate all that functionality into core. Do this also mean that we will need to replace the existing metrics code in core with the newly imported package as a part of this project? If so, that's a relatively large undertaking and it needs to be discussed separately IMO. 5. Request Throttler vs QuotaManager - I wanted my quota manager to do something similar to what you proposed. Inside KafkaApis, I could do: if(quotaManager.check()) // process request else return Internally QuotaManager:check() could do exactly what you suggested try { quotaMetric.record(newVal) } catch (QuotaException e) { // logic to calculate delay requestThrottler.add(new DelayedResponse(...), ...) return } This approach gives us the flexibility of deciding what metric we want to record inside QuotaManager. This brings us back to the same discussion of pluggable quota policies. It's a bit hard to articulate, but for
[jira] [Updated] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-1646: - Reviewer: (was: Jay Kreps) Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Assignee: xueqiang wang Labels: newbie, patch Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358125#comment-14358125 ] Jay Kreps commented on KAFKA-1646: -- Hey [~waldenchen] this patch is adding a TON of windows-specific if/else statements. I don't think that is sustainable. I think if we are going to do this we need to try to make it the same strategy across OS's just for maintainability. That said, are you sure NTFS can't just be tuned to accomplish the same thing? Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Assignee: xueqiang wang Labels: newbie, patch Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.8.3 release plan
Yeah the real question is always what will we block on? I don't think we should try to hold back smaller changes. In this bucket I would include most things you described: mm improvements, replica assignment tool improvements, flush, purgatory improvements, compression optimization, etc. Likely these will all get done in time as well as many things that kind of pop up from users but probably aren't worth doing a release for on their own. If one of them slips that fine. I also don't think we should try to hold back work that is done if it isn't on a list. I would consider either SSL+SASL or the consumer worthy of a release on its own. If they finish close to the same time that is great. We can maybe just assess as these evolve where the other one is at and make a call whether it will be one or both? -Jay On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira gshap...@cloudera.com wrote: If we are going in terms of features, I can see the following features getting in in the next month or two: * New consumer * Improved Mirror Maker (I've seen tons of interest) * Centralized admin requests (aka KIP-4) * Nicer replica-reassignment tool * SSL (and perhaps also SASL)? I think this collection will make a nice release. Perhaps we can cap it there and focus (as a community) on getting these in, we can have a release without too much scope creep in the not-very-distant-future? Even just 3 out of these 5 will still make a nice incremental improvement. Gwen On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps jay.kr...@gmail.com wrote: Yeah I'd be in favor of a quicker, smaller release but I think as long as we have these big things in flight we should probably keep the release criteria feature-based rather than time-based, though (e.g. when X works not every other month). Ideally the next release would have at least a beta version of the new consumer. I think having a new hunk of code like that available but marked as beta is maybe a good way to go, as it gets it into peoples hands for testing. This way we can declare the API not fully locked down until the final release too, since mostly users only look at stuff after we release it. Maybe we can try to construct a schedule around this? -Jay On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein joe.st...@stealth.ly wrote: There hasn't been any public discussion about the 0.8.3 release plan. There seems to be a lot of work in flight, work with patches and review that could/should get committed but now just pending KIPS, work without KIP but that is in trunk already (e.g. the new Consumer) that would be the the release but missing the KIP for the release... What does this mean for the 0.8.3 release? What are we trying to get out and when? Also looking at https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan there seems to be things we are getting earlier (which is great of course) so are we going to try to up the version and go with 0.9.0? 0.8.2.0 ended up getting very bloated and that delayed it much longer than we had originally communicated to the community and want to make sure we take that feedback from the community and try to improve upon it. Thanks! ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - -
[jira] [Updated] (KAFKA-1930) Move server over to new metrics library
[ https://issues.apache.org/jira/browse/KAFKA-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Auradkar updated KAFKA-1930: --- Assignee: Aditya Auradkar Move server over to new metrics library --- Key: KAFKA-1930 URL: https://issues.apache.org/jira/browse/KAFKA-1930 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Aditya Auradkar We are using org.apache.kafka.common.metrics on the clients, but using Coda Hale metrics on the server. We should move the server over to the new metrics package as well. This will help to make all our metrics self-documenting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1930) Move server over to new metrics library
[ https://issues.apache.org/jira/browse/KAFKA-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358077#comment-14358077 ] Aditya Auradkar commented on KAFKA-1930: I plan to work on this ticket since this has been called out as a pre-requisite for implementing quotas (KIP 13) in Kafka. I shall circulate a KIP once I understand the scope of the change well enough. Move server over to new metrics library --- Key: KAFKA-1930 URL: https://issues.apache.org/jira/browse/KAFKA-1930 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps We are using org.apache.kafka.common.metrics on the clients, but using Coda Hale metrics on the server. We should move the server over to the new metrics package as well. This will help to make all our metrics self-documenting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [KIP-DISCUSSION] KIP-13 Quotas
Hey Todd, Yeah it is kind of weird to do the quota check after taking a request, but since the penalty is applied during that request and it just delays you to the right rate, I think it isn't exactly wrong. I admit it is weird, though. What you say about closing the connection makes sense. The issue is that our current model for connections is totally transient. The clients are supposed to handle any kind of transient connection loss and just re-establish. So basically all existing clients would likely just retry all the same whether you closed the connection or not, so at the moment there would be no way to know a retried request is actually a retry. Your point about the REST proxy is a good one, I don't think we had considered that. Currently the java producer just has a single client.id for all requests so the rest proxy would be a single client. But actually what you want is the original sender to be the client. This is technically very hard to do because the client will actually be batching records from all senders together into one request so the only way to get the client id right would be to make a new producer for each rest proxy client and this would mean a lot of memory and connections. This needs thought, not sure what solution there is. I am not 100% convinced we need to obey the request timeout. The configuration issue actually isn't a problem because the request timeout is sent with the request so the broker actually knows it now even without a handshake. However the question is, if someone sets a pathologically low request timeout do we need to obey it? and if so won't that mean we can't quota them? I claim the answer is no! I think we should redefine request timeout to mean replication timeout, which is actually what it is today. Even today if you interact with a slow server it may take longer than that timeout (say because the fs write queues up for a long-ass time). I think we need a separate client timeout which should be fairly long and unlikely to be hit (default to 30 secs or something). -Jay On Tue, Mar 10, 2015 at 10:12 AM, Todd Palino tpal...@gmail.com wrote: Thanks, Jay. On the interface, I agree with Aditya (and you, I believe) that we don't need to expose the public API contract at this time, but structuring the internal logic to allow for it later with low cost is a good idea. Glad you explained the thoughts on where to hold requests. While my gut reaction is to not like processing a produce request that is over quota, it makes sense to do it that way if you are going to have your quota action be a delay. On the delay, I see your point on the bootstrap cases. However, one of the places I differ, and part of the reason that I prefer the error, is that I would never allow a producer who is over quota to resend a produce request. A producer should identify itself at the start of it's connection, and at that point if it is over quota, the broker would return an error and close the connection. The same goes for a consumer. I'm a fan, in general, of pushing all error cases and handling down to the client and doing as little special work to accommodate those cases on the broker side as possible. A case to consider here is what does this mean for REST endpoints to Kafka? Are you going to hold the HTTP connection open as well? Is the endpoint going to queue and hold requests? I think the point that we can only delay as long as the producer's timeout is a valid one, especially given that we do not have any means for the broker and client to negotiate settings, whether that is timeouts or message sizes or anything else. There are a lot of things that you have to know when setting up a Kafka client about what your settings should be, when much of that should be provided for in the protocol handshake. It's not as critical in an environment like ours, where we have central configuration for most clients, but we still see issues with it. I think being able to have the client and broker negotiate a minimum timeout allowed would make the delay more palatable. I'm still not sure it's the right solution, and that we're not just going with what's fast and cheap as opposed to what is good (or right). But given the details of where to hold the request, I have less of a concern with the burden on the broker. -Todd On Mon, Mar 9, 2015 at 5:01 PM, Jay Kreps jay.kr...@gmail.com wrote: Hey Todd, Nice points, let me try to respond: Plugins Yeah let me explain what I mean about plugins. The contracts that matter for us are public contracts, i.e. the protocol, the binary format, stuff in zk, and the various plug-in apis we expose. Adding an internal interface later will not be hard--the quota check is going to be done in 2-6 places in the code which would need to be updated, all internal to the broker. The challenge with making things pluggable up front is that the policy is usually fairly trivial to plug in but each policy
[jira] [Updated] (KAFKA-2009) Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker
[ https://issues.apache.org/jira/browse/KAFKA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-2009: - Fix Version/s: 0.8.3 Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker Key: KAFKA-2009 URL: https://issues.apache.org/jira/browse/KAFKA-2009 Project: Kafka Issue Type: Bug Reporter: Jiangjie Qin Assignee: Jiangjie Qin Fix For: 0.8.3 Attachments: KAFKA-2009.patch, KAFKA-2009_2015-03-11_11:26:57.patch This ticket is to fix the mirror maker problem on current trunk which is introduced in KAFKA-1650. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1914) Count TotalProduceRequestRate and TotalFetchRequestRate in BrokerTopicMetrics
[ https://issues.apache.org/jira/browse/KAFKA-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1914: - Fix Version/s: 0.8.3 Count TotalProduceRequestRate and TotalFetchRequestRate in BrokerTopicMetrics - Key: KAFKA-1914 URL: https://issues.apache.org/jira/browse/KAFKA-1914 Project: Kafka Issue Type: Sub-task Components: core Reporter: Aditya A Auradkar Assignee: Aditya Auradkar Fix For: 0.8.3 Attachments: KAFKA-1914.patch, KAFKA-1914.patch, KAFKA-1914_2015-02-17_15:46:27.patch Currently the BrokerTopicMetrics only counts the failedProduceRequestRate and the failedFetchRequestRate. We should add 2 metrics to count the overall produce/fetch request rates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1865) Add a flush() call to the new producer API
[ https://issues.apache.org/jira/browse/KAFKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1865: - Fix Version/s: 0.8.3 Add a flush() call to the new producer API -- Key: KAFKA-1865 URL: https://issues.apache.org/jira/browse/KAFKA-1865 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.8.3 Attachments: KAFKA-1865.patch, KAFKA-1865_2015-02-21_15:36:54.patch, KAFKA-1865_2015-02-22_16:26:46.patch, KAFKA-1865_2015-02-23_18:29:16.patch, KAFKA-1865_2015-02-25_17:15:26.patch, KAFKA-1865_2015-02-26_10:37:16.patch The postconditions of this would be that any record enqueued prior to flush() would have completed being sent (either successfully or not). An open question is whether you can continue sending new records while this call is executing (on other threads). We should only do this if it doesn't add inefficiencies for people who don't use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1852: - Fix Version/s: 0.8.3 OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Fix For: 0.8.3 Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch, KAFKA-1852_2015-02-18_13:13:17.patch, KAFKA-1852_2015-02-27_13:50:34.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1910) Refactor KafkaConsumer
[ https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1910: - Fix Version/s: 0.8.3 Refactor KafkaConsumer -- Key: KAFKA-1910 URL: https://issues.apache.org/jira/browse/KAFKA-1910 Project: Kafka Issue Type: Sub-task Components: consumer Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.8.3 Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch KafkaConsumer now contains all the logic on the consumer side, making it a very huge class file, better re-factoring it to have multiple layers on top of KafkaClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1831) Producer does not provide any information about which host the data was sent to
[ https://issues.apache.org/jira/browse/KAFKA-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1831: - Fix Version/s: 0.8.2.0 Producer does not provide any information about which host the data was sent to --- Key: KAFKA-1831 URL: https://issues.apache.org/jira/browse/KAFKA-1831 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8.1.1 Reporter: Mark Payne Assignee: Jun Rao Fix For: 0.8.2.0 For traceability purposes and for troubleshooting, when sending data to Kafka, the Producer should provide information about which host the data was sent to. This works well already in the SimpleConsumer, which provides host() and port() methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31706: Patch for KAFKA-1997
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/#review76189 --- Hit this unit test failure, is this relevant? -- kafka.consumer.ZookeeperConsumerConnectorTest testConsumerRebalanceListener FAILED junit.framework.AssertionFailedError: expected:List((0,group1_consumer1-0), (1,group1_consumer2-0)) but was:ArrayBuffer((1,group1_consumer2-0)) at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:71) at kafka.consumer.ZookeeperConsumerConnectorTest.testConsumerRebalanceListener(ZookeeperConsumerConnectorTest.scala:393) - Guozhang Wang On March 11, 2015, 10:20 p.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/ --- (Updated March 11, 2015, 10:20 p.m.) Review request for kafka. Bugs: KAFKA-1997 https://issues.apache.org/jira/browse/KAFKA-1997 Repository: kafka Description --- Addressed Guozhang's comments. Changed the exit behavior on send failure because close(0) is not ready yet. Will submit followup patch after KAFKA-1660 is checked in. Expanded imports from _ and * to full class path Incorporated Joel's comments. Addressed Joel's comments. Addressed Guozhang's comments. Incorporated Guozhang's comments. Diffs - clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f core/src/main/scala/kafka/consumer/PartitionAssignor.scala e6ff7683a0df4a7d221e949767e57c34703d5aad core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 5487259751ebe19f137948249aa1fd2637d2deb4 core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 7f45a90ba6676290172b7da54c15ee5dc1a42a2e core/src/main/scala/kafka/tools/MirrorMaker.scala bafa379ff57bc46458ea8409406f5046dc9c973e core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 543070f4fd3e96f3183cae9ee2ccbe843409ee58 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 19640cc55b5baa0a26a808d708b7f4caf491c9f0 Diff: https://reviews.apache.org/r/31706/diff/ Testing --- Thanks, Jiangjie Qin
Re: Review Request 31967: Patch for KAFKA-1546
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31967/ --- (Updated March 12, 2015, 1:48 a.m.) Review request for kafka. Bugs: KAFKA-1546 https://issues.apache.org/jira/browse/KAFKA-1546 Repository: kafka Description (updated) --- PATCH for KAFKA-1546 Diffs (updated) - core/src/main/scala/kafka/cluster/Partition.scala c4bf48a801007ebe7497077d2018d6dffe1677d4 core/src/main/scala/kafka/cluster/Replica.scala bd13c20338ce3d73113224440e858a12814e5adb core/src/main/scala/kafka/server/KafkaConfig.scala 48e33626695ad8a28b0018362ac225f11df94973 core/src/main/scala/kafka/server/ReplicaManager.scala c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 92152358c95fa9178d71bd1c079af0a0bd8f1da8 core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala c124c8df5b5079e5ffbd0c4ea359562a66aaf317 core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 Diff: https://reviews.apache.org/r/31967/diff/ Testing --- Thanks, Aditya Auradkar
[jira] [Commented] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357956#comment-14357956 ] Aditya A Auradkar commented on KAFKA-1546: -- Updated reviewboard https://reviews.apache.org/r/31967/diff/ against branch origin/trunk Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1546) Automate replica lag tuning
[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya A Auradkar updated KAFKA-1546: - Attachment: KAFKA-1546_2015-03-11_18:48:09.patch Automate replica lag tuning --- Key: KAFKA-1546 URL: https://issues.apache.org/jira/browse/KAFKA-1546 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 Reporter: Neha Narkhede Assignee: Aditya Auradkar Labels: newbie++ Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch Currently, there is no good way to tune the replica lag configs to automatically account for high and low volume topics on the same cluster. For the low-volume topic it will take a very long time to detect a lagging replica, and for the high-volume topic it will have false-positives. One approach to making this easier would be to have the configuration be something like replica.lag.max.ms and translate this into a number of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31706: Patch for KAFKA-1997
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31706/ --- (Updated March 12, 2015, 2:10 a.m.) Review request for kafka. Bugs: KAFKA-1997 https://issues.apache.org/jira/browse/KAFKA-1997 Repository: kafka Description (updated) --- Addressed Guozhang's comments. Changed the exit behavior on send failure because close(0) is not ready yet. Will submit followup patch after KAFKA-1660 is checked in. Expanded imports from _ and * to full class path Incorporated Joel's comments. Addressed Joel's comments. Addressed Guozhang's comments. Incorporated Guozhang's comments. Fix a transient bug in ZookeeperConsumerConnectTest Diffs (updated) - clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f core/src/main/scala/kafka/consumer/PartitionAssignor.scala e6ff7683a0df4a7d221e949767e57c34703d5aad core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 5487259751ebe19f137948249aa1fd2637d2deb4 core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 7f45a90ba6676290172b7da54c15ee5dc1a42a2e core/src/main/scala/kafka/tools/MirrorMaker.scala bafa379ff57bc46458ea8409406f5046dc9c973e core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 543070f4fd3e96f3183cae9ee2ccbe843409ee58 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 19640cc55b5baa0a26a808d708b7f4caf491c9f0 Diff: https://reviews.apache.org/r/31706/diff/ Testing --- Thanks, Jiangjie Qin