[jira] [Created] (KAFKA-1623) kafka is sometimes slow to accept connections
Shlomi Hazan created KAFKA-1623: --- Summary: kafka is sometimes slow to accept connections Key: KAFKA-1623 URL: https://issues.apache.org/jira/browse/KAFKA-1623 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Shlomi Hazan from SocketServer.scala:144 the Acceptor can wait up to 500 millis before processing the accumulated FDs. Also, the backlog of the acceptor socket seems not to be defined, which may be problematic if all 500 millis are elapsed before the thread awakes. setting the backlog is doable using the proper ServerSocket Ctor, and maybe better be provisioned via configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] 0.8.2 release branch, unofficial release candidates(s), 0.8.1.2 release
Just made a pass over the unresolved tickets tagged for 0.8.2, I think many of them can be pushed to 0.8.3 / 0.9. On Wed, Sep 3, 2014 at 8:05 PM, Jonathan Weeks jonathanbwe...@gmail.com wrote: +1 on a 0.8.1.2 release as described. I manually applied patches to cobble together a working gradle build for kafka for scala 2.11, but would really appreciate an official release — i.e. 0.8.1.2, as we also have other dependent libraries we use as well (e.g. akka-kafka) that would be much easier to migrate and support if the build was public and official. There were at least several others on the “users” list that expressed interest in scala 2.11 support, who knows how many more “lurkers” are out there. Best Regards, -Jonathan Hey, I wanted to take a quick pulse to see if we are getting closer to a branch for 0.8.2. 1) There still seems to be a lot of open issues https://issues.apache.org/jira/browse/KAFKA/fixforversion/12326167/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel and our 30 day summary is showing issues: 51 created and *34* resolved and not sure how much of that we could really just decide to push off to 0.8.3 or 0.9.0 vs working on 0.8.2 as stable for release. There is already so much goodness on trunk. I appreciate the double commit pain especially as trunk and branch drift (ugh). 2) Also, I wanted to float the idea of after making the 0.8.2 branch that I would do some unofficial release candidates for folks to test prior to release and vote. What I was thinking was I would build, upload and stage like I was preparing artifacts for vote but let the community know to go in and have at it well prior to the vote release. We don't get a lot of community votes during a release but issues after (which is natural because of how things are done). I have seen four Apache projects doing this very successfully not only have they had less iterations of RC votes (sensitive to that myself) but the community kicked back issues they saw by giving them some pre release time to go through their own test and staging environments as the release are coming about. 3) Checking again on should we have a 0.8.1.2 release if folks in the community find important features (this might be best asked on the user list maybe not sure) they don't want/can't wait for which wouldn't be too much pain/dangerous to back port. Two things that spring to the top of my head are 2.11 Scala support and fixing the source jars. Both of these are easy to patch personally I don't mind but want to gauge more from the community on this too. I have heard gripes ad hoc from folks in direct communication but no complains really in the public forum and wanted to open the floor if folks had a need. 4) 0.9 work I feel is being held up some (or at least resourcing it from my perspective). We decided to hold up including SSL (even though we have a path for it). Jay did a nice update recently to the Security wiki which I think we should move forward with. I have some more to add/change/update and want to start getting down to more details and getting specific people working on specific tasks but without knowing what we are doing when it is hard to manage. 5) I just updated https://issues.apache.org/jira/browse/KAFKA-1555 I think it is a really important feature update doesn't have to be in 0.8.2 but we need consensus (no pun intended). It fundamentally allows for data in min two rack requirement which A LOT of data requires for successful save to occur. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / -- -- Guozhang
[jira] [Updated] (KAFKA-1623) kafka is sometimes slow to accept connections
[ https://issues.apache.org/jira/browse/KAFKA-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1623: - Affects Version/s: (was: 0.8.1.1) 0.9.0 kafka is sometimes slow to accept connections - Key: KAFKA-1623 URL: https://issues.apache.org/jira/browse/KAFKA-1623 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.9.0 Reporter: Shlomi Hazan Labels: performance from SocketServer.scala:144 the Acceptor can wait up to 500 millis before processing the accumulated FDs. Also, the backlog of the acceptor socket seems not to be defined, which may be problematic if all 500 millis are elapsed before the thread awakes. setting the backlog is doable using the proper ServerSocket Ctor, and maybe better be provisioned via configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1590) Binarize trace level request logging along with debug level text logging
[ https://issues.apache.org/jira/browse/KAFKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121497#comment-14121497 ] Abhishek Sharma commented on KAFKA-1590: Please assign it to me. I want to give a try on this. Binarize trace level request logging along with debug level text logging Key: KAFKA-1590 URL: https://issues.apache.org/jira/browse/KAFKA-1590 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Labels: newbie Fix For: 0.9.0 With trace level logging, the request handling logs can grow very fast depending on the client behavior (e.g. consumer with 0 maxWait and hence keep sending fetch requests). Previously we have changed it to debug level which only provides a summary of the requests, omitting request details. However this does not work perfectly since summaries are not sufficient for trouble-shooting, and turning on trace level upon issues will be too late. The proposed solution here, is to default to debug level logging with trace level logging printed as binary format at the same time. The generated binary files can then be further compressed / rolled out. When needed, we will then decompress / parse the trace logs into texts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Subscription: outstanding kafka patches
Issue Subscription Filter: outstanding kafka patches (127 issues) The list of outstanding kafka patches Subscriber: kafka-mailing-list Key Summary KAFKA-1620 Make kafka api protocol implementation public https://issues.apache.org/jira/browse/KAFKA-1620 KAFKA-1619 perf dir can be removed https://issues.apache.org/jira/browse/KAFKA-1619 KAFKA-1616 Purgatory Size and Num.Delayed.Request metrics are incorrect https://issues.apache.org/jira/browse/KAFKA-1616 KAFKA-1614 Partition log directory name and segments information exposed via JMX https://issues.apache.org/jira/browse/KAFKA-1614 KAFKA-1611 Improve system test configuration https://issues.apache.org/jira/browse/KAFKA-1611 KAFKA-1610 Local modifications to collections generated from mapValues will be lost https://issues.apache.org/jira/browse/KAFKA-1610 KAFKA-1604 System Test for Transaction Management https://issues.apache.org/jira/browse/KAFKA-1604 KAFKA-1601 ConsoleConsumer/SimpleConsumerPerformance should be transaction-aware https://issues.apache.org/jira/browse/KAFKA-1601 KAFKA-1600 Controller failover not working correctly. https://issues.apache.org/jira/browse/KAFKA-1600 KAFKA-1597 New metrics: ResponseQueueSize and BeingSentResponses https://issues.apache.org/jira/browse/KAFKA-1597 KAFKA-1596 Exception in KafkaScheduler while shutting down https://issues.apache.org/jira/browse/KAFKA-1596 KAFKA-1591 Clean-up Unnecessary stack trace in error/warn logs https://issues.apache.org/jira/browse/KAFKA-1591 KAFKA-1586 support sticky partitioning in the new producer https://issues.apache.org/jira/browse/KAFKA-1586 KAFKA-1585 Client: Infinite conflict in /consumers/ https://issues.apache.org/jira/browse/KAFKA-1585 KAFKA-1583 Kafka API Refactoring https://issues.apache.org/jira/browse/KAFKA-1583 KAFKA-1569 Tool for performance and correctness of transactions end-to-end https://issues.apache.org/jira/browse/KAFKA-1569 KAFKA-1561 Data Loss for Incremented Replica Factor and Leader Election https://issues.apache.org/jira/browse/KAFKA-1561 KAFKA-1543 Changing replication factor https://issues.apache.org/jira/browse/KAFKA-1543 KAFKA-1541 Add transactional request definitions to clients package https://issues.apache.org/jira/browse/KAFKA-1541 KAFKA-1528 Normalize all the line endings https://issues.apache.org/jira/browse/KAFKA-1528 KAFKA-1527 SimpleConsumer should be transaction-aware https://issues.apache.org/jira/browse/KAFKA-1527 KAFKA-1526 Producer performance tool should have an option to enable transactions https://issues.apache.org/jira/browse/KAFKA-1526 KAFKA-1525 DumpLogSegments should print transaction IDs https://issues.apache.org/jira/browse/KAFKA-1525 KAFKA-1524 Implement transactional producer https://issues.apache.org/jira/browse/KAFKA-1524 KAFKA-1523 Implement transaction manager module https://issues.apache.org/jira/browse/KAFKA-1523 KAFKA-1517 Messages is a required argument to Producer Performance Test https://issues.apache.org/jira/browse/KAFKA-1517 KAFKA-1510 Force offset commits when migrating consumer offsets from zookeeper to kafka https://issues.apache.org/jira/browse/KAFKA-1510 KAFKA-1509 Restart of destination broker after unreplicated partition move leaves partitions without leader https://issues.apache.org/jira/browse/KAFKA-1509 KAFKA-1507 Using GetOffsetShell against non-existent topic creates the topic unintentionally https://issues.apache.org/jira/browse/KAFKA-1507 KAFKA-1500 adding new consumer requests using the new protocol https://issues.apache.org/jira/browse/KAFKA-1500 KAFKA-1499 Broker-side compression configuration https://issues.apache.org/jira/browse/KAFKA-1499 KAFKA-1496 Using batch message in sync producer only sends the first message if we use a Scala Stream as the argument https://issues.apache.org/jira/browse/KAFKA-1496 KAFKA-1481 Stop using dashes AND underscores as separators in MBean names https://issues.apache.org/jira/browse/KAFKA-1481 KAFKA-1477 add authentication layer and initial JKS x509 implementation for brokers, producers and consumer for network communication https://issues.apache.org/jira/browse/KAFKA-1477 KAFKA-1476 Get a list of consumer groups https://issues.apache.org/jira/browse/KAFKA-1476 KAFKA-1475 Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know https://issues.apache.org/jira/browse/KAFKA-1475 KAFKA-1471 Add Producer Unit Tests for LZ4 and LZ4HC compression https://issues.apache.org/jira/browse/KAFKA-1471 KAFKA-1468 Improve perf tests
[jira] [Commented] (KAFKA-328) Write unit test for kafka server startup and shutdown API
[ https://issues.apache.org/jira/browse/KAFKA-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121932#comment-14121932 ] BalajiSeshadri commented on KAFKA-328: -- [~nehanarkhede] can u please suggest on my previous comment. Write unit test for kafka server startup and shutdown API -- Key: KAFKA-328 URL: https://issues.apache.org/jira/browse/KAFKA-328 Project: Kafka Issue Type: Bug Reporter: Neha Narkhede Assignee: BalajiSeshadri Labels: newbie Background discussion in KAFKA-320 People often try to embed KafkaServer in an application that ends up calling startup() and shutdown() repeatedly and sometimes in odd ways. To ensure this works correctly we have to be very careful about cleaning up resources. This is a good practice for making unit tests reliable anyway. A good first step would be to add some unit tests on startup and shutdown to cover various cases: 1. A Kafka server can startup if it is not already starting up, if it is not currently being shutdown, or if it hasn't been already started 2. A Kafka server can shutdown if it is not already shutting down, if it is not currently starting up, or if it hasn't been already shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker
[ https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BalajiSeshadri reassigned KAFKA-1618: - Assignee: BalajiSeshadri (was: Gwen Shapira) Exception thrown when running console producer with no port number for the broker - Key: KAFKA-1618 URL: https://issues.apache.org/jira/browse/KAFKA-1618 Project: Kafka Issue Type: Improvement Reporter: Gwen Shapira Assignee: BalajiSeshadri When running console producer with just localhost as the broker list, I get ArrayIndexOutOfBounds exception. I expect either a clearer error about arguments or for the producer to guess a default port. [root@shapira-1 bin]# ./kafka-console-producer.sh --topic rufus1 --broker-list localhost java.lang.ArrayIndexOutOfBoundsException: 1 at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102) at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97) at kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32) at kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41) at kafka.producer.Producer.init(Producer.scala:59) at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158) at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1616) Purgatory Size and Num.Delayed.Request metrics are incorrect
[ https://issues.apache.org/jira/browse/KAFKA-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121935#comment-14121935 ] Guozhang Wang commented on KAFKA-1616: -- Updated reviewboard https://reviews.apache.org/r/25155/diff/ against branch origin/trunk Purgatory Size and Num.Delayed.Request metrics are incorrect Key: KAFKA-1616 URL: https://issues.apache.org/jira/browse/KAFKA-1616 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1616.patch, KAFKA-1616_2014-08-28_10:12:17.patch, KAFKA-1616_2014-09-01_14:41:56.patch, KAFKA-1616_2014-09-02_12:58:07.patch, KAFKA-1616_2014-09-02_13:23:13.patch, KAFKA-1616_2014-09-03_12:53:09.patch, KAFKA-1616_2014-09-04_13:26:02.patch The request purgatory used two atomic integers watched and unsatisfied to record the purgatory size ( = watched + unsatisfied) and number of delayed requests ( = unsatisfied). But due to some race conditions these two atomic integers are not updated correctly, result in incorrect metrics. Proposed solution: to have a cleaner semantics, we can define the purgatory size to be just the number of elements in the watched lists, and the number of delayed requests to be just the length of the expiry queue. And instead of using two atomic integeres we just compute the size of the lists / queue on the fly each time the metrics are pulled. This may use some more CPU cycles for these two metrics but should be minor, and the correctness is guaranteed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25155: Fix KAFKA-1616
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25155/ --- (Updated Sept. 4, 2014, 8:26 p.m.) Review request for kafka. Bugs: KAFKA-1616 https://issues.apache.org/jira/browse/KAFKA-1616 Repository: kafka Description (updated) --- Purgatory size to be the sum of watched list sizes; delayed request to be the expiry queue length; remove atomic integers for metrics; add a unit test for watched list sizes and enqueued requests Diffs (updated) - core/src/main/scala/kafka/server/RequestPurgatory.scala ce06d2c381348deef8559374869fcaed923da1d1 core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 168712de241125982d556c188c76514fceb93779 Diff: https://reviews.apache.org/r/25155/diff/ Testing --- Thanks, Guozhang Wang
[jira] [Updated] (KAFKA-1616) Purgatory Size and Num.Delayed.Request metrics are incorrect
[ https://issues.apache.org/jira/browse/KAFKA-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1616: - Attachment: KAFKA-1616_2014-09-04_13:26:02.patch Purgatory Size and Num.Delayed.Request metrics are incorrect Key: KAFKA-1616 URL: https://issues.apache.org/jira/browse/KAFKA-1616 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1616.patch, KAFKA-1616_2014-08-28_10:12:17.patch, KAFKA-1616_2014-09-01_14:41:56.patch, KAFKA-1616_2014-09-02_12:58:07.patch, KAFKA-1616_2014-09-02_13:23:13.patch, KAFKA-1616_2014-09-03_12:53:09.patch, KAFKA-1616_2014-09-04_13:26:02.patch The request purgatory used two atomic integers watched and unsatisfied to record the purgatory size ( = watched + unsatisfied) and number of delayed requests ( = unsatisfied). But due to some race conditions these two atomic integers are not updated correctly, result in incorrect metrics. Proposed solution: to have a cleaner semantics, we can define the purgatory size to be just the number of elements in the watched lists, and the number of delayed requests to be just the length of the expiry queue. And instead of using two atomic integeres we just compute the size of the lists / queue on the fly each time the metrics are pulled. This may use some more CPU cycles for these two metrics but should be minor, and the correctness is guaranteed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1533) transient unit test failure in ProducerFailureHandlingTest
[ https://issues.apache.org/jira/browse/KAFKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122008#comment-14122008 ] Guozhang Wang commented on KAFKA-1533: -- Hi [~junrao] do you have time to fix this minor thing before 0.8.2 release? transient unit test failure in ProducerFailureHandlingTest -- Key: KAFKA-1533 URL: https://issues.apache.org/jira/browse/KAFKA-1533 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Guozhang Wang Fix For: 0.8.2 Attachments: KAFKA-1533.patch, KAFKA-1533.patch, KAFKA-1533.patch, KAFKA-1533_2014-07-21_15:45:58.patch, kafka.threads, stack.out Occasionally, saw the test hang on tear down. The following is the stack trace. Test worker prio=5 tid=7f9246956000 nid=0x10e078000 in Object.wait() [10e075000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f4e69578 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1344) - locked 7f4e69578 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:732) at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:91) at org.I0Itec.zkclient.ZkClient$8.call(ZkClient.java:720) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:716) at kafka.utils.ZkUtils$.deletePath(ZkUtils.scala:416) at kafka.utils.ZkUtils$.deregisterBrokerInZk(ZkUtils.scala:184) at kafka.server.KafkaHealthcheck.shutdown(KafkaHealthcheck.scala:50) at kafka.server.KafkaServer$$anonfun$shutdown$2.apply$mcV$sp(KafkaServer.scala:243) at kafka.utils.Utils$.swallow(Utils.scala:172) at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) at kafka.utils.Utils$.swallowWarn(Utils.scala:45) at kafka.utils.Logging$class.swallow(Logging.scala:94) at kafka.utils.Utils$.swallow(Utils.scala:45) at kafka.server.KafkaServer.shutdown(KafkaServer.scala:243) at kafka.api.ProducerFailureHandlingTest.tearDown(ProducerFailureHandlingTest.scala:90) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1352) Reduce logging on the server
[ https://issues.apache.org/jira/browse/KAFKA-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122009#comment-14122009 ] Guozhang Wang commented on KAFKA-1352: -- We will be fixing this issue all together with https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Error+Handling+and+Logging in 0.9. Reduce logging on the server Key: KAFKA-1352 URL: https://issues.apache.org/jira/browse/KAFKA-1352 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.8.1 Reporter: Neha Narkhede Assignee: Ivan Lyutov Labels: newbie, usability Fix For: 0.8.2 Attachments: KAFKA-1352.patch, KAFKA-1352.patch, KAFKA-1352_2014-04-04_21:20:31.patch We have excessive logging in the server, making the logs unreadable and also affecting the performance of the server in practice. We need to clean the logs to address these issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1352) Reduce logging on the server
[ https://issues.apache.org/jira/browse/KAFKA-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1352: - Fix Version/s: (was: 0.8.2) 0.9.0 Reduce logging on the server Key: KAFKA-1352 URL: https://issues.apache.org/jira/browse/KAFKA-1352 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.8.1 Reporter: Neha Narkhede Assignee: Ivan Lyutov Labels: newbie, usability Fix For: 0.9.0 Attachments: KAFKA-1352.patch, KAFKA-1352.patch, KAFKA-1352_2014-04-04_21:20:31.patch We have excessive logging in the server, making the logs unreadable and also affecting the performance of the server in practice. We need to clean the logs to address these issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1377) transient unit test failure in LogOffsetTest
[ https://issues.apache.org/jira/browse/KAFKA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122012#comment-14122012 ] Guozhang Wang commented on KAFKA-1377: -- Pushing to 0.9 as for now, [~omkreddy] is it still a consistently reproducible issue on your side? transient unit test failure in LogOffsetTest Key: KAFKA-1377 URL: https://issues.apache.org/jira/browse/KAFKA-1377 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Jun Rao Fix For: 0.8.2 Attachments: KAFKA-1377.patch, KAFKA-1377_2014-04-11_17:42:13.patch, KAFKA-1377_2014-04-11_18:14:45.patch Saw the following transient unit test failure. kafka.server.LogOffsetTest testGetOffsetsBeforeEarliestTime FAILED junit.framework.AssertionFailedError: expected:List(0) but was:Vector() at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:71) at kafka.server.LogOffsetTest.testGetOffsetsBeforeEarliestTime(LogOffsetTest.scala:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1377) transient unit test failure in LogOffsetTest
[ https://issues.apache.org/jira/browse/KAFKA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1377: - Fix Version/s: (was: 0.8.2) 0.9.0 transient unit test failure in LogOffsetTest Key: KAFKA-1377 URL: https://issues.apache.org/jira/browse/KAFKA-1377 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Jun Rao Labels: newbie Fix For: 0.9.0 Attachments: KAFKA-1377.patch, KAFKA-1377_2014-04-11_17:42:13.patch, KAFKA-1377_2014-04-11_18:14:45.patch Saw the following transient unit test failure. kafka.server.LogOffsetTest testGetOffsetsBeforeEarliestTime FAILED junit.framework.AssertionFailedError: expected:List(0) but was:Vector() at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:71) at kafka.server.LogOffsetTest.testGetOffsetsBeforeEarliestTime(LogOffsetTest.scala:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1377) transient unit test failure in LogOffsetTest
[ https://issues.apache.org/jira/browse/KAFKA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1377: - Labels: newbie (was: ) transient unit test failure in LogOffsetTest Key: KAFKA-1377 URL: https://issues.apache.org/jira/browse/KAFKA-1377 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Jun Rao Labels: newbie Fix For: 0.9.0 Attachments: KAFKA-1377.patch, KAFKA-1377_2014-04-11_17:42:13.patch, KAFKA-1377_2014-04-11_18:14:45.patch Saw the following transient unit test failure. kafka.server.LogOffsetTest testGetOffsetsBeforeEarliestTime FAILED junit.framework.AssertionFailedError: expected:List(0) but was:Vector() at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:71) at kafka.server.LogOffsetTest.testGetOffsetsBeforeEarliestTime(LogOffsetTest.scala:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker
[ https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BalajiSeshadri updated KAFKA-1618: -- Comment: was deleted (was: Please review patch that defaults to 9091 when no port is specified.) Exception thrown when running console producer with no port number for the broker - Key: KAFKA-1618 URL: https://issues.apache.org/jira/browse/KAFKA-1618 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1.1 Reporter: Gwen Shapira Assignee: BalajiSeshadri Labels: newbie Fix For: 0.8.2 When running console producer with just localhost as the broker list, I get ArrayIndexOutOfBounds exception. I expect either a clearer error about arguments or for the producer to guess a default port. [root@shapira-1 bin]# ./kafka-console-producer.sh --topic rufus1 --broker-list localhost java.lang.ArrayIndexOutOfBoundsException: 1 at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102) at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97) at kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32) at kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41) at kafka.producer.Producer.init(Producer.scala:59) at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158) at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker
[ https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BalajiSeshadri updated KAFKA-1618: -- Status: Open (was: Patch Available) Exception thrown when running console producer with no port number for the broker - Key: KAFKA-1618 URL: https://issues.apache.org/jira/browse/KAFKA-1618 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1.1 Reporter: Gwen Shapira Assignee: BalajiSeshadri Labels: newbie Fix For: 0.8.2 When running console producer with just localhost as the broker list, I get ArrayIndexOutOfBounds exception. I expect either a clearer error about arguments or for the producer to guess a default port. [root@shapira-1 bin]# ./kafka-console-producer.sh --topic rufus1 --broker-list localhost java.lang.ArrayIndexOutOfBoundsException: 1 at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102) at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97) at kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32) at kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41) at kafka.producer.Producer.init(Producer.scala:59) at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158) at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker
[ https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BalajiSeshadri updated KAFKA-1618: -- Fix Version/s: 0.8.2 Reviewer: Joe Stein Labels: newbie (was: ) Affects Version/s: 0.8.1.1 Status: Patch Available (was: Open) Please review patch that defaults to 9091 when no port is specified. Exception thrown when running console producer with no port number for the broker - Key: KAFKA-1618 URL: https://issues.apache.org/jira/browse/KAFKA-1618 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1.1 Reporter: Gwen Shapira Assignee: BalajiSeshadri Labels: newbie Fix For: 0.8.2 When running console producer with just localhost as the broker list, I get ArrayIndexOutOfBounds exception. I expect either a clearer error about arguments or for the producer to guess a default port. [root@shapira-1 bin]# ./kafka-console-producer.sh --topic rufus1 --broker-list localhost java.lang.ArrayIndexOutOfBoundsException: 1 at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102) at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97) at kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32) at kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41) at kafka.producer.Producer.init(Producer.scala:59) at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158) at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker
[ https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BalajiSeshadri updated KAFKA-1618: -- Reviewer: Joe Stein (was: Joe Stein) Status: Patch Available (was: Open) Exception thrown when running console producer with no port number for the broker - Key: KAFKA-1618 URL: https://issues.apache.org/jira/browse/KAFKA-1618 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1.1 Reporter: Gwen Shapira Assignee: BalajiSeshadri Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1618.patch When running console producer with just localhost as the broker list, I get ArrayIndexOutOfBounds exception. I expect either a clearer error about arguments or for the producer to guess a default port. [root@shapira-1 bin]# ./kafka-console-producer.sh --topic rufus1 --broker-list localhost java.lang.ArrayIndexOutOfBoundsException: 1 at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102) at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97) at kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32) at kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41) at kafka.producer.Producer.init(Producer.scala:59) at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158) at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker
[ https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BalajiSeshadri updated KAFKA-1618: -- Attachment: KAFKA-1618.patch Exception thrown when running console producer with no port number for the broker - Key: KAFKA-1618 URL: https://issues.apache.org/jira/browse/KAFKA-1618 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1.1 Reporter: Gwen Shapira Assignee: BalajiSeshadri Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1618.patch When running console producer with just localhost as the broker list, I get ArrayIndexOutOfBounds exception. I expect either a clearer error about arguments or for the producer to guess a default port. [root@shapira-1 bin]# ./kafka-console-producer.sh --topic rufus1 --broker-list localhost java.lang.ArrayIndexOutOfBoundsException: 1 at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102) at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97) at kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32) at kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41) at kafka.producer.Producer.init(Producer.scala:59) at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158) at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122018#comment-14122018 ] Guozhang Wang commented on KAFKA-686: - This is a better-to-have in 0.8.2, if [~phargett] and [~viktortnk] do not have time now we can push to 0.9. 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper -- Key: KAFKA-686 URL: https://issues.apache.org/jira/browse/KAFKA-686 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Priority: Blocker Labels: newbie, patch Fix For: 0.8.2 Attachments: KAFAK-686-null-pointer-fix.patch, KAFKA-686-null-pointer-fix-2.patch People will not know that the zookeeper paths are not compatible. When you try to start the 0.8 broker pointed at a 0.7 zookeeper you get a NullPointerException. We should detect this and give a more sane error. Error: kafka.common.KafkaException: Can't parse json string: null at kafka.utils.Json$.liftedTree1$1(Json.scala:20) at kafka.utils.Json$.parseFull(Json.scala:16) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43) at kafka.controller.KafkaController.startup(KafkaController.scala:381) at kafka.server.KafkaServer.startup(KafkaServer.scala:90) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) Caused by: java.lang.NullPointerException at scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52) at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71) at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85) at kafka.utils.Json$.liftedTree1$1(Json.scala:17) ... 16 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker
[ https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BalajiSeshadri updated KAFKA-1618: -- Resolution: Fixed Status: Resolved (was: Patch Available) Exception thrown when running console producer with no port number for the broker - Key: KAFKA-1618 URL: https://issues.apache.org/jira/browse/KAFKA-1618 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1.1 Reporter: Gwen Shapira Assignee: BalajiSeshadri Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1618.patch When running console producer with just localhost as the broker list, I get ArrayIndexOutOfBounds exception. I expect either a clearer error about arguments or for the producer to guess a default port. [root@shapira-1 bin]# ./kafka-console-producer.sh --topic rufus1 --broker-list localhost java.lang.ArrayIndexOutOfBoundsException: 1 at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102) at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97) at kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32) at kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41) at kafka.producer.Producer.init(Producer.scala:59) at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158) at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1190) create a draw performance graph script
[ https://issues.apache.org/jira/browse/KAFKA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122020#comment-14122020 ] Guozhang Wang commented on KAFKA-1190: -- Moving out of 0.8.2 (again...) create a draw performance graph script -- Key: KAFKA-1190 URL: https://issues.apache.org/jira/browse/KAFKA-1190 Project: Kafka Issue Type: Improvement Reporter: Joe Stein Fix For: 0.9.0 Attachments: KAFKA-1190.patch This will be an R script to draw relevant graphs given a bunch of csv files from the above tools. The output of this script will be a bunch of png files that can be combined with some html to act as a perf report. Here are the graphs that would be good to see: * Latency histogram for producer * MB/sec and messages/sec produced * MB/sec and messages/sec consumed * Flush time * Errors (should not be any) * Consumer cache hit ratio (both the bytes and count, specifically 1 #physical_reads / #requests and 1 - physical_bytes_read / bytes_read) * Write merge ratio (num_physical_writes/num_produce_requests and avg_request_size/avg_physical_write_size) CPU, network, io, etc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1190) create a draw performance graph script
[ https://issues.apache.org/jira/browse/KAFKA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1190: - Fix Version/s: (was: 0.8.2) 0.9.0 create a draw performance graph script -- Key: KAFKA-1190 URL: https://issues.apache.org/jira/browse/KAFKA-1190 Project: Kafka Issue Type: Improvement Reporter: Joe Stein Fix For: 0.9.0 Attachments: KAFKA-1190.patch This will be an R script to draw relevant graphs given a bunch of csv files from the above tools. The output of this script will be a bunch of png files that can be combined with some html to act as a perf report. Here are the graphs that would be good to see: * Latency histogram for producer * MB/sec and messages/sec produced * MB/sec and messages/sec consumed * Flush time * Errors (should not be any) * Consumer cache hit ratio (both the bytes and count, specifically 1 #physical_reads / #requests and 1 - physical_bytes_read / bytes_read) * Write merge ratio (num_physical_writes/num_produce_requests and avg_request_size/avg_physical_write_size) CPU, network, io, etc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1420) Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests
[ https://issues.apache.org/jira/browse/KAFKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122022#comment-14122022 ] Guozhang Wang commented on KAFKA-1420: -- Can some committer take another look and commit this one? It looks good to me now. Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests -- Key: KAFKA-1420 URL: https://issues.apache.org/jira/browse/KAFKA-1420 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Jun Rao Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1420.patch, KAFKA-1420_2014-07-30_11:18:26.patch, KAFKA-1420_2014-07-30_11:24:55.patch, KAFKA-1420_2014-08-02_11:04:15.patch, KAFKA-1420_2014-08-10_14:12:05.patch, KAFKA-1420_2014-08-10_23:03:46.patch This is a follow-up JIRA from KAFKA-1389. There are a bunch of places in the unit tests where we misuse AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK to create topics, where TestUtils.createTopic needs to be used instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1300) Added WaitForReplaction admin tool.
[ https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1300: - Fix Version/s: (was: 0.8.2) 0.9.0 Added WaitForReplaction admin tool. --- Key: KAFKA-1300 URL: https://issues.apache.org/jira/browse/KAFKA-1300 Project: Kafka Issue Type: New Feature Components: tools Affects Versions: 0.8.0 Environment: Ubuntu 12.04 Reporter: Brenden Matthews Labels: patch Fix For: 0.9.0 Attachments: 0001-Added-WaitForReplaction-admin-tool.patch I have created a tool similar to the broker shutdown tool for doing rolling restarts of Kafka clusters. The tool watches the max replica lag of the specified broker, and waits until the lag drops to 0 before exiting. To do a rolling restart, here's the process we use: for (broker - brokers) { run shutdown tool for broker terminate broker start new broker run wait for replication tool on new broker } Here's an example command line use: ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1300) Added WaitForReplaction admin tool.
[ https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122023#comment-14122023 ] Guozhang Wang commented on KAFKA-1300: -- Moving out of 0.8.2 as for now. Added WaitForReplaction admin tool. --- Key: KAFKA-1300 URL: https://issues.apache.org/jira/browse/KAFKA-1300 Project: Kafka Issue Type: New Feature Components: tools Affects Versions: 0.8.0 Environment: Ubuntu 12.04 Reporter: Brenden Matthews Labels: patch Fix For: 0.9.0 Attachments: 0001-Added-WaitForReplaction-admin-tool.patch I have created a tool similar to the broker shutdown tool for doing rolling restarts of Kafka clusters. The tool watches the max replica lag of the specified broker, and waits until the lag drops to 0 before exiting. To do a rolling restart, here's the process we use: for (broker - brokers) { run shutdown tool for broker terminate broker start new broker run wait for replication tool on new broker } Here's an example command line use: ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option
[ https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122028#comment-14122028 ] Guozhang Wang commented on KAFKA-1215: -- Moving out of 0.8.2 for now.. Rack-Aware replica assignment option Key: KAFKA-1215 URL: https://issues.apache.org/jira/browse/KAFKA-1215 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0 Reporter: Joris Van Remoortere Assignee: Jun Rao Fix For: 0.9.0 Attachments: rack_aware_replica_assignment_v1.patch, rack_aware_replica_assignment_v2.patch Adding a rack-id to kafka config. This rack-id can be used during replica assignment by using the max-rack-replication argument in the admin scripts (create topic, etc.). By default the original replication assignment algorithm is used because max-rack-replication defaults to -1. max-rack-replication -1 is not honored if you are doing manual replica assignment (preffered). If this looks good I can add some test cases specific to the rack-aware assignment. I can also port this to trunk. We are currently running 0.8.0 in production and need this, so i wrote the patch against that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names
[ https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122025#comment-14122025 ] Guozhang Wang commented on KAFKA-1481: -- [~junrao] do you think we can check in this ticket in time for 0.8.2? Stop using dashes AND underscores as separators in MBean names -- Key: KAFKA-1481 URL: https://issues.apache.org/jira/browse/KAFKA-1481 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Otis Gospodnetic Labels: patch Fix For: 0.8.2 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch MBeans should not use dashes or underscores as separators because these characters are allowed in hostnames, topics, group and consumer IDs, etc., and these are embedded in MBeans names making it impossible to parse out individual bits from MBeans. Perhaps a pipe character should be used to avoid the conflict. This looks like a major blocker because it means nobody can write Kafka 0.8.x monitoring tools unless they are doing it for themselves AND do not use dashes AND do not use underscores. See: http://search-hadoop.com/m/4TaT4lonIW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1215) Rack-Aware replica assignment option
[ https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1215: - Fix Version/s: (was: 0.8.2) 0.9.0 Rack-Aware replica assignment option Key: KAFKA-1215 URL: https://issues.apache.org/jira/browse/KAFKA-1215 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0 Reporter: Joris Van Remoortere Assignee: Jun Rao Fix For: 0.9.0 Attachments: rack_aware_replica_assignment_v1.patch, rack_aware_replica_assignment_v2.patch Adding a rack-id to kafka config. This rack-id can be used during replica assignment by using the max-rack-replication argument in the admin scripts (create topic, etc.). By default the original replication assignment algorithm is used because max-rack-replication defaults to -1. max-rack-replication -1 is not honored if you are doing manual replica assignment (preffered). If this looks good I can add some test cases specific to the rack-aware assignment. I can also port this to trunk. We are currently running 0.8.0 in production and need this, so i wrote the patch against that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-404) When using chroot path, create chroot on startup if it doesn't exist
[ https://issues.apache.org/jira/browse/KAFKA-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122030#comment-14122030 ] Guozhang Wang commented on KAFKA-404: - [~marek.dolgos] do you have some time to finish this before 0.8.2 release? When using chroot path, create chroot on startup if it doesn't exist Key: KAFKA-404 URL: https://issues.apache.org/jira/browse/KAFKA-404 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1 Environment: CentOS 5.5, Linux 2.6.18-194.32.1.el5 x86_64 GNU/Linux Reporter: Jonathan Creasy Labels: newbie, patch Fix For: 0.8.2 Attachments: KAFKA-404-0.7.1.patch, KAFKA-404-0.8.patch, KAFKA-404-auto-create-zookeeper-chroot-on-start-up-i.patch, KAFKA-404-auto-create-zookeeper-chroot-on-start-up-v2.patch, KAFKA-404-auto-create-zookeeper-chroot-on-start-up-v3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled
[ https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122033#comment-14122033 ] Guozhang Wang commented on KAFKA-1305: -- Moving out of 0.8.2 for now. Controller can hang on controlled shutdown with auto leader balance enabled --- Key: KAFKA-1305 URL: https://issues.apache.org/jira/browse/KAFKA-1305 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Assignee: Neha Narkhede Priority: Blocker Fix For: 0.8.2 This is relatively easy to reproduce especially when doing a rolling bounce. What happened here is as follows: 1. The previous controller was bounced and broker 265 became the new controller. 2. I went on to do a controlled shutdown of broker 265 (the new controller). 3. In the mean time the automatically scheduled preferred replica leader election process started doing its thing and starts sending LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers). (t@113 below). 4. While that's happening, the controlled shutdown process on 265 succeeds and proceeds to deregister itself from ZooKeeper and shuts down the socket server. 5. (ReplicaStateMachine actually removes deregistered brokers from the controller channel manager's list of brokers to send requests to. However, that removal cannot take place (t@18 below) because preferred replica leader election task owns the controller lock.) 6. So the request thread to broker 265 gets into infinite retries. 7. The entire broker shutdown process is blocked on controller shutdown for the same reason (it needs to acquire the controller lock). Relevant portions from the thread-dump: Controller-265-to-broker-265-send-thread - Thread t@113 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143) at kafka.utils.Utils$.swallow(Utils.scala:167) at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) at kafka.utils.Utils$.swallowWarn(Utils.scala:46) at kafka.utils.Logging$class.swallow(Logging.scala:94) at kafka.utils.Utils$.swallow(Utils.scala:46) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) - locked java.lang.Object@6dbf14a7 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) Locked ownable synchronizers: - None ... Thread-4 - Thread t@17 java.lang.Thread.State: WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: kafka-scheduler-0 at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at kafka.utils.Utils$.inLock(Utils.scala:536) at kafka.controller.KafkaController.shutdown(KafkaController.scala:642) at kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:242) at kafka.utils.Utils$.swallow(Utils.scala:167) at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) at kafka.utils.Utils$.swallowWarn(Utils.scala:46) at kafka.utils.Logging$class.swallow(Logging.scala:94) at kafka.utils.Utils$.swallow(Utils.scala:46) at kafka.server.KafkaServer.shutdown(KafkaServer.scala:242) at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:46) at kafka.Kafka$$anon$1.run(Kafka.scala:42) ... kafka-scheduler-0 - Thread t@117 java.lang.Thread.State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1dc407fc at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:306) at kafka.controller.ControllerChannelManager.sendRequest(ControllerChannelManager.scala:57) - locked java.lang.Object@578b748f
[jira] [Updated] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled
[ https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1305: - Fix Version/s: (was: 0.8.2) 0.9.0 Controller can hang on controlled shutdown with auto leader balance enabled --- Key: KAFKA-1305 URL: https://issues.apache.org/jira/browse/KAFKA-1305 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Assignee: Neha Narkhede Priority: Blocker Fix For: 0.9.0 This is relatively easy to reproduce especially when doing a rolling bounce. What happened here is as follows: 1. The previous controller was bounced and broker 265 became the new controller. 2. I went on to do a controlled shutdown of broker 265 (the new controller). 3. In the mean time the automatically scheduled preferred replica leader election process started doing its thing and starts sending LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers). (t@113 below). 4. While that's happening, the controlled shutdown process on 265 succeeds and proceeds to deregister itself from ZooKeeper and shuts down the socket server. 5. (ReplicaStateMachine actually removes deregistered brokers from the controller channel manager's list of brokers to send requests to. However, that removal cannot take place (t@18 below) because preferred replica leader election task owns the controller lock.) 6. So the request thread to broker 265 gets into infinite retries. 7. The entire broker shutdown process is blocked on controller shutdown for the same reason (it needs to acquire the controller lock). Relevant portions from the thread-dump: Controller-265-to-broker-265-send-thread - Thread t@113 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143) at kafka.utils.Utils$.swallow(Utils.scala:167) at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) at kafka.utils.Utils$.swallowWarn(Utils.scala:46) at kafka.utils.Logging$class.swallow(Logging.scala:94) at kafka.utils.Utils$.swallow(Utils.scala:46) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) - locked java.lang.Object@6dbf14a7 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) Locked ownable synchronizers: - None ... Thread-4 - Thread t@17 java.lang.Thread.State: WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: kafka-scheduler-0 at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at kafka.utils.Utils$.inLock(Utils.scala:536) at kafka.controller.KafkaController.shutdown(KafkaController.scala:642) at kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:242) at kafka.utils.Utils$.swallow(Utils.scala:167) at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) at kafka.utils.Utils$.swallowWarn(Utils.scala:46) at kafka.utils.Logging$class.swallow(Logging.scala:94) at kafka.utils.Utils$.swallow(Utils.scala:46) at kafka.server.KafkaServer.shutdown(KafkaServer.scala:242) at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:46) at kafka.Kafka$$anon$1.run(Kafka.scala:42) ... kafka-scheduler-0 - Thread t@117 java.lang.Thread.State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1dc407fc at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:306) at kafka.controller.ControllerChannelManager.sendRequest(ControllerChannelManager.scala:57) - locked java.lang.Object@578b748f at
[jira] [Updated] (KAFKA-1342) Slow controlled shutdowns can result in stale shutdown requests
[ https://issues.apache.org/jira/browse/KAFKA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1342: - Fix Version/s: (was: 0.8.2) 0.9.0 Slow controlled shutdowns can result in stale shutdown requests --- Key: KAFKA-1342 URL: https://issues.apache.org/jira/browse/KAFKA-1342 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Joel Koshy Priority: Blocker Fix For: 0.9.0 I don't think this is a bug introduced in 0.8.1., but triggered by the fact that controlled shutdown seems to have become slower in 0.8.1 (will file a separate ticket to investigate that). When doing a rolling bounce, it is possible for a bounced broker to stop all its replica fetchers since the previous PID's shutdown requests are still being shutdown. - 515 is the controller - Controlled shutdown initiated for 503 - Controller starts controlled shutdown for 503 - The controlled shutdown takes a long time in moving leaders and moving follower replicas on 503 to the offline state. - So 503's read from the shutdown channel times out and a new channel is created. It issues another shutdown request. This request (since it is a new channel) is accepted at the controller's socket server but then waits on the broker shutdown lock held by the previous controlled shutdown which is still in progress. - The above step repeats for the remaining retries (six more requests). - 503 hits SocketTimeout exception on reading the response of the last shutdown request and proceeds to do an unclean shutdown. - The controller's onBrokerFailure call-back fires and moves 503's replicas to offline (not too important in this sequence). - 503 is brought back up. - The controller's onBrokerStartup call-back fires and moves its replicas (and partitions) to online state. 503 starts its replica fetchers. - Unfortunately, the (phantom) shutdown requests are still being handled and the controller sends StopReplica requests to 503. - The first shutdown request finally finishes (after 76 minutes in my case!). - The remaining shutdown requests also execute and do the same thing (sends StopReplica requests for all partitions to 503). - The remaining requests complete quickly because they end up not having to touch zookeeper paths - no leaders left on the broker and no need to shrink ISR in zookeeper since it has already been done by the first shutdown request. - So in the end-state 503 is up, but effectively idle due to the previous PID's shutdown requests. There are some obvious fixes that can be made to controlled shutdown to help address the above issue. E.g., we don't really need to move follower partitions to Offline. We did that as an optimization so the broker falls out of ISR sooner - which is helpful when producers set required.acks to -1. However it adds a lot of latency to controlled shutdown. Also, (more importantly) we should have a mechanism to abort any stale shutdown process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122037#comment-14122037 ] Guozhang Wang commented on KAFKA-1493: -- Could we still have that for 0.8.2? Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Priority: Blocker Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1490) remove gradlew initial setup output from source distribution
[ https://issues.apache.org/jira/browse/KAFKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122041#comment-14122041 ] Guozhang Wang commented on KAFKA-1490: -- Should this be done before 0.8.2? remove gradlew initial setup output from source distribution Key: KAFKA-1490 URL: https://issues.apache.org/jira/browse/KAFKA-1490 Project: Kafka Issue Type: Bug Reporter: Joe Stein Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Our current source releases contains lots of stuff in the gradle folder we do not need -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122042#comment-14122042 ] Viktor Taranenko commented on KAFKA-686: [~guozhang] I'm happy to put some effort there over this weekend. When do you guys want to release 0.8.2? I see there is no 'Exception.failAsValue' exists in Scala 2.8. I can make it compatible if required 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper -- Key: KAFKA-686 URL: https://issues.apache.org/jira/browse/KAFKA-686 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Priority: Blocker Labels: newbie, patch Fix For: 0.8.2 Attachments: KAFAK-686-null-pointer-fix.patch, KAFKA-686-null-pointer-fix-2.patch People will not know that the zookeeper paths are not compatible. When you try to start the 0.8 broker pointed at a 0.7 zookeeper you get a NullPointerException. We should detect this and give a more sane error. Error: kafka.common.KafkaException: Can't parse json string: null at kafka.utils.Json$.liftedTree1$1(Json.scala:20) at kafka.utils.Json$.parseFull(Json.scala:16) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43) at kafka.controller.KafkaController.startup(KafkaController.scala:381) at kafka.server.KafkaServer.startup(KafkaServer.scala:90) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) Caused by: java.lang.NullPointerException at scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52) at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71) at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85) at kafka.utils.Json$.liftedTree1$1(Json.scala:17) ... 16 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-703) A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request
[ https://issues.apache.org/jira/browse/KAFKA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-703. - Resolution: Fixed A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request --- Key: KAFKA-703 URL: https://issues.apache.org/jira/browse/KAFKA-703 Project: Kafka Issue Type: Bug Components: purgatory Affects Versions: 0.8.1 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Priority: Blocker Fix For: 0.8.2 When a producer request is handled, the fetch purgatory is checked to ensure any fetch requests are satisfied. When the produce request is satisfied we do the check again and if the same fetch request was still in the fetch purgatory it would end up double counting the bytes received. Possible Solutions 1. In the delayed produce request case, do the check only after the produce request is satisfied. This could potentially delay the fetch request from being satisfied. 2. Remove dependency of fetch request on produce request and just look at the last logical log offset (which should mostly be cached). This would need the replica.fetch.min.bytes to be number of messages rather than bytes. This also helps KAFKA-671 in that we would no longer need to pass the ProduceRequest object to the producer purgatory and hence not have to consume any memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-703) A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request
[ https://issues.apache.org/jira/browse/KAFKA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122043#comment-14122043 ] Guozhang Wang commented on KAFKA-703: - This problem is resolved in the purgatory / API redesign: KAFKA-1583. Closing now. A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request --- Key: KAFKA-703 URL: https://issues.apache.org/jira/browse/KAFKA-703 Project: Kafka Issue Type: Bug Components: purgatory Affects Versions: 0.8.1 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Priority: Blocker Fix For: 0.8.2 When a producer request is handled, the fetch purgatory is checked to ensure any fetch requests are satisfied. When the produce request is satisfied we do the check again and if the same fetch request was still in the fetch purgatory it would end up double counting the bytes received. Possible Solutions 1. In the delayed produce request case, do the check only after the produce request is satisfied. This could potentially delay the fetch request from being satisfied. 2. Remove dependency of fetch request on produce request and just look at the last logical log offset (which should mostly be cached). This would need the replica.fetch.min.bytes to be number of messages rather than bytes. This also helps KAFKA-671 in that we would no longer need to pass the ProduceRequest object to the producer purgatory and hence not have to consume any memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122049#comment-14122049 ] Joe Stein commented on KAFKA-686: - 0.8.2 drops support for Scala 2.8 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper -- Key: KAFKA-686 URL: https://issues.apache.org/jira/browse/KAFKA-686 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Priority: Blocker Labels: newbie, patch Fix For: 0.8.2 Attachments: KAFAK-686-null-pointer-fix.patch, KAFKA-686-null-pointer-fix-2.patch People will not know that the zookeeper paths are not compatible. When you try to start the 0.8 broker pointed at a 0.7 zookeeper you get a NullPointerException. We should detect this and give a more sane error. Error: kafka.common.KafkaException: Can't parse json string: null at kafka.utils.Json$.liftedTree1$1(Json.scala:20) at kafka.utils.Json$.parseFull(Json.scala:16) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43) at kafka.controller.KafkaController.startup(KafkaController.scala:381) at kafka.server.KafkaServer.startup(KafkaServer.scala:90) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) Caused by: java.lang.NullPointerException at scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52) at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71) at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85) at kafka.utils.Json$.liftedTree1$1(Json.scala:17) ... 16 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1585) Client: Infinite conflict in /consumers/
[ https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-1585. -- Resolution: Fixed Client: Infinite conflict in /consumers/ -- Key: KAFKA-1585 URL: https://issues.apache.org/jira/browse/KAFKA-1585 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.1.1 Reporter: Artur Denysenko Priority: Critical Fix For: 0.8.2 Attachments: kafka_consumer_ephemeral_node_extract.zip Periodically we have kafka consumers cycling in conflict in /consumers/ and I wrote this conflicted ephemeral node. Please see attached log extract. After restarting the process kafka consumers are working perfectly. We are using Zookeeper 3.4.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1585) Client: Infinite conflict in /consumers/
[ https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122048#comment-14122048 ] Guozhang Wang commented on KAFKA-1585: -- I think this issue is already resolved by KAFKA-1451. Please re-open it with 0.9.0 version if you still see it. Client: Infinite conflict in /consumers/ -- Key: KAFKA-1585 URL: https://issues.apache.org/jira/browse/KAFKA-1585 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.1.1 Reporter: Artur Denysenko Priority: Critical Fix For: 0.8.2 Attachments: kafka_consumer_ephemeral_node_extract.zip Periodically we have kafka consumers cycling in conflict in /consumers/ and I wrote this conflicted ephemeral node. Please see attached log extract. After restarting the process kafka consumers are working perfectly. We are using Zookeeper 3.4.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-665) Outgoing responses delayed on a busy Kafka broker
[ https://issues.apache.org/jira/browse/KAFKA-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-665: Fix Version/s: (was: 0.8.2) 0.9.0 Outgoing responses delayed on a busy Kafka broker -- Key: KAFKA-665 URL: https://issues.apache.org/jira/browse/KAFKA-665 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Neha Narkhede Priority: Critical Labels: replication-performance Fix For: 0.9.0 In a long running test, I observed that after a few hours of operation, few requests start timing out, mainly because they spent very long time sitting in the response queue - [2012-12-07 22:05:56,670] TRACE Completed request with correlation id 3965966 and client : TopicMetadataRequest:4009, queueTime:1, localTime:28, remoteTime:0, sendTime:3980 (kafka.network.RequestChannel$) [2012-12-07 22:04:12,046] TRACE Completed request with correlation id 3962561 and client : TopicMetadataRequest:3449, queueTime:0, localTime:29, remoteTime:0, sendTime:3420 (kafka.network.RequestChannel$) [2012-12-07 22:05:56,670] TRACE Completed request with correlation id 3965966 and client : TopicMetadataRequest:4009, queueTime:1, localTime:28, remoteTime:0, sendTime:3980 (kafka.network.RequestChannel$) We might have a problem in the way we process outgoing responses. Basically, if the processor thread blocks on enqueuing requests in the request queue, it doesn't come around to processing its responses which are ready to go out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1313) Support adding replicas to existing topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122050#comment-14122050 ] Guozhang Wang commented on KAFKA-1313: -- Moving to 0.9. Support adding replicas to existing topic partitions Key: KAFKA-1313 URL: https://issues.apache.org/jira/browse/KAFKA-1313 Project: Kafka Issue Type: New Feature Components: tools Affects Versions: 0.8.0 Reporter: Marc Labbe Priority: Critical Fix For: 0.9.0 There is currently no easy way to add replicas to an existing topic partitions. For example, topic create-test has been created with ReplicationFactor=1: Topic:create-test PartitionCount:3ReplicationFactor:1 Configs: Topic: create-test Partition: 0Leader: 1 Replicas: 1 Isr: 1 Topic: create-test Partition: 1Leader: 2 Replicas: 2 Isr: 2 Topic: create-test Partition: 2Leader: 3 Replicas: 3 Isr: 3 I would like to increase the ReplicationFactor=2 (or more) so it shows up like this instead. Topic:create-test PartitionCount:3ReplicationFactor:2 Configs: Topic: create-test Partition: 0Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: create-test Partition: 1Leader: 2 Replicas: 2,3 Isr: 2,3 Topic: create-test Partition: 2Leader: 3 Replicas: 3,1 Isr: 3,1 Use cases for this: - adding brokers and thus increase fault tolerance - fixing human errors for topics created with wrong values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1194) The kafka broker cannot delete the old log files after the configured time
[ https://issues.apache.org/jira/browse/KAFKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1194: - Fix Version/s: (was: 0.8.2) 0.9.0 The kafka broker cannot delete the old log files after the configured time -- Key: KAFKA-1194 URL: https://issues.apache.org/jira/browse/KAFKA-1194 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1 Environment: window Reporter: Tao Qin Assignee: Jay Kreps Labels: features, patch Fix For: 0.9.0 Attachments: KAFKA-1194.patch, kafka-1194-v1.patch Original Estimate: 72h Remaining Estimate: 72h We tested it in windows environment, and set the log.retention.hours to 24 hours. # The minimum age of a log file to be eligible for deletion log.retention.hours=24 After several days, the kafka broker still cannot delete the old log file. And we get the following exceptions: [2013-12-19 01:57:38,528] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) kafka.common.KafkaStorageException: Failed to change the log file suffix from to .deleted for log segment 1516723 at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:249) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:638) at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:629) at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418) at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) at scala.collection.immutable.List.foreach(List.scala:76) at kafka.log.Log.deleteOldSegments(Log.scala:418) at kafka.log.LogManager.kafka$log$LogManager$$cleanupExpiredSegments(LogManager.scala:284) at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:316) at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:314) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573) at scala.collection.IterableLike$class.foreach(IterableLike.scala:73) at scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:615) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742) at kafka.log.LogManager.cleanupLogs(LogManager.scala:314) at kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:143) at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I think this error happens because kafka tries to rename the log file when it is still opened. So we should close the file first before rename. The index file uses a special data structure, the MappedByteBuffer. Javadoc describes it as: A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected. Fortunately, I find a forceUnmap function in kafka code, and perhaps it can be used to free the MappedByteBuffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1194) The kafka broker cannot delete the old log files after the configured time
[ https://issues.apache.org/jira/browse/KAFKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122053#comment-14122053 ] Guozhang Wang commented on KAFKA-1194: -- Moving again to 0.9 for now.. The kafka broker cannot delete the old log files after the configured time -- Key: KAFKA-1194 URL: https://issues.apache.org/jira/browse/KAFKA-1194 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1 Environment: window Reporter: Tao Qin Assignee: Jay Kreps Labels: features, patch Fix For: 0.9.0 Attachments: KAFKA-1194.patch, kafka-1194-v1.patch Original Estimate: 72h Remaining Estimate: 72h We tested it in windows environment, and set the log.retention.hours to 24 hours. # The minimum age of a log file to be eligible for deletion log.retention.hours=24 After several days, the kafka broker still cannot delete the old log file. And we get the following exceptions: [2013-12-19 01:57:38,528] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) kafka.common.KafkaStorageException: Failed to change the log file suffix from to .deleted for log segment 1516723 at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:249) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:638) at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:629) at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418) at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) at scala.collection.immutable.List.foreach(List.scala:76) at kafka.log.Log.deleteOldSegments(Log.scala:418) at kafka.log.LogManager.kafka$log$LogManager$$cleanupExpiredSegments(LogManager.scala:284) at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:316) at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:314) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573) at scala.collection.IterableLike$class.foreach(IterableLike.scala:73) at scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:615) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742) at kafka.log.LogManager.cleanupLogs(LogManager.scala:314) at kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:143) at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I think this error happens because kafka tries to rename the log file when it is still opened. So we should close the file first before rename. The index file uses a special data structure, the MappedByteBuffer. Javadoc describes it as: A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected. Fortunately, I find a forceUnmap function in kafka code, and perhaps it can be used to free the MappedByteBuffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1193) Data loss if broker is killed using kill -9
[ https://issues.apache.org/jira/browse/KAFKA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-1193. -- Resolution: Fixed Data loss if broker is killed using kill -9 --- Key: KAFKA-1193 URL: https://issues.apache.org/jira/browse/KAFKA-1193 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.0, 0.8.1 Environment: Centos 6.3 Reporter: Hanish Bansal Fix For: 0.8.2 We are having kafka cluster of 2 nodes. (Using Kafka 0.8.0 version) Replication Factor: 2 Number of partitions: 2 Actual Behaviour: - Out of two nodes, if leader node goes down then data lost happens. Steps to Reproduce: -- 1. Create a 2 node kafka cluster with replication factor 2 2. Start the Kafka cluster 3. Create a topic lets say test-trunk111 4. Restart any one node. 5. Check topic status using kafka-list-topic tool. topic isr status is: topic: test-trunk111partition: 0leader: 0replicas: 1,0isr: 0,1 topic: test-trunk111partition: 1leader: 0replicas: 0,1isr: 0,1 If there is only one broker node in isr list then wait for some time and again check isr status of topic. There should be 2 brokers in isr list. 6. Start producing the data. 7. Kill leader node (borker-0 in our case) meanwhile of data producing. 8. After all data is produced start consumer. 9. Observe the behaviour. There is data loss. After leader goes down, topic isr status is: topic: test-trunk111partition: 0leader: 1replicas: 1,0isr: 1 topic: test-trunk111partition: 1leader: 1replicas: 0,1isr: 1 We have tried below things to avoid data loss: 1. Configured request.required.acks=-1 in producer configuration because as mentioned in documentation http://kafka.apache.org/documentation.html#producerconfigs, setting this value to -1 provides guarantee that no messages will be lost. 2. Increased the message.send.max.retries from 3 to 10 in producer configuration. 3. Set controlled.shutdown.enable to true in broker configuration. 4. Tested with Kafka-0.8.1 after applying patch KAFKA-1188.patch available on https://issues.apache.org/jira/browse/KAFKA-1188 Nothing work out from above things in case of leader node is killed using kill -9 pid. Expected Behaviour: No data should be lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122054#comment-14122054 ] Viktor Taranenko commented on KAFKA-686: So I can consider 2.9 minimal version while solving this issue? 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper -- Key: KAFKA-686 URL: https://issues.apache.org/jira/browse/KAFKA-686 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Priority: Blocker Labels: newbie, patch Fix For: 0.8.2 Attachments: KAFAK-686-null-pointer-fix.patch, KAFKA-686-null-pointer-fix-2.patch People will not know that the zookeeper paths are not compatible. When you try to start the 0.8 broker pointed at a 0.7 zookeeper you get a NullPointerException. We should detect this and give a more sane error. Error: kafka.common.KafkaException: Can't parse json string: null at kafka.utils.Json$.liftedTree1$1(Json.scala:20) at kafka.utils.Json$.parseFull(Json.scala:16) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43) at kafka.controller.KafkaController.startup(KafkaController.scala:381) at kafka.server.KafkaServer.startup(KafkaServer.scala:90) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) Caused by: java.lang.NullPointerException at scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52) at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71) at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85) at kafka.utils.Json$.liftedTree1$1(Json.scala:17) ... 16 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1313) Support adding replicas to existing topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122056#comment-14122056 ] Joe Stein commented on KAFKA-1313: -- Don't we already have this feature with reassign partition tool? Or am I missing something? If so we can close this Support adding replicas to existing topic partitions Key: KAFKA-1313 URL: https://issues.apache.org/jira/browse/KAFKA-1313 Project: Kafka Issue Type: New Feature Components: tools Affects Versions: 0.8.0 Reporter: Marc Labbe Priority: Critical Fix For: 0.9.0 There is currently no easy way to add replicas to an existing topic partitions. For example, topic create-test has been created with ReplicationFactor=1: Topic:create-test PartitionCount:3ReplicationFactor:1 Configs: Topic: create-test Partition: 0Leader: 1 Replicas: 1 Isr: 1 Topic: create-test Partition: 1Leader: 2 Replicas: 2 Isr: 2 Topic: create-test Partition: 2Leader: 3 Replicas: 3 Isr: 3 I would like to increase the ReplicationFactor=2 (or more) so it shows up like this instead. Topic:create-test PartitionCount:3ReplicationFactor:2 Configs: Topic: create-test Partition: 0Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: create-test Partition: 1Leader: 2 Replicas: 2,3 Isr: 2,3 Topic: create-test Partition: 2Leader: 3 Replicas: 3,1 Isr: 3,1 Use cases for this: - adding brokers and thus increase fault tolerance - fixing human errors for topics created with wrong values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1193) Data loss if broker is killed using kill -9
[ https://issues.apache.org/jira/browse/KAFKA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122055#comment-14122055 ] Guozhang Wang commented on KAFKA-1193: -- [~hanish.bansal.agarwal] I am closing this ticket now. Please feel free to re-open it if you observed this issue again. Data loss if broker is killed using kill -9 --- Key: KAFKA-1193 URL: https://issues.apache.org/jira/browse/KAFKA-1193 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.0, 0.8.1 Environment: Centos 6.3 Reporter: Hanish Bansal Fix For: 0.8.2 We are having kafka cluster of 2 nodes. (Using Kafka 0.8.0 version) Replication Factor: 2 Number of partitions: 2 Actual Behaviour: - Out of two nodes, if leader node goes down then data lost happens. Steps to Reproduce: -- 1. Create a 2 node kafka cluster with replication factor 2 2. Start the Kafka cluster 3. Create a topic lets say test-trunk111 4. Restart any one node. 5. Check topic status using kafka-list-topic tool. topic isr status is: topic: test-trunk111partition: 0leader: 0replicas: 1,0isr: 0,1 topic: test-trunk111partition: 1leader: 0replicas: 0,1isr: 0,1 If there is only one broker node in isr list then wait for some time and again check isr status of topic. There should be 2 brokers in isr list. 6. Start producing the data. 7. Kill leader node (borker-0 in our case) meanwhile of data producing. 8. After all data is produced start consumer. 9. Observe the behaviour. There is data loss. After leader goes down, topic isr status is: topic: test-trunk111partition: 0leader: 1replicas: 1,0isr: 1 topic: test-trunk111partition: 1leader: 1replicas: 0,1isr: 1 We have tried below things to avoid data loss: 1. Configured request.required.acks=-1 in producer configuration because as mentioned in documentation http://kafka.apache.org/documentation.html#producerconfigs, setting this value to -1 provides guarantee that no messages will be lost. 2. Increased the message.send.max.retries from 3 to 10 in producer configuration. 3. Set controlled.shutdown.enable to true in broker configuration. 4. Tested with Kafka-0.8.1 after applying patch KAFKA-1188.patch available on https://issues.apache.org/jira/browse/KAFKA-1188 Nothing work out from above things in case of leader node is killed using kill -9 pid. Expected Behaviour: No data should be lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122058#comment-14122058 ] Joe Stein commented on KAFKA-686: - 2.9.1 yes 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper -- Key: KAFKA-686 URL: https://issues.apache.org/jira/browse/KAFKA-686 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Priority: Blocker Labels: newbie, patch Fix For: 0.8.2 Attachments: KAFAK-686-null-pointer-fix.patch, KAFKA-686-null-pointer-fix-2.patch People will not know that the zookeeper paths are not compatible. When you try to start the 0.8 broker pointed at a 0.7 zookeeper you get a NullPointerException. We should detect this and give a more sane error. Error: kafka.common.KafkaException: Can't parse json string: null at kafka.utils.Json$.liftedTree1$1(Json.scala:20) at kafka.utils.Json$.parseFull(Json.scala:16) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43) at kafka.controller.KafkaController.startup(KafkaController.scala:381) at kafka.server.KafkaServer.startup(KafkaServer.scala:90) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) Caused by: java.lang.NullPointerException at scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52) at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71) at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85) at kafka.utils.Json$.liftedTree1$1(Json.scala:17) ... 16 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1057) Trim whitespaces from user specified configs
[ https://issues.apache.org/jira/browse/KAFKA-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122059#comment-14122059 ] Guozhang Wang commented on KAFKA-1057: -- Moving out of 0.8.2 for now. Trim whitespaces from user specified configs Key: KAFKA-1057 URL: https://issues.apache.org/jira/browse/KAFKA-1057 Project: Kafka Issue Type: Bug Components: config Reporter: Neha Narkhede Assignee: Neha Narkhede Labels: newbie Fix For: 0.9.0 Whitespaces in configs are a common problem that leads to config errors. It will be nice if Kafka can trim the whitespaces from configs automatically -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1057) Trim whitespaces from user specified configs
[ https://issues.apache.org/jira/browse/KAFKA-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1057: - Fix Version/s: (was: 0.8.2) 0.9.0 Trim whitespaces from user specified configs Key: KAFKA-1057 URL: https://issues.apache.org/jira/browse/KAFKA-1057 Project: Kafka Issue Type: Bug Components: config Reporter: Neha Narkhede Assignee: Neha Narkhede Labels: newbie Fix For: 0.9.0 Whitespaces in configs are a common problem that leads to config errors. It will be nice if Kafka can trim the whitespaces from configs automatically -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged
[ https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1108: - Fix Version/s: (was: 0.8.2) 0.9.0 when controlled shutdown attempt fails, the reason is not always logged --- Key: KAFKA-1108 URL: https://issues.apache.org/jira/browse/KAFKA-1108 Project: Kafka Issue Type: Bug Reporter: Jason Rosenberg Fix For: 0.9.0 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and then if there's a failure, it will retry the controlledShutdown. Looking at the code, there are 2 ways a retry could fail, one with an error response from the controller, and this messaging code: {code} info(Remaining partitions to move: %s.format(shutdownResponse.partitionsRemaining.mkString(,))) info(Error code from controller: %d.format(shutdownResponse.errorCode)) {code} Alternatively, there could be an IOException, with this code executed: {code} catch { case ioe: java.io.IOException = channel.disconnect() channel = null // ignore and try again } {code} And then finally, in either case: {code} if (!shutdownSuceeded) { Thread.sleep(config.controlledShutdownRetryBackoffMs) warn(Retrying controlled shutdown after the previous attempt failed...) } {code} It would be nice if the nature of the IOException were logged in either case (I'd be happy with an ioe.getMessage() instead of a full stack trace, as kafka in general tends to be too willing to dump IOException stack traces!). I suspect, in my case, the actual IOException is a socket timeout (as the time between initial Starting controlled shutdown and the first Retrying... message is usually about 35 seconds (the socket timeout + the controlled shutdown retry backoff). So, it would seem that really, the issue in this case is that controlled shutdown is taking too long. It would seem sensible instead to have the controller report back to the server (before the socket timeout) that more time is needed, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1147) Consumer socket timeout should be greater than fetch max wait
[ https://issues.apache.org/jira/browse/KAFKA-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122061#comment-14122061 ] Guozhang Wang commented on KAFKA-1147: -- This is quite an easy fix and could be reviewed / checked in by any committer I think. Consumer socket timeout should be greater than fetch max wait - Key: KAFKA-1147 URL: https://issues.apache.org/jira/browse/KAFKA-1147 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Joel Koshy Assignee: Guozhang Wang Fix For: 0.8.2, 0.9.0 Attachments: KAFKA-1147.patch, KAFKA-1147_2013-12-07_18:22:18.patch, KAFKA-1147_2013-12-09_09:14:24.patch, KAFKA-1147_2013-12-10_14:31:46.patch From the mailing list: The consumer-config documentation states that The actual timeout set will be max.fetch.wait + socket.timeout.ms. - however, that change seems to have been lost in the code a while ago - we should either fix the doc or re-introduce the addition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged
[ https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122062#comment-14122062 ] Guozhang Wang commented on KAFKA-1108: -- Moving to 0.9 for now. when controlled shutdown attempt fails, the reason is not always logged --- Key: KAFKA-1108 URL: https://issues.apache.org/jira/browse/KAFKA-1108 Project: Kafka Issue Type: Bug Reporter: Jason Rosenberg Fix For: 0.9.0 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and then if there's a failure, it will retry the controlledShutdown. Looking at the code, there are 2 ways a retry could fail, one with an error response from the controller, and this messaging code: {code} info(Remaining partitions to move: %s.format(shutdownResponse.partitionsRemaining.mkString(,))) info(Error code from controller: %d.format(shutdownResponse.errorCode)) {code} Alternatively, there could be an IOException, with this code executed: {code} catch { case ioe: java.io.IOException = channel.disconnect() channel = null // ignore and try again } {code} And then finally, in either case: {code} if (!shutdownSuceeded) { Thread.sleep(config.controlledShutdownRetryBackoffMs) warn(Retrying controlled shutdown after the previous attempt failed...) } {code} It would be nice if the nature of the IOException were logged in either case (I'd be happy with an ioe.getMessage() instead of a full stack trace, as kafka in general tends to be too willing to dump IOException stack traces!). I suspect, in my case, the actual IOException is a socket timeout (as the time between initial Starting controlled shutdown and the first Retrying... message is usually about 35 seconds (the socket timeout + the controlled shutdown retry backoff). So, it would seem that really, the issue in this case is that controlled shutdown is taking too long. It would seem sensible instead to have the controller report back to the server (before the socket timeout) that more time is needed, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1056) Evenly Distribute Intervals in OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122066#comment-14122066 ] Guozhang Wang commented on KAFKA-1056: -- Moving to 0.9 for now. Evenly Distribute Intervals in OffsetIndex -- Key: KAFKA-1056 URL: https://issues.apache.org/jira/browse/KAFKA-1056 Project: Kafka Issue Type: Improvement Components: log Reporter: Guozhang Wang Assignee: Guozhang Wang Labels: newbie++ Fix For: 0.8.2 Today a new entry will be created in OffsetIndex for each produce request regardless of the number of messages it contains. It is better to evenly distribute the intervals between index entries for index search efficiency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1054) Elimiate Compilation Warnings for 0.8 Final Release
[ https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122068#comment-14122068 ] Guozhang Wang commented on KAFKA-1054: -- Moving to 0.9. Elimiate Compilation Warnings for 0.8 Final Release --- Key: KAFKA-1054 URL: https://issues.apache.org/jira/browse/KAFKA-1054 Project: Kafka Issue Type: Improvement Reporter: Guozhang Wang Labels: newbie Fix For: 0.9.0 Currently we have a total number of 38 warnings for source code compilation of 0.8. 1) 3 from Unchecked type pattern 2) 6 from Unchecked conversion 3) 29 from Deprecated Hadoop API functions It's better we finish these before the final release of 0.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1054) Elimiate Compilation Warnings for 0.8 Final Release
[ https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1054: - Fix Version/s: (was: 0.8.2) 0.9.0 Elimiate Compilation Warnings for 0.8 Final Release --- Key: KAFKA-1054 URL: https://issues.apache.org/jira/browse/KAFKA-1054 Project: Kafka Issue Type: Improvement Reporter: Guozhang Wang Labels: newbie Fix For: 0.9.0 Currently we have a total number of 38 warnings for source code compilation of 0.8. 1) 3 from Unchecked type pattern 2) 6 from Unchecked conversion 3) 29 from Deprecated Hadoop API functions It's better we finish these before the final release of 0.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1056) Evenly Distribute Intervals in OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1056: - Fix Version/s: (was: 0.8.2) 0.9.0 Evenly Distribute Intervals in OffsetIndex -- Key: KAFKA-1056 URL: https://issues.apache.org/jira/browse/KAFKA-1056 Project: Kafka Issue Type: Improvement Components: log Reporter: Guozhang Wang Assignee: Guozhang Wang Labels: newbie++ Fix For: 0.9.0 Today a new entry will be created in OffsetIndex for each produce request regardless of the number of messages it contains. It is better to evenly distribute the intervals between index entries for index search efficiency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1043) Time-consuming FetchRequest could block other request in the response queue
[ https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122070#comment-14122070 ] Guozhang Wang commented on KAFKA-1043: -- Moving to 0.9. Time-consuming FetchRequest could block other request in the response queue --- Key: KAFKA-1043 URL: https://issues.apache.org/jira/browse/KAFKA-1043 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.8.2 Since in SocketServer the processor who takes any request is also responsible for writing the response for that request, we make each processor owning its own response queue. If a FetchRequest takes irregularly long time to write the channel buffer it would block all other responses in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1043) Time-consuming FetchRequest could block other request in the response queue
[ https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1043: - Fix Version/s: (was: 0.8.2) 0.9.0 Time-consuming FetchRequest could block other request in the response queue --- Key: KAFKA-1043 URL: https://issues.apache.org/jira/browse/KAFKA-1043 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Since in SocketServer the processor who takes any request is also responsible for writing the response for that request, we make each processor owning its own response queue. If a FetchRequest takes irregularly long time to write the channel buffer it would block all other responses in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1354) Failed to load class org.slf4j.impl.StaticLoggerBinder
[ https://issues.apache.org/jira/browse/KAFKA-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122076#comment-14122076 ] Guozhang Wang commented on KAFKA-1354: -- Moving to 0.9 for tracking. Failed to load class org.slf4j.impl.StaticLoggerBinder Key: KAFKA-1354 URL: https://issues.apache.org/jira/browse/KAFKA-1354 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1 Environment: RHEL Reporter: RakeshAcharya Assignee: Jay Kreps Labels: newbie, patch, usability Fix For: 0.9.0 Original Estimate: 672h Remaining Estimate: 672h Getting below errors during Kafka startup SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [2014-03-31 18:55:36,488] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1354) Failed to load class org.slf4j.impl.StaticLoggerBinder
[ https://issues.apache.org/jira/browse/KAFKA-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1354: - Fix Version/s: (was: 0.8.2) 0.9.0 Failed to load class org.slf4j.impl.StaticLoggerBinder Key: KAFKA-1354 URL: https://issues.apache.org/jira/browse/KAFKA-1354 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1 Environment: RHEL Reporter: RakeshAcharya Assignee: Jay Kreps Labels: newbie, patch, usability Fix For: 0.9.0 Original Estimate: 672h Remaining Estimate: 672h Getting below errors during Kafka startup SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [2014-03-31 18:55:36,488] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1348) Producer's Broker Discovery Interface
[ https://issues.apache.org/jira/browse/KAFKA-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122085#comment-14122085 ] Guozhang Wang commented on KAFKA-1348: -- Closing this ticket now. Producer's Broker Discovery Interface - Key: KAFKA-1348 URL: https://issues.apache.org/jira/browse/KAFKA-1348 Project: Kafka Issue Type: Improvement Components: producer Reporter: Jay Bae Assignee: Jun Rao Fix For: 0.8.2 Producer has a property 'broker.list' static configuration. I need a requirement to be able to override this behavior such as Netflix Eureka Discovery module. Let me contribute and please add this to 0.8.1.1 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1348) Producer's Broker Discovery Interface
[ https://issues.apache.org/jira/browse/KAFKA-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-1348. -- Resolution: Not a Problem Producer's Broker Discovery Interface - Key: KAFKA-1348 URL: https://issues.apache.org/jira/browse/KAFKA-1348 Project: Kafka Issue Type: Improvement Components: producer Reporter: Jay Bae Assignee: Jun Rao Fix For: 0.8.2 Producer has a property 'broker.list' static configuration. I need a requirement to be able to override this behavior such as Netflix Eureka Discovery module. Let me contribute and please add this to 0.8.1.1 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1019) kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist
[ https://issues.apache.org/jira/browse/KAFKA-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1019: - Fix Version/s: (was: 0.8.2) 0.9.0 kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist -- Key: KAFKA-1019 URL: https://issues.apache.org/jira/browse/KAFKA-1019 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Guozhang Wang Labels: newbie Fix For: 0.9.0 From Libo Yu: I tried to run kafka-preferred-replica-election.sh on our kafka cluster. But I got this expection: Failed to start preferred replica election org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/uattoqaaa.default/partitions I checked zookeeper and there is no /brokers/topics/uattoqaaa.default/partitions. All I found is /brokers/topics/uattoqaaa.default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1019) kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist
[ https://issues.apache.org/jira/browse/KAFKA-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122087#comment-14122087 ] Guozhang Wang commented on KAFKA-1019: -- Moving to 0.9 now. kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist -- Key: KAFKA-1019 URL: https://issues.apache.org/jira/browse/KAFKA-1019 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Guozhang Wang Labels: newbie Fix For: 0.9.0 From Libo Yu: I tried to run kafka-preferred-replica-election.sh on our kafka cluster. But I got this expection: Failed to start preferred replica election org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/uattoqaaa.default/partitions I checked zookeeper and there is no /brokers/topics/uattoqaaa.default/partitions. All I found is /brokers/topics/uattoqaaa.default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1034) Improve partition reassignment to optimize writes to zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122088#comment-14122088 ] Guozhang Wang commented on KAFKA-1034: -- [~nehanarkhede] Could you provide some more context on this one? And if it is still an issue shall we move to 0.9? Improve partition reassignment to optimize writes to zookeeper -- Key: KAFKA-1034 URL: https://issues.apache.org/jira/browse/KAFKA-1034 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Fix For: 0.8.2 For ReassignPartition tool, check if optimizing the writes to ZK after every replica reassignment is possible -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1002) Delete aliveLeaders field from LeaderAndIsrRequest
[ https://issues.apache.org/jira/browse/KAFKA-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122090#comment-14122090 ] Guozhang Wang commented on KAFKA-1002: -- I think we still need aliveLeaders today, so this may not be an issue any more. [~junrao] [~swapnilghike] Could you comment on this one? Delete aliveLeaders field from LeaderAndIsrRequest -- Key: KAFKA-1002 URL: https://issues.apache.org/jira/browse/KAFKA-1002 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Swapnil Ghike Fix For: 0.8.2 After KAFKA-999 is committed, we don't need aliveLeaders in LeaderAndIsrRequest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-853) Allow OffsetFetchRequest to initialize offsets
[ https://issues.apache.org/jira/browse/KAFKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-853: Fix Version/s: (was: 0.8.2) 0.9.0 Allow OffsetFetchRequest to initialize offsets -- Key: KAFKA-853 URL: https://issues.apache.org/jira/browse/KAFKA-853 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.8.1 Reporter: David Arthur Fix For: 0.9.0 Original Estimate: 24h Remaining Estimate: 24h It would be nice for the OffsetFetchRequest API to have the option to initialize offsets instead of returning unknown_topic_or_partition. It could mimic the Offsets API by adding the time field and then follow the same code path on the server as the Offset API. In this case, the response would need to a boolean to indicate if the returned offset was initialized or fetched from ZK. This would simplify the client logic when dealing with new topics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-853) Allow OffsetFetchRequest to initialize offsets
[ https://issues.apache.org/jira/browse/KAFKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122093#comment-14122093 ] Guozhang Wang commented on KAFKA-853: - Moving to 0.9 now. Allow OffsetFetchRequest to initialize offsets -- Key: KAFKA-853 URL: https://issues.apache.org/jira/browse/KAFKA-853 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.8.1 Reporter: David Arthur Fix For: 0.9.0 Original Estimate: 24h Remaining Estimate: 24h It would be nice for the OffsetFetchRequest API to have the option to initialize offsets instead of returning unknown_topic_or_partition. It could mimic the Offsets API by adding the time field and then follow the same code path on the server as the Offset API. In this case, the response would need to a boolean to indicate if the returned offset was initialized or fetched from ZK. This would simplify the client logic when dealing with new topics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1439) transient unit test failure in testDeleteTopicDuringAddPartition
[ https://issues.apache.org/jira/browse/KAFKA-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-1439. -- Resolution: Duplicate transient unit test failure in testDeleteTopicDuringAddPartition Key: KAFKA-1439 URL: https://issues.apache.org/jira/browse/KAFKA-1439 Project: Kafka Issue Type: Bug Reporter: Jun Rao Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.2 Saw the following transient unit test failure. kafka.admin.DeleteTopicTest testDeleteTopicDuringAddPartition FAILED junit.framework.AssertionFailedError: Admin path /admin/delete_topic/test path not deleted even after a replica is restarted at junit.framework.Assert.fail(Assert.java:47) at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:578) at kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:333) at kafka.admin.DeleteTopicTest.testDeleteTopicDuringAddPartition(DeleteTopicTest.scala:216) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1235) Enable server to indefinitely retry on controlled shutdown
[ https://issues.apache.org/jira/browse/KAFKA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122095#comment-14122095 ] Guozhang Wang commented on KAFKA-1235: -- Moving to 0.9. Enable server to indefinitely retry on controlled shutdown -- Key: KAFKA-1235 URL: https://issues.apache.org/jira/browse/KAFKA-1235 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1235.patch, KAFKA-1235_2014-02-20_11:28:36.patch, KAFKA-1235_2014-02-20_13:08:23.patch Today the kafka server can exit silently if it hits an exception that is swallowed during controlled shut down or controlled.shutdown.max.retries has been exhausted. It is better to add an option to let it retry indefinitely. Also will fix some other loose-check bugs on socket-closing logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-451) follower replica may need to backoff the fetching if leader is not ready yet
[ https://issues.apache.org/jira/browse/KAFKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-451: Fix Version/s: (was: 0.8.2) 0.9.0 follower replica may need to backoff the fetching if leader is not ready yet Key: KAFKA-451 URL: https://issues.apache.org/jira/browse/KAFKA-451 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Prashanth Menon Labels: optimization Fix For: 0.9.0 Original Estimate: 48h Remaining Estimate: 48h Currently, when a follower starts fetching from a new leader, it just keeps sending fetch requests even if the requests fail because the leader is not ready yet. We probably should let the follower backoff a bit in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-570) Kafka should not need snappy jar at runtime
[ https://issues.apache.org/jira/browse/KAFKA-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122099#comment-14122099 ] Guozhang Wang commented on KAFKA-570: - This issue will be automatically fixed when the server code moves to client's common libraries, moving to 0.9 for now. Kafka should not need snappy jar at runtime --- Key: KAFKA-570 URL: https://issues.apache.org/jira/browse/KAFKA-570 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swapnil Ghike Labels: bugs Fix For: 0.9.0 CompressionFactory imports snappy jar in a pattern match. The purpose of importing it this way seems to be avoiding the import unless snappy compression is actually required. However, kafka throws a ClassNotFoundException if snappy jar is removed at runtime from lib_managed. This exception can be easily seen by producing some data with the console producer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-566) Add last modified time to the TopicMetadataRequest
[ https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-566: Fix Version/s: (was: 0.8.2) 0.9.0 Add last modified time to the TopicMetadataRequest -- Key: KAFKA-566 URL: https://issues.apache.org/jira/browse/KAFKA-566 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Fix For: 0.9.0 To support KAFKA-560 it would be nice to have a last modified time in the TopicMetadataRequest. This would be the timestamp of the last append to the log as taken from stat on the final log segment. Implementation would involve 1. Adding a new field to TopicMetadataRequest 2. Adding a method Log.lastModified: Long to get the last modified time from a log This timestamp would, of course, be subject to error in the event that the file was touched without modification, but I think that is actually okay since it provides a manual way to avoid gc'ing a topic that you know you will want. It is debatable whether this should go in 0.8. It would be nice to add the field to the metadata request, at least, as that change should be easy and would avoid needing to bump the version in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-95) Create Jenkins readable test output
[ https://issues.apache.org/jira/browse/KAFKA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-95: --- Fix Version/s: (was: 0.8.2) 0.9.0 Create Jenkins readable test output --- Key: KAFKA-95 URL: https://issues.apache.org/jira/browse/KAFKA-95 Project: Kafka Issue Type: Improvement Components: packaging Reporter: Chris Burroughs Fix For: 0.9.0 Jenkinds likes XML. See http://henkelmann.eu/2010/11/14/sbt_hudson_with_test_integration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122100#comment-14122100 ] Guozhang Wang commented on KAFKA-493: - Moving the fix of 2) to 0.9 High CPU usage on inactive server - Key: KAFKA-493 URL: https://issues.apache.org/jira/browse/KAFKA-493 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0 Reporter: Jay Kreps Fix For: 0.8.2 Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, backtraces.txt, stacktrace.txt I've been playing with the 0.8 branch of Kafka and noticed that idle CPU usage is fairly high (13% of a core). Is that to be expected? I did look at the stack, but didn't see anything obvious. A background task? I wanted to mention how I am getting into this state. I've set up two machines with the latest 0.8 code base and am using a replication factor of 2. On starting the brokers there is no idle CPU activity. Then I run a test that essential does 10k publish operations followed by immediate consume operations (I was measuring latency). Once this has run the kafka nodes seem to consistently be consuming CPU essentially forever. hprof results: THREAD START (obj=53ae, id = 24, name=RMI TCP Accept-0, group=system) THREAD START (obj=53ae, id = 25, name=RMI TCP Accept-, group=system) THREAD START (obj=53ae, id = 26, name=RMI TCP Accept-0, group=system) THREAD START (obj=53ae, id = 21, name=main, group=main) THREAD START (obj=53ae, id = 27, name=Thread-2, group=main) THREAD START (obj=53ae, id = 28, name=Thread-3, group=main) THREAD START (obj=53ae, id = 29, name=kafka-processor-9092-0, group=main) THREAD START (obj=53ae, id = 200010, name=kafka-processor-9092-1, group=main) THREAD START (obj=53ae, id = 200011, name=kafka-acceptor, group=main) THREAD START (obj=574b, id = 200012, name=ZkClient-EventThread-20-localhost:2181, group=main) THREAD START (obj=576e, id = 200014, name=main-SendThread(), group=main) THREAD START (obj=576d, id = 200013, name=main-EventThread, group=main) THREAD START (obj=53ae, id = 200015, name=metrics-meter-tick-thread-1, group=main) THREAD START (obj=53ae, id = 200016, name=metrics-meter-tick-thread-2, group=main) THREAD START (obj=53ae, id = 200017, name=request-expiration-task, group=main) THREAD START (obj=53ae, id = 200018, name=request-expiration-task, group=main) THREAD START (obj=53ae, id = 200019, name=kafka-request-handler-0, group=main) THREAD START (obj=53ae, id = 200020, name=kafka-request-handler-1, group=main) THREAD START (obj=53ae, id = 200021, name=Thread-6, group=main) THREAD START (obj=53ae, id = 200022, name=Thread-7, group=main) THREAD START (obj=5899, id = 200023, name=ReplicaFetcherThread-0-2 on broker 1, , group=main) THREAD START (obj=5899, id = 200024, name=ReplicaFetcherThread-0-3 on broker 1, , group=main) THREAD START (obj=5899, id = 200025, name=ReplicaFetcherThread-0-0 on broker 1, , group=main) THREAD START (obj=5899, id = 200026, name=ReplicaFetcherThread-0-1 on broker 1, , group=main) THREAD START (obj=53ae, id = 200028, name=SIGINT handler, group=system) THREAD START (obj=53ae, id = 200029, name=Thread-5, group=main) THREAD START (obj=574b, id = 200030, name=Thread-1, group=main) THREAD START (obj=574b, id = 200031, name=Thread-0, group=main) THREAD END (id = 200031) THREAD END (id = 200029) THREAD END (id = 200020) THREAD END (id = 200019) THREAD END (id = 28) THREAD END (id = 200021) THREAD END (id = 27) THREAD END (id = 200022) THREAD END (id = 200018) THREAD END (id = 200017) THREAD END (id = 200012) THREAD END (id = 200013) THREAD END (id = 200014) THREAD END (id = 200025) THREAD END (id = 200023) THREAD END (id = 200026) THREAD END (id = 200024) THREAD END (id = 200011) THREAD END (id = 29) THREAD END (id = 200010) THREAD END (id = 200030) THREAD END (id = 200028) TRACE 301281: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218) sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
[jira] [Updated] (KAFKA-740) Improve crash-safety of log segment swap
[ https://issues.apache.org/jira/browse/KAFKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-740: Fix Version/s: (was: 0.8.2) 0.9.0 Improve crash-safety of log segment swap Key: KAFKA-740 URL: https://issues.apache.org/jira/browse/KAFKA-740 Project: Kafka Issue Type: Bug Components: log Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.9.0 Currently Log.replaceSegments has a bug that can cause a swap containing multiple segments to partially complete. This would lead to duplicate data in the log. The proposed fix is to use a name like offset1_and_offset2.swap for a segment meant to replace segments with base offsets offset1 and offset2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-740) Improve crash-safety of log segment swap
[ https://issues.apache.org/jira/browse/KAFKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122101#comment-14122101 ] Guozhang Wang commented on KAFKA-740: - Moving to 0.9. Improve crash-safety of log segment swap Key: KAFKA-740 URL: https://issues.apache.org/jira/browse/KAFKA-740 Project: Kafka Issue Type: Bug Components: log Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.9.0 Currently Log.replaceSegments has a bug that can cause a swap containing multiple segments to partially complete. This would lead to duplicate data in the log. The proposed fix is to use a name like offset1_and_offset2.swap for a segment meant to replace segments with base offsets offset1 and offset2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-747) RequestChannel re-design
[ https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-747: Fix Version/s: (was: 0.8.2) 0.9.0 RequestChannel re-design Key: KAFKA-747 URL: https://issues.apache.org/jira/browse/KAFKA-747 Project: Kafka Issue Type: New Feature Components: network Reporter: Jay Kreps Assignee: Neha Narkhede Fix For: 0.9.0 We have had some discussion around how to handle queuing requests. There are two competing concerns: 1. We need to maintain request order on a per-socket basis. 2. We want to be able to balance load flexibly over a pool of threads so that if one thread blocks on I/O request processing continues. Two Approaches We Have Considered 1. Have a global queue of unprocessed requests. All I/O threads read requests off this global queue and process them. To avoid re-ordering have the network layer only read one request at a time. 2. Have a queue per I/O thread and have the network threads statically map sockets to I/O thread request queues. Problems With These Approaches In the first case you are not able to get any per-producer parallelism. That is you can't read the next request while the current one is being handled. This seems like it would not be a big deal, but preliminary benchmarks show that it might be. In the second case there are two problems. The first is that when an I/O thread gets blocked all request processing for sockets attached to that I/O thread will grind to a halt. If you have 10,000 connections, and 10 I/O threads, then each blockage will stop 1,000 producers. If there is one topic that has long synchronous flush times enabled (or is experiencing fsync locking) this will cause big latency blips for all producers using that I/O thread. The next problem is around backpressure and memory management. Say we use BlockingQueues to feed the I/O threads. And say that one I/O thread stalls. It's request queue will fill up and it will then block ALL network threads, since they will block on inserting into that queue, even though the other I/O threads are unused and have empty queues. A Proposed Better Solution The problem with the first solution is that we are not pipelining requests. The problem with the second approach is that we are too constrained in moving work from one I/O thread to another. Instead we should have a single request queue-like structure, but internally enforce the condition that requests are not re-ordered. Here are the details. We retain RequestChannel but refactor its internals. Internally we replace the blocking queue with a linked list. We also keep an in-flight-keys array with one entry per I/O thread. When removing a work item from the list we can't just take the first thing. Instead we need to walk the list and look for something with a request key not in the in-flight-keys array. When a response is sent, we remove that key from the in-flight array. This guarantees that requests for a socket with key K are ordered, but that processing for K can only block requests made by K. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-493: Fix Version/s: (was: 0.8.2) 0.9.0 High CPU usage on inactive server - Key: KAFKA-493 URL: https://issues.apache.org/jira/browse/KAFKA-493 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0 Reporter: Jay Kreps Fix For: 0.9.0 Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, backtraces.txt, stacktrace.txt I've been playing with the 0.8 branch of Kafka and noticed that idle CPU usage is fairly high (13% of a core). Is that to be expected? I did look at the stack, but didn't see anything obvious. A background task? I wanted to mention how I am getting into this state. I've set up two machines with the latest 0.8 code base and am using a replication factor of 2. On starting the brokers there is no idle CPU activity. Then I run a test that essential does 10k publish operations followed by immediate consume operations (I was measuring latency). Once this has run the kafka nodes seem to consistently be consuming CPU essentially forever. hprof results: THREAD START (obj=53ae, id = 24, name=RMI TCP Accept-0, group=system) THREAD START (obj=53ae, id = 25, name=RMI TCP Accept-, group=system) THREAD START (obj=53ae, id = 26, name=RMI TCP Accept-0, group=system) THREAD START (obj=53ae, id = 21, name=main, group=main) THREAD START (obj=53ae, id = 27, name=Thread-2, group=main) THREAD START (obj=53ae, id = 28, name=Thread-3, group=main) THREAD START (obj=53ae, id = 29, name=kafka-processor-9092-0, group=main) THREAD START (obj=53ae, id = 200010, name=kafka-processor-9092-1, group=main) THREAD START (obj=53ae, id = 200011, name=kafka-acceptor, group=main) THREAD START (obj=574b, id = 200012, name=ZkClient-EventThread-20-localhost:2181, group=main) THREAD START (obj=576e, id = 200014, name=main-SendThread(), group=main) THREAD START (obj=576d, id = 200013, name=main-EventThread, group=main) THREAD START (obj=53ae, id = 200015, name=metrics-meter-tick-thread-1, group=main) THREAD START (obj=53ae, id = 200016, name=metrics-meter-tick-thread-2, group=main) THREAD START (obj=53ae, id = 200017, name=request-expiration-task, group=main) THREAD START (obj=53ae, id = 200018, name=request-expiration-task, group=main) THREAD START (obj=53ae, id = 200019, name=kafka-request-handler-0, group=main) THREAD START (obj=53ae, id = 200020, name=kafka-request-handler-1, group=main) THREAD START (obj=53ae, id = 200021, name=Thread-6, group=main) THREAD START (obj=53ae, id = 200022, name=Thread-7, group=main) THREAD START (obj=5899, id = 200023, name=ReplicaFetcherThread-0-2 on broker 1, , group=main) THREAD START (obj=5899, id = 200024, name=ReplicaFetcherThread-0-3 on broker 1, , group=main) THREAD START (obj=5899, id = 200025, name=ReplicaFetcherThread-0-0 on broker 1, , group=main) THREAD START (obj=5899, id = 200026, name=ReplicaFetcherThread-0-1 on broker 1, , group=main) THREAD START (obj=53ae, id = 200028, name=SIGINT handler, group=system) THREAD START (obj=53ae, id = 200029, name=Thread-5, group=main) THREAD START (obj=574b, id = 200030, name=Thread-1, group=main) THREAD START (obj=574b, id = 200031, name=Thread-0, group=main) THREAD END (id = 200031) THREAD END (id = 200029) THREAD END (id = 200020) THREAD END (id = 200019) THREAD END (id = 28) THREAD END (id = 200021) THREAD END (id = 27) THREAD END (id = 200022) THREAD END (id = 200018) THREAD END (id = 200017) THREAD END (id = 200012) THREAD END (id = 200013) THREAD END (id = 200014) THREAD END (id = 200025) THREAD END (id = 200023) THREAD END (id = 200026) THREAD END (id = 200024) THREAD END (id = 200011) THREAD END (id = 29) THREAD END (id = 200010) THREAD END (id = 200030) THREAD END (id = 200028) TRACE 301281: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218) sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
[jira] [Commented] (KAFKA-747) RequestChannel re-design
[ https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122102#comment-14122102 ] Guozhang Wang commented on KAFKA-747: - Moving to 0.9 RequestChannel re-design Key: KAFKA-747 URL: https://issues.apache.org/jira/browse/KAFKA-747 Project: Kafka Issue Type: New Feature Components: network Reporter: Jay Kreps Assignee: Neha Narkhede Fix For: 0.9.0 We have had some discussion around how to handle queuing requests. There are two competing concerns: 1. We need to maintain request order on a per-socket basis. 2. We want to be able to balance load flexibly over a pool of threads so that if one thread blocks on I/O request processing continues. Two Approaches We Have Considered 1. Have a global queue of unprocessed requests. All I/O threads read requests off this global queue and process them. To avoid re-ordering have the network layer only read one request at a time. 2. Have a queue per I/O thread and have the network threads statically map sockets to I/O thread request queues. Problems With These Approaches In the first case you are not able to get any per-producer parallelism. That is you can't read the next request while the current one is being handled. This seems like it would not be a big deal, but preliminary benchmarks show that it might be. In the second case there are two problems. The first is that when an I/O thread gets blocked all request processing for sockets attached to that I/O thread will grind to a halt. If you have 10,000 connections, and 10 I/O threads, then each blockage will stop 1,000 producers. If there is one topic that has long synchronous flush times enabled (or is experiencing fsync locking) this will cause big latency blips for all producers using that I/O thread. The next problem is around backpressure and memory management. Say we use BlockingQueues to feed the I/O threads. And say that one I/O thread stalls. It's request queue will fill up and it will then block ALL network threads, since they will block on inserting into that queue, even though the other I/O threads are unused and have empty queues. A Proposed Better Solution The problem with the first solution is that we are not pipelining requests. The problem with the second approach is that we are too constrained in moving work from one I/O thread to another. Instead we should have a single request queue-like structure, but internally enforce the condition that requests are not re-ordered. Here are the details. We retain RequestChannel but refactor its internals. Internally we replace the blocking queue with a linked list. We also keep an in-flight-keys array with one entry per I/O thread. When removing a work item from the list we can't just take the first thing. Instead we need to walk the list and look for something with a request key not in the in-flight-keys array. When a response is sent, we remove that key from the in-flight array. This guarantees that requests for a socket with key K are ordered, but that processing for K can only block requests made by K. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-661) Prevent a shutting down broker from re-entering the ISR
[ https://issues.apache.org/jira/browse/KAFKA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-661: Fix Version/s: (was: 0.8.2) 0.9.0 Prevent a shutting down broker from re-entering the ISR --- Key: KAFKA-661 URL: https://issues.apache.org/jira/browse/KAFKA-661 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Joel Koshy Fix For: 0.9.0 There is a timing issue in controlled shutdown that affects low-volume topics. The leader that is being shut down receives a leaderAndIsrRequest informing it is no longer the leader and thus starts up a follower which starts issuing fetch requests to the new leader. We then shrink the ISR and send a StopReplicaRequest to the shutting down broker. However, the new leader upon receiving the fetch request expands the ISR again. This does not really have critical impact in the sense that it can cause producers to that topic to timeout. However, there are probably very few or no produce requests coming in as it primarily affects low-volume topics. The shutdown logic itself seems to be working correctly in that the leader has been successfully moved. One possible approach would be to use the callback feature in the ControllerBrokerRequestBatch and wait until the StopReplicaRequest has been processed by the shutting down broker before shrinking the ISR; and there are probably other ways as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1324) Debian packaging
[ https://issues.apache.org/jira/browse/KAFKA-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1324: - Fix Version/s: (was: 0.8.2) 0.9.0 Debian packaging Key: KAFKA-1324 URL: https://issues.apache.org/jira/browse/KAFKA-1324 Project: Kafka Issue Type: Improvement Components: packaging Environment: linux Reporter: David Stendardi Priority: Minor Labels: deb, debian, fpm, packaging Fix For: 0.9.0 Attachments: packaging.patch The following patch add a task releaseDeb to the gradle build : ./gradlew releaseDeb This task should create a debian package in core/build/distributions using fpm : https://github.com/jordansissel/fpm. We decided to use fpm so other package types would be easy to provide in further iterations (eg : rpm). *Some implementations details* : - We splitted the releaseTarGz in two tasks : distDir, releaseTarGz. - We tried to use gradle builtin variables (project.name etc...) - By default the service will not start automatically so the user is free to setup the service with custom configuration. Notes : * FPM is required and should be in the path. * FPM does not allow yet to declare /etc/default/kafka as a conffiles (see : https://github.com/jordansissel/fpm/issues/668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1324) Debian packaging
[ https://issues.apache.org/jira/browse/KAFKA-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122104#comment-14122104 ] Guozhang Wang commented on KAFKA-1324: -- Moving to 0.9 for now. Debian packaging Key: KAFKA-1324 URL: https://issues.apache.org/jira/browse/KAFKA-1324 Project: Kafka Issue Type: Improvement Components: packaging Environment: linux Reporter: David Stendardi Priority: Minor Labels: deb, debian, fpm, packaging Fix For: 0.9.0 Attachments: packaging.patch The following patch add a task releaseDeb to the gradle build : ./gradlew releaseDeb This task should create a debian package in core/build/distributions using fpm : https://github.com/jordansissel/fpm. We decided to use fpm so other package types would be easy to provide in further iterations (eg : rpm). *Some implementations details* : - We splitted the releaseTarGz in two tasks : distDir, releaseTarGz. - We tried to use gradle builtin variables (project.name etc...) - By default the service will not start automatically so the user is free to setup the service with custom configuration. Notes : * FPM is required and should be in the path. * FPM does not allow yet to declare /etc/default/kafka as a conffiles (see : https://github.com/jordansissel/fpm/issues/668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-334) Some tests fail when building on a Windows box
[ https://issues.apache.org/jira/browse/KAFKA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-334: Fix Version/s: (was: 0.8.2) 0.9.0 Some tests fail when building on a Windows box -- Key: KAFKA-334 URL: https://issues.apache.org/jira/browse/KAFKA-334 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7 Environment: Windows 7 - reproduces under command shell, cygwin, and MINGW32 (Git Bash) Reporter: Roman Garcia Priority: Minor Labels: build-failure, test-fail Fix For: 0.9.0 Trying to create a ZIP distro from sources failed. On Win7. On cygwin, command shell and git bash. Tried with incubator-src download from ASF download page, as well as fresh checkout from latest trunk (r1329547). Once I tried the same on a Linux box, everything was working ok. svn co http://svn.apache.org/repos/asf/incubator/kafka/trunk kafka-0.7.0 ./sbt update (OK) ./sbt package (OK) ./sbt release-zip (FAIL) Tests failing: [error] Error running kafka.integration.LazyInitProducerTest: Test FAILED [error] Error running kafka.zk.ZKLoadBalanceTest: Test FAILED [error] Error running kafka.javaapi.producer.ProducerTest: Test FAILED [error] Error running kafka.producer.ProducerTest: Test FAILED [error] Error running test: One or more subtasks failed [error] Error running doc: Scaladoc generation failed Stacks: [error] Test Failed: testZKSendWithDeadBroker junit.framework.AssertionFailedError: Message set should have another message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.javaapi.producer.ProducerTest.testZKSendWithDeadBroker(ProducerTest.scala:448) [error] Test Failed: testZKSendToNewTopic junit.framework.AssertionFailedError: Message set should have 1 message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.javaapi.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:416) [error] Test Failed: testLoadBalance(kafka.zk.ZKLoadBalanceTest) junit.framework.AssertionFailedError: expected:5 but was:0 at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:195) at junit.framework.Assert.assertEquals(Assert.java:201) at kafka.zk.ZKLoadBalanceTest.checkSetEqual(ZKLoadBalanceTest.scala:121) at kafka.zk.ZKLoadBalanceTest.testLoadBalance(ZKLoadBalanceTest.scala:89) [error] Test Failed: testPartitionedSendToNewTopic java.lang.AssertionError: Unexpected method call send(test-topic1, 0, ByteBufferMessageSet(MessageAndOffset(message(magic = 1, attributes = 0, crc = 2326977762, payload = java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]),15), )): close(): expected: 1, actual: 0 at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:45) at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:73) at org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:92) at kafka.producer.SyncProducer$$EnhancerByCGLIB$$4385e618.send(generated) at kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:114) at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100) at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43) at kafka.producer.ProducerPool.send(ProducerPool.scala:100) at kafka.producer.Producer.zkSend(Producer.scala:137) at kafka.producer.Producer.send(Producer.scala:99) at kafka.producer.ProducerTest.testPartitionedSendToNewTopic(ProducerTest.scala:576) [error] Test Failed: testZKSendToNewTopic junit.framework.AssertionFailedError: Message set should have 1 message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:429) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-334) Some tests fail when building on a Windows box
[ https://issues.apache.org/jira/browse/KAFKA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122105#comment-14122105 ] Guozhang Wang commented on KAFKA-334: - Moving the fix to post 0.8.2 for now. Some tests fail when building on a Windows box -- Key: KAFKA-334 URL: https://issues.apache.org/jira/browse/KAFKA-334 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7 Environment: Windows 7 - reproduces under command shell, cygwin, and MINGW32 (Git Bash) Reporter: Roman Garcia Priority: Minor Labels: build-failure, test-fail Fix For: 0.9.0 Trying to create a ZIP distro from sources failed. On Win7. On cygwin, command shell and git bash. Tried with incubator-src download from ASF download page, as well as fresh checkout from latest trunk (r1329547). Once I tried the same on a Linux box, everything was working ok. svn co http://svn.apache.org/repos/asf/incubator/kafka/trunk kafka-0.7.0 ./sbt update (OK) ./sbt package (OK) ./sbt release-zip (FAIL) Tests failing: [error] Error running kafka.integration.LazyInitProducerTest: Test FAILED [error] Error running kafka.zk.ZKLoadBalanceTest: Test FAILED [error] Error running kafka.javaapi.producer.ProducerTest: Test FAILED [error] Error running kafka.producer.ProducerTest: Test FAILED [error] Error running test: One or more subtasks failed [error] Error running doc: Scaladoc generation failed Stacks: [error] Test Failed: testZKSendWithDeadBroker junit.framework.AssertionFailedError: Message set should have another message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.javaapi.producer.ProducerTest.testZKSendWithDeadBroker(ProducerTest.scala:448) [error] Test Failed: testZKSendToNewTopic junit.framework.AssertionFailedError: Message set should have 1 message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.javaapi.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:416) [error] Test Failed: testLoadBalance(kafka.zk.ZKLoadBalanceTest) junit.framework.AssertionFailedError: expected:5 but was:0 at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:195) at junit.framework.Assert.assertEquals(Assert.java:201) at kafka.zk.ZKLoadBalanceTest.checkSetEqual(ZKLoadBalanceTest.scala:121) at kafka.zk.ZKLoadBalanceTest.testLoadBalance(ZKLoadBalanceTest.scala:89) [error] Test Failed: testPartitionedSendToNewTopic java.lang.AssertionError: Unexpected method call send(test-topic1, 0, ByteBufferMessageSet(MessageAndOffset(message(magic = 1, attributes = 0, crc = 2326977762, payload = java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]),15), )): close(): expected: 1, actual: 0 at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:45) at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:73) at org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:92) at kafka.producer.SyncProducer$$EnhancerByCGLIB$$4385e618.send(generated) at kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:114) at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100) at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43) at kafka.producer.ProducerPool.send(ProducerPool.scala:100) at kafka.producer.Producer.zkSend(Producer.scala:137) at kafka.producer.Producer.send(Producer.scala:99) at kafka.producer.ProducerTest.testPartitionedSendToNewTopic(ProducerTest.scala:576) [error] Test Failed: testZKSendToNewTopic junit.framework.AssertionFailedError: Message set should have 1 message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:429) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-313) Add JSON output and looping options to ConsumerOffsetChecker
[ https://issues.apache.org/jira/browse/KAFKA-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122107#comment-14122107 ] Guozhang Wang commented on KAFKA-313: - [~jjkoshy] Do we still need this feature today? Add JSON output and looping options to ConsumerOffsetChecker Key: KAFKA-313 URL: https://issues.apache.org/jira/browse/KAFKA-313 Project: Kafka Issue Type: Improvement Reporter: Dave DeMaagd Priority: Minor Labels: patch Fix For: 0.8.2 Attachments: KAFKA-313-2012032200.diff Adds: * '--loop N' - causes the program to loop forever, sleeping for up to N seconds between loops (loop time minus collection time, unless that's less than 0, at which point it will just run again immediately) * '--asjson' - display as a JSON string instead of the more human readable output format. Neither of the above depend on each other (you can loop in the human readable output, or do a single shot execution with JSON output). Existing behavior/output maintained if neither of the above are used. Diff Attached. Impacted files: core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1313) Support adding replicas to existing topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-1313. -- Resolution: Fixed Fix Version/s: (was: 0.9.0) 0.8.2 Support adding replicas to existing topic partitions Key: KAFKA-1313 URL: https://issues.apache.org/jira/browse/KAFKA-1313 Project: Kafka Issue Type: New Feature Components: tools Affects Versions: 0.8.0 Reporter: Marc Labbe Priority: Critical Fix For: 0.8.2 There is currently no easy way to add replicas to an existing topic partitions. For example, topic create-test has been created with ReplicationFactor=1: Topic:create-test PartitionCount:3ReplicationFactor:1 Configs: Topic: create-test Partition: 0Leader: 1 Replicas: 1 Isr: 1 Topic: create-test Partition: 1Leader: 2 Replicas: 2 Isr: 2 Topic: create-test Partition: 2Leader: 3 Replicas: 3 Isr: 3 I would like to increase the ReplicationFactor=2 (or more) so it shows up like this instead. Topic:create-test PartitionCount:3ReplicationFactor:2 Configs: Topic: create-test Partition: 0Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: create-test Partition: 1Leader: 2 Replicas: 2,3 Isr: 2,3 Topic: create-test Partition: 2Leader: 3 Replicas: 3,1 Isr: 3,1 Use cases for this: - adding brokers and thus increase fault tolerance - fixing human errors for topics created with wrong values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1313) Support adding replicas to existing topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122115#comment-14122115 ] Guozhang Wang commented on KAFKA-1313: -- My bad. Closing now. Support adding replicas to existing topic partitions Key: KAFKA-1313 URL: https://issues.apache.org/jira/browse/KAFKA-1313 Project: Kafka Issue Type: New Feature Components: tools Affects Versions: 0.8.0 Reporter: Marc Labbe Priority: Critical Fix For: 0.8.2 There is currently no easy way to add replicas to an existing topic partitions. For example, topic create-test has been created with ReplicationFactor=1: Topic:create-test PartitionCount:3ReplicationFactor:1 Configs: Topic: create-test Partition: 0Leader: 1 Replicas: 1 Isr: 1 Topic: create-test Partition: 1Leader: 2 Replicas: 2 Isr: 2 Topic: create-test Partition: 2Leader: 3 Replicas: 3 Isr: 3 I would like to increase the ReplicationFactor=2 (or more) so it shows up like this instead. Topic:create-test PartitionCount:3ReplicationFactor:2 Configs: Topic: create-test Partition: 0Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: create-test Partition: 1Leader: 2 Replicas: 2,3 Isr: 2,3 Topic: create-test Partition: 2Leader: 3 Replicas: 3,1 Isr: 3,1 Use cases for this: - adding brokers and thus increase fault tolerance - fixing human errors for topics created with wrong values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work
[ https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122134#comment-14122134 ] Sriharsha Chintalapani commented on KAFKA-1558: --- [~guozhang] I am waiting some logs to show up supposed issue. I can put in sometime this week get this done if someone can give me reproduction steps or logs. Thanks AdminUtils.deleteTopic does not work Key: KAFKA-1558 URL: https://issues.apache.org/jira/browse/KAFKA-1558 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Henning Schmiedehausen Assignee: Sriharsha Chintalapani Fix For: 0.8.2 the AdminUtils:.deleteTopic method is implemented as {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.createPersistentPath(zkClient, ZkUtils.getDeleteTopicPath(topic)) } {code} but the DeleteTopicCommand actually does {code} zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer) zkClient.deleteRecursive(ZkUtils.getTopicPath(topic)) {code} so I guess, that the 'createPersistentPath' above should actually be {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic)) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1482) Transient test failures for kafka.admin.DeleteTopicTest
[ https://issues.apache.org/jira/browse/KAFKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122136#comment-14122136 ] Sriharsha Chintalapani commented on KAFKA-1482: --- [~guozhang] Are you working on this JIRA or can I give it a shot. Thanks Transient test failures for kafka.admin.DeleteTopicTest --- Key: KAFKA-1482 URL: https://issues.apache.org/jira/browse/KAFKA-1482 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Jun Rao Labels: newbie Fix For: 0.8.2 A couple of test cases have timing related transient test failures: kafka.admin.DeleteTopicTest testPartitionReassignmentDuringDeleteTopic FAILED junit.framework.AssertionFailedError: Admin path /admin/delete_topic/test path not deleted even after a replica is restarted at junit.framework.Assert.fail(Assert.java:47) at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:578) at kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:333) at kafka.admin.DeleteTopicTest.testPartitionReassignmentDuringDeleteTopic(DeleteTopicTest.scala:197) kafka.admin.DeleteTopicTest testDeleteTopicDuringAddPartition FAILED junit.framework.AssertionFailedError: Replica logs not deleted after delete topic is complete at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:338) at kafka.admin.DeleteTopicTest.testDeleteTopicDuringAddPartition(DeleteTopicTest.scala:216) kafka.admin.DeleteTopicTest testRequestHandlingDuringDeleteTopic FAILED org.scalatest.junit.JUnitTestFailedError: fails with exception at org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:102) at org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142) at org.scalatest.Assertions$class.fail(Assertions.scala:664) at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142) at kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:123) Caused by: org.scalatest.junit.JUnitTestFailedError: Test should fail because the topic is being deleted at org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101) at org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142) at org.scalatest.Assertions$class.fail(Assertions.scala:644) at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142) at kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:120) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1510) Force offset commits when migrating consumer offsets from zookeeper to kafka
[ https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122243#comment-14122243 ] Joel Koshy commented on KAFKA-1510: --- [~nmarasoi] Your patch looks good to me - however, it does not cleanly apply on the latest trunk. Would you mind rebasing? If you don't have time I can take care of it as well. Force offset commits when migrating consumer offsets from zookeeper to kafka Key: KAFKA-1510 URL: https://issues.apache.org/jira/browse/KAFKA-1510 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Joel Koshy Labels: newbie Fix For: 0.8.2 Attachments: Patch_to_push_unfiltered_offsets_to_both_Kafka_and_potentially_Zookeeper_when_Kafka_is_con.patch, Unfiltered_to_kafka,_Incremental_to_Zookeeper.patch When migrating consumer offsets from ZooKeeper to kafka, we have to turn on dual-commit (i.e., the consumers will commit offsets to both zookeeper and kafka) in addition to setting offsets.storage to kafka. However, when we commit offsets we only commit offsets if they have changed (since the last commit). For low-volume topics or for topics that receive data in bursts offsets may not move for a long period of time. Therefore we may want to force the commit (even if offsets have not changed) when migrating (i.e., when dual-commit is enabled) - we can add a minimum interval threshold (say force commit after every 10 auto-commits) as well as on rebalance and shutdown. Also, I think it is safe to switch the default for offsets.storage from zookeeper to kafka and set the default to dual-commit (for people who have not migrated yet). We have deployed this to the largest consumers at linkedin and have not seen any issues so far (except for the migration caveat that this jira will resolve). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1482) Transient test failures for kafka.admin.DeleteTopicTest
[ https://issues.apache.org/jira/browse/KAFKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122420#comment-14122420 ] Guozhang Wang commented on KAFKA-1482: -- I am not work on that JIRA, not sure if Jun is looking into it though. Transient test failures for kafka.admin.DeleteTopicTest --- Key: KAFKA-1482 URL: https://issues.apache.org/jira/browse/KAFKA-1482 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Jun Rao Labels: newbie Fix For: 0.8.2 A couple of test cases have timing related transient test failures: kafka.admin.DeleteTopicTest testPartitionReassignmentDuringDeleteTopic FAILED junit.framework.AssertionFailedError: Admin path /admin/delete_topic/test path not deleted even after a replica is restarted at junit.framework.Assert.fail(Assert.java:47) at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:578) at kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:333) at kafka.admin.DeleteTopicTest.testPartitionReassignmentDuringDeleteTopic(DeleteTopicTest.scala:197) kafka.admin.DeleteTopicTest testDeleteTopicDuringAddPartition FAILED junit.framework.AssertionFailedError: Replica logs not deleted after delete topic is complete at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:338) at kafka.admin.DeleteTopicTest.testDeleteTopicDuringAddPartition(DeleteTopicTest.scala:216) kafka.admin.DeleteTopicTest testRequestHandlingDuringDeleteTopic FAILED org.scalatest.junit.JUnitTestFailedError: fails with exception at org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:102) at org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142) at org.scalatest.Assertions$class.fail(Assertions.scala:664) at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142) at kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:123) Caused by: org.scalatest.junit.JUnitTestFailedError: Test should fail because the topic is being deleted at org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101) at org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142) at org.scalatest.Assertions$class.fail(Assertions.scala:644) at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142) at kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:120) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1510) Force offset commits when migrating consumer offsets from zookeeper to kafka
[ https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nicu marasoiu updated KAFKA-1510: - Attachment: (was: Unfiltered_to_kafka,_Incremental_to_Zookeeper.patch) Force offset commits when migrating consumer offsets from zookeeper to kafka Key: KAFKA-1510 URL: https://issues.apache.org/jira/browse/KAFKA-1510 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Joel Koshy Labels: newbie Fix For: 0.8.2 Attachments: Patch_to_push_unfiltered_offsets_to_both_Kafka_and_potentially_Zookeeper_when_Kafka_is_con.patch When migrating consumer offsets from ZooKeeper to kafka, we have to turn on dual-commit (i.e., the consumers will commit offsets to both zookeeper and kafka) in addition to setting offsets.storage to kafka. However, when we commit offsets we only commit offsets if they have changed (since the last commit). For low-volume topics or for topics that receive data in bursts offsets may not move for a long period of time. Therefore we may want to force the commit (even if offsets have not changed) when migrating (i.e., when dual-commit is enabled) - we can add a minimum interval threshold (say force commit after every 10 auto-commits) as well as on rebalance and shutdown. Also, I think it is safe to switch the default for offsets.storage from zookeeper to kafka and set the default to dual-commit (for people who have not migrated yet). We have deployed this to the largest consumers at linkedin and have not seen any issues so far (except for the migration caveat that this jira will resolve). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1510) Force offset commits when migrating consumer offsets from zookeeper to kafka
[ https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nicu marasoiu updated KAFKA-1510: - Attachment: Unfiltered_offsets_commit_to_kafka_rebased.patch Attached rebased patch Force offset commits when migrating consumer offsets from zookeeper to kafka Key: KAFKA-1510 URL: https://issues.apache.org/jira/browse/KAFKA-1510 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Joel Koshy Labels: newbie Fix For: 0.8.2 Attachments: Patch_to_push_unfiltered_offsets_to_both_Kafka_and_potentially_Zookeeper_when_Kafka_is_con.patch, Unfiltered_offsets_commit_to_kafka_rebased.patch When migrating consumer offsets from ZooKeeper to kafka, we have to turn on dual-commit (i.e., the consumers will commit offsets to both zookeeper and kafka) in addition to setting offsets.storage to kafka. However, when we commit offsets we only commit offsets if they have changed (since the last commit). For low-volume topics or for topics that receive data in bursts offsets may not move for a long period of time. Therefore we may want to force the commit (even if offsets have not changed) when migrating (i.e., when dual-commit is enabled) - we can add a minimum interval threshold (say force commit after every 10 auto-commits) as well as on rebalance and shutdown. Also, I think it is safe to switch the default for offsets.storage from zookeeper to kafka and set the default to dual-commit (for people who have not migrated yet). We have deployed this to the largest consumers at linkedin and have not seen any issues so far (except for the migration caveat that this jira will resolve). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector
[ https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122456#comment-14122456 ] nicu marasoiu commented on KAFKA-1282: -- Hi, I am not completely sure I fully understood your solution in point 2: Do you mean to close at most one connection per iteration, right? This is ok, the worst case scenario is closing 100K old connections in 10 hours, one per select. On storing the time to close in a local variable, the access of the oldest entry every iteration is O(1) super cheap so I would skip this optimization. Disconnect idle socket connection in Selector - Key: KAFKA-1282 URL: https://issues.apache.org/jira/browse/KAFKA-1282 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Neha Narkhede Labels: newbie++ Fix For: 0.9.0 Attachments: KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch, idleDisconnect.patch To reduce # socket connections, it would be useful for the new producer to close socket connections that are idle. We can introduce a new producer config for the idle time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)