[jira] [Updated] (KAFKA-922) System Test - set retry.backoff.ms=300 to testcase_0119
[ https://issues.apache.org/jira/browse/KAFKA-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-922: Attachment: kafka-922-v1.patch Uploaded kafka-922-v2.patch. > System Test - set retry.backoff.ms=300 to testcase_0119 > --- > > Key: KAFKA-922 > URL: https://issues.apache.org/jira/browse/KAFKA-922 > Project: Kafka > Issue Type: Task >Reporter: John Fung > Attachments: kafka-922-v1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-922) System Test - set retry.backoff.ms=300 to testcase_0119
[ https://issues.apache.org/jira/browse/KAFKA-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-922: Status: Patch Available (was: Open) Trivial patch to set retry.backoff.ms=300 > System Test - set retry.backoff.ms=300 to testcase_0119 > --- > > Key: KAFKA-922 > URL: https://issues.apache.org/jira/browse/KAFKA-922 > Project: Kafka > Issue Type: Task >Reporter: John Fung > Attachments: kafka-922-v1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-921) Expose max lag mbean for consumers and replica fetchers
[ https://issues.apache.org/jira/browse/KAFKA-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668994#comment-13668994 ] Jun Rao commented on KAFKA-921: --- Thanks for the patch. Could we add the max lag in one place in AbstractFetcherThread and AbstractFetcherManager? We can pass in the proper metrics name. > Expose max lag mbean for consumers and replica fetchers > --- > > Key: KAFKA-921 > URL: https://issues.apache.org/jira/browse/KAFKA-921 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Joel Koshy > Fix For: 0.8 > > Attachments: KAFKA-921-v1.patch > > > We have a ton of consumer mbeans with names that are derived from the > consumer id, broker being fetched from, fetcher id, etc. This makes it > difficult to do basic monitoring of consumer/replica fetcher lag - since the > mbean to monitor can change. A more useful metric for monitoring purposes is > the maximum lag across all fetchers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed
[ https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-903: -- Attachment: kafka-903_v3.patch > [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > --- > > Key: KAFKA-903 > URL: https://issues.apache.org/jira/browse/KAFKA-903 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8 > Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, > probably copied on 4/30. kafka-0.8, built current on 4/30. > -rwx--+ 1 reefedjib None 41123 Mar 19 2009 commons-cli-1.2.jar > -rwx--+ 1 reefedjib None 58160 Jan 11 13:45 commons-codec-1.4.jar > -rwx--+ 1 reefedjib None 575389 Apr 18 13:41 > commons-collections-3.2.1.jar > -rwx--+ 1 reefedjib None 143847 May 21 2009 commons-compress-1.0.jar > -rwx--+ 1 reefedjib None 52543 Jan 11 13:45 commons-exec-1.1.jar > -rwx--+ 1 reefedjib None 57779 Jan 11 13:45 commons-fileupload-1.2.1.jar > -rwx--+ 1 reefedjib None 109043 Jan 20 2008 commons-io-1.4.jar > -rwx--+ 1 reefedjib None 279193 Jan 11 13:45 commons-lang-2.5.jar > -rwx--+ 1 reefedjib None 60686 Jan 11 13:45 commons-logging-1.1.1.jar > -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar > -rwx--+ 1 reefedjib None 206866 Apr 7 21:24 jackson-core-2.1.4.jar > -rwx--+ 1 reefedjib None 232245 Apr 7 21:24 jackson-core-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 69314 Apr 7 21:24 > jackson-dataformat-smile-2.1.4.jar > -rwx--+ 1 reefedjib None 780385 Apr 7 21:24 > jackson-mapper-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 47913 May 9 23:39 jopt-simple-3.0-rc2.jar > -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 > kafka_2.8.0-0.8.0-SNAPSHOT.jar > -rwx--+ 1 reefedjib None 481535 Jan 11 13:46 log4j-1.2.16.jar > -rwx--+ 1 reefedjib None 20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar > -rwx--+ 1 reefedjib None 251784 Apr 18 13:41 logback-classic-1.0.6.jar > -rwx--+ 1 reefedjib None 349706 Apr 18 13:41 logback-core-1.0.6.jar > -rwx--+ 1 reefedjib None 82123 Nov 26 13:11 metrics-core-2.2.0.jar > -rwx--+ 1 reefedjib None 1540457 Jul 12 2012 ojdbc14.jar > -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar > -rwx--+ 1 reefedjib None 3114958 Apr 2 07:47 scalatest_2.10-1.9.1.jar > -rwx--+ 1 reefedjib None 25962 Apr 18 13:41 slf4j-api-1.6.5.jar > -rwx--+ 1 reefedjib None 62269 Nov 29 03:26 zkclient-0.2.jar > -rwx--+ 1 reefedjib None 601677 Apr 18 13:41 zookeeper-3.3.3.jar >Reporter: Rob Withers >Priority: Blocker > Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, > kafka-903_v2.patch, kafka-903_v3.patch > > > This FATAL shuts down both brokers on windows, > {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 > messages with no compression to [robert_v_2x0,0] > {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer > sending messages with correlation id 178 for topics [robert_v_2x0,0] to > broker 1 on 192.168.1.100:9093 > {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > {2013-05-10 18:23:57,739} INFO [Thread-4] (Logging.scala:67) - [Kafka > Server 0], shutting down > Furthermore, attempts to restart them fail, with the following log: > {2013-05-10 19:14:52,156} INFO [Thread-1] (Logging.scala:67) - [Kafka Server > 0], started > {2013-05-10 19:14:52,157} INFO [ZkClient-EventThread-32-localhost:2181] > (Logging.scala:67) - New leader is 0 > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:79) - Delivering event #1 done > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of > /controller_epoch changed sent to > kafka.controller.ControllerEpochListener@5cb88f42] > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:78) - Processing request:: > sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe > txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists > cxid:0x1d zxid:0xfffe txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] > (ClientCnxn.java:838) - Reading reply sessionid:0x13e9127882e0001, packet:: > clientPath:nu
[jira] [Updated] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed
[ https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-903: -- Attachment: (was: kafka-903_v3.patch) > [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > --- > > Key: KAFKA-903 > URL: https://issues.apache.org/jira/browse/KAFKA-903 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8 > Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, > probably copied on 4/30. kafka-0.8, built current on 4/30. > -rwx--+ 1 reefedjib None 41123 Mar 19 2009 commons-cli-1.2.jar > -rwx--+ 1 reefedjib None 58160 Jan 11 13:45 commons-codec-1.4.jar > -rwx--+ 1 reefedjib None 575389 Apr 18 13:41 > commons-collections-3.2.1.jar > -rwx--+ 1 reefedjib None 143847 May 21 2009 commons-compress-1.0.jar > -rwx--+ 1 reefedjib None 52543 Jan 11 13:45 commons-exec-1.1.jar > -rwx--+ 1 reefedjib None 57779 Jan 11 13:45 commons-fileupload-1.2.1.jar > -rwx--+ 1 reefedjib None 109043 Jan 20 2008 commons-io-1.4.jar > -rwx--+ 1 reefedjib None 279193 Jan 11 13:45 commons-lang-2.5.jar > -rwx--+ 1 reefedjib None 60686 Jan 11 13:45 commons-logging-1.1.1.jar > -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar > -rwx--+ 1 reefedjib None 206866 Apr 7 21:24 jackson-core-2.1.4.jar > -rwx--+ 1 reefedjib None 232245 Apr 7 21:24 jackson-core-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 69314 Apr 7 21:24 > jackson-dataformat-smile-2.1.4.jar > -rwx--+ 1 reefedjib None 780385 Apr 7 21:24 > jackson-mapper-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 47913 May 9 23:39 jopt-simple-3.0-rc2.jar > -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 > kafka_2.8.0-0.8.0-SNAPSHOT.jar > -rwx--+ 1 reefedjib None 481535 Jan 11 13:46 log4j-1.2.16.jar > -rwx--+ 1 reefedjib None 20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar > -rwx--+ 1 reefedjib None 251784 Apr 18 13:41 logback-classic-1.0.6.jar > -rwx--+ 1 reefedjib None 349706 Apr 18 13:41 logback-core-1.0.6.jar > -rwx--+ 1 reefedjib None 82123 Nov 26 13:11 metrics-core-2.2.0.jar > -rwx--+ 1 reefedjib None 1540457 Jul 12 2012 ojdbc14.jar > -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar > -rwx--+ 1 reefedjib None 3114958 Apr 2 07:47 scalatest_2.10-1.9.1.jar > -rwx--+ 1 reefedjib None 25962 Apr 18 13:41 slf4j-api-1.6.5.jar > -rwx--+ 1 reefedjib None 62269 Nov 29 03:26 zkclient-0.2.jar > -rwx--+ 1 reefedjib None 601677 Apr 18 13:41 zookeeper-3.3.3.jar >Reporter: Rob Withers >Priority: Blocker > Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, > kafka-903_v2.patch, kafka-903_v3.patch > > > This FATAL shuts down both brokers on windows, > {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 > messages with no compression to [robert_v_2x0,0] > {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer > sending messages with correlation id 178 for topics [robert_v_2x0,0] to > broker 1 on 192.168.1.100:9093 > {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > {2013-05-10 18:23:57,739} INFO [Thread-4] (Logging.scala:67) - [Kafka > Server 0], shutting down > Furthermore, attempts to restart them fail, with the following log: > {2013-05-10 19:14:52,156} INFO [Thread-1] (Logging.scala:67) - [Kafka Server > 0], started > {2013-05-10 19:14:52,157} INFO [ZkClient-EventThread-32-localhost:2181] > (Logging.scala:67) - New leader is 0 > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:79) - Delivering event #1 done > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of > /controller_epoch changed sent to > kafka.controller.ControllerEpochListener@5cb88f42] > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:78) - Processing request:: > sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe > txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists > cxid:0x1d zxid:0xfffe txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] > (ClientCnxn.java:838) - Reading reply sessionid:0x13e9127882e0001, packet:: > cl
[jira] [Updated] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed
[ https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-903: -- Attachment: kafka-903_v3.patch Attach patch v3. To address Jay's concern, instead of using a generic renameTo util, only falls back to the non-atomic renameTo in checkpointing the high watermark file. Since both files are in the same dir and we control the naming, those other causes you listed that can fail renameTo won't happen. I didn't do the os level checking since I am not sure it that works well for environments like cygwin. We could guard this under a broker config parameter, but I am not sure if it's worth it. For Sriram's concern, this seems to be at least a problem for some versions of java on Windows since other projects like Hadoop (https://issues.apache.org/jira/browse/HADOOP-959) have also seen this before. > [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > --- > > Key: KAFKA-903 > URL: https://issues.apache.org/jira/browse/KAFKA-903 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8 > Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, > probably copied on 4/30. kafka-0.8, built current on 4/30. > -rwx--+ 1 reefedjib None 41123 Mar 19 2009 commons-cli-1.2.jar > -rwx--+ 1 reefedjib None 58160 Jan 11 13:45 commons-codec-1.4.jar > -rwx--+ 1 reefedjib None 575389 Apr 18 13:41 > commons-collections-3.2.1.jar > -rwx--+ 1 reefedjib None 143847 May 21 2009 commons-compress-1.0.jar > -rwx--+ 1 reefedjib None 52543 Jan 11 13:45 commons-exec-1.1.jar > -rwx--+ 1 reefedjib None 57779 Jan 11 13:45 commons-fileupload-1.2.1.jar > -rwx--+ 1 reefedjib None 109043 Jan 20 2008 commons-io-1.4.jar > -rwx--+ 1 reefedjib None 279193 Jan 11 13:45 commons-lang-2.5.jar > -rwx--+ 1 reefedjib None 60686 Jan 11 13:45 commons-logging-1.1.1.jar > -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar > -rwx--+ 1 reefedjib None 206866 Apr 7 21:24 jackson-core-2.1.4.jar > -rwx--+ 1 reefedjib None 232245 Apr 7 21:24 jackson-core-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 69314 Apr 7 21:24 > jackson-dataformat-smile-2.1.4.jar > -rwx--+ 1 reefedjib None 780385 Apr 7 21:24 > jackson-mapper-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 47913 May 9 23:39 jopt-simple-3.0-rc2.jar > -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 > kafka_2.8.0-0.8.0-SNAPSHOT.jar > -rwx--+ 1 reefedjib None 481535 Jan 11 13:46 log4j-1.2.16.jar > -rwx--+ 1 reefedjib None 20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar > -rwx--+ 1 reefedjib None 251784 Apr 18 13:41 logback-classic-1.0.6.jar > -rwx--+ 1 reefedjib None 349706 Apr 18 13:41 logback-core-1.0.6.jar > -rwx--+ 1 reefedjib None 82123 Nov 26 13:11 metrics-core-2.2.0.jar > -rwx--+ 1 reefedjib None 1540457 Jul 12 2012 ojdbc14.jar > -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar > -rwx--+ 1 reefedjib None 3114958 Apr 2 07:47 scalatest_2.10-1.9.1.jar > -rwx--+ 1 reefedjib None 25962 Apr 18 13:41 slf4j-api-1.6.5.jar > -rwx--+ 1 reefedjib None 62269 Nov 29 03:26 zkclient-0.2.jar > -rwx--+ 1 reefedjib None 601677 Apr 18 13:41 zookeeper-3.3.3.jar >Reporter: Rob Withers >Priority: Blocker > Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, > kafka-903_v2.patch, kafka-903_v3.patch > > > This FATAL shuts down both brokers on windows, > {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 > messages with no compression to [robert_v_2x0,0] > {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer > sending messages with correlation id 178 for topics [robert_v_2x0,0] to > broker 1 on 192.168.1.100:9093 > {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > {2013-05-10 18:23:57,739} INFO [Thread-4] (Logging.scala:67) - [Kafka > Server 0], shutting down > Furthermore, attempts to restart them fail, with the following log: > {2013-05-10 19:14:52,156} INFO [Thread-1] (Logging.scala:67) - [Kafka Server > 0], started > {2013-05-10 19:14:52,157} INFO [ZkClient-EventThread-32-localhost:2181] > (Logging.scala:67) - New leader is 0 > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:79) - Delivering event #1 done > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:69) - De
[jira] [Commented] (KAFKA-917) Expose zk.session.timeout.ms in console consumer
[ https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668885#comment-13668885 ] Swapnil Ghike commented on KAFKA-917: - Makes sense, opened KAFKA-924. > Expose zk.session.timeout.ms in console consumer > > > Key: KAFKA-917 > URL: https://issues.apache.org/jira/browse/KAFKA-917 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.7, 0.8 >Reporter: Swapnil Ghike >Assignee: Swapnil Ghike >Priority: Blocker > Labels: bugs > Fix For: 0.8 > > Attachments: kafka-917.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-924) Specify console consumer properties via a single --property command line parameter
Swapnil Ghike created KAFKA-924: --- Summary: Specify console consumer properties via a single --property command line parameter Key: KAFKA-924 URL: https://issues.apache.org/jira/browse/KAFKA-924 Project: Kafka Issue Type: Bug Reporter: Swapnil Ghike Quoting Neha from KAFKA-917: I think the right way to add access to all consumer properties is to specify it through a single --property command line parameter that takes in "key1=value1,key2=value2,..." list. That will make sure we don't have to keep changing console consumer as well add/remove config options on the consumer. Some configs make sense to be top level for console consumer though. Things like topic, from-beginning, groupid etc. Rest can be specified through the "property" parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
KAFKA-259
Hi, I have submitted a patch for KAFKA-259 and its been approved by legal. Can you please go ahead and verify it and let me know how to proceed? https://issues.apache.org/jira/browse/KAFKA-259 Thanks for all your help, Ashwanth
[jira] [Commented] (KAFKA-917) Expose zk.session.timeout.ms in console consumer
[ https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668846#comment-13668846 ] Neha Narkhede commented on KAFKA-917: - This, by itself, looks ok. I think the right way to add access to all consumer properties is to specify it through a single --property command line parameter that takes in "key1=value1,key2=value2,..." list. That will make sure we don't have to keep changing console consumer as well add/remove config options on the consumer. Some configs make sense to be top level for console consumer though. Things like topic, from-beginning, groupid etc. Rest can be specified through the "property" parameter. Can you file a JIRA to clean that up for 0.8? For now, we can accept this. > Expose zk.session.timeout.ms in console consumer > > > Key: KAFKA-917 > URL: https://issues.apache.org/jira/browse/KAFKA-917 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.7, 0.8 >Reporter: Swapnil Ghike >Assignee: Swapnil Ghike >Priority: Blocker > Labels: bugs > Fix For: 0.8 > > Attachments: kafka-917.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-923) Improve controller failover latency
[ https://issues.apache.org/jira/browse/KAFKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-923: Attachment: kafka-923-v1.patch There are probably more improvements possible in controller failover, but at this point the aim is to focus on the low hanging fruits and avoid affecting stability. This patch - optimizes the replica state machine startup by not reading the partition information for all topics from zookeeper again - optimizes the partition state machine startup by not reading the leadership information for all partitions from zookeeper again > Improve controller failover latency > --- > > Key: KAFKA-923 > URL: https://issues.apache.org/jira/browse/KAFKA-923 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 0.8 >Reporter: Neha Narkhede >Assignee: Neha Narkhede >Priority: Critical > Labels: kafka-0.8 > Attachments: kafka-923-v1.patch > > > During controller failover, we do the following things - > 1. Increment controller epoch > 2. Initialize controller context > 3. Initialize replica state machine > 4. Initialize partition state machine > During step 2 above, we read the information of all topics and partitions, > the replica assignments and leadership information. During step 3 and 4, we > re-read some of this information from zookeeper. Since the zookeeper reads > are proportional to the number of topics and the reads are serial, it is > important to optimize this. The zookeeper reads in steps 3 and 4 are not > required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-923) Improve controller failover latency
[ https://issues.apache.org/jira/browse/KAFKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-923: Status: Patch Available (was: Open) > Improve controller failover latency > --- > > Key: KAFKA-923 > URL: https://issues.apache.org/jira/browse/KAFKA-923 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 0.8 >Reporter: Neha Narkhede >Assignee: Neha Narkhede >Priority: Critical > Labels: kafka-0.8 > > During controller failover, we do the following things - > 1. Increment controller epoch > 2. Initialize controller context > 3. Initialize replica state machine > 4. Initialize partition state machine > During step 2 above, we read the information of all topics and partitions, > the replica assignments and leadership information. During step 3 and 4, we > re-read some of this information from zookeeper. Since the zookeeper reads > are proportional to the number of topics and the reads are serial, it is > important to optimize this. The zookeeper reads in steps 3 and 4 are not > required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-923) Improve controller failover latency
Neha Narkhede created KAFKA-923: --- Summary: Improve controller failover latency Key: KAFKA-923 URL: https://issues.apache.org/jira/browse/KAFKA-923 Project: Kafka Issue Type: Improvement Components: controller Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical During controller failover, we do the following things - 1. Increment controller epoch 2. Initialize controller context 3. Initialize replica state machine 4. Initialize partition state machine During step 2 above, we read the information of all topics and partitions, the replica assignments and leadership information. During step 3 and 4, we re-read some of this information from zookeeper. Since the zookeeper reads are proportional to the number of topics and the reads are serial, it is important to optimize this. The zookeeper reads in steps 3 and 4 are not required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-917) Expose zk.session.timeout.ms in console consumer
[ https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668781#comment-13668781 ] Swapnil Ghike commented on KAFKA-917: - Hi, could someone review this? Thanks! > Expose zk.session.timeout.ms in console consumer > > > Key: KAFKA-917 > URL: https://issues.apache.org/jira/browse/KAFKA-917 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.7, 0.8 >Reporter: Swapnil Ghike >Assignee: Swapnil Ghike >Priority: Blocker > Labels: bugs > Fix For: 0.8 > > Attachments: kafka-917.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-922) System Test - set retry.backoff.ms=300 to testcase_0119
John Fung created KAFKA-922: --- Summary: System Test - set retry.backoff.ms=300 to testcase_0119 Key: KAFKA-922 URL: https://issues.apache.org/jira/browse/KAFKA-922 Project: Kafka Issue Type: Task Reporter: John Fung -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed
[ https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668746#comment-13668746 ] Timothy Chen commented on KAFKA-903: Do you know when this is going to be pushed to 0.8 branch? > [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > --- > > Key: KAFKA-903 > URL: https://issues.apache.org/jira/browse/KAFKA-903 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8 > Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, > probably copied on 4/30. kafka-0.8, built current on 4/30. > -rwx--+ 1 reefedjib None 41123 Mar 19 2009 commons-cli-1.2.jar > -rwx--+ 1 reefedjib None 58160 Jan 11 13:45 commons-codec-1.4.jar > -rwx--+ 1 reefedjib None 575389 Apr 18 13:41 > commons-collections-3.2.1.jar > -rwx--+ 1 reefedjib None 143847 May 21 2009 commons-compress-1.0.jar > -rwx--+ 1 reefedjib None 52543 Jan 11 13:45 commons-exec-1.1.jar > -rwx--+ 1 reefedjib None 57779 Jan 11 13:45 commons-fileupload-1.2.1.jar > -rwx--+ 1 reefedjib None 109043 Jan 20 2008 commons-io-1.4.jar > -rwx--+ 1 reefedjib None 279193 Jan 11 13:45 commons-lang-2.5.jar > -rwx--+ 1 reefedjib None 60686 Jan 11 13:45 commons-logging-1.1.1.jar > -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar > -rwx--+ 1 reefedjib None 206866 Apr 7 21:24 jackson-core-2.1.4.jar > -rwx--+ 1 reefedjib None 232245 Apr 7 21:24 jackson-core-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 69314 Apr 7 21:24 > jackson-dataformat-smile-2.1.4.jar > -rwx--+ 1 reefedjib None 780385 Apr 7 21:24 > jackson-mapper-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 47913 May 9 23:39 jopt-simple-3.0-rc2.jar > -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 > kafka_2.8.0-0.8.0-SNAPSHOT.jar > -rwx--+ 1 reefedjib None 481535 Jan 11 13:46 log4j-1.2.16.jar > -rwx--+ 1 reefedjib None 20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar > -rwx--+ 1 reefedjib None 251784 Apr 18 13:41 logback-classic-1.0.6.jar > -rwx--+ 1 reefedjib None 349706 Apr 18 13:41 logback-core-1.0.6.jar > -rwx--+ 1 reefedjib None 82123 Nov 26 13:11 metrics-core-2.2.0.jar > -rwx--+ 1 reefedjib None 1540457 Jul 12 2012 ojdbc14.jar > -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar > -rwx--+ 1 reefedjib None 3114958 Apr 2 07:47 scalatest_2.10-1.9.1.jar > -rwx--+ 1 reefedjib None 25962 Apr 18 13:41 slf4j-api-1.6.5.jar > -rwx--+ 1 reefedjib None 62269 Nov 29 03:26 zkclient-0.2.jar > -rwx--+ 1 reefedjib None 601677 Apr 18 13:41 zookeeper-3.3.3.jar >Reporter: Rob Withers >Priority: Blocker > Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, > kafka-903_v2.patch > > > This FATAL shuts down both brokers on windows, > {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 > messages with no compression to [robert_v_2x0,0] > {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer > sending messages with correlation id 178 for topics [robert_v_2x0,0] to > broker 1 on 192.168.1.100:9093 > {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > {2013-05-10 18:23:57,739} INFO [Thread-4] (Logging.scala:67) - [Kafka > Server 0], shutting down > Furthermore, attempts to restart them fail, with the following log: > {2013-05-10 19:14:52,156} INFO [Thread-1] (Logging.scala:67) - [Kafka Server > 0], started > {2013-05-10 19:14:52,157} INFO [ZkClient-EventThread-32-localhost:2181] > (Logging.scala:67) - New leader is 0 > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:79) - Delivering event #1 done > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of > /controller_epoch changed sent to > kafka.controller.ControllerEpochListener@5cb88f42] > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:78) - Processing request:: > sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe > txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists > cxid:0x1d zxid:0xfffe txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] > (ClientCn
[jira] [Updated] (KAFKA-921) Expose max lag mbean for consumers and replica fetchers
[ https://issues.apache.org/jira/browse/KAFKA-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Koshy updated KAFKA-921: - Attachment: KAFKA-921-v1.patch This provides a max lag mbean for both consumer fetcher manager and replica fetcher manager; although I think it is more useful for monitoring consumers. For replica fetchers we need to closely monitor all replica fetchers anyway. i.e., the set of mbeans is static. I can reduce the scope to just consumers if others agree. > Expose max lag mbean for consumers and replica fetchers > --- > > Key: KAFKA-921 > URL: https://issues.apache.org/jira/browse/KAFKA-921 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Joel Koshy > Fix For: 0.8 > > Attachments: KAFKA-921-v1.patch > > > We have a ton of consumer mbeans with names that are derived from the > consumer id, broker being fetched from, fetcher id, etc. This makes it > difficult to do basic monitoring of consumer/replica fetcher lag - since the > mbean to monitor can change. A more useful metric for monitoring purposes is > the maximum lag across all fetchers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-921) Expose max lag mbean for consumers and replica fetchers
Joel Koshy created KAFKA-921: Summary: Expose max lag mbean for consumers and replica fetchers Key: KAFKA-921 URL: https://issues.apache.org/jira/browse/KAFKA-921 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Joel Koshy Fix For: 0.8 We have a ton of consumer mbeans with names that are derived from the consumer id, broker being fetched from, fetcher id, etc. This makes it difficult to do basic monitoring of consumer/replica fetcher lag - since the mbean to monitor can change. A more useful metric for monitoring purposes is the maximum lag across all fetchers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Deadline Extension: 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)
we apologize if you receive multiple copies of this message === CALL FOR PAPERS 2013 Workshop on Middleware for HPC and Big Data Systems MHPC '13 as part of Euro-Par 2013, Aachen, Germany === Date: August 27, 2012 Workshop URL: http://m-hpc.org Springer LNCS SUBMISSION DEADLINE: June 10, 2013 - LNCS Full paper submission (extended) June 28, 2013 - Lightning Talk abstracts SCOPE Extremely large, diverse, and complex data sets are generated from scientific applications, the Internet, social media and other applications. Data may be physically distributed and shared by an ever larger community. Collecting, aggregating, storing and analyzing large data volumes presents major challenges. Processing such amounts of data efficiently has been an issue to scientific discovery and technological advancement. In addition, making the data accessible, understandable and interoperable includes unsolved problems. Novel middleware architectures, algorithms, and application development frameworks are required. In this workshop we are particularly interested in original work at the intersection of HPC and Big Data with regard to middleware handling and optimizations. Scope is existing and proposed middleware for HPC and big data, including analytics libraries and frameworks. The goal of this workshop is to bring together software architects, middleware and framework developers, data-intensive application developers as well as users from the scientific and engineering community to exchange their experience in processing large datasets and to report their scientific achievement and innovative ideas. The workshop also offers a dedicated forum for these researchers to access the state of the art, to discuss problems and requirements, to identify gaps in current and planned designs, and to collaborate in strategies for scalable data-intensive computing. The workshop will be one day in length, composed of 20 min paper presentations, each followed by 10 min discussion sections. Presentations may be accompanied by interactive demonstrations. TOPICS Topics of interest include, but are not limited to: - Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive, Pig, Sqoop, HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack - Data intensive middleware architecture - Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab - NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase - Schedulers including Cascading - Middleware for optimized data locality/in-place data processing - Data handling middleware for deployment in virtualized HPC environments - Parallelization and distributed processing architectures at the middleware level - Integration with cloud middleware and application servers - Runtime environments and system level support for data-intensive computing - Skeletons and patterns - Checkpointing - Programming models and languages - Big Data ETL - Stream processing middleware - In-memory databases for HPC - Scalability and interoperability - Large-scale data storage and distributed file systems - Content-centric addressing and networking - Execution engines, languages and environments including CIEL/Skywriting - Performance analysis, evaluation of data-intensive middleware - In-depth analysis and performance optimizations in existing data-handling middleware, focusing on indexing/fast storing or retrieval between compute and storage nodes - Highly scalable middleware optimized for minimum communication - Use cases and experience for popular Big Data middleware - Middleware security, privacy and trust architectures DATES Papers: Rolling abstract submission June 10, 2013 - Full paper submission (extended) July 8, 2013 - Acceptance notification October 3, 2013 - Camera-ready version due Lightning Talks: June 28, 2013 - Deadline for lightning talk abstracts July 15, 2013 - Lightning talk notification August 27, 2013 - Workshop Date TPC CHAIR Michael Alexander (chair), TU Wien, Austria Anastassios Nanos (co-chair), NTUA, Greece Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany Lizhe Wang (co-chair), Chinese Academy of Sciences, China Gianluigi Zanetti (co-chair), CRS4, Italy PROGRAM COMMITTEE Amitanand Aiyer, Facebook, USA Costas Bekas, IBM, Switzerland Jakob Blomer, CERN, Switzerland William Gardner, University of Guelph, Canada José Gracia, HPC Center of the University of Stuttgart, Germany Zhenghua Guom, Indiana University, USA Marcus Hardt, Karlsruhe Institute of Technology, Germany Sverre Jarp, CERN, Switzerland Christopher Jung, Karlsruhe Institute of Technology, Germany Andreas Knüpfer - Technische Universität Dresden, Germany Nectarios Koziris, National Technical University of Athens, Greece Yan Ma, Chinese Academy of Sciences, China Martin Schulz - Lawrence Livermore National Laboratory V
[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request
[ https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668466#comment-13668466 ] Sriram Subramanian commented on KAFKA-911: -- I suggest we wait for my patch. My patch changes quite a bit of this logic and it just adds to the merge problem. > Bug in controlled shutdown logic in controller leads to controller not > sending out some state change request > - > > Key: KAFKA-911 > URL: https://issues.apache.org/jira/browse/KAFKA-911 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8 >Reporter: Neha Narkhede >Assignee: Neha Narkhede >Priority: Blocker > Labels: kafka-0.8, p1 > Attachments: kafka-911-v1.patch, kafka-911-v2.patch > > > The controlled shutdown logic in the controller first tries to move the > leaders from the broker being shutdown. Then it tries to remove the broker > from the isr list. During that operation, it does not synchronize on the > controllerLock. This causes a race condition while dispatching data using the > controller's channel manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request
[ https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-911: Attachment: kafka-911-v2.patch I agree with Joel's suggestion. Removing the shutting down brokers from the ISR is better. This patch sends the LeaderAndIsrRequest with the reduced isr to the new leader for the partitions on the shutting down brokers. This ensures the leader will remove the shutting down broker from the isr in zookeeper. This also makes it unnecessary for the shrunk isr zookeeper write to happen during the controlled shutdown on the controller. > Bug in controlled shutdown logic in controller leads to controller not > sending out some state change request > - > > Key: KAFKA-911 > URL: https://issues.apache.org/jira/browse/KAFKA-911 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8 >Reporter: Neha Narkhede >Assignee: Neha Narkhede >Priority: Blocker > Labels: kafka-0.8, p1 > Attachments: kafka-911-v1.patch, kafka-911-v2.patch > > > The controlled shutdown logic in the controller first tries to move the > leaders from the broker being shutdown. Then it tries to remove the broker > from the isr list. During that operation, it does not synchronize on the > controllerLock. This causes a race condition while dispatching data using the > controller's channel manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (KAFKA-916) Deadlock between fetcher shutdown and handling partitions with error
[ https://issues.apache.org/jira/browse/KAFKA-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Koshy closed KAFKA-916. Thanks for the review. Committed to 0.8 > Deadlock between fetcher shutdown and handling partitions with error > > > Key: KAFKA-916 > URL: https://issues.apache.org/jira/browse/KAFKA-916 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Joel Koshy > Fix For: 0.8 > > Attachments: KAFKA-916-v1.patch > > > Here is another consumer deadlock that we encountered. All consumers are > vulnerable to this during a rebalance if there happen to be partitions in > error. > On a rebalance, the fetcher manager closes all fetchers and this holds on to > the fetcher thread map's lock. (mapLock in AbstractFetcherManager). [t1] > While the fetcher manager is iterating over fetchers to stop them, a fetcher > that is yet to be stopped hits an error on a partition and proceeds to > handle partitions with error [t2]. This handling involves looking up the > fetcher for that partition and then removing it from the fetcher's set of > partitions to consume. This requires grabbing the same map lock in [t1], > hence the deadlock. > [t1] > 2013-05-22_20:23:11.95767 "main" prio=10 tid=0x7f1b24007800 nid=0x573b > waiting on condition [0x7f1b2bd38000] > 2013-05-22_20:23:11.95767java.lang.Thread.State: WAITING (parking) > 2013-05-22_20:23:11.95767 at sun.misc.Unsafe.park(Native Method) > 2013-05-22_20:23:11.95767 - parking to wait for <0x7f1a25780598> (a > java.util.concurrent.CountDownLatch$Sync) > 2013-05-22_20:23:11.95767 at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > 2013-05-22_20:23:11.95767 at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > 2013-05-22_20:23:11.95768 at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) > 2013-05-22_20:23:11.95768 at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) > 2013-05-22_20:23:11.95768 at > java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207) > 2013-05-22_20:23:11.95768 at > kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:36) > 2013-05-22_20:23:11.95769 at > kafka.server.AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:68) > 2013-05-22_20:23:11.95769 at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$1.apply(AbstractFetcherManager.scala:79) > 2013-05-22_20:23:11.95769 at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$1.apply(AbstractFetcherManager.scala:78) > 2013-05-22_20:23:11.95769 at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) > 2013-05-22_20:23:11.95769 at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) > 2013-05-22_20:23:11.95770 at > scala.collection.Iterator$class.foreach(Iterator.scala:631) > 2013-05-22_20:23:11.95770 at > scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161) > 2013-05-22_20:23:11.95770 at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194) > 2013-05-22_20:23:11.95770 at > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > 2013-05-22_20:23:11.95771 at > scala.collection.mutable.HashMap.foreach(HashMap.scala:80) > 2013-05-22_20:23:11.95771 at > kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:78) > ---> 2013-05-22_20:23:11.95771- locked <0x7f1a2ae92510> (a > java.lang.Object) > 2013-05-22_20:23:11.95771 at > kafka.consumer.ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:156) > 2013-05-22_20:23:11.95771 at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$closeFetchersForQueues(ZookeeperConsumerConnector.scala:488) > 2013-05-22_20:23:11.95772 at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.closeFetchers(ZookeeperConsumerConnector.scala:525) > 2013-05-22_20:23:11.95772 at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:422) > 2013-05-22_20:23:11.95772 at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374) > 2013-05-22_20:23:11.95772 at > scala.collection.immutable.Range$ByOne$class.foreach$mVc$sp(Range.scala:282) > 2013-05-22_20:23:11.95773 at > scala.col
[jira] [Resolved] (KAFKA-916) Deadlock between fetcher shutdown and handling partitions with error
[ https://issues.apache.org/jira/browse/KAFKA-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Koshy resolved KAFKA-916. -- Resolution: Fixed > Deadlock between fetcher shutdown and handling partitions with error > > > Key: KAFKA-916 > URL: https://issues.apache.org/jira/browse/KAFKA-916 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Joel Koshy > Fix For: 0.8 > > Attachments: KAFKA-916-v1.patch > > > Here is another consumer deadlock that we encountered. All consumers are > vulnerable to this during a rebalance if there happen to be partitions in > error. > On a rebalance, the fetcher manager closes all fetchers and this holds on to > the fetcher thread map's lock. (mapLock in AbstractFetcherManager). [t1] > While the fetcher manager is iterating over fetchers to stop them, a fetcher > that is yet to be stopped hits an error on a partition and proceeds to > handle partitions with error [t2]. This handling involves looking up the > fetcher for that partition and then removing it from the fetcher's set of > partitions to consume. This requires grabbing the same map lock in [t1], > hence the deadlock. > [t1] > 2013-05-22_20:23:11.95767 "main" prio=10 tid=0x7f1b24007800 nid=0x573b > waiting on condition [0x7f1b2bd38000] > 2013-05-22_20:23:11.95767java.lang.Thread.State: WAITING (parking) > 2013-05-22_20:23:11.95767 at sun.misc.Unsafe.park(Native Method) > 2013-05-22_20:23:11.95767 - parking to wait for <0x7f1a25780598> (a > java.util.concurrent.CountDownLatch$Sync) > 2013-05-22_20:23:11.95767 at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > 2013-05-22_20:23:11.95767 at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > 2013-05-22_20:23:11.95768 at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) > 2013-05-22_20:23:11.95768 at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) > 2013-05-22_20:23:11.95768 at > java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207) > 2013-05-22_20:23:11.95768 at > kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:36) > 2013-05-22_20:23:11.95769 at > kafka.server.AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:68) > 2013-05-22_20:23:11.95769 at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$1.apply(AbstractFetcherManager.scala:79) > 2013-05-22_20:23:11.95769 at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$1.apply(AbstractFetcherManager.scala:78) > 2013-05-22_20:23:11.95769 at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) > 2013-05-22_20:23:11.95769 at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) > 2013-05-22_20:23:11.95770 at > scala.collection.Iterator$class.foreach(Iterator.scala:631) > 2013-05-22_20:23:11.95770 at > scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161) > 2013-05-22_20:23:11.95770 at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194) > 2013-05-22_20:23:11.95770 at > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > 2013-05-22_20:23:11.95771 at > scala.collection.mutable.HashMap.foreach(HashMap.scala:80) > 2013-05-22_20:23:11.95771 at > kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:78) > ---> 2013-05-22_20:23:11.95771- locked <0x7f1a2ae92510> (a > java.lang.Object) > 2013-05-22_20:23:11.95771 at > kafka.consumer.ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:156) > 2013-05-22_20:23:11.95771 at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$closeFetchersForQueues(ZookeeperConsumerConnector.scala:488) > 2013-05-22_20:23:11.95772 at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.closeFetchers(ZookeeperConsumerConnector.scala:525) > 2013-05-22_20:23:11.95772 at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:422) > 2013-05-22_20:23:11.95772 at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374) > 2013-05-22_20:23:11.95772 at > scala.collection.immutable.Range$ByOne$class.foreach$mVc$sp(Range.scala:282) > 2013-05-22_20:23:11.95773 at > scala.collection.immutable.Range$$an