[jira] [Updated] (KAFKA-922) System Test - set retry.backoff.ms=300 to testcase_0119

2013-05-28 Thread John Fung (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fung updated KAFKA-922:


Attachment: kafka-922-v1.patch

Uploaded kafka-922-v2.patch.

> System Test - set retry.backoff.ms=300 to testcase_0119
> ---
>
> Key: KAFKA-922
> URL: https://issues.apache.org/jira/browse/KAFKA-922
> Project: Kafka
>  Issue Type: Task
>Reporter: John Fung
> Attachments: kafka-922-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-922) System Test - set retry.backoff.ms=300 to testcase_0119

2013-05-28 Thread John Fung (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fung updated KAFKA-922:


Status: Patch Available  (was: Open)

Trivial patch to set retry.backoff.ms=300

> System Test - set retry.backoff.ms=300 to testcase_0119
> ---
>
> Key: KAFKA-922
> URL: https://issues.apache.org/jira/browse/KAFKA-922
> Project: Kafka
>  Issue Type: Task
>Reporter: John Fung
> Attachments: kafka-922-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-921) Expose max lag mbean for consumers and replica fetchers

2013-05-28 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668994#comment-13668994
 ] 

Jun Rao commented on KAFKA-921:
---

Thanks for the patch. Could we add the max lag in one place in 
AbstractFetcherThread and AbstractFetcherManager? We can pass in the proper 
metrics name.

> Expose max lag mbean for consumers and replica fetchers
> ---
>
> Key: KAFKA-921
> URL: https://issues.apache.org/jira/browse/KAFKA-921
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Joel Koshy
> Fix For: 0.8
>
> Attachments: KAFKA-921-v1.patch
>
>
> We have a ton of consumer mbeans with names that are derived from the 
> consumer id, broker being fetched from, fetcher id, etc. This makes it 
> difficult to do basic monitoring of consumer/replica fetcher lag - since the 
> mbean to monitor can change. A more useful metric for monitoring purposes is 
> the maximum lag across all fetchers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-05-28 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-903:
--

Attachment: kafka-903_v3.patch

> [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> ---
>
> Key: KAFKA-903
> URL: https://issues.apache.org/jira/browse/KAFKA-903
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8
> Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
> probably copied on 4/30. kafka-0.8, built current on 4/30.
> -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
> -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
> -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
> commons-collections-3.2.1.jar
> -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
> -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
> -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
> -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar
> -rwx--+ 1 reefedjib None  279193 Jan 11 13:45 commons-lang-2.5.jar
> -rwx--+ 1 reefedjib None   60686 Jan 11 13:45 commons-logging-1.1.1.jar
> -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar
> -rwx--+ 1 reefedjib None  206866 Apr  7 21:24 jackson-core-2.1.4.jar
> -rwx--+ 1 reefedjib None  232245 Apr  7 21:24 jackson-core-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   69314 Apr  7 21:24 
> jackson-dataformat-smile-2.1.4.jar
> -rwx--+ 1 reefedjib None  780385 Apr  7 21:24 
> jackson-mapper-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   47913 May  9 23:39 jopt-simple-3.0-rc2.jar
> -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 
> kafka_2.8.0-0.8.0-SNAPSHOT.jar
> -rwx--+ 1 reefedjib None  481535 Jan 11 13:46 log4j-1.2.16.jar
> -rwx--+ 1 reefedjib None   20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar
> -rwx--+ 1 reefedjib None  251784 Apr 18 13:41 logback-classic-1.0.6.jar
> -rwx--+ 1 reefedjib None  349706 Apr 18 13:41 logback-core-1.0.6.jar
> -rwx--+ 1 reefedjib None   82123 Nov 26 13:11 metrics-core-2.2.0.jar
> -rwx--+ 1 reefedjib None 1540457 Jul 12  2012 ojdbc14.jar
> -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar
> -rwx--+ 1 reefedjib None 3114958 Apr  2 07:47 scalatest_2.10-1.9.1.jar
> -rwx--+ 1 reefedjib None   25962 Apr 18 13:41 slf4j-api-1.6.5.jar
> -rwx--+ 1 reefedjib None   62269 Nov 29 03:26 zkclient-0.2.jar
> -rwx--+ 1 reefedjib None  601677 Apr 18 13:41 zookeeper-3.3.3.jar
>Reporter: Rob Withers
>Priority: Blocker
> Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, 
> kafka-903_v2.patch, kafka-903_v3.patch
>
>
> This FATAL shuts down both brokers on windows, 
> {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 
> messages with no compression to [robert_v_2x0,0]
> {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer 
> sending messages with correlation id 178 for topics [robert_v_2x0,0] to 
> broker 1 on 192.168.1.100:9093
> {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> {2013-05-10 18:23:57,739}  INFO [Thread-4] (Logging.scala:67) - [Kafka 
> Server 0], shutting down
> Furthermore, attempts to restart them fail, with the following log:
> {2013-05-10 19:14:52,156}  INFO [Thread-1] (Logging.scala:67) - [Kafka Server 
> 0], started
> {2013-05-10 19:14:52,157}  INFO [ZkClient-EventThread-32-localhost:2181] 
> (Logging.scala:67) - New leader is 0
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:79) - Delivering event #1 done
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of 
> /controller_epoch changed sent to 
> kafka.controller.ControllerEpochListener@5cb88f42]
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:78) - Processing request:: 
> sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe 
> txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists 
> cxid:0x1d zxid:0xfffe txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] 
> (ClientCnxn.java:838) - Reading reply sessionid:0x13e9127882e0001, packet:: 
> clientPath:nu

[jira] [Updated] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-05-28 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-903:
--

Attachment: (was: kafka-903_v3.patch)

> [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> ---
>
> Key: KAFKA-903
> URL: https://issues.apache.org/jira/browse/KAFKA-903
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8
> Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
> probably copied on 4/30. kafka-0.8, built current on 4/30.
> -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
> -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
> -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
> commons-collections-3.2.1.jar
> -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
> -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
> -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
> -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar
> -rwx--+ 1 reefedjib None  279193 Jan 11 13:45 commons-lang-2.5.jar
> -rwx--+ 1 reefedjib None   60686 Jan 11 13:45 commons-logging-1.1.1.jar
> -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar
> -rwx--+ 1 reefedjib None  206866 Apr  7 21:24 jackson-core-2.1.4.jar
> -rwx--+ 1 reefedjib None  232245 Apr  7 21:24 jackson-core-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   69314 Apr  7 21:24 
> jackson-dataformat-smile-2.1.4.jar
> -rwx--+ 1 reefedjib None  780385 Apr  7 21:24 
> jackson-mapper-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   47913 May  9 23:39 jopt-simple-3.0-rc2.jar
> -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 
> kafka_2.8.0-0.8.0-SNAPSHOT.jar
> -rwx--+ 1 reefedjib None  481535 Jan 11 13:46 log4j-1.2.16.jar
> -rwx--+ 1 reefedjib None   20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar
> -rwx--+ 1 reefedjib None  251784 Apr 18 13:41 logback-classic-1.0.6.jar
> -rwx--+ 1 reefedjib None  349706 Apr 18 13:41 logback-core-1.0.6.jar
> -rwx--+ 1 reefedjib None   82123 Nov 26 13:11 metrics-core-2.2.0.jar
> -rwx--+ 1 reefedjib None 1540457 Jul 12  2012 ojdbc14.jar
> -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar
> -rwx--+ 1 reefedjib None 3114958 Apr  2 07:47 scalatest_2.10-1.9.1.jar
> -rwx--+ 1 reefedjib None   25962 Apr 18 13:41 slf4j-api-1.6.5.jar
> -rwx--+ 1 reefedjib None   62269 Nov 29 03:26 zkclient-0.2.jar
> -rwx--+ 1 reefedjib None  601677 Apr 18 13:41 zookeeper-3.3.3.jar
>Reporter: Rob Withers
>Priority: Blocker
> Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, 
> kafka-903_v2.patch, kafka-903_v3.patch
>
>
> This FATAL shuts down both brokers on windows, 
> {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 
> messages with no compression to [robert_v_2x0,0]
> {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer 
> sending messages with correlation id 178 for topics [robert_v_2x0,0] to 
> broker 1 on 192.168.1.100:9093
> {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> {2013-05-10 18:23:57,739}  INFO [Thread-4] (Logging.scala:67) - [Kafka 
> Server 0], shutting down
> Furthermore, attempts to restart them fail, with the following log:
> {2013-05-10 19:14:52,156}  INFO [Thread-1] (Logging.scala:67) - [Kafka Server 
> 0], started
> {2013-05-10 19:14:52,157}  INFO [ZkClient-EventThread-32-localhost:2181] 
> (Logging.scala:67) - New leader is 0
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:79) - Delivering event #1 done
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of 
> /controller_epoch changed sent to 
> kafka.controller.ControllerEpochListener@5cb88f42]
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:78) - Processing request:: 
> sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe 
> txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists 
> cxid:0x1d zxid:0xfffe txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] 
> (ClientCnxn.java:838) - Reading reply sessionid:0x13e9127882e0001, packet:: 
> cl

[jira] [Updated] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-05-28 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-903:
--

Attachment: kafka-903_v3.patch

Attach patch v3. 

To address Jay's concern, instead of using a generic renameTo util, only falls 
back to the non-atomic renameTo in checkpointing the high watermark file. Since 
both files are in the same dir and we control the naming, those other causes 
you listed that can fail renameTo won't happen. I didn't do the os level 
checking since I am not sure it that works well for environments like cygwin. 
We could guard this under a broker config parameter, but I am not sure if it's 
worth it.

For Sriram's concern, this seems to be at least a problem for some versions of 
java on Windows since other projects like Hadoop 
(https://issues.apache.org/jira/browse/HADOOP-959) have also seen this before.
  

> [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> ---
>
> Key: KAFKA-903
> URL: https://issues.apache.org/jira/browse/KAFKA-903
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8
> Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
> probably copied on 4/30. kafka-0.8, built current on 4/30.
> -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
> -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
> -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
> commons-collections-3.2.1.jar
> -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
> -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
> -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
> -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar
> -rwx--+ 1 reefedjib None  279193 Jan 11 13:45 commons-lang-2.5.jar
> -rwx--+ 1 reefedjib None   60686 Jan 11 13:45 commons-logging-1.1.1.jar
> -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar
> -rwx--+ 1 reefedjib None  206866 Apr  7 21:24 jackson-core-2.1.4.jar
> -rwx--+ 1 reefedjib None  232245 Apr  7 21:24 jackson-core-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   69314 Apr  7 21:24 
> jackson-dataformat-smile-2.1.4.jar
> -rwx--+ 1 reefedjib None  780385 Apr  7 21:24 
> jackson-mapper-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   47913 May  9 23:39 jopt-simple-3.0-rc2.jar
> -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 
> kafka_2.8.0-0.8.0-SNAPSHOT.jar
> -rwx--+ 1 reefedjib None  481535 Jan 11 13:46 log4j-1.2.16.jar
> -rwx--+ 1 reefedjib None   20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar
> -rwx--+ 1 reefedjib None  251784 Apr 18 13:41 logback-classic-1.0.6.jar
> -rwx--+ 1 reefedjib None  349706 Apr 18 13:41 logback-core-1.0.6.jar
> -rwx--+ 1 reefedjib None   82123 Nov 26 13:11 metrics-core-2.2.0.jar
> -rwx--+ 1 reefedjib None 1540457 Jul 12  2012 ojdbc14.jar
> -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar
> -rwx--+ 1 reefedjib None 3114958 Apr  2 07:47 scalatest_2.10-1.9.1.jar
> -rwx--+ 1 reefedjib None   25962 Apr 18 13:41 slf4j-api-1.6.5.jar
> -rwx--+ 1 reefedjib None   62269 Nov 29 03:26 zkclient-0.2.jar
> -rwx--+ 1 reefedjib None  601677 Apr 18 13:41 zookeeper-3.3.3.jar
>Reporter: Rob Withers
>Priority: Blocker
> Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, 
> kafka-903_v2.patch, kafka-903_v3.patch
>
>
> This FATAL shuts down both brokers on windows, 
> {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 
> messages with no compression to [robert_v_2x0,0]
> {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer 
> sending messages with correlation id 178 for topics [robert_v_2x0,0] to 
> broker 1 on 192.168.1.100:9093
> {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> {2013-05-10 18:23:57,739}  INFO [Thread-4] (Logging.scala:67) - [Kafka 
> Server 0], shutting down
> Furthermore, attempts to restart them fail, with the following log:
> {2013-05-10 19:14:52,156}  INFO [Thread-1] (Logging.scala:67) - [Kafka Server 
> 0], started
> {2013-05-10 19:14:52,157}  INFO [ZkClient-EventThread-32-localhost:2181] 
> (Logging.scala:67) - New leader is 0
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:79) - Delivering event #1 done
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:69) - De

[jira] [Commented] (KAFKA-917) Expose zk.session.timeout.ms in console consumer

2013-05-28 Thread Swapnil Ghike (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668885#comment-13668885
 ] 

Swapnil Ghike commented on KAFKA-917:
-

Makes sense, opened KAFKA-924.

> Expose zk.session.timeout.ms in console consumer
> 
>
> Key: KAFKA-917
> URL: https://issues.apache.org/jira/browse/KAFKA-917
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.7, 0.8
>Reporter: Swapnil Ghike
>Assignee: Swapnil Ghike
>Priority: Blocker
>  Labels: bugs
> Fix For: 0.8
>
> Attachments: kafka-917.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-924) Specify console consumer properties via a single --property command line parameter

2013-05-28 Thread Swapnil Ghike (JIRA)
Swapnil Ghike created KAFKA-924:
---

 Summary: Specify console consumer properties via a single 
--property command line parameter
 Key: KAFKA-924
 URL: https://issues.apache.org/jira/browse/KAFKA-924
 Project: Kafka
  Issue Type: Bug
Reporter: Swapnil Ghike


Quoting Neha from KAFKA-917:

I think the right way to add access to all consumer properties is to specify it 
through a single --property command line parameter that takes in 
"key1=value1,key2=value2,..." list. That will make sure we don't have to keep 
changing console consumer as well add/remove config options on the consumer. 
Some configs make sense to be top level for console consumer though. Things 
like topic, from-beginning, groupid etc. Rest can be specified through the 
"property" parameter. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


KAFKA-259

2013-05-28 Thread Ashwanth Fernando
Hi,
I have submitted a patch for KAFKA-259 and its been approved by legal. Can you 
please go ahead and verify it and let me know how to proceed?

https://issues.apache.org/jira/browse/KAFKA-259

Thanks for all your help,
Ashwanth


[jira] [Commented] (KAFKA-917) Expose zk.session.timeout.ms in console consumer

2013-05-28 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668846#comment-13668846
 ] 

Neha Narkhede commented on KAFKA-917:
-

This, by itself, looks ok. I think the right way to add access to all consumer 
properties is to specify it through a single --property command line parameter 
that takes in "key1=value1,key2=value2,..." list. That will make sure we don't 
have to keep changing console consumer as well add/remove config options on the 
consumer. Some configs make sense to be top level for console consumer though. 
Things like topic, from-beginning, groupid etc. Rest can be specified through 
the "property" parameter. Can you file a JIRA to clean that up for 0.8? For 
now, we can accept this.

> Expose zk.session.timeout.ms in console consumer
> 
>
> Key: KAFKA-917
> URL: https://issues.apache.org/jira/browse/KAFKA-917
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.7, 0.8
>Reporter: Swapnil Ghike
>Assignee: Swapnil Ghike
>Priority: Blocker
>  Labels: bugs
> Fix For: 0.8
>
> Attachments: kafka-917.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-923) Improve controller failover latency

2013-05-28 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-923:


Attachment: kafka-923-v1.patch

There are probably more improvements possible in controller failover, but at 
this point the aim is to focus on the low hanging fruits and avoid affecting 
stability.

This patch 
- optimizes the replica state machine startup by not reading the partition 
information for all topics from zookeeper again
- optimizes the partition state machine startup by not reading the leadership 
information for all partitions from zookeeper again


> Improve controller failover latency
> ---
>
> Key: KAFKA-923
> URL: https://issues.apache.org/jira/browse/KAFKA-923
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Critical
>  Labels: kafka-0.8
> Attachments: kafka-923-v1.patch
>
>
> During controller failover, we do the following things -
> 1. Increment controller epoch 
> 2. Initialize controller context
> 3. Initialize replica state machine
> 4. Initialize partition state machine
> During step 2 above, we read the information of all topics and partitions, 
> the replica assignments and leadership information. During step 3 and 4, we 
> re-read some of this information from zookeeper. Since the zookeeper reads 
> are proportional to the number of topics and the reads are serial, it is 
> important to optimize this. The zookeeper reads in steps 3 and 4 are not 
> required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-923) Improve controller failover latency

2013-05-28 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-923:


Status: Patch Available  (was: Open)

> Improve controller failover latency
> ---
>
> Key: KAFKA-923
> URL: https://issues.apache.org/jira/browse/KAFKA-923
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Critical
>  Labels: kafka-0.8
>
> During controller failover, we do the following things -
> 1. Increment controller epoch 
> 2. Initialize controller context
> 3. Initialize replica state machine
> 4. Initialize partition state machine
> During step 2 above, we read the information of all topics and partitions, 
> the replica assignments and leadership information. During step 3 and 4, we 
> re-read some of this information from zookeeper. Since the zookeeper reads 
> are proportional to the number of topics and the reads are serial, it is 
> important to optimize this. The zookeeper reads in steps 3 and 4 are not 
> required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-923) Improve controller failover latency

2013-05-28 Thread Neha Narkhede (JIRA)
Neha Narkhede created KAFKA-923:
---

 Summary: Improve controller failover latency
 Key: KAFKA-923
 URL: https://issues.apache.org/jira/browse/KAFKA-923
 Project: Kafka
  Issue Type: Improvement
  Components: controller
Affects Versions: 0.8
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical


During controller failover, we do the following things -

1. Increment controller epoch 
2. Initialize controller context
3. Initialize replica state machine
4. Initialize partition state machine

During step 2 above, we read the information of all topics and partitions, the 
replica assignments and leadership information. During step 3 and 4, we re-read 
some of this information from zookeeper. Since the zookeeper reads are 
proportional to the number of topics and the reads are serial, it is important 
to optimize this. The zookeeper reads in steps 3 and 4 are not required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-917) Expose zk.session.timeout.ms in console consumer

2013-05-28 Thread Swapnil Ghike (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668781#comment-13668781
 ] 

Swapnil Ghike commented on KAFKA-917:
-

Hi, could someone review this? Thanks!

> Expose zk.session.timeout.ms in console consumer
> 
>
> Key: KAFKA-917
> URL: https://issues.apache.org/jira/browse/KAFKA-917
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.7, 0.8
>Reporter: Swapnil Ghike
>Assignee: Swapnil Ghike
>Priority: Blocker
>  Labels: bugs
> Fix For: 0.8
>
> Attachments: kafka-917.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-922) System Test - set retry.backoff.ms=300 to testcase_0119

2013-05-28 Thread John Fung (JIRA)
John Fung created KAFKA-922:
---

 Summary: System Test - set retry.backoff.ms=300 to testcase_0119
 Key: KAFKA-922
 URL: https://issues.apache.org/jira/browse/KAFKA-922
 Project: Kafka
  Issue Type: Task
Reporter: John Fung




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-05-28 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668746#comment-13668746
 ] 

Timothy Chen commented on KAFKA-903:


Do you know when this is going to be pushed to 0.8 branch?

> [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> ---
>
> Key: KAFKA-903
> URL: https://issues.apache.org/jira/browse/KAFKA-903
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8
> Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
> probably copied on 4/30. kafka-0.8, built current on 4/30.
> -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
> -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
> -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
> commons-collections-3.2.1.jar
> -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
> -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
> -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
> -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar
> -rwx--+ 1 reefedjib None  279193 Jan 11 13:45 commons-lang-2.5.jar
> -rwx--+ 1 reefedjib None   60686 Jan 11 13:45 commons-logging-1.1.1.jar
> -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar
> -rwx--+ 1 reefedjib None  206866 Apr  7 21:24 jackson-core-2.1.4.jar
> -rwx--+ 1 reefedjib None  232245 Apr  7 21:24 jackson-core-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   69314 Apr  7 21:24 
> jackson-dataformat-smile-2.1.4.jar
> -rwx--+ 1 reefedjib None  780385 Apr  7 21:24 
> jackson-mapper-asl-1.9.12.jar
> -rwx--+ 1 reefedjib None   47913 May  9 23:39 jopt-simple-3.0-rc2.jar
> -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 
> kafka_2.8.0-0.8.0-SNAPSHOT.jar
> -rwx--+ 1 reefedjib None  481535 Jan 11 13:46 log4j-1.2.16.jar
> -rwx--+ 1 reefedjib None   20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar
> -rwx--+ 1 reefedjib None  251784 Apr 18 13:41 logback-classic-1.0.6.jar
> -rwx--+ 1 reefedjib None  349706 Apr 18 13:41 logback-core-1.0.6.jar
> -rwx--+ 1 reefedjib None   82123 Nov 26 13:11 metrics-core-2.2.0.jar
> -rwx--+ 1 reefedjib None 1540457 Jul 12  2012 ojdbc14.jar
> -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar
> -rwx--+ 1 reefedjib None 3114958 Apr  2 07:47 scalatest_2.10-1.9.1.jar
> -rwx--+ 1 reefedjib None   25962 Apr 18 13:41 slf4j-api-1.6.5.jar
> -rwx--+ 1 reefedjib None   62269 Nov 29 03:26 zkclient-0.2.jar
> -rwx--+ 1 reefedjib None  601677 Apr 18 13:41 zookeeper-3.3.3.jar
>Reporter: Rob Withers
>Priority: Blocker
> Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, 
> kafka-903_v2.patch
>
>
> This FATAL shuts down both brokers on windows, 
> {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 
> messages with no compression to [robert_v_2x0,0]
> {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer 
> sending messages with correlation id 178 for topics [robert_v_2x0,0] to 
> broker 1 on 192.168.1.100:9093
> {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] 
> (Logging.scala:109) - Attempt to swap the new high watermark file with the 
> old one failed
> {2013-05-10 18:23:57,739}  INFO [Thread-4] (Logging.scala:67) - [Kafka 
> Server 0], shutting down
> Furthermore, attempts to restart them fail, with the following log:
> {2013-05-10 19:14:52,156}  INFO [Thread-1] (Logging.scala:67) - [Kafka Server 
> 0], started
> {2013-05-10 19:14:52,157}  INFO [ZkClient-EventThread-32-localhost:2181] 
> (Logging.scala:67) - New leader is 0
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:79) - Delivering event #1 done
> {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] 
> (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of 
> /controller_epoch changed sent to 
> kafka.controller.ControllerEpochListener@5cb88f42]
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:78) - Processing request:: 
> sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe 
> txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] 
> (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists 
> cxid:0x1d zxid:0xfffe txntype:unknown reqpath:/controller_epoch
> {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] 
> (ClientCn

[jira] [Updated] (KAFKA-921) Expose max lag mbean for consumers and replica fetchers

2013-05-28 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy updated KAFKA-921:
-

Attachment: KAFKA-921-v1.patch

This provides a max lag mbean for both consumer fetcher manager and replica 
fetcher manager; although I think it is more useful for monitoring consumers. 
For replica fetchers we need to closely monitor all replica fetchers anyway. 
i.e., the set of mbeans is static. I can reduce the scope to just consumers if 
others agree.

> Expose max lag mbean for consumers and replica fetchers
> ---
>
> Key: KAFKA-921
> URL: https://issues.apache.org/jira/browse/KAFKA-921
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Joel Koshy
> Fix For: 0.8
>
> Attachments: KAFKA-921-v1.patch
>
>
> We have a ton of consumer mbeans with names that are derived from the 
> consumer id, broker being fetched from, fetcher id, etc. This makes it 
> difficult to do basic monitoring of consumer/replica fetcher lag - since the 
> mbean to monitor can change. A more useful metric for monitoring purposes is 
> the maximum lag across all fetchers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-921) Expose max lag mbean for consumers and replica fetchers

2013-05-28 Thread Joel Koshy (JIRA)
Joel Koshy created KAFKA-921:


 Summary: Expose max lag mbean for consumers and replica fetchers
 Key: KAFKA-921
 URL: https://issues.apache.org/jira/browse/KAFKA-921
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Joel Koshy
 Fix For: 0.8


We have a ton of consumer mbeans with names that are derived from the consumer 
id, broker being fetched from, fetcher id, etc. This makes it difficult to do 
basic monitoring of consumer/replica fetcher lag - since the mbean to monitor 
can change. A more useful metric for monitoring purposes is the maximum lag 
across all fetchers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Deadline Extension: 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

2013-05-28 Thread MHPC 2013
we apologize if you receive multiple copies of this message

===

CALL FOR PAPERS

2013 Workshop on

Middleware for HPC and Big Data Systems

MHPC '13

as part of Euro-Par 2013, Aachen, Germany

===

Date: August 27, 2012

Workshop URL: http://m-hpc.org

Springer LNCS

SUBMISSION DEADLINE:

June 10, 2013 - LNCS Full paper submission (extended)
June 28, 2013 - Lightning Talk abstracts


SCOPE

Extremely large, diverse, and complex data sets are generated from
scientific applications, the Internet, social media and other applications.
Data may be physically distributed and shared by an ever larger community.
Collecting, aggregating, storing and analyzing large data volumes
presents major challenges. Processing such amounts of data efficiently
has been an issue to scientific discovery and technological
advancement. In addition, making the data accessible, understandable and
interoperable includes unsolved problems. Novel middleware architectures,
algorithms, and application development frameworks are required.

In this workshop we are particularly interested in original work at the
intersection of HPC and Big Data with regard to middleware handling
and optimizations. Scope is existing and proposed middleware for HPC
and big data, including analytics libraries and frameworks.

The goal of this workshop is to bring together software architects,
middleware and framework developers, data-intensive application developers
as well as users from the scientific and engineering community to exchange
their experience in processing large datasets and to report their scientific
achievement and innovative ideas. The workshop also offers a dedicated forum
for these researchers to access the state of the art, to discuss problems
and requirements, to identify gaps in current and planned designs, and to
collaborate in strategies for scalable data-intensive computing.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied by interactive demonstrations.


TOPICS

Topics of interest include, but are not limited to:

- Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive,
Pig, Sqoop,
HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack
- Data intensive middleware architecture
- Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab
- NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase
- Schedulers including Cascading
- Middleware for optimized data locality/in-place data processing
- Data handling middleware for deployment in virtualized HPC environments
- Parallelization and distributed processing architectures at the
middleware level
- Integration with cloud middleware and application servers
- Runtime environments and system level support for data-intensive computing
- Skeletons and patterns
- Checkpointing
- Programming models and languages
- Big Data ETL
- Stream processing middleware
- In-memory databases for HPC
- Scalability and interoperability
- Large-scale data storage and distributed file systems
- Content-centric addressing and networking
- Execution engines, languages and environments including CIEL/Skywriting
- Performance analysis, evaluation of data-intensive middleware
- In-depth analysis and performance optimizations in existing data-handling
middleware, focusing on indexing/fast storing or retrieval between compute
and storage nodes
- Highly scalable middleware optimized for minimum communication
- Use cases and experience for popular Big Data middleware
- Middleware security, privacy and trust architectures

DATES

Papers:
Rolling abstract submission
June 10, 2013 - Full paper submission (extended)
July 8, 2013 - Acceptance notification
October 3, 2013 - Camera-ready version due

Lightning Talks:
June 28, 2013 - Deadline for lightning talk abstracts
July 15, 2013 - Lightning talk notification

August 27, 2013 - Workshop Date


TPC

CHAIR

Michael Alexander (chair), TU Wien, Austria
Anastassios Nanos (co-chair), NTUA, Greece
Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany
Lizhe Wang (co-chair), Chinese Academy of Sciences, China
Gianluigi Zanetti (co-chair), CRS4, Italy

PROGRAM COMMITTEE

Amitanand Aiyer, Facebook, USA
Costas Bekas, IBM, Switzerland
Jakob Blomer, CERN, Switzerland
William Gardner, University of Guelph, Canada
José Gracia, HPC Center of the University of Stuttgart, Germany
Zhenghua Guom,  Indiana University, USA
Marcus Hardt,  Karlsruhe Institute of Technology, Germany
Sverre Jarp, CERN, Switzerland
Christopher Jung,  Karlsruhe Institute of Technology, Germany
Andreas Knüpfer - Technische Universität Dresden, Germany
Nectarios Koziris, National Technical University of Athens, Greece
Yan Ma, Chinese Academy of Sciences, China
Martin Schulz - Lawrence Livermore National Laboratory
V

[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request

2013-05-28 Thread Sriram Subramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668466#comment-13668466
 ] 

Sriram Subramanian commented on KAFKA-911:
--

I suggest we wait for my patch. My patch changes quite a bit of this logic and 
it just adds to the merge problem.

> Bug in controlled shutdown logic in controller leads to controller not 
> sending out some state change request 
> -
>
> Key: KAFKA-911
> URL: https://issues.apache.org/jira/browse/KAFKA-911
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Blocker
>  Labels: kafka-0.8, p1
> Attachments: kafka-911-v1.patch, kafka-911-v2.patch
>
>
> The controlled shutdown logic in the controller first tries to move the 
> leaders from the broker being shutdown. Then it tries to remove the broker 
> from the isr list. During that operation, it does not synchronize on the 
> controllerLock. This causes a race condition while dispatching data using the 
> controller's channel manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request

2013-05-28 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-911:


Attachment: kafka-911-v2.patch

I agree with Joel's suggestion. Removing the shutting down brokers from the ISR 
is better. This patch sends the LeaderAndIsrRequest with the reduced isr to the 
new leader for the partitions on the shutting down brokers. This ensures the 
leader will remove the shutting down broker from the isr in zookeeper. This 
also makes it unnecessary for the shrunk isr zookeeper write to happen during 
the controlled shutdown on the controller. 

> Bug in controlled shutdown logic in controller leads to controller not 
> sending out some state change request 
> -
>
> Key: KAFKA-911
> URL: https://issues.apache.org/jira/browse/KAFKA-911
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Blocker
>  Labels: kafka-0.8, p1
> Attachments: kafka-911-v1.patch, kafka-911-v2.patch
>
>
> The controlled shutdown logic in the controller first tries to move the 
> leaders from the broker being shutdown. Then it tries to remove the broker 
> from the isr list. During that operation, it does not synchronize on the 
> controllerLock. This causes a race condition while dispatching data using the 
> controller's channel manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (KAFKA-916) Deadlock between fetcher shutdown and handling partitions with error

2013-05-28 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy closed KAFKA-916.



Thanks for the review. Committed to 0.8

> Deadlock between fetcher shutdown and handling partitions with error
> 
>
> Key: KAFKA-916
> URL: https://issues.apache.org/jira/browse/KAFKA-916
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Joel Koshy
> Fix For: 0.8
>
> Attachments: KAFKA-916-v1.patch
>
>
> Here is another consumer deadlock that we encountered. All consumers are
> vulnerable to this during a rebalance if there happen to be partitions in
> error.
> On a rebalance, the fetcher manager closes all fetchers and this holds on to
> the fetcher thread map's lock. (mapLock in AbstractFetcherManager). [t1]
> While the fetcher manager is iterating over fetchers to stop them, a fetcher
> that is yet to be stopped hits an error on a partition and proceeds to
> handle partitions with error [t2]. This handling involves looking up the
> fetcher for that partition and then removing it from the fetcher's set of
> partitions to consume. This requires grabbing the same map lock in [t1],
> hence the deadlock.
> [t1]
> 2013-05-22_20:23:11.95767 "main" prio=10 tid=0x7f1b24007800 nid=0x573b 
> waiting on condition [0x7f1b2bd38000]
> 2013-05-22_20:23:11.95767java.lang.Thread.State: WAITING (parking)
> 2013-05-22_20:23:11.95767 at sun.misc.Unsafe.park(Native Method)
> 2013-05-22_20:23:11.95767 - parking to wait for  <0x7f1a25780598> (a 
> java.util.concurrent.CountDownLatch$Sync)
> 2013-05-22_20:23:11.95767 at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 2013-05-22_20:23:11.95767 at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 2013-05-22_20:23:11.95768 at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 2013-05-22_20:23:11.95768 at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 2013-05-22_20:23:11.95768 at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
> 2013-05-22_20:23:11.95768 at 
> kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:36)
> 2013-05-22_20:23:11.95769 at 
> kafka.server.AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:68)
> 2013-05-22_20:23:11.95769 at 
> kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$1.apply(AbstractFetcherManager.scala:79)
> 2013-05-22_20:23:11.95769 at 
> kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$1.apply(AbstractFetcherManager.scala:78)
> 2013-05-22_20:23:11.95769 at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
> 2013-05-22_20:23:11.95769 at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
> 2013-05-22_20:23:11.95770 at 
> scala.collection.Iterator$class.foreach(Iterator.scala:631)
> 2013-05-22_20:23:11.95770 at 
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
> 2013-05-22_20:23:11.95770 at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
> 2013-05-22_20:23:11.95770 at 
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> 2013-05-22_20:23:11.95771 at 
> scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
> 2013-05-22_20:23:11.95771 at 
> kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:78)
> ---> 2013-05-22_20:23:11.95771- locked <0x7f1a2ae92510> (a 
> java.lang.Object)
> 2013-05-22_20:23:11.95771 at 
> kafka.consumer.ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:156)
> 2013-05-22_20:23:11.95771 at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$closeFetchersForQueues(ZookeeperConsumerConnector.scala:488)
> 2013-05-22_20:23:11.95772 at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.closeFetchers(ZookeeperConsumerConnector.scala:525)
> 2013-05-22_20:23:11.95772 at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:422)
> 2013-05-22_20:23:11.95772 at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374)
> 2013-05-22_20:23:11.95772 at 
> scala.collection.immutable.Range$ByOne$class.foreach$mVc$sp(Range.scala:282)
> 2013-05-22_20:23:11.95773 at 
> scala.col

[jira] [Resolved] (KAFKA-916) Deadlock between fetcher shutdown and handling partitions with error

2013-05-28 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy resolved KAFKA-916.
--

Resolution: Fixed

> Deadlock between fetcher shutdown and handling partitions with error
> 
>
> Key: KAFKA-916
> URL: https://issues.apache.org/jira/browse/KAFKA-916
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Joel Koshy
> Fix For: 0.8
>
> Attachments: KAFKA-916-v1.patch
>
>
> Here is another consumer deadlock that we encountered. All consumers are
> vulnerable to this during a rebalance if there happen to be partitions in
> error.
> On a rebalance, the fetcher manager closes all fetchers and this holds on to
> the fetcher thread map's lock. (mapLock in AbstractFetcherManager). [t1]
> While the fetcher manager is iterating over fetchers to stop them, a fetcher
> that is yet to be stopped hits an error on a partition and proceeds to
> handle partitions with error [t2]. This handling involves looking up the
> fetcher for that partition and then removing it from the fetcher's set of
> partitions to consume. This requires grabbing the same map lock in [t1],
> hence the deadlock.
> [t1]
> 2013-05-22_20:23:11.95767 "main" prio=10 tid=0x7f1b24007800 nid=0x573b 
> waiting on condition [0x7f1b2bd38000]
> 2013-05-22_20:23:11.95767java.lang.Thread.State: WAITING (parking)
> 2013-05-22_20:23:11.95767 at sun.misc.Unsafe.park(Native Method)
> 2013-05-22_20:23:11.95767 - parking to wait for  <0x7f1a25780598> (a 
> java.util.concurrent.CountDownLatch$Sync)
> 2013-05-22_20:23:11.95767 at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 2013-05-22_20:23:11.95767 at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 2013-05-22_20:23:11.95768 at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 2013-05-22_20:23:11.95768 at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 2013-05-22_20:23:11.95768 at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
> 2013-05-22_20:23:11.95768 at 
> kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:36)
> 2013-05-22_20:23:11.95769 at 
> kafka.server.AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:68)
> 2013-05-22_20:23:11.95769 at 
> kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$1.apply(AbstractFetcherManager.scala:79)
> 2013-05-22_20:23:11.95769 at 
> kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$1.apply(AbstractFetcherManager.scala:78)
> 2013-05-22_20:23:11.95769 at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
> 2013-05-22_20:23:11.95769 at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
> 2013-05-22_20:23:11.95770 at 
> scala.collection.Iterator$class.foreach(Iterator.scala:631)
> 2013-05-22_20:23:11.95770 at 
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
> 2013-05-22_20:23:11.95770 at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
> 2013-05-22_20:23:11.95770 at 
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> 2013-05-22_20:23:11.95771 at 
> scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
> 2013-05-22_20:23:11.95771 at 
> kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:78)
> ---> 2013-05-22_20:23:11.95771- locked <0x7f1a2ae92510> (a 
> java.lang.Object)
> 2013-05-22_20:23:11.95771 at 
> kafka.consumer.ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:156)
> 2013-05-22_20:23:11.95771 at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$closeFetchersForQueues(ZookeeperConsumerConnector.scala:488)
> 2013-05-22_20:23:11.95772 at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.closeFetchers(ZookeeperConsumerConnector.scala:525)
> 2013-05-22_20:23:11.95772 at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:422)
> 2013-05-22_20:23:11.95772 at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374)
> 2013-05-22_20:23:11.95772 at 
> scala.collection.immutable.Range$ByOne$class.foreach$mVc$sp(Range.scala:282)
> 2013-05-22_20:23:11.95773 at 
> scala.collection.immutable.Range$$an