[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803893#comment-15803893
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

we are going to md5 checksum check every file this morning.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> ka

[jira] [Updated] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers

2017-01-05 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4595:
---
Priority: Critical  (was: Major)

> Controller send thread can't stop when broker change listener event trigger 
> for  dead brokers
> -
>
> Key: KAFKA-4595
> URL: https://issues.apache.org/jira/browse/KAFKA-4595
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.1
>Reporter: Pengwei
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> In our test env, we found controller is not working after a delete topic 
> opertation and network issue, the stack is below:
> "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" 
> #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on 
> condition [0x7fb76b7c8000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc05497b8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50)
>   at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32)
>   at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128)
>   at 
> kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81)
>   - locked <0xc0258760> (a java.lang.Object)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
>   at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>Locked ownable synchronizers:
>   - <0xc02587f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> "Controller-1001-to-broker-1003-send-thread" #88 prio=5 os_prio=0 
> tid=0x7fb778342000 nid=0x5a4c waiting on condition [0x7fb761de]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc02587f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acqui

[jira] [Updated] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers

2017-01-05 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4595:
---
Labels: reliability  (was: )

> Controller send thread can't stop when broker change listener event trigger 
> for  dead brokers
> -
>
> Key: KAFKA-4595
> URL: https://issues.apache.org/jira/browse/KAFKA-4595
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.1
>Reporter: Pengwei
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> In our test env, we found controller is not working after a delete topic 
> opertation and network issue, the stack is below:
> "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" 
> #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on 
> condition [0x7fb76b7c8000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc05497b8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50)
>   at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32)
>   at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128)
>   at 
> kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81)
>   - locked <0xc0258760> (a java.lang.Object)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
>   at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>Locked ownable synchronizers:
>   - <0xc02587f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> "Controller-1001-to-broker-1003-send-thread" #88 prio=5 os_prio=0 
> tid=0x7fb778342000 nid=0x5a4c waiting on condition [0x7fb761de]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc02587f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.

[jira] [Issue Comment Deleted] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Andre Pearce (IG) updated KAFKA-4497:
-
Comment: was deleted

(was: [~junrao] We are 99.9% jars are good, see above we double checked again 
the process class path on getting the issue last night, post reducing 
min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 
0.10.1.1 that are kafka.

We can offer to deploy a custom built jar, which can contain further 
logging/debug in for you if you wish (as we have this in a testing env) or we 
can offer to even host a webex or something so you can look at the system your 
self.

 )

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonf

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803829#comment-15803829
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] We are 99.9% jars are good, see above we double checked again the 
process class path on getting the issue last night, post reducing 
min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 
0.10.1.1 that are kafka.

We can offer to deploy a custom built jar, which can contain further 
logging/debug in for you if you wish (as we have this in a testing env) or we 
can offer to even host a webex or something so you can look at the system your 
self.

 

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> k

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803830#comment-15803830
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] We are 99.9% jars are good, see above we double checked again the 
process class path on getting the issue last night, post reducing 
min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 
0.10.1.1 that are kafka.

We can offer to deploy a custom built jar, which can contain further 
logging/debug in for you if you wish (as we have this in a testing env) or we 
can offer to even host a webex or something so you can look at the system your 
self.

 

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> k

[jira] [Commented] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers

2017-01-05 Thread Pengwei (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803816#comment-15803816
 ] 

Pengwei commented on KAFKA-4595:


[~huxi_2b]

> Controller send thread can't stop when broker change listener event trigger 
> for  dead brokers
> -
>
> Key: KAFKA-4595
> URL: https://issues.apache.org/jira/browse/KAFKA-4595
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.1
>Reporter: Pengwei
> Fix For: 0.10.2.0
>
>
> In our test env, we found controller is not working after a delete topic 
> opertation and network issue, the stack is below:
> "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" 
> #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on 
> condition [0x7fb76b7c8000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc05497b8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50)
>   at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32)
>   at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128)
>   at 
> kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81)
>   - locked <0xc0258760> (a java.lang.Object)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
>   at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>Locked ownable synchronizers:
>   - <0xc02587f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> "Controller-1001-to-broker-1003-send-thread" #88 prio=5 os_prio=0 
> tid=0x7fb778342000 nid=0x5a4c waiting on condition [0x7fb761de]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc02587f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:87

[jira] [Commented] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers

2017-01-05 Thread Pengwei (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803812#comment-15803812
 ] 

Pengwei commented on KAFKA-4595:


Yes, you are right.

> Controller send thread can't stop when broker change listener event trigger 
> for  dead brokers
> -
>
> Key: KAFKA-4595
> URL: https://issues.apache.org/jira/browse/KAFKA-4595
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.1
>Reporter: Pengwei
> Fix For: 0.10.2.0
>
>
> In our test env, we found controller is not working after a delete topic 
> opertation and network issue, the stack is below:
> "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" 
> #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on 
> condition [0x7fb76b7c8000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc05497b8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50)
>   at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32)
>   at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128)
>   at 
> kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81)
>   - locked <0xc0258760> (a java.lang.Object)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
>   at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>Locked ownable synchronizers:
>   - <0xc02587f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> "Controller-1001-to-broker-1003-send-thread" #88 prio=5 os_prio=0 
> tid=0x7fb778342000 nid=0x5a4c waiting on condition [0x7fb761de]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc02587f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronize

[jira] [Commented] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers

2017-01-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803804#comment-15803804
 ] 

huxi commented on KAFKA-4595:
-

Yes, seems that the process might look like this:
1. At some point of time, deleting topic operation began which triggered to 
resume the DeleteTopicsThread. This thread got the controller lock and ran 
successfully. Then it released the lock. But the request send thread had not 
begun to execute the callback.
2. Later, the network got something wrong, then controller triggered the zk 
listener thread to remove dead brokers. This thread got the controller lock and 
shut down the request send thread for that dead broker and wait.
3. Then, the request send thread began to execute the 
deleteTopicStopReplicaCallback which also tried to get the controller lock. So 
it waited forever and failed to be shut down.
4. Then deadlock happened.
[~pengwei]  Does it make sense?

> Controller send thread can't stop when broker change listener event trigger 
> for  dead brokers
> -
>
> Key: KAFKA-4595
> URL: https://issues.apache.org/jira/browse/KAFKA-4595
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.1
>Reporter: Pengwei
> Fix For: 0.10.2.0
>
>
> In our test env, we found controller is not working after a delete topic 
> opertation and network issue, the stack is below:
> "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" 
> #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on 
> condition [0x7fb76b7c8000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc05497b8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50)
>   at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32)
>   at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128)
>   at 
> kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81)
>   - locked <0xc0258760> (a java.lang.Object)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
>   at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>Locked ownable synchronizers:
> 

Re: Wiki permissions

2017-01-05 Thread Dongjin Lee
Thanks Wang,
I completed to submit my KIP because of you.

Regards,
Dongjin

On Fri, Jan 6, 2017 at 2:52 AM, Guozhang Wang  wrote:

> Dongjin
>
> I have granted you the permission under apache id dongjin
>
>
> Guozhang
>
>
> On Wed, Jan 4, 2017 at 11:51 PM, Dongjin Lee  wrote:
>
> > Hello. I just completed a KIP for KAFKA-4514[^1] but can not create any
> > page on the wiki. Could anyone can give me the permission?
> >
> > Regards,
> > Dongjin
> >
> > [^1]: https://issues.apache.org/jira/browse/KAFKA-4514
> >
> > --
> > *Dongjin Lee*
> >
> >
> > *Software developer in Line+.So interested in massive-scale machine
> > learning.facebook: www.facebook.com/dongjin.lee.kr
> > linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > github:
> > github.com/dongjinleekr
> > twitter: www.twitter.com/dongjinleekr
> > *
> >
>
>
>
> --
> -- Guozhang
>



-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
linkedin:
kr.linkedin.com/in/dongjinleekr
github:
github.com/dongjinleekr
twitter: www.twitter.com/dongjinleekr
*


Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread Vahid S Hashemian
One thing that probably needs some clarification is what is implied by 
"deprecated" in the Kafka project.
I googled it a bit and it doesn't seem that deprecation conventionally 
implies termination of support (or anything that could negatively impact 
existing users). That's my interpretation too.
It would be good to know if Kafka follows a different interpretation of 
the term.

If my understanding of the term is correct, since we are not yet targeting 
a certain major release in which the old consumer will be removed, I don't 
see any harm in marking it as deprecated.
There will be enough time to plan and implement the migration, if the 
community decides that's the way to go, before phasing it out.

At the minimum new Kafka users will pick the Java consumer without any 
confusion. And existing users will know that Kafka is preparing for the 
old consumer's retirement.

--Vahid




From:   Joel Koshy 
To: "dev@kafka.apache.org" 
Date:   01/05/2017 06:55 PM
Subject:Re: [DISCUSS] KIP-109: Old Consumer Deprecation



While I realize this only marks the old consumer as deprecated and not a
complete removal, I agree that it is somewhat premature to do this prior 
to
having a migration process implemented. Onur has described this in detail
in the earlier thread: http://markmail.org/message/ekv352zy7xttco5s and 
I'm
surprised that more companies aren't affected by (or aware of?) the issue.

On Thu, Jan 5, 2017 at 4:40 PM, radai  wrote:

> I cant speak for anyone else, but a rolling upgrade is definitely how we
> (LinkedIn) will do the migration.
>
> On Thu, Jan 5, 2017 at 4:28 PM, Gwen Shapira  wrote:
>
> > it sounds good to have
> > it, but that's probably not how people will end up migrati
> >
>






[jira] [Commented] (KAFKA-4603) argument error,and command parsed error

2017-01-05 Thread Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803365#comment-15803365
 ] 

Xin commented on KAFKA-4603:


In Kafka, all  the  OptionParser constructor  has no allowAbbreviations 
argument 
I tried some other cmd, found that they also like 
zookeeper-security-migration.sh, prefixion  of argument are accepted

> argument error,and command parsed error
> ---
>
> Key: KAFKA-4603
> URL: https://issues.apache.org/jira/browse/KAFKA-4603
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, documentation
>Affects Versions: 0.10.0.1, 0.10.2.0
> Environment: suse
>Reporter: Xin
>Priority: Minor
>
> according to the 7.6.2 Migrating clusters of document :
> ./zookeeper-security-migration.sh --zookeeper.acl=secure 
> --zookeeper.connection=localhost:2181
> joptsimple.OptionArgumentConversionException: Cannot parse argument 
> 'localhost:2181' of option zookeeper.connection.timeout
>   at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:93)
>   at 
> joptsimple.ArgumentAcceptingOptionSpec.convert(ArgumentAcceptingOptionSpec.java:274)
>   at joptsimple.OptionSet.valuesOf(OptionSet.java:223)
>   at joptsimple.OptionSet.valueOf(OptionSet.java:172)
>   at kafka.admin.ZkSecurityMigrator$.run(ZkSecurityMigrator.scala:111)
>   at kafka.admin.ZkSecurityMigrator$.main(ZkSecurityMigrator.scala:119)
>   at kafka.admin.ZkSecurityMigrator.main(ZkSecurityMigrator.scala)
> Caused by: joptsimple.internal.ReflectionException: 
> java.lang.NumberFormatException: For input string: "localhost:2181"
>   at 
> joptsimple.internal.Reflection.reflectionException(Reflection.java:140)
>   at joptsimple.internal.Reflection.invoke(Reflection.java:122)
>   at 
> joptsimple.internal.MethodInvokingValueConverter.convert(MethodInvokingValueConverter.java:48)
>   at joptsimple.internal.Reflection.convertWith(Reflection.java:128)
>   at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:90)
>   ... 6 more
> Caused by: java.lang.NumberFormatException: For input string: "localhost:2181"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:492)
>   at java.lang.Integer.valueOf(Integer.java:582)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at joptsimple.internal.Reflection.invoke(Reflection.java:119)
>   ... 9 more
> ===>the argument  "zookeeper.connection" has been parsed to 
> "zookeeper.connection.timeout"
> using help i found that  the argument  is :
> --zookeeper.connectSets the ZooKeeper connect string 
>  (ensemble). This parameter takes a  
>  comma-separated list of host:port   
>  pairs. (default: localhost:2181)
> --zookeeper.connection.timeout Sets the ZooKeeper connection timeout.
> the document describe wrong, and the code also has something wrong:
>  in ZkSecurityMigrator.scala,
>   val parser = new OptionParse()==>
> Any of --v, --ve, ... are accepted on the command line and treated as though 
> you had typed --verbose.
> To suppress this behavior, use the OptionParser constructor 
> OptionParser(boolean allowAbbreviations) and pass a value of false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4603) argument error,and command parsed error

2017-01-05 Thread Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803353#comment-15803353
 ] 

Xin commented on KAFKA-4603:


 Changed  OptionParser constructor, 
ZkSecurityMigrator.scala:

var jaasFile = System.getProperty(JaasUtils.JAVA_LOGIN_CONFIG_PARAM)
val parser = new OptionParser(false)


I got the result:
 # ./zookeeper-security-migration.sh --zookeeper.acl=secure 
--zookeeper.connection=localhost:2181
Picked up JAVA_TOOL_OPTIONS: -Xmx8g
joptsimple.UnrecognizedOptionException: zookeeper.connection is not a 
recognized option
at 
joptsimple.OptionException.unrecognizedOption(OptionException.java:108)
at joptsimple.OptionParser.handleLongOptionToken(OptionParser.java:510)
at 
joptsimple.OptionParserState$2.handleArgument(OptionParserState.java:56)
at joptsimple.OptionParser.parse(OptionParser.java:396)
at kafka.admin.ZkSecurityMigrator$.run(ZkSecurityMigrator.scala:80)
at kafka.admin.ZkSecurityMigrator$.main(ZkSecurityMigrator.scala:117)
at kafka.admin.ZkSecurityMigrator.main(ZkSecurityMigrator.scala)

> argument error,and command parsed error
> ---
>
> Key: KAFKA-4603
> URL: https://issues.apache.org/jira/browse/KAFKA-4603
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, documentation
>Affects Versions: 0.10.0.1, 0.10.2.0
> Environment: suse
>Reporter: Xin
>Priority: Minor
>
> according to the 7.6.2 Migrating clusters of document :
> ./zookeeper-security-migration.sh --zookeeper.acl=secure 
> --zookeeper.connection=localhost:2181
> joptsimple.OptionArgumentConversionException: Cannot parse argument 
> 'localhost:2181' of option zookeeper.connection.timeout
>   at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:93)
>   at 
> joptsimple.ArgumentAcceptingOptionSpec.convert(ArgumentAcceptingOptionSpec.java:274)
>   at joptsimple.OptionSet.valuesOf(OptionSet.java:223)
>   at joptsimple.OptionSet.valueOf(OptionSet.java:172)
>   at kafka.admin.ZkSecurityMigrator$.run(ZkSecurityMigrator.scala:111)
>   at kafka.admin.ZkSecurityMigrator$.main(ZkSecurityMigrator.scala:119)
>   at kafka.admin.ZkSecurityMigrator.main(ZkSecurityMigrator.scala)
> Caused by: joptsimple.internal.ReflectionException: 
> java.lang.NumberFormatException: For input string: "localhost:2181"
>   at 
> joptsimple.internal.Reflection.reflectionException(Reflection.java:140)
>   at joptsimple.internal.Reflection.invoke(Reflection.java:122)
>   at 
> joptsimple.internal.MethodInvokingValueConverter.convert(MethodInvokingValueConverter.java:48)
>   at joptsimple.internal.Reflection.convertWith(Reflection.java:128)
>   at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:90)
>   ... 6 more
> Caused by: java.lang.NumberFormatException: For input string: "localhost:2181"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:492)
>   at java.lang.Integer.valueOf(Integer.java:582)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at joptsimple.internal.Reflection.invoke(Reflection.java:119)
>   ... 9 more
> ===>the argument  "zookeeper.connection" has been parsed to 
> "zookeeper.connection.timeout"
> using help i found that  the argument  is :
> --zookeeper.connectSets the ZooKeeper connect string 
>  (ensemble). This parameter takes a  
>  comma-separated list of host:port   
>  pairs. (default: localhost:2181)
> --zookeeper.connection.timeout Sets the ZooKeeper connection timeout.
> the document describe wrong, and the code also has something wrong:
>  in ZkSecurityMigrator.scala,
>   val parser = new OptionParse()==>
> Any of --v, --ve, ... are accepted on the command line and treated as though 
> you had typed --verbose.
> To suppress this behavior, use the OptionParser constructor 
> OptionParser(boolean allowAbbreviations) and pass a value of false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4400) Prefix for sink task consumer groups should be configurable

2017-01-05 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803350#comment-15803350
 ] 

Ewen Cheslack-Postava commented on KAFKA-4400:
--

[~skyahead] Did you see the comments on the PR? I'd love to get this fixed 
since anyone deploying multiple Connect clusters on the same Kafka cluster can 
be affected by this if they have connectors with the same name in both Connect 
clusters. A KIP would probably be super easy and minimal, any interest in 
pushing this forward?

> Prefix for sink task consumer groups should be configurable
> ---
>
> Key: KAFKA-4400
> URL: https://issues.apache.org/jira/browse/KAFKA-4400
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> Currently the prefix for creating consumer groups is fixed. This means that 
> if you run multiple Connect clusters using the same Kafka cluster and create 
> connectors with the same name, sink tasks in different clusters will join the 
> same group. Making this prefix configurable at the worker level would protect 
> against this.
> An alternative would be to define unique cluster IDs for each connect 
> cluster, which would allow us to construct a unique name for the group 
> without requiring yet another config (but presents something of a 
> compatibility challenge).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4603) argument error,and command parsed error

2017-01-05 Thread Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803345#comment-15803345
 ] 

Xin commented on KAFKA-4603:


I think  that  new OptionParser() not suitable for kafka
command in kafka has many similar  prefixion

this may lead to command parse error
 
To suppress this behavior, use the OptionParser constructor 
OptionParser(boolean allowAbbreviations) and pass a value of false.

> argument error,and command parsed error
> ---
>
> Key: KAFKA-4603
> URL: https://issues.apache.org/jira/browse/KAFKA-4603
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, documentation
>Affects Versions: 0.10.0.1, 0.10.2.0
> Environment: suse
>Reporter: Xin
>Priority: Minor
>
> according to the 7.6.2 Migrating clusters of document :
> ./zookeeper-security-migration.sh --zookeeper.acl=secure 
> --zookeeper.connection=localhost:2181
> joptsimple.OptionArgumentConversionException: Cannot parse argument 
> 'localhost:2181' of option zookeeper.connection.timeout
>   at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:93)
>   at 
> joptsimple.ArgumentAcceptingOptionSpec.convert(ArgumentAcceptingOptionSpec.java:274)
>   at joptsimple.OptionSet.valuesOf(OptionSet.java:223)
>   at joptsimple.OptionSet.valueOf(OptionSet.java:172)
>   at kafka.admin.ZkSecurityMigrator$.run(ZkSecurityMigrator.scala:111)
>   at kafka.admin.ZkSecurityMigrator$.main(ZkSecurityMigrator.scala:119)
>   at kafka.admin.ZkSecurityMigrator.main(ZkSecurityMigrator.scala)
> Caused by: joptsimple.internal.ReflectionException: 
> java.lang.NumberFormatException: For input string: "localhost:2181"
>   at 
> joptsimple.internal.Reflection.reflectionException(Reflection.java:140)
>   at joptsimple.internal.Reflection.invoke(Reflection.java:122)
>   at 
> joptsimple.internal.MethodInvokingValueConverter.convert(MethodInvokingValueConverter.java:48)
>   at joptsimple.internal.Reflection.convertWith(Reflection.java:128)
>   at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:90)
>   ... 6 more
> Caused by: java.lang.NumberFormatException: For input string: "localhost:2181"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:492)
>   at java.lang.Integer.valueOf(Integer.java:582)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at joptsimple.internal.Reflection.invoke(Reflection.java:119)
>   ... 9 more
> ===>the argument  "zookeeper.connection" has been parsed to 
> "zookeeper.connection.timeout"
> using help i found that  the argument  is :
> --zookeeper.connectSets the ZooKeeper connect string 
>  (ensemble). This parameter takes a  
>  comma-separated list of host:port   
>  pairs. (default: localhost:2181)
> --zookeeper.connection.timeout Sets the ZooKeeper connection timeout.
> the document describe wrong, and the code also has something wrong:
>  in ZkSecurityMigrator.scala,
>   val parser = new OptionParse()==>
> Any of --v, --ve, ... are accepted on the command line and treated as though 
> you had typed --verbose.
> To suppress this behavior, use the OptionParser constructor 
> OptionParser(boolean allowAbbreviations) and pass a value of false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4603) argument error,and command parsed error

2017-01-05 Thread Xin (JIRA)
Xin created KAFKA-4603:
--

 Summary: argument error,and command parsed error
 Key: KAFKA-4603
 URL: https://issues.apache.org/jira/browse/KAFKA-4603
 Project: Kafka
  Issue Type: Bug
  Components: admin, documentation
Affects Versions: 0.10.0.1, 0.10.2.0
 Environment: suse
Reporter: Xin
Priority: Minor



according to the 7.6.2 Migrating clusters of document :

./zookeeper-security-migration.sh --zookeeper.acl=secure 
--zookeeper.connection=localhost:2181
joptsimple.OptionArgumentConversionException: Cannot parse argument 
'localhost:2181' of option zookeeper.connection.timeout
at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:93)
at 
joptsimple.ArgumentAcceptingOptionSpec.convert(ArgumentAcceptingOptionSpec.java:274)
at joptsimple.OptionSet.valuesOf(OptionSet.java:223)
at joptsimple.OptionSet.valueOf(OptionSet.java:172)
at kafka.admin.ZkSecurityMigrator$.run(ZkSecurityMigrator.scala:111)
at kafka.admin.ZkSecurityMigrator$.main(ZkSecurityMigrator.scala:119)
at kafka.admin.ZkSecurityMigrator.main(ZkSecurityMigrator.scala)
Caused by: joptsimple.internal.ReflectionException: 
java.lang.NumberFormatException: For input string: "localhost:2181"
at 
joptsimple.internal.Reflection.reflectionException(Reflection.java:140)
at joptsimple.internal.Reflection.invoke(Reflection.java:122)
at 
joptsimple.internal.MethodInvokingValueConverter.convert(MethodInvokingValueConverter.java:48)
at joptsimple.internal.Reflection.convertWith(Reflection.java:128)
at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:90)
... 6 more
Caused by: java.lang.NumberFormatException: For input string: "localhost:2181"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at joptsimple.internal.Reflection.invoke(Reflection.java:119)
... 9 more



===>the argument  "zookeeper.connection" has been parsed to 
"zookeeper.connection.timeout"

using help i found that  the argument  is :
--zookeeper.connectSets the ZooKeeper connect string 
 (ensemble). This parameter takes a  
 comma-separated list of host:port   
 pairs. (default: localhost:2181)
--zookeeper.connection.timeout Sets the ZooKeeper connection timeout.

the document describe wrong, and the code also has something wrong:

 in ZkSecurityMigrator.scala,
  val parser = new OptionParse()==>

Any of --v, --ve, ... are accepted on the command line and treated as though 
you had typed --verbose.

To suppress this behavior, use the OptionParser constructor 
OptionParser(boolean allowAbbreviations) and pass a value of false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread Joel Koshy
While I realize this only marks the old consumer as deprecated and not a
complete removal, I agree that it is somewhat premature to do this prior to
having a migration process implemented. Onur has described this in detail
in the earlier thread: http://markmail.org/message/ekv352zy7xttco5s and I'm
surprised that more companies aren't affected by (or aware of?) the issue.

On Thu, Jan 5, 2017 at 4:40 PM, radai  wrote:

> I cant speak for anyone else, but a rolling upgrade is definitely how we
> (LinkedIn) will do the migration.
>
> On Thu, Jan 5, 2017 at 4:28 PM, Gwen Shapira  wrote:
>
> > it sounds good to have
> > it, but that's probably not how people will end up migrati
> >
>


[jira] [Commented] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped

2017-01-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803315#comment-15803315
 ] 

huxi commented on KAFKA-4599:
-

What kind of Storm-supplied kafka client do you use? storm-kafka or 
storm-kafka-client ? And the version?

> KafkaConsumer encounters SchemaException when Kafka broker stopped
> --
>
> Key: KAFKA-4599
> URL: https://issues.apache.org/jira/browse/KAFKA-4599
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Andrew Olson
>
> We recently observed an issue in production that can apparently occur a small 
> percentage of the time when a Kafka broker is stopped. We're using version 
> 0.9.0.1 for all brokers and clients.
> During a recent episode, 3 KafkaConsumer instances (out of approximately 100) 
> ran into the following SchemaException within a few seconds of instructing 
> the broker to shutdown.
> {noformat}
> 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: 
> Error reading field 'responses': Error reading array of size 2774863, only 62 
> bytes available
>   at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> {noformat}
> The exception message was slightly different for one consumer,
> {{Error reading field 'responses': Error reading array of size 2774863, only 
> 260 bytes available}}
> The exception was not caught and caused the Storm Executor thread to restart, 
> so it's not clear if it would have been transient or fatal for the 
> KafkaConsumer.
> Here are the initial broker shutdown logs,
> {noformat}
> 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> shutting down
> 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutting down
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Stopped 
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutdown completed
> 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-3-30], Shutting down
> 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> Controlled shutdown succeeded
> 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutting down
> 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutdown completed
> {noformat}
> We've found one very similar reported occurrence,
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Jason Gustafson
Hey Gwen,

I'm not super strong on this, but I think the case for a longer timeout as
the default behavior is weaker for the consumer. For the producer, it means
we might lose messages that the application tried to send. For the
consumer, it means we might lose offset commits, which means duplicates
later on. But if we're not attempting coordinator rediscovery on connection
failures or retrying offset commits anyway, does the extra time help that
much? I'm not sure, but I'd rather have a reasonable bound on the shutdown
time as the default behavior and let users who want to wait longer provide
their own timeout. I've seen a few too many cases in the mail lists where
users complain about services taking too long to shutdown. The other issue,
as Jay pointed out, is that the current behavior of close() has an
effectively small timeout, so this changes the behavior of existing code,
which seems best to avoid.

-Jason

On Thu, Jan 5, 2017 at 3:09 PM, Gwen Shapira  wrote:

> I hate going back and forth on this, but KafkaProducer.close() (with
> no timeout) is equivalent to close(Long.MAX_VALUE,
> TimeUnit.MILLISECONDS), while the KafkaConsumer.close() is equivalent
> to close(30*1000,TimeUnit.MILLISECONDS).
>
> Isn't this kind of inconsistency best to avoid?
>
> On Thu, Jan 5, 2017 at 2:08 PM, Rajini Sivaram 
> wrote:
> > Thank you, Ismael. I have sent another one. Hopefully that will appear in
> > its own thread.
> >
> > Rajini
> >
> > On Thu, Jan 5, 2017 at 9:30 PM, Ismael Juma  wrote:
> >
> >> Thanks Rajini. This seems to be happening a lot lately: Gmail is showing
> >> the vote message in the discuss thread.
> >>
> >> Ismael
> >>
> >> On Thu, Jan 5, 2017 at 9:23 PM, Rajini Sivaram  >
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I would like to start the voting process for *KIP-102 - Add close with
> >> > timeout for consumers:*
> >> >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > 102+-+Add+close+with+timeout+for+consumers
> >> >
> >> >
> >> > This KIP adds a new close method with a timeout for consumers similar
> to
> >> > the close method in the producer. As described in the discussion
> thread
> >> >  mbox/%3cCAG_+
> >> > n9us5ohthwmyai9pz4s2j62fmils2ufj8oie9jpmyaf...@mail.gmail.com%3e>,
> >> > the changes are only in the close code path and hence the impact is
> not
> >> too
> >> > big. The existing close() method without a timeout will use a default
> >> > timeout of 30 seconds.
> >> >
> >> > Thank you...
> >> >
> >> > Regards,
> >> >
> >> > Rajini
> >> >
> >>
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


[DISCUSS] KIP-110: Add Codec for ZStandard Compression

2017-01-05 Thread Dongjin Lee
Hi all,

I've just posted a new KIP "KIP-110: Add Codec for ZStandard Compression"
for
discussion:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression

Please have a look when you are free.

Best,
Dongjin

-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
linkedin:
kr.linkedin.com/in/dongjinleekr
github:
github.com/dongjinleekr
twitter: www.twitter.com/dongjinleekr
*


Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-05 Thread Joel Koshy
(adding the dev list back - as it seems to have gotten dropped earlier in
this thread)

On Thu, Jan 5, 2017 at 6:36 PM, Joel Koshy  wrote:

> +1
>
> This is a very well-written KIP!
> Minor: there is still a mix of terms in the doc that references the
> earlier LeaderGenerationRequest (which is what I'm assuming what it was
> called in previous versions of the wiki). Same for the diagrams which I'm
> guessing are a little harder to make consistent with the text.
>
>
>
> On Thu, Jan 5, 2017 at 5:54 PM, Jun Rao  wrote:
>
>> Hi, Ben,
>>
>> Thanks for the updated KIP. +1
>>
>> 1) In OffsetForLeaderEpochResponse, start_offset probably should be
>> end_offset since it's the end offset of that epoch.
>> 3) That's fine. We can fix KAFKA-1120 separately.
>>
>> Jun
>>
>>
>> On Thu, Jan 5, 2017 at 11:11 AM, Ben Stopford  wrote:
>>
>> > Hi Jun
>> >
>> > Thanks for raising these points. Thorough as ever!
>> >
>> > 1) Changes made as requested.
>> > 2) Done.
>> > 3) My plan for handing returning leaders is to simply to force the
>> Leader
>> > Epoch to increment if a leader returns. I don't plan to fix KAFKA-1120
>> as
>> > part of this KIP. It is really a separate issue with wider implications.
>> > I'd be happy to add KAFKA-1120 into the release though if we have time.
>> > 4) Agreed. Not sure exactly how that's going to play out, but I think
>> we're
>> > on the same page.
>> >
>> > Please could
>> >
>> > Cheers
>> > B
>> >
>> > On Thu, Jan 5, 2017 at 12:50 AM Jun Rao  wrote:
>> >
>> > > Hi, Ben,
>> > >
>> > > Thanks for the proposal. Looks good overall. A few comments below.
>> > >
>> > > 1. For LeaderEpochRequest, we need to include topic right? We probably
>> > want
>> > > to follow other requests by nesting partition inside topic? For
>> > > LeaderEpochResponse,
>> > > do we need to return leader_epoch? I was thinking that we could just
>> > return
>> > > an end_offset, which is the next offset of the last message in the
>> > > requested leader generation. Finally, would
>> OffsetForLeaderEpochRequest
>> > be
>> > > a better name?
>> > >
>> > > 2. We should bump up both the produce request and the fetch request
>> > > protocol version since both include the message set.
>> > >
>> > > 3. Extending LeaderEpoch to include Returning Leaders: To support
>> this,
>> > do
>> > > you plan to use the approach that stores  CZXID in the broker
>> > registration
>> > > and including the CZXID of the leader in /brokers/topics/[topic]/
>> > > partitions/[partitionId]/state in ZK?
>> > >
>> > > 4. Since there are a few other KIPs involving message format too, it
>> > would
>> > > be useful to consider if we could combine the message format changes
>> in
>> > the
>> > > same release.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > >
>> > > On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford 
>> wrote:
>> > >
>> > > > Hi All
>> > > >
>> > > > We’re having some problems with this thread being subsumed by the
>> > > > [Discuss] thread. Hopefully this one will appear distinct. If you
>> see
>> > > more
>> > > > than one, please use this one.
>> > > >
>> > > > KIP-101 should now be ready for a vote. As a reminder the KIP
>> proposes
>> > a
>> > > > change to the replication protocol to remove the potential for
>> replicas
>> > > to
>> > > > diverge.
>> > > >
>> > > > The KIP can be found here:  https://cwiki.apache.org/confl
>> > > > uence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+
>> > > > use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-
>> > > > +Alter+Replication+Protocol+to+use+Leader+Epoch+rather+
>> > > > than+High+Watermark+for+Truncation>
>> > > >
>> > > > Please let us know your vote.
>> > > >
>> > > > B
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > >
>> >
>>
>
>


Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2017-01-05 Thread Ewen Cheslack-Postava
+1

On Thu, Jan 5, 2017 at 1:30 PM, Ismael Juma  wrote:

> Thanks for the updates, +1
>
> Ismael
>
> On Thu, Jan 5, 2017 at 6:14 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com
> > wrote:
>
> > I'd like to start another round of voting on KIP-88 after recent changes
> > to the KIP (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 88%3A+OffsetFetch+Protocol+Update
> >
> > ).
> >
> > The discussion thread can be found here:
> > https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html
> >
> > The discussion around recent changes to the KIP starts here:
> > https://www.mail-archive.com/dev@kafka.apache.org/msg62357.html
> >
> > Thank you for your feedback.
> >
> > Regards,
> > --Vahid
> >
> >
>


Re: [VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Ewen Cheslack-Postava
For 0.10.2.0, this can absolutely make it in if we get the reviews done
quickly enough. The cutoff for getting a new feature in would be the
feature freeze on Jan 13 (when we generate the release branch and start
having to cherry-pick commits, so we want to limit to stabilization and
important bug fixes).

See
https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan
for a more in depth explanation. I just sent out the announcement to make
sure everyone had a couple of weeks of warning.

-Ewen

On Thu, Jan 5, 2017 at 3:09 PM, Gwen Shapira  wrote:

> I hate going back and forth on this, but KafkaProducer.close() (with
> no timeout) is equivalent to close(Long.MAX_VALUE,
> TimeUnit.MILLISECONDS), while the KafkaConsumer.close() is equivalent
> to close(30*1000,TimeUnit.MILLISECONDS).
>
> Isn't this kind of inconsistency best to avoid?
>
> On Thu, Jan 5, 2017 at 2:08 PM, Rajini Sivaram 
> wrote:
> > Thank you, Ismael. I have sent another one. Hopefully that will appear in
> > its own thread.
> >
> > Rajini
> >
> > On Thu, Jan 5, 2017 at 9:30 PM, Ismael Juma  wrote:
> >
> >> Thanks Rajini. This seems to be happening a lot lately: Gmail is showing
> >> the vote message in the discuss thread.
> >>
> >> Ismael
> >>
> >> On Thu, Jan 5, 2017 at 9:23 PM, Rajini Sivaram  >
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I would like to start the voting process for *KIP-102 - Add close with
> >> > timeout for consumers:*
> >> >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > 102+-+Add+close+with+timeout+for+consumers
> >> >
> >> >
> >> > This KIP adds a new close method with a timeout for consumers similar
> to
> >> > the close method in the producer. As described in the discussion
> thread
> >> >  mbox/%3cCAG_+
> >> > n9us5ohthwmyai9pz4s2j62fmils2ufj8oie9jpmyaf...@mail.gmail.com%3e>,
> >> > the changes are only in the close code path and hence the impact is
> not
> >> too
> >> > big. The existing close() method without a timeout will use a default
> >> > timeout of 30 seconds.
> >> >
> >> > Thank you...
> >> >
> >> > Regards,
> >> >
> >> > Rajini
> >> >
> >>
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: [VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Sriram Subramanian
+1

On Thu, Jan 5, 2017 at 6:30 PM, Ewen Cheslack-Postava 
wrote:

> +1
>
> -Ewen
>
> On Thu, Jan 5, 2017 at 5:48 PM, Neha Narkhede  wrote:
>
> > +1 (binding)
> >
> > On Thu, Jan 5, 2017 at 2:07 PM Rajini Sivaram 
> > wrote:
> >
> > > Hi all,
> > >
> > >
> > > I would like to start the voting process for *KIP-102 - Add close with
> > > timeout for consumers*:
> > >
> > >
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 102+-+Add+close+with+timeout+for+consumers
> > >
> > >
> > >
> > > This KIP adds a new close method with a timeout for consumers similar
> to
> > > the close method in the producer. As described in the discussion thread
> > > <
> > > http://mail-archives.apache.org/mod_mbox/kafka-dev/201612.
> mbox/%3cCAG_+
> > n9us5ohthwmyai9pz4s2j62fmils2ufj8oie9jpmyaf...@mail.gmail.com%3e
> > > >,
> > > the changes are only in the close code path and hence the impact is not
> > too
> > > big. The existing close() method without a timeout will use a default
> > > timeout of 30 seconds.
> > >
> > >
> > > Thank you
> > >
> > >
> > > Regards,
> > >
> > >
> > > Rajini
> > >
> > --
> > Thanks,
> > Neha
> >
>


Re: [VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Ewen Cheslack-Postava
+1

-Ewen

On Thu, Jan 5, 2017 at 5:48 PM, Neha Narkhede  wrote:

> +1 (binding)
>
> On Thu, Jan 5, 2017 at 2:07 PM Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> >
> > I would like to start the voting process for *KIP-102 - Add close with
> > timeout for consumers*:
> >
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 102+-+Add+close+with+timeout+for+consumers
> >
> >
> >
> > This KIP adds a new close method with a timeout for consumers similar to
> > the close method in the producer. As described in the discussion thread
> > <
> > http://mail-archives.apache.org/mod_mbox/kafka-dev/201612.mbox/%3cCAG_+
> n9us5ohthwmyai9pz4s2j62fmils2ufj8oie9jpmyaf...@mail.gmail.com%3e
> > >,
> > the changes are only in the close code path and hence the impact is not
> too
> > big. The existing close() method without a timeout will use a default
> > timeout of 30 seconds.
> >
> >
> > Thank you
> >
> >
> > Regards,
> >
> >
> > Rajini
> >
> --
> Thanks,
> Neha
>


Re: [VOTE] KIP-72 - Allow putting a bound on memory consumed by Incoming requests

2017-01-05 Thread Jun Rao
Hi, Radai,

The new metrics look good. +1 on the KIP.

Thanks,

Jun

On Fri, Dec 16, 2016 at 4:46 PM, radai  wrote:

> I've added the 3 new metrics/sensors i've implemented to the KIP.
>
> at this point I would need to re-validate the functionality (which i expect
> to do early january).
>
> code reviews welcome ;-)
>
> On Mon, Nov 28, 2016 at 10:37 AM, radai 
> wrote:
>
> > will do (only added a single one so far, the rest TBD)
> >
> > On Mon, Nov 28, 2016 at 10:04 AM, Jun Rao  wrote:
> >
> >> Hi, Radai,
> >>
> >> Could you add a high level description of the newly added metrics to the
> >> KIP wiki?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Wed, Nov 23, 2016 at 3:45 PM, radai 
> >> wrote:
> >>
> >> > Hi Jun,
> >> >
> >> > I've added the sensor you requested (or at least I think I did )
> >> >
> >> > On Fri, Nov 18, 2016 at 12:37 PM, Jun Rao  wrote:
> >> >
> >> > > KafkaRequestHandlerPool
> >> >
> >>
> >
> >
>


Re: [VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Neha Narkhede
+1 (binding)

On Thu, Jan 5, 2017 at 2:07 PM Rajini Sivaram 
wrote:

> Hi all,
>
>
> I would like to start the voting process for *KIP-102 - Add close with
> timeout for consumers*:
>
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-102+-+Add+close+with+timeout+for+consumers
>
>
>
> This KIP adds a new close method with a timeout for consumers similar to
> the close method in the producer. As described in the discussion thread
> <
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201612.mbox/%3ccag_+n9us5ohthwmyai9pz4s2j62fmils2ufj8oie9jpmyaf...@mail.gmail.com%3e
> >,
> the changes are only in the close code path and hence the impact is not too
> big. The existing close() method without a timeout will use a default
> timeout of 30 seconds.
>
>
> Thank you
>
>
> Regards,
>
>
> Rajini
>
-- 
Thanks,
Neha


Re: [VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-05 Thread Neha Narkhede
+1 (binding)

On Wed, Jan 4, 2017 at 2:36 PM Shikhar Bhushan  wrote:

> I do plan on introducing a new `connect:transforms` module (which
> `connect:runtime` will depend on), so they will live in a separate module
> in the source tree and output.
>
> ( https://github.com/apache/kafka/pull/2299 )
>
> On Wed, Jan 4, 2017 at 2:28 PM Ewen Cheslack-Postava 
> wrote:
>
> > +1
> >
> > Gwen, re: bundling transformations, would it help at all to isolate them
> to
> > a separate jar or is the concern purely about maintaining them as part of
> > Kafka?
> >
> > -Ewen
> >
> > On Wed, Jan 4, 2017 at 1:31 PM, Sriram Subramanian 
> > wrote:
> >
> > > +1
> > >
> > > On Wed, Jan 4, 2017 at 1:29 PM, Gwen Shapira 
> wrote:
> > >
> > > > I would have preferred not to bundle transformations, but since SMT
> > > > capability is a much needed feature, I'll take it in its current
> form.
> > > >
> > > > +1
> > > >
> > > > On Wed, Jan 4, 2017 at 10:47 AM, Shikhar Bhushan <
> shik...@confluent.io
> > >
> > > > wrote:
> > > > > Hi all,
> > > > >
> > > > > I'd like to start voting on KIP-66:
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 66%3A+Single+Message+Transforms+for+Kafka+Connect
> > > > >
> > > > > Best,
> > > > >
> > > > > Shikhar
> > > >
> > > >
> > > >
> > > > --
> > > > Gwen Shapira
> > > > Product Manager | Confluent
> > > > 650.450.2760 <(650)%20450-2760> <(650)%20450-2760> | @gwenshap
> > > > Follow us: Twitter | blog
> > > >
> > >
> >
>
-- 
Thanks,
Neha


[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803184#comment-15803184
 ] 

Jun Rao commented on KAFKA-4497:


[~michael.andre.pearce], I did the following test. (1) Run Kafka 0.10.1.1 
binary from the website on the latest log you provided. The cleaner ran 
successfully. (2) Run Kafka 0.10.1.0 binary from the website on the same log. 
The cleaner failed on the same exception your reported. So, I am wondering if 
you can check your environment again to make sure the right jar is picked up at 
runtime. Also, you mentioned that someone else reported the same issue on 
0.10.1.1. Do you know where it is reported?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$

[jira] [Commented] (KAFKA-1817) AdminUtils.createTopic vs kafka-topics.sh --create with partitions

2017-01-05 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803174#comment-15803174
 ] 

Jeff Widman commented on KAFKA-1817:


So can this be closed?

> AdminUtils.createTopic vs kafka-topics.sh --create with partitions
> --
>
> Key: KAFKA-1817
> URL: https://issues.apache.org/jira/browse/KAFKA-1817
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.8.2.0
> Environment: debian linux current version  up to date
>Reporter: Jason Kania
>
> When topics are created using AdminUtils.createTopic in code, no partitions 
> folder is created The zookeeper shell shows this.
> ls /brokers/topics/foshizzle
> []
> However, when kafka-topics.sh --create is run, the partitions folder is 
> created:
> ls /brokers/topics/foshizzle
> [partitions]
> The unfortunately useless error message "KeeperErrorCode = NoNode for 
> /brokers/topics/periodicReading/partitions" makes it unclear what to do. When 
> the topics are listed via kafka-topics.sh, they appear to have been created 
> fine. It would be good if the exception was wrapped by Kafka to suggested 
> looking in the zookeeper shell so a person didn't have to dig around to 
> understand what the meaning of this path is...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #1151

2017-01-05 Thread Apache Jenkins Server
See 

Changes:

[me] KAFKA-4575: ensure topic created before starting sink for

--
[...truncated 7963 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.integration.AutoOffsetResetTe

[GitHub] kafka pull request #2321: MINOR: update JavaDoc for simple helper interfaces...

2017-01-05 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2321

MINOR: update JavaDoc for simple helper interfaces of KStream and KTable 
operators



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka javaDocImprovements3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2321


commit eeb39aceb0797c081b45ae945085ab31c3fa914e
Author: Matthias J. Sax 
Date:   2017-01-06T01:06:44Z

MINOR: update JavaDoc for simple helper interfaces of KStream and KTable 
operators




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread radai
I cant speak for anyone else, but a rolling upgrade is definitely how we
(LinkedIn) will do the migration.

On Thu, Jan 5, 2017 at 4:28 PM, Gwen Shapira  wrote:

> it sounds good to have
> it, but that's probably not how people will end up migrati
>


Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread Gwen Shapira
Since the APIs are super different, I expect migrating from the old to
the new consumer will involve some re-write of the app that does the
consuming. In most such cases, the upgrade path involves running both
versions side-by-side for a while, validating results and then
retiring the old version. Sometimes migration of offsets is needed and
Grant published a tool for that a while back.

Having a rolling upgrade plan between both APIs is pretty involved and
I'm not sure there's a real demand for it (i.e. it sounds good to have
it, but that's probably not how people will end up migrating).

Gwen

On Thu, Jan 5, 2017 at 3:42 PM, radai  wrote:
> im all for (working towards) getting rid of old code, but there's still no
> solid migration path - you'll be "stranding" users on deprecated, no longer
> maintained code with no "safe" way out that does not involve downtime
> (specifically old and new consumers cannot correctly divide up partitions
> between themselves if both operate within the same group on the same topic).
>
> On Thu, Jan 5, 2017 at 3:10 PM, Gwen Shapira  wrote:
>
>> Very strong support from me too :)
>>
>> On Thu, Jan 5, 2017 at 12:09 PM, Vahid S Hashemian
>>  wrote:
>> > Hi all,
>> >
>> > There was some discussion recently on deprecating the old consumer (
>> > https://www.mail-archive.com/dev@kafka.apache.org/msg59084.html).
>> > Ismael suggested to cover the discussion and voting of major deprecations
>> > like this under a KIP.
>> >
>> > So I started KIP-109 (
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 109%3A+Old+Consumer+Deprecation
>> > ) and look forward to your feedback and comments.
>> >
>> > We'd like to implement this deprecation in the upcoming 0.10.2.0 release.
>> >
>> > Thanks.
>> > --Vahid
>> >
>>
>>
>>
>> --
>> Gwen Shapira
>> Product Manager | Confluent
>> 650.450.2760 | @gwenshap
>> Follow us: Twitter | blog
>>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


[jira] [Created] (KAFKA-4602) KIP-72 Allow putting a bound on memory consumed by Incoming requests

2017-01-05 Thread radai rosenblatt (JIRA)
radai rosenblatt created KAFKA-4602:
---

 Summary: KIP-72 Allow putting a bound on memory consumed by 
Incoming requests
 Key: KAFKA-4602
 URL: https://issues.apache.org/jira/browse/KAFKA-4602
 Project: Kafka
  Issue Type: New Feature
  Components: core
Reporter: radai rosenblatt
Assignee: radai rosenblatt


this issue tracks the implementation of KIP-72, as outlined here - 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-72%3A+Allow+putting+a+bound+on+memory+consumed+by+Incoming+requests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4601) Avoid duplicated repartitioning in KStream DSL

2017-01-05 Thread Bill Bejeck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802932#comment-15802932
 ] 

Bill Bejeck commented on KAFKA-4601:


Thanks for the explanation. Avoiding duplicated partitioning sounds more 
involved than I may have time to devote to it.

> Avoid duplicated repartitioning in KStream DSL
> --
>
> Key: KAFKA-4601
> URL: https://issues.apache.org/jira/browse/KAFKA-4601
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: performance
>
> Consider the following DSL:
> {code}
> Stream source = builder.stream(Serdes.String(), 
> Serdes.String(), "topic1").map(..);
> KTable counts = source
> .groupByKey()
> .count("Counts");
> KStream sink = source.leftJoin(counts, ..);
> {code}
> The resulted topology looks like this:
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-FILTER-04, KSTREAM-FILTER-07]
>   KSTREAM-FILTER-04:
>   children:   
> [KSTREAM-SINK-03]
>   KSTREAM-SINK-03:
>   topic:  X-Counts-repartition
>   KSTREAM-FILTER-07:
>   children:   
> [KSTREAM-SINK-06]
>   KSTREAM-SINK-06:
>   topic:  
> X-KSTREAM-MAP-01-repartition
> ProcessorTopology:
>   KSTREAM-SOURCE-08:
>   topics: 
> [X-KSTREAM-MAP-01-repartition]
>   children:   
> [KSTREAM-LEFTJOIN-09]
>   KSTREAM-LEFTJOIN-09:
>   states: [Counts]
>   KSTREAM-SOURCE-05:
>   topics: [X-Counts-repartition]
>   children:   
> [KSTREAM-AGGREGATE-02]
>   KSTREAM-AGGREGATE-02:
>   states: [Counts]
> {code}
> I.e. there are two repartition topics, one for the aggregate and one for the 
> join, which not only introduce unnecessary overheads but also mess up the 
> processing ordering (users are expecting each record to go through 
> aggregation first then the join operator). And in order to get the following 
> simpler topology users today need to add a {{through}} operator after {{map}} 
> manually to enforce repartitioning.
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-SINK-02]
>   KSTREAM-SINK-02:
>   topic:  topic 2
> ProcessorTopology:
>   KSTREAM-SOURCE-03:
>   topics: [topic 2]
>   children:   
> [KSTREAM-AGGREGATE-04, KSTREAM-LEFTJOIN-05]
>   KSTREAM-AGGREGATE-04:
>   states: [Counts]
>   KSTREAM-LEFTJOIN-05:
>   states: [Counts]
> {code} 
> This kind of optimization should be automatic in Streams, which we can 
> consider doing when extending from one-operator-at-a-time translation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4601) Avoid duplicated repartitioning in KStream DSL

2017-01-05 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-4601:
---
Assignee: (was: Bill Bejeck)

> Avoid duplicated repartitioning in KStream DSL
> --
>
> Key: KAFKA-4601
> URL: https://issues.apache.org/jira/browse/KAFKA-4601
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: performance
>
> Consider the following DSL:
> {code}
> Stream source = builder.stream(Serdes.String(), 
> Serdes.String(), "topic1").map(..);
> KTable counts = source
> .groupByKey()
> .count("Counts");
> KStream sink = source.leftJoin(counts, ..);
> {code}
> The resulted topology looks like this:
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-FILTER-04, KSTREAM-FILTER-07]
>   KSTREAM-FILTER-04:
>   children:   
> [KSTREAM-SINK-03]
>   KSTREAM-SINK-03:
>   topic:  X-Counts-repartition
>   KSTREAM-FILTER-07:
>   children:   
> [KSTREAM-SINK-06]
>   KSTREAM-SINK-06:
>   topic:  
> X-KSTREAM-MAP-01-repartition
> ProcessorTopology:
>   KSTREAM-SOURCE-08:
>   topics: 
> [X-KSTREAM-MAP-01-repartition]
>   children:   
> [KSTREAM-LEFTJOIN-09]
>   KSTREAM-LEFTJOIN-09:
>   states: [Counts]
>   KSTREAM-SOURCE-05:
>   topics: [X-Counts-repartition]
>   children:   
> [KSTREAM-AGGREGATE-02]
>   KSTREAM-AGGREGATE-02:
>   states: [Counts]
> {code}
> I.e. there are two repartition topics, one for the aggregate and one for the 
> join, which not only introduce unnecessary overheads but also mess up the 
> processing ordering (users are expecting each record to go through 
> aggregation first then the join operator). And in order to get the following 
> simpler topology users today need to add a {{through}} operator after {{map}} 
> manually to enforce repartitioning.
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-SINK-02]
>   KSTREAM-SINK-02:
>   topic:  topic 2
> ProcessorTopology:
>   KSTREAM-SOURCE-03:
>   topics: [topic 2]
>   children:   
> [KSTREAM-AGGREGATE-04, KSTREAM-LEFTJOIN-05]
>   KSTREAM-AGGREGATE-04:
>   states: [Counts]
>   KSTREAM-LEFTJOIN-05:
>   states: [Counts]
> {code} 
> This kind of optimization should be automatic in Streams, which we can 
> consider doing when extending from one-operator-at-a-time translation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4601) Avoid duplicated repartitioning in KStream DSL

2017-01-05 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802904#comment-15802904
 ] 

Guozhang Wang commented on KAFKA-4601:
--

[~bbejeck] Note that it may be more complex than it sounds: today we translate 
the DSL operators to underlying processor nodes in topology independently (i.e. 
one at a time), so when we are translating a join after an aggregate, we do not 
know what processors have been created so far. To solve this specific issue we 
can make some workaround, but a general solution would be extending the 
translation mechanism to be "global": this can be as complex as query 
optimization.

> Avoid duplicated repartitioning in KStream DSL
> --
>
> Key: KAFKA-4601
> URL: https://issues.apache.org/jira/browse/KAFKA-4601
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: performance
>
> Consider the following DSL:
> {code}
> Stream source = builder.stream(Serdes.String(), 
> Serdes.String(), "topic1").map(..);
> KTable counts = source
> .groupByKey()
> .count("Counts");
> KStream sink = source.leftJoin(counts, ..);
> {code}
> The resulted topology looks like this:
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-FILTER-04, KSTREAM-FILTER-07]
>   KSTREAM-FILTER-04:
>   children:   
> [KSTREAM-SINK-03]
>   KSTREAM-SINK-03:
>   topic:  X-Counts-repartition
>   KSTREAM-FILTER-07:
>   children:   
> [KSTREAM-SINK-06]
>   KSTREAM-SINK-06:
>   topic:  
> X-KSTREAM-MAP-01-repartition
> ProcessorTopology:
>   KSTREAM-SOURCE-08:
>   topics: 
> [X-KSTREAM-MAP-01-repartition]
>   children:   
> [KSTREAM-LEFTJOIN-09]
>   KSTREAM-LEFTJOIN-09:
>   states: [Counts]
>   KSTREAM-SOURCE-05:
>   topics: [X-Counts-repartition]
>   children:   
> [KSTREAM-AGGREGATE-02]
>   KSTREAM-AGGREGATE-02:
>   states: [Counts]
> {code}
> I.e. there are two repartition topics, one for the aggregate and one for the 
> join, which not only introduce unnecessary overheads but also mess up the 
> processing ordering (users are expecting each record to go through 
> aggregation first then the join operator). And in order to get the following 
> simpler topology users today need to add a {{through}} operator after {{map}} 
> manually to enforce repartitioning.
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-SINK-02]
>   KSTREAM-SINK-02:
>   topic:  topic 2
> ProcessorTopology:
>   KSTREAM-SOURCE-03:
>   topics: [topic 2]
>   children:   
> [KSTREAM-AGGREGATE-04, KSTREAM-LEFTJOIN-05]
>   KSTREAM-AGGREGATE-04:
>   states: [Counts]
>   KSTREAM-LEFTJOIN-05:
>   states: [Counts]
> {code} 
> This kind of optimization should be automatic in Streams, which we can 
> consider doing when extending from one-operator-at-a-time translation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread radai
im all for (working towards) getting rid of old code, but there's still no
solid migration path - you'll be "stranding" users on deprecated, no longer
maintained code with no "safe" way out that does not involve downtime
(specifically old and new consumers cannot correctly divide up partitions
between themselves if both operate within the same group on the same topic).

On Thu, Jan 5, 2017 at 3:10 PM, Gwen Shapira  wrote:

> Very strong support from me too :)
>
> On Thu, Jan 5, 2017 at 12:09 PM, Vahid S Hashemian
>  wrote:
> > Hi all,
> >
> > There was some discussion recently on deprecating the old consumer (
> > https://www.mail-archive.com/dev@kafka.apache.org/msg59084.html).
> > Ismael suggested to cover the discussion and voting of major deprecations
> > like this under a KIP.
> >
> > So I started KIP-109 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 109%3A+Old+Consumer+Deprecation
> > ) and look forward to your feedback and comments.
> >
> > We'd like to implement this deprecation in the upcoming 0.10.2.0 release.
> >
> > Thanks.
> > --Vahid
> >
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


[GitHub] kafka pull request #2320: MINOR: Maybe decorate inner topics for SourceNode

2017-01-05 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/2320

MINOR: Maybe decorate inner topics for SourceNode

When creating the source node in TopologyBuilder, we need to decorate its 
input topics if they are inner (i.e. repartition) topics with the prefix.

Also did some minor cleanup in the printing function for better 
visualization in debugging.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka KMinor-source-topic-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2320






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4601) Avoid duplicated repartitioning in KStream DSL

2017-01-05 Thread Bill Bejeck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802878#comment-15802878
 ] 

Bill Bejeck commented on KAFKA-4601:


If it's ok, picking this one up.

> Avoid duplicated repartitioning in KStream DSL
> --
>
> Key: KAFKA-4601
> URL: https://issues.apache.org/jira/browse/KAFKA-4601
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: performance
>
> Consider the following DSL:
> {code}
> Stream source = builder.stream(Serdes.String(), 
> Serdes.String(), "topic1").map(..);
> KTable counts = source
> .groupByKey()
> .count("Counts");
> KStream sink = source.leftJoin(counts, ..);
> {code}
> The resulted topology looks like this:
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-FILTER-04, KSTREAM-FILTER-07]
>   KSTREAM-FILTER-04:
>   children:   
> [KSTREAM-SINK-03]
>   KSTREAM-SINK-03:
>   topic:  X-Counts-repartition
>   KSTREAM-FILTER-07:
>   children:   
> [KSTREAM-SINK-06]
>   KSTREAM-SINK-06:
>   topic:  
> X-KSTREAM-MAP-01-repartition
> ProcessorTopology:
>   KSTREAM-SOURCE-08:
>   topics: 
> [X-KSTREAM-MAP-01-repartition]
>   children:   
> [KSTREAM-LEFTJOIN-09]
>   KSTREAM-LEFTJOIN-09:
>   states: [Counts]
>   KSTREAM-SOURCE-05:
>   topics: [X-Counts-repartition]
>   children:   
> [KSTREAM-AGGREGATE-02]
>   KSTREAM-AGGREGATE-02:
>   states: [Counts]
> {code}
> I.e. there are two repartition topics, one for the aggregate and one for the 
> join, which not only introduce unnecessary overheads but also mess up the 
> processing ordering (users are expecting each record to go through 
> aggregation first then the join operator). And in order to get the following 
> simpler topology users today need to add a {{through}} operator after {{map}} 
> manually to enforce repartitioning.
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-SINK-02]
>   KSTREAM-SINK-02:
>   topic:  topic 2
> ProcessorTopology:
>   KSTREAM-SOURCE-03:
>   topics: [topic 2]
>   children:   
> [KSTREAM-AGGREGATE-04, KSTREAM-LEFTJOIN-05]
>   KSTREAM-AGGREGATE-04:
>   states: [Counts]
>   KSTREAM-LEFTJOIN-05:
>   states: [Counts]
> {code} 
> This kind of optimization should be automatic in Streams, which we can 
> consider doing when extending from one-operator-at-a-time translation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4601) Avoid duplicated repartitioning in KStream DSL

2017-01-05 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reassigned KAFKA-4601:
--

Assignee: Bill Bejeck

> Avoid duplicated repartitioning in KStream DSL
> --
>
> Key: KAFKA-4601
> URL: https://issues.apache.org/jira/browse/KAFKA-4601
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: performance
>
> Consider the following DSL:
> {code}
> Stream source = builder.stream(Serdes.String(), 
> Serdes.String(), "topic1").map(..);
> KTable counts = source
> .groupByKey()
> .count("Counts");
> KStream sink = source.leftJoin(counts, ..);
> {code}
> The resulted topology looks like this:
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-FILTER-04, KSTREAM-FILTER-07]
>   KSTREAM-FILTER-04:
>   children:   
> [KSTREAM-SINK-03]
>   KSTREAM-SINK-03:
>   topic:  X-Counts-repartition
>   KSTREAM-FILTER-07:
>   children:   
> [KSTREAM-SINK-06]
>   KSTREAM-SINK-06:
>   topic:  
> X-KSTREAM-MAP-01-repartition
> ProcessorTopology:
>   KSTREAM-SOURCE-08:
>   topics: 
> [X-KSTREAM-MAP-01-repartition]
>   children:   
> [KSTREAM-LEFTJOIN-09]
>   KSTREAM-LEFTJOIN-09:
>   states: [Counts]
>   KSTREAM-SOURCE-05:
>   topics: [X-Counts-repartition]
>   children:   
> [KSTREAM-AGGREGATE-02]
>   KSTREAM-AGGREGATE-02:
>   states: [Counts]
> {code}
> I.e. there are two repartition topics, one for the aggregate and one for the 
> join, which not only introduce unnecessary overheads but also mess up the 
> processing ordering (users are expecting each record to go through 
> aggregation first then the join operator). And in order to get the following 
> simpler topology users today need to add a {{through}} operator after {{map}} 
> manually to enforce repartitioning.
> {code}
> ProcessorTopology:
>   KSTREAM-SOURCE-00:
>   topics: [topic1]
>   children:   [KSTREAM-MAP-01]
>   KSTREAM-MAP-01:
>   children:   
> [KSTREAM-SINK-02]
>   KSTREAM-SINK-02:
>   topic:  topic 2
> ProcessorTopology:
>   KSTREAM-SOURCE-03:
>   topics: [topic 2]
>   children:   
> [KSTREAM-AGGREGATE-04, KSTREAM-LEFTJOIN-05]
>   KSTREAM-AGGREGATE-04:
>   states: [Counts]
>   KSTREAM-LEFTJOIN-05:
>   states: [Counts]
> {code} 
> This kind of optimization should be automatic in Streams, which we can 
> consider doing when extending from one-operator-at-a-time translation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4601) Avoid duplicated repartitioning in KStream DSL

2017-01-05 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-4601:


 Summary: Avoid duplicated repartitioning in KStream DSL
 Key: KAFKA-4601
 URL: https://issues.apache.org/jira/browse/KAFKA-4601
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Guozhang Wang


Consider the following DSL:

{code}
Stream source = builder.stream(Serdes.String(), 
Serdes.String(), "topic1").map(..);

KTable counts = source
.groupByKey()
.count("Counts");

KStream sink = source.leftJoin(counts, ..);
{code}

The resulted topology looks like this:

{code}
ProcessorTopology:
KSTREAM-SOURCE-00:
topics: [topic1]
children:   [KSTREAM-MAP-01]
KSTREAM-MAP-01:
children:   
[KSTREAM-FILTER-04, KSTREAM-FILTER-07]
KSTREAM-FILTER-04:
children:   
[KSTREAM-SINK-03]
KSTREAM-SINK-03:
topic:  X-Counts-repartition
KSTREAM-FILTER-07:
children:   
[KSTREAM-SINK-06]
KSTREAM-SINK-06:
topic:  
X-KSTREAM-MAP-01-repartition

ProcessorTopology:
KSTREAM-SOURCE-08:
topics: 
[X-KSTREAM-MAP-01-repartition]
children:   
[KSTREAM-LEFTJOIN-09]
KSTREAM-LEFTJOIN-09:
states: [Counts]
KSTREAM-SOURCE-05:
topics: [X-Counts-repartition]
children:   
[KSTREAM-AGGREGATE-02]
KSTREAM-AGGREGATE-02:
states: [Counts]
{code}

I.e. there are two repartition topics, one for the aggregate and one for the 
join, which not only introduce unnecessary overheads but also mess up the 
processing ordering (users are expecting each record to go through aggregation 
first then the join operator). And in order to get the following simpler 
topology users today need to add a {{through}} operator after {{map}} manually 
to enforce repartitioning.

{code}
ProcessorTopology:
KSTREAM-SOURCE-00:
topics: [topic1]
children:   [KSTREAM-MAP-01]
KSTREAM-MAP-01:
children:   
[KSTREAM-SINK-02]
KSTREAM-SINK-02:
topic:  topic 2

ProcessorTopology:
KSTREAM-SOURCE-03:
topics: [topic 2]
children:   
[KSTREAM-AGGREGATE-04, KSTREAM-LEFTJOIN-05]
KSTREAM-AGGREGATE-04:
states: [Counts]
KSTREAM-LEFTJOIN-05:
states: [Counts]
{code} 

This kind of optimization should be automatic in Streams, which we can consider 
doing when extending from one-operator-at-a-time translation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (KAFKA-4468) Correctly calculate the window end timestamp after read from state stores

2017-01-05 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4468 stopped by Bill Bejeck.
--
> Correctly calculate the window end timestamp after read from state stores
> -
>
> Key: KAFKA-4468
> URL: https://issues.apache.org/jira/browse/KAFKA-4468
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: architecture
>
> When storing the WindowedStore on the persistent KV store, we only use the 
> start timestamp of the window as part of the combo-key as (start-timestamp, 
> key). The reason that we do not add the end-timestamp as well is that we can 
> always calculate it from the start timestamp + window_length, and hence we 
> can save 8 bytes per key on the persistent KV store.
> However, after read it (via {{WindowedDeserializer}}) we do not set its end 
> timestamp correctly but just read it as an {{UnlimitedWindow}}. We should fix 
> this by calculating its end timestamp as mentioned above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4468) Correctly calculate the window end timestamp after read from state stores

2017-01-05 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-4468:
---
Assignee: (was: Bill Bejeck)

> Correctly calculate the window end timestamp after read from state stores
> -
>
> Key: KAFKA-4468
> URL: https://issues.apache.org/jira/browse/KAFKA-4468
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: architecture
>
> When storing the WindowedStore on the persistent KV store, we only use the 
> start timestamp of the window as part of the combo-key as (start-timestamp, 
> key). The reason that we do not add the end-timestamp as well is that we can 
> always calculate it from the start timestamp + window_length, and hence we 
> can save 8 bytes per key on the persistent KV store.
> However, after read it (via {{WindowedDeserializer}}) we do not set its end 
> timestamp correctly but just read it as an {{UnlimitedWindow}}. We should fix 
> this by calculating its end timestamp as mentioned above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming sink connector

2017-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802848#comment-15802848
 ] 

ASF GitHub Bot commented on KAFKA-4575:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2313


> Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in 
> consuming messages after resuming sink connector
> --
>
> Key: KAFKA-4575
> URL: https://issues.apache.org/jira/browse/KAFKA-4575
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
> Fix For: 0.10.2.0
>
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {noformat}
> [INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
>  Summary: Failed to consume messages after resuming source connector
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 267, in test_pause_and_resume_sink
> err_msg="Failed to consume messages after resuming source connector")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Failed to consume messages after resuming source connector
> {noformat}
> We recently fixed KAFKA-4527 and this is a new kind of failure in the same 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming sink connector

2017-01-05 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-4575.
--
   Resolution: Fixed
Fix Version/s: 0.10.2.0

Issue resolved by pull request 2313
[https://github.com/apache/kafka/pull/2313]

> Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in 
> consuming messages after resuming sink connector
> --
>
> Key: KAFKA-4575
> URL: https://issues.apache.org/jira/browse/KAFKA-4575
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
> Fix For: 0.10.2.0
>
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {noformat}
> [INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
>  Summary: Failed to consume messages after resuming source connector
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 267, in test_pause_and_resume_sink
> err_msg="Failed to consume messages after resuming source connector")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Failed to consume messages after resuming source connector
> {noformat}
> We recently fixed KAFKA-4527 and this is a new kind of failure in the same 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2313: KAFKA-4575: ensure topic created before starting s...

2017-01-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2313


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread Gwen Shapira
Very strong support from me too :)

On Thu, Jan 5, 2017 at 12:09 PM, Vahid S Hashemian
 wrote:
> Hi all,
>
> There was some discussion recently on deprecating the old consumer (
> https://www.mail-archive.com/dev@kafka.apache.org/msg59084.html).
> Ismael suggested to cover the discussion and voting of major deprecations
> like this under a KIP.
>
> So I started KIP-109 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-109%3A+Old+Consumer+Deprecation
> ) and look forward to your feedback and comments.
>
> We'd like to implement this deprecation in the upcoming 0.10.2.0 release.
>
> Thanks.
> --Vahid
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Gwen Shapira
I hate going back and forth on this, but KafkaProducer.close() (with
no timeout) is equivalent to close(Long.MAX_VALUE,
TimeUnit.MILLISECONDS), while the KafkaConsumer.close() is equivalent
to close(30*1000,TimeUnit.MILLISECONDS).

Isn't this kind of inconsistency best to avoid?

On Thu, Jan 5, 2017 at 2:08 PM, Rajini Sivaram  wrote:
> Thank you, Ismael. I have sent another one. Hopefully that will appear in
> its own thread.
>
> Rajini
>
> On Thu, Jan 5, 2017 at 9:30 PM, Ismael Juma  wrote:
>
>> Thanks Rajini. This seems to be happening a lot lately: Gmail is showing
>> the vote message in the discuss thread.
>>
>> Ismael
>>
>> On Thu, Jan 5, 2017 at 9:23 PM, Rajini Sivaram 
>> wrote:
>>
>> > Hi all,
>> >
>> > I would like to start the voting process for *KIP-102 - Add close with
>> > timeout for consumers:*
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 102+-+Add+close+with+timeout+for+consumers
>> >
>> >
>> > This KIP adds a new close method with a timeout for consumers similar to
>> > the close method in the producer. As described in the discussion thread
>> > > > n9us5ohthwmyai9pz4s2j62fmils2ufj8oie9jpmyaf...@mail.gmail.com%3e>,
>> > the changes are only in the close code path and hence the impact is not
>> too
>> > big. The existing close() method without a timeout will use a default
>> > timeout of 30 seconds.
>> >
>> > Thank you...
>> >
>> > Regards,
>> >
>> > Rajini
>> >
>>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


[jira] [Commented] (KAFKA-4588) QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable is occasionally failing on jenkins

2017-01-05 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802744#comment-15802744
 ] 

Matthias J. Sax commented on KAFKA-4588:


One more instance: 
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/541/testReport/junit/org.apache.kafka.streams.integration/QueryableStateIntegrationTest/shouldNotMakeStoreAvailableUntilAllStoresAvailable_1_/

> QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable
>  is occasionally failing on jenkins
> ---
>
> Key: KAFKA-4588
> URL: https://issues.apache.org/jira/browse/KAFKA-4588
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
> Fix For: 0.10.2.0
>
>
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
> shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] FAILED
> java.lang.AssertionError: Condition not met within timeout 3. waiting 
> for store count-by-key
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:259)
> at 
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable(QueryableStateIntegrationTest.java:502)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4222) Transient failure in QueryableStateIntegrationTest.queryOnRebalance

2017-01-05 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802735#comment-15802735
 ] 

Matthias J. Sax commented on KAFKA-4222:


One more instance: 
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/537/testReport/junit/org.apache.kafka.streams.integration/QueryableStateIntegrationTest/queryOnRebalance_0_/

> Transient failure in QueryableStateIntegrationTest.queryOnRebalance
> ---
>
> Key: KAFKA-4222
> URL: https://issues.apache.org/jira/browse/KAFKA-4222
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Jason Gustafson
>Assignee: Eno Thereska
> Fix For: 0.10.1.0
>
>
> Seen here: https://builds.apache.org/job/kafka-trunk-jdk8/915/console
> {code}
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
> queryOnRebalance[1] FAILED
> java.lang.AssertionError: Condition not met within timeout 3. waiting 
> for metadata, store and value to be non null
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:276)
> at 
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest.verifyAllKVKeys(QueryableStateIntegrationTest.java:263)
> at 
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:342)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-105: Addition of Record Level for Sensors

2017-01-05 Thread Eno Thereska
Thanks Jay, will fix the motivation. We have a microbenchmark and perf graph in 
the PR:
https://github.com/apache/kafka/pull/1446#issuecomment-268106260 


I'll need to think some more about point 3.

Thanks
Eno

> On 5 Jan 2017, at 19:18, Jay Kreps  wrote:
> 
> This is great! A couple of quick comments:
> 
>   1. It'd be good to make the motivation a bit more clear. I think the
>   motivation is "We want to have lots of partition/task/etc metrics but we're
>   concerned about the performance impact so we want to disable them by
>   default." Currently the motivation section is more about the proposed
>   change and doesn't really make clear why.
>   2. Do we have a microbenchmark that shows that the performance of (1)
>   enabled metrics, (2) disabled metrics, (3) no metrics? This would help
>   build the case for needing this extra knob. Obviously if metrics are cheap
>   you would always just leave them enabled and not worry about it. I think
>   there should be some cost because we are at least taking a lock during the
>   recording but I'm not sure how material that is.
>   3. One consideration in how this exposed: we always found the ability to
>   dynamically change the logging level in JMX for log4j pretty useful. I
>   think if we want to leave the door open to add this ability to enable
>   metrics at runtime it may have some impact on the decision around how
>   metrics are registered/reported.
> 
> -Jay
> 
> On Thu, Jan 5, 2017 at 9:59 AM, Guozhang Wang  wrote:
> 
>> I thought about "not registering at all" and left a comment on the PR as
>> well regarding this idea. My concern is that it may be not very
>> straight-forward to implement though via the MetricsReporter interface, if
>> Eno and Aarti has a good approach to tackle it I would love it.
>> 
>> 
>> Guozhang
>> 
>> On Thu, Jan 5, 2017 at 5:34 AM, Eno Thereska 
>> wrote:
>> 
>>> Updated KIP for 1. Waiting to hear from Guozhang on 2 and then we can
>>> proceed.
>>> 
>>> Thanks
>>> Eno
 On 5 Jan 2017, at 12:27, Ismael Juma  wrote:
 
 Thanks Eno. It would be good to update the KIP as well with regards to
>> 1.
 
 About 2, I am not sure how adding a field could break existing tools.
 Having said that, your suggestion not to register sensors at all if
>> their
 record level is below what is configured works for me.
 
 Ismael
 
 On Thu, Jan 5, 2017 at 12:07 PM, Eno Thereska 
 wrote:
 
> Thanks Ismael. Already addressed 1. in the PR.
> 
> As for 2. I'd prefer not adding extra info to the metrics reporters at
> this point, since it might break existing tools out there (e.g., if we
>>> add
> things like configured level). Existing tools might be expecting the
>>> info
> to be reported in a particular format.
> 
> If the current way is confusing, I think the next best alternative is
>> to
> not register sensors at all if their record level is below what is
> configured. That way they don't show up at all. This will require some
>>> more
> code in Sensors.java to check at every step, but I think it's clean
>> from
> the user's point of view.
> 
> Eno
> 
> 
>> On 5 Jan 2017, at 11:23, Ismael Juma  wrote:
>> 
>> Thanks for the KIP, it seems like a good improvement. A couple of
> comments:
>> 
>> 1. As Jun asked in the PR, do we need a broker config as well? The
>>> broker
>> uses Kafka Metrics for some metrics, but we probably don't have any
>>> debug
>> sensors at the broker yet. Either way, it would be good to describe
>> the
>> thinking around this.
>> 
>> 2. The behaviour with regards to the metrics reporter could be
> surprising.
>> It would be good to elaborate a little more on this aspect. For
>>> example,
>> maybe we want to expose the sensor level and the current configured
>>> level
>> to metric reporters. That could then be used to build the debug/info
>> dashboard that Eno mentioned (while making it clear that some metrics
>> exist, but are not currently being recorded).
>> 
>> Ismael
>> 
>> On Thu, Jan 5, 2017 at 10:37 AM, Eno Thereska <
>> eno.there...@gmail.com>
>> wrote:
>> 
>>> Correct on 2. Guozhang: the sensor will be registered and polled by
>> a
>>> reporter, but the metrics associated with it will not be updated.
>>> 
>>> That would allow a user to have, for example, a debug dashboard and
>> an
>>> info dashboard.
>>> 
>>> Updated KIP to make this clear.
>>> 
>>> Thanks
>>> Eno
>>> 
>>> 
 On 4 Jan 2017, at 18:00, Aarti Gupta 
>> wrote:
 
 Thanks for the review, Guozhang,
 
 Addressed 2 out of the three comments,
 
 1. Fixed and updated KIP (swapped code variable name
 METRICS_RECORD_LEVEL_CONFIG with config name metrics.record.level)
 
>

Re: [DISCUSS] KIP-105: Addition of Record Level for Sensors

2017-01-05 Thread Eno Thereska
Guozhang,

I was thinking to do this in Sensor.java, and not touch the MetricsReporter 
interface. Basically I'd go for not adding a KafkaMetric at all with this 
approach. But perhaps I'm missing something.

Thanks
Eno
> On 5 Jan 2017, at 17:59, Guozhang Wang  wrote:
> 
> I thought about "not registering at all" and left a comment on the PR as
> well regarding this idea. My concern is that it may be not very
> straight-forward to implement though via the MetricsReporter interface, if
> Eno and Aarti has a good approach to tackle it I would love it.
> 
> 
> Guozhang
> 
> On Thu, Jan 5, 2017 at 5:34 AM, Eno Thereska  wrote:
> 
>> Updated KIP for 1. Waiting to hear from Guozhang on 2 and then we can
>> proceed.
>> 
>> Thanks
>> Eno
>>> On 5 Jan 2017, at 12:27, Ismael Juma  wrote:
>>> 
>>> Thanks Eno. It would be good to update the KIP as well with regards to 1.
>>> 
>>> About 2, I am not sure how adding a field could break existing tools.
>>> Having said that, your suggestion not to register sensors at all if their
>>> record level is below what is configured works for me.
>>> 
>>> Ismael
>>> 
>>> On Thu, Jan 5, 2017 at 12:07 PM, Eno Thereska 
>>> wrote:
>>> 
 Thanks Ismael. Already addressed 1. in the PR.
 
 As for 2. I'd prefer not adding extra info to the metrics reporters at
 this point, since it might break existing tools out there (e.g., if we
>> add
 things like configured level). Existing tools might be expecting the
>> info
 to be reported in a particular format.
 
 If the current way is confusing, I think the next best alternative is to
 not register sensors at all if their record level is below what is
 configured. That way they don't show up at all. This will require some
>> more
 code in Sensors.java to check at every step, but I think it's clean from
 the user's point of view.
 
 Eno
 
 
> On 5 Jan 2017, at 11:23, Ismael Juma  wrote:
> 
> Thanks for the KIP, it seems like a good improvement. A couple of
 comments:
> 
> 1. As Jun asked in the PR, do we need a broker config as well? The
>> broker
> uses Kafka Metrics for some metrics, but we probably don't have any
>> debug
> sensors at the broker yet. Either way, it would be good to describe the
> thinking around this.
> 
> 2. The behaviour with regards to the metrics reporter could be
 surprising.
> It would be good to elaborate a little more on this aspect. For
>> example,
> maybe we want to expose the sensor level and the current configured
>> level
> to metric reporters. That could then be used to build the debug/info
> dashboard that Eno mentioned (while making it clear that some metrics
> exist, but are not currently being recorded).
> 
> Ismael
> 
> On Thu, Jan 5, 2017 at 10:37 AM, Eno Thereska 
> wrote:
> 
>> Correct on 2. Guozhang: the sensor will be registered and polled by a
>> reporter, but the metrics associated with it will not be updated.
>> 
>> That would allow a user to have, for example, a debug dashboard and an
>> info dashboard.
>> 
>> Updated KIP to make this clear.
>> 
>> Thanks
>> Eno
>> 
>> 
>>> On 4 Jan 2017, at 18:00, Aarti Gupta  wrote:
>>> 
>>> Thanks for the review, Guozhang,
>>> 
>>> Addressed 2 out of the three comments,
>>> 
>>> 1. Fixed and updated KIP (swapped code variable name
>>> METRICS_RECORD_LEVEL_CONFIG with config name metrics.record.level)
>>> 
>>> 3. >>Could you elaborate on the "shouldRecord()" function, e.g. which
>> class
>>> it will be added to? Does it contain any parameters?
>>> 
>>> Added more details on shouldRecord()  on the KIP
>>> 
>>> In Sensor.java the shouldRecord() method is used to compare the value
 of
>>> metric record level in the consumer config and the RecordLevel
 associated
>>> with the sensor, to determine if metrics should recorded.
>>> 
>>> From Sensor.java
>>> 
>>> /**
>>> * @return true if the sensor's record level indicates that the metric
>>> will be recorded, false otherwise
>>> */
>>> public boolean shouldRecord() {
>>> return this.recordLevel.shouldRecord(config.recordLevel());
>>> }
>>> 
>>> From nested enum, Sensor.RecordLevel
>>> 
>>> public boolean shouldRecord(final RecordLevel configLevel) {
>>> if (configLevel.equals(DEBUG)) {
>>> return true;
>>> } else {
>>> return configLevel.equals(this);
>>> }
>>> }
>>> 
>>> 
>>> 2. Need to discuss with Eno.
>>> 
>>> 
>>> Thanks!
>>> 
>>> aarti
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Jan 3, 2017 at 2:27 PM, Guozhang Wang 
>> wrote:
>>> 
 LGTM overall. A few comments below:
 
 1. "METRICS_RECORD_LEVEL_CONFIG" is the internal code's variable
>> name,
>> but
 not the 

Re: [VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Rajini Sivaram
Thank you, Ismael. I have sent another one. Hopefully that will appear in
its own thread.

Rajini

On Thu, Jan 5, 2017 at 9:30 PM, Ismael Juma  wrote:

> Thanks Rajini. This seems to be happening a lot lately: Gmail is showing
> the vote message in the discuss thread.
>
> Ismael
>
> On Thu, Jan 5, 2017 at 9:23 PM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I would like to start the voting process for *KIP-102 - Add close with
> > timeout for consumers:*
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 102+-+Add+close+with+timeout+for+consumers
> >
> >
> > This KIP adds a new close method with a timeout for consumers similar to
> > the close method in the producer. As described in the discussion thread
> >  > n9us5ohthwmyai9pz4s2j62fmils2ufj8oie9jpmyaf...@mail.gmail.com%3e>,
> > the changes are only in the close code path and hence the impact is not
> too
> > big. The existing close() method without a timeout will use a default
> > timeout of 30 seconds.
> >
> > Thank you...
> >
> > Regards,
> >
> > Rajini
> >
>


[VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Rajini Sivaram
Hi all,


I would like to start the voting process for *KIP-102 - Add close with
timeout for consumers*:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-102+-+Add+close+with+timeout+for+consumers



This KIP adds a new close method with a timeout for consumers similar to
the close method in the producer. As described in the discussion thread
,
the changes are only in the close code path and hence the impact is not too
big. The existing close() method without a timeout will use a default
timeout of 30 seconds.


Thank you


Regards,


Rajini


Re: [jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread James Cheng

> On Jan 5, 2017, at 3:47 AM, Edoardo Comar  wrote:
> 
> "Jiangjie Qin (JIRA)"  wrote on 05/01/2017 08:06:58:
> 
>>[ https://issues.apache.org/jira/browse/KAFKA-4497?
>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> tabpanel&focusedCommentId=15800696#comment-15800696 ] 
>> 
>> Jiangjie Qin commented on KAFKA-4497:
>> -
>> 
>> [~michael.andre.pearce] Thanks for helping test this again. 
>> 
>> If possible, could you enable trace level logging on 
>> kafka.log.LogSegment. This will print out the timestamp and the 
>> offset of the message with the largest timestamp, but it will 
>> probably flood the log4j log. So enable this logging is optional and
>> preferred only when the issue can be reproduced pretty quickly.
>> 
> 
> Is it possible to *dynamically* reconfigure log4j so that one could turn 
> tracing on/off for selected components 
> without editing the log4j.properties and restarting Kafka ? 
> 

I've done it before using JMX. 

If you have JMX enabled, you can use jconsole, connect to kafka, and do a 
setLoggerLevel() to change the logging level of a particular package. It 
changes it on a live server, without needing to reboot. It does not save the 
change past a reboot.

I found this page that has the steps (with screenshots) that show how to do it:
http://www.javaworld.com/article/2073316/dynamic-java-log-levels-with-jmx-loggingmxbean--jconsole--visualvm--and-groovy.html
 


-James


> The log4j API supports a watch-file-and-reload
> http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PropertyConfigurator.html#configureAndWatch(java.lang.String)
> but apparently that cannot happen declaratively by just putting the 
> properties file on the classpath.
> 
> Worth a new JIRA/KIP ?
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Re: [DISCUSS] KIP-108: Create Topic Policy

2017-01-05 Thread Colin McCabe
Yeah, I agree... having a validation mode would be nice.  We should be
explicit that passing validation doesn't 100% guarantee that a
subsequent call to create the topic will succeed, though.  There is an
obvious race condition there-- for example, with a plugin which consults
some external authentication system, there could be a change to the
privileges in between validation and attempted creation.

It also seems like we should try to provide a helpful exception message
for the cases where topic creation fails.  This might involve adding
more detail about error conditions to CreateTopicsRequest... right now
it just returns an error code, but a text message would be a nice
addition.

cheers,
Colin


On Thu, Jan 5, 2017, at 13:41, dan wrote:
> it would be nice to have a dry-run or validate ability added to this kip.
> since we are offloading validation to a 3rd party implementor a random
> user
> can't know a priori (based solely on kafka configs) whether a call should
> succeed without actually creating the topic.
> 
> a similar case is in connect where there is a separate endpoint
> 
> to attempt to validate a connect configuration without actually creating
> the connector.
> 
> thanks
> dan
> 
> 
> On Thu, Jan 5, 2017 at 7:34 AM, Ismael Juma  wrote:
> 
> > Hi all,
> >
> > We've posted "KIP-108: Create Topic Policy" for discussion:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 108%3A+Create+Topic+Policy
> >
> > Please take a look. Your feedback is appreciated.
> >
> > Thanks,
> > Ismael
> >


Re: [DISCUSS] KIP-108: Create Topic Policy

2017-01-05 Thread dan
it would be nice to have a dry-run or validate ability added to this kip.
since we are offloading validation to a 3rd party implementor a random user
can't know a priori (based solely on kafka configs) whether a call should
succeed without actually creating the topic.

a similar case is in connect where there is a separate endpoint

to attempt to validate a connect configuration without actually creating
the connector.

thanks
dan


On Thu, Jan 5, 2017 at 7:34 AM, Ismael Juma  wrote:

> Hi all,
>
> We've posted "KIP-108: Create Topic Policy" for discussion:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 108%3A+Create+Topic+Policy
>
> Please take a look. Your feedback is appreciated.
>
> Thanks,
> Ismael
>


Build failed in Jenkins: kafka-trunk-jdk8 #1150

2017-01-05 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] MINOR: Update JavaDoc for KTable helper interfaces

--
[...truncated 7970 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.integration.AutoOffsetResetTes

Re: [VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Ismael Juma
Thanks Rajini. This seems to be happening a lot lately: Gmail is showing
the vote message in the discuss thread.

Ismael

On Thu, Jan 5, 2017 at 9:23 PM, Rajini Sivaram 
wrote:

> Hi all,
>
> I would like to start the voting process for *KIP-102 - Add close with
> timeout for consumers:*
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 102+-+Add+close+with+timeout+for+consumers
>
>
> This KIP adds a new close method with a timeout for consumers similar to
> the close method in the producer. As described in the discussion thread
>  n9us5ohthwmyai9pz4s2j62fmils2ufj8oie9jpmyaf...@mail.gmail.com%3e>,
> the changes are only in the close code path and hence the impact is not too
> big. The existing close() method without a timeout will use a default
> timeout of 30 seconds.
>
> Thank you...
>
> Regards,
>
> Rajini
>


Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2017-01-05 Thread Ismael Juma
Thanks for the updates, +1

Ismael

On Thu, Jan 5, 2017 at 6:14 PM, Vahid S Hashemian  wrote:

> I'd like to start another round of voting on KIP-88 after recent changes
> to the KIP (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 88%3A+OffsetFetch+Protocol+Update
>
> ).
>
> The discussion thread can be found here:
> https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html
>
> The discussion around recent changes to the KIP starts here:
> https://www.mail-archive.com/dev@kafka.apache.org/msg62357.html
>
> Thank you for your feedback.
>
> Regards,
> --Vahid
>
>


[jira] [Created] (KAFKA-4600) Consumer proceeds on when ConsumerRebalanceListener fails

2017-01-05 Thread Braedon Vickers (JIRA)
Braedon Vickers created KAFKA-4600:
--

 Summary: Consumer proceeds on when ConsumerRebalanceListener fails
 Key: KAFKA-4600
 URL: https://issues.apache.org/jira/browse/KAFKA-4600
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.10.1.1
Reporter: Braedon Vickers


One of the use cases for a ConsumerRebalanceListener is to load state necessary 
for processing a partition when it is assigned. However, when 
ConsumerRebalanceListener.onPartitionsAssigned() fails for some reason (i.e. 
the state isn't loaded), the error is logged and the consumer proceeds on as if 
nothing happened, happily consuming messages from the new partition. When the 
state is relied upon for correct processing, this can be very bad, e.g. data 
loss can occur.

It would be better if the error was propagated up so it could be dealt with 
normally. At the very least the assignment should fail so the consumer doesn't 
see any messages from the new partitions, and the rebalance can be reattempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2017-01-05 Thread Jason Gustafson
Thanks for the updates. +1

On Thu, Jan 5, 2017 at 11:42 AM, Gwen Shapira  wrote:

> +1
>
> On Thu, Jan 5, 2017 at 10:14 AM, Vahid S Hashemian
>  wrote:
> > I'd like to start another round of voting on KIP-88 after recent changes
> > to the KIP (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 88%3A+OffsetFetch+Protocol+Update
> >
> > ).
> >
> > The discussion thread can be found here:
> > https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html
> >
> > The discussion around recent changes to the KIP starts here:
> > https://www.mail-archive.com/dev@kafka.apache.org/msg62357.html
> >
> > Thank you for your feedback.
> >
> > Regards,
> > --Vahid
> >
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


[VOTE] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Rajini Sivaram
Hi all,

I would like to start the voting process for *KIP-102 - Add close with
timeout for consumers:*

https://cwiki.apache.org/confluence/display/KAFKA/KIP-102+-+Add+close+with+timeout+for+consumers


This KIP adds a new close method with a timeout for consumers similar to
the close method in the producer. As described in the discussion thread
,
the changes are only in the close code path and hence the impact is not too
big. The existing close() method without a timeout will use a default
timeout of 30 seconds.

Thank you...

Regards,

Rajini


Re: [DISCUSS] KIP-102 - Add close with timeout for consumers

2017-01-05 Thread Rajini Sivaram
Thank you, Jason. I will start the vote.

On Thu, Jan 5, 2017 at 5:52 PM, Jason Gustafson  wrote:

> Yeah, if you start a vote soon, I think it has a chance to get into 0.10.2.
> I guess it's up to Ewen, but I'm happy to help review.
>
> -Jason
>
> On Wed, Jan 4, 2017 at 11:42 AM, Rajini Sivaram 
> wrote:
>
> > Hi Jason,
> >
> > Yes, we do potentially timeout even before sending pending commits after
> > the request timeout (default is > 5 minutes, so this should only happen
> > when there are real issues or when brokers are shutdown). I have updated
> > the KIP to use a default timeout of 30 seconds for the existing close()
> > method.
> >
> > Since the code changes are limited to the close() code path, can we
> include
> > this in 0.10.2.0? If so, I can initiate the vote tomorrow.
> >
> > Thank you...
> >
> >
> > On Wed, Jan 4, 2017 at 5:35 PM, Jason Gustafson 
> > wrote:
> >
> > > Hi Rajini,
> > >
> > > Thanks for the clarification. I looked again at the patch and I see
> what
> > > you're saying now. I was confused because I assumed the request timeout
> > was
> > > being enforced on the requests themselves, but it is more that the
> > request
> > > timeout bounds the attempt to send them in addition to the time to
> > receive
> > > a response, right? So it is possible that we timeout before even
> getting
> > a
> > > chance to send the OffsetCommit (for example).
> > >
> > > I think I'd still prefer timing out quicker by default if possible. The
> > one
> > > case where it might be worthwhile waiting longer is when there are
> > pending
> > > offset commits sent through commitSync() or commitAsync(). But if we're
> > not
> > > actually doing retries or coordinator rediscovery, I'm not sure the
> > > additional time helps that much.
> > >
> > > -Jason
> > >
> > > On Wed, Jan 4, 2017 at 8:27 AM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > Thank you for the review.
> > > >
> > > > During close(), if there is a rebalance and the coordinator has to be
> > > > rediscovered, close terminates without trying to find the
> coordinator.
> > > The
> > > > poll() loop within close terminates if the coordinator is not known
> (as
> > > it
> > > > does now) or if the timeout expires. At the moment, that timeout is a
> > > > hard-coded 5 second timeout. The PR changes that to min(closeTimeout,
> > > > requestTimeout). So even if there are pending commits, the maximum
> wait
> > > > will be requestTimeout in the final poll() loop of close().
> > > >
> > > > In addition to this, before the poll loop, there is a
> > > > maybeAutoCommitOffsetsSync(). At the moment, this does not have a
> > timeout
> > > > and can wait indefinitely. The PR introduces a timeout for this
> commit
> > > > invoked from close(). The timeout is min(closeTimeout,
> requestTimeout).
> > > > Hence the maximum timeout of (2 * requestTimeout) for any close.
> Have I
> > > > missed something?
> > > >
> > > > I had chosen Long.MAX_VALUE as default close timeout to be consistent
> > > with
> > > > Producer. But perhaps a lower timeout of 30 seconds is more
> meaningful
> > > for
> > > > Consumer since consumer typically has less to do. Even with (2 *
> > > > requestTimeout), the default would be 20 minutes, which is perhaps
> too
> > > high
> > > > anyway. I will update the KIP.
> > > >
> > > >
> > > > On Wed, Jan 4, 2017 at 3:16 AM, Jason Gustafson 
> > > > wrote:
> > > >
> > > > > Hey Rajini,
> > > > >
> > > > > Thanks for the KIP. I had a quick look at the patch and the impact
> > > > doesn't
> > > > > seem too bad. Just wanted to clarify one point. This is from the
> KIP:
> > > > >
> > > > > The existing close() method without a timeout will attempt to close
> > the
> > > > > > consumer gracefully with a timeout of Long.MAX_VALUE. Since
> commit
> > > and
> > > > > > leave group requests are timed out after the request timeout, the
> > > upper
> > > > > > bound will be approximately 2*request.timeout.ms (around 10
> > minutes
> > > by
> > > > > > default).
> > > > >
> > > > >
> > > > > I don't think this is quite right. There could be one or more
> pending
> > > > > OffsetCommit requests (sent using commitAsync) that we may have to
> > > await.
> > > > > We could also be in the middle of a group rebalance. The other
> > > > complication
> > > > > is what happens in the event of a request timeout. Usually the
> > consumer
> > > > > will rediscover the coordinator. Would we do that as well in
> close()
> > > and
> > > > > retry any failed requests if there is time remaining, or would we
> > just
> > > > fail
> > > > > the remaining requests and return? In any case, it may not be so
> easy
> > > to
> > > > > set an upper bound on the default timeout.
> > > > >
> > > > > With that in mind, I'm wondering whether waiting indefinitely
> should
> > be
> > > > the
> > > > > default. In the case of the OffsetCommit before closing (when
> > > autocommit
> > > > is
> > > > > enabled) or the LeaveGroup, it's 

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-01-05 Thread Apurva Mehta
Hello all,

I updated the KIP document with the changes discussed here. In particular:


   1. The consumer config was renamed from 'fetch.mode' to
   'isolation.level'. The values were changed appropriately as well.
   2. Clarified the transactional guarantees for the consumer.
   3. Added a subsection about the streams usecase for transactions, as
   this is a driving motivation behind the current proposal. The existing
   motivation was not strong enough, as evidenced by some of the discussions
   that took place here.
   4.  Miscellaneous minor clarifications which have been pointed out in
   the thread by multiple folks.


I have not yet updated the 'Rejected alternatives' since we have a bunch of
higher level proposals which are a bit open right now. I think the
proposals bifurcate into doing buffering client side vs complicating the
server side when handling transactions. We plan on doing a KIP call this
month where we can discuss our options in this regard, at which point we
will update the document (and rejected alternatives) to reflect the
collective decision.

Thanks for all the comments, it has been a great discussion so far! Here is
the KIP link, for convenience:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

Apurva

On Wed, Nov 30, 2016 at 2:19 PM, Guozhang Wang  wrote:

> Hi all,
>
> I have just created KIP-98 to enhance Kafka with exactly once delivery
> semantics:
>
> *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 98+-+Exactly+Once+Delivery+and+Transactional+Messaging
>  98+-+Exactly+Once+Delivery+and+Transactional+Messaging>*
>
> This KIP adds a transactional messaging mechanism along with an idempotent
> producer implementation to make sure that 1) duplicated messages sent from
> the same identified producer can be detected on the broker side, and 2) a
> group of messages sent within a transaction will atomically be either
> reflected and fetchable to consumers or not as a whole.
>
> The above wiki page provides a high-level view of the proposed changes as
> well as summarized guarantees. Initial draft of the detailed implementation
> design is described in this Google doc:
>
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF
> 0wSw9ra8
>
>
> We would love to hear your comments and suggestions.
>
> Thanks,
>
> -- Guozhang
>


Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread Tom Crayford
It's not time for voting, but a huge +1 from me.

On Thu, Jan 5, 2017 at 8:25 PM, Vahid S Hashemian  wrote:

> Thanks Ismael. I added that to the KIP.
>
>
>
>
> From:   Ismael Juma 
> To: dev@kafka.apache.org
> Date:   01/05/2017 12:16 PM
> Subject:Re: [DISCUSS] KIP-109: Old Consumer Deprecation
> Sent by:isma...@gmail.com
>
>
>
> Thanks Vahid, +1 (predictably). Worth mentioning in the KIP that
> compatibility with older brokers (0.10.0 and later) is another feature
> that
> will only be supported by the Java consumer.
>
> Ismael
>
> On 5 Jan 2017 8:10 pm, "Vahid S Hashemian" 
> wrote:
>
> > Hi all,
> >
> > There was some discussion recently on deprecating the old consumer (
> > https://www.mail-archive.com/dev@kafka.apache.org/msg59084.html).
> > Ismael suggested to cover the discussion and voting of major
> deprecations
> > like this under a KIP.
> >
> > So I started KIP-109 (
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-109%3A+Old+Consumer+
> > Deprecation
> > ) and look forward to your feedback and comments.
> >
> > We'd like to implement this deprecation in the upcoming 0.10.2.0
> release.
> >
> > Thanks.
> > --Vahid
> >
> >
>
>
>
>
>


Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread Vahid S Hashemian
Thanks Ismael. I added that to the KIP.




From:   Ismael Juma 
To: dev@kafka.apache.org
Date:   01/05/2017 12:16 PM
Subject:Re: [DISCUSS] KIP-109: Old Consumer Deprecation
Sent by:isma...@gmail.com



Thanks Vahid, +1 (predictably). Worth mentioning in the KIP that
compatibility with older brokers (0.10.0 and later) is another feature 
that
will only be supported by the Java consumer.

Ismael

On 5 Jan 2017 8:10 pm, "Vahid S Hashemian" 
wrote:

> Hi all,
>
> There was some discussion recently on deprecating the old consumer (
> https://www.mail-archive.com/dev@kafka.apache.org/msg59084.html).
> Ismael suggested to cover the discussion and voting of major 
deprecations
> like this under a KIP.
>
> So I started KIP-109 (
> 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-109%3A+Old+Consumer+
> Deprecation
> ) and look forward to your feedback and comments.
>
> We'd like to implement this deprecation in the upcoming 0.10.2.0 
release.
>
> Thanks.
> --Vahid
>
>






[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802412#comment-15802412
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

confirming class path via the process: 

conflue+ 14435 14401  1 13:21 ?00:05:53 
/opt/projects/java/sun-jdk8-1.8.0.66/jre/bin/java -cp 
:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/xz-1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-api-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-log4j-appender-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/zkclient-0.9.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-scaladoc.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-examples-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-sources.jar:/opt/projects/confluent/bin/../share/java/kafka/zookeeper-3.4.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.servlet-api-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.ws.rs-api-2.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-javadoc.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test-sources.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-client-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-common-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-container-servlet-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-container-servlet-core-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-guava-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/metr

Re: [DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread Ismael Juma
Thanks Vahid, +1 (predictably). Worth mentioning in the KIP that
compatibility with older brokers (0.10.0 and later) is another feature that
will only be supported by the Java consumer.

Ismael

On 5 Jan 2017 8:10 pm, "Vahid S Hashemian" 
wrote:

> Hi all,
>
> There was some discussion recently on deprecating the old consumer (
> https://www.mail-archive.com/dev@kafka.apache.org/msg59084.html).
> Ismael suggested to cover the discussion and voting of major deprecations
> like this under a KIP.
>
> So I started KIP-109 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-109%3A+Old+Consumer+
> Deprecation
> ) and look forward to your feedback and comments.
>
> We'd like to implement this deprecation in the upcoming 0.10.2.0 release.
>
> Thanks.
> --Vahid
>
>


[jira] [Commented] (KAFKA-3264) Mark the old Scala consumer and related classes as deprecated

2017-01-05 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802401#comment-15802401
 ] 

Ismael Juma commented on KAFKA-3264:


Excellent, thanks!

> Mark the old Scala consumer and related classes as deprecated
> -
>
> Key: KAFKA-3264
> URL: https://issues.apache.org/jira/browse/KAFKA-3264
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Grant Henke
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> Once the new consumer is out of beta, we should consider deprecating the old 
> Scala consumers to encourage use of the new consumer and facilitate the 
> removal of the old consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] KIP-109: Old Consumer Deprecation

2017-01-05 Thread Vahid S Hashemian
Hi all,

There was some discussion recently on deprecating the old consumer (
https://www.mail-archive.com/dev@kafka.apache.org/msg59084.html).
Ismael suggested to cover the discussion and voting of major deprecations 
like this under a KIP.

So I started KIP-109 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-109%3A+Old+Consumer+Deprecation
) and look forward to your feedback and comments.

We'd like to implement this deprecation in the upcoming 0.10.2.0 release.

Thanks.
--Vahid



[jira] [Commented] (KAFKA-3264) Mark the old Scala consumer and related classes as deprecated

2017-01-05 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802393#comment-15802393
 ] 

Vahid Hashemian commented on KAFKA-3264:


That's what I thought. The discussion link you sent does not show all the 
entries for me, and that's why I had missed that part. I use an alternative 
discussion link in the KIP that should hopefully work better :) Thanks for 
clarifying.

> Mark the old Scala consumer and related classes as deprecated
> -
>
> Key: KAFKA-3264
> URL: https://issues.apache.org/jira/browse/KAFKA-3264
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Grant Henke
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> Once the new consumer is out of beta, we should consider deprecating the old 
> Scala consumers to encourage use of the new consumer and facilitate the 
> removal of the old consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802394#comment-15802394
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] [~becket_qin] anything else you guys need? As its just happened and 
more chance of capturing anything extra.

[2017-01-05 19:37:07,144] ERROR [kafka-log-cleaner-thread-0], Error due to  
(kafka.log.LogCleaner)
kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
20 no larger than the last offset appended (50346) to /v
ar/kafka/logs/com_ig_trade_v1_position_event--demo--compacted-14/.timeindex.cleaned.
at 
kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
at kafka.log.LogSegment.append(LogSegment.scala:106)
at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.clean(LogCleaner.scala:363)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2017-01-05 19:37:07,144] INFO [kafka-log-cleaner-thread-0], Stopped  
(kafka.log.LogCleaner)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,9

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802390#comment-15802390
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

log4j  log-cleaner.log with TRACE, 
https://issues.apache.org/jira/secure/attachment/12845849/log-cleaner.log.1.zip

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cl

[jira] [Updated] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Andre Pearce (IG) updated KAFKA-4497:
-
Attachment: log-cleaner.log.1.zip

logcleaner log for error on 0.10.1.1 @2017-01-05 19:37:07,144

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$

[jira] [Commented] (KAFKA-3264) Mark the old Scala consumer and related classes as deprecated

2017-01-05 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802373#comment-15802373
 ] 

Ismael Juma commented on KAFKA-3264:


I think it's fine to do that later. We'll have to decide if it's a requirement 
before we remove the old high level consumer (it should not affect the removal 
of the simple consumer). Deprecation in the meantime seems fine according to 
the discussion in the mailing list (and I agree).

> Mark the old Scala consumer and related classes as deprecated
> -
>
> Key: KAFKA-3264
> URL: https://issues.apache.org/jira/browse/KAFKA-3264
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Grant Henke
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> Once the new consumer is out of beta, we should consider deprecating the old 
> Scala consumers to encourage use of the new consumer and facilitate the 
> removal of the old consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802365#comment-15802365
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

https://issues.apache.org/jira/secure/attachment/12845847/com_ig_trade_v1_position_event--demo--compacted-14.tar.gz

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner

[jira] [Updated] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Andre Pearce (IG) updated KAFKA-4497:
-
Attachment: com_ig_trade_v1_position_event--demo--compacted-14.tar.gz

partiton/segment logs for issue on 0.10.1.1 - 2017-01-05 19:37:07,144

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> k

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802326#comment-15802326
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[2017-01-05 19:37:07,137] TRACE Inserting 88493 bytes at offset 49647 at 
position 4393661 with largest timestamp 1480518430821 at offset 49859 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 49860 in log 
com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 
07 15:06:26 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-01-05 19:37:07,138] TRACE Inserting 15951 bytes at offset 49863 at 
position 4482154 with largest timestamp 1481123186887 at offset 49971 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 49972 in log 
com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 
14 14:22:00 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-01-05 19:37:07,138] TRACE Inserting 7496 bytes at offset 49972 at 
position 4498105 with largest timestamp 1481722619382 at offset 50151 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 50155 in log 
com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 
21 15:26:33 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-01-05 19:37:07,138] TRACE Inserting 12632 bytes at offset 50158 at 
position 4505601 with largest timestamp 1482321739128 at offset 50346 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 50349 in log 
com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 
28 16:07:21 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-01-05 19:37:07,140] TRACE Inserting 11743 bytes at offset 50350 at 
position 4518233 with largest timestamp 1482933300393 at offset -1 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,144] ERROR [kafka-log-cleaner-thread-0], Error due to  
(kafka.log.LogCleaner)
kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
20 no larger than the last offset appended (50346) to 
/var/kafka/logs/com_ig_trade_v1_position_event--demo--compacted-14/.timeindex.cleaned.
at 
kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
at kafka.log.LogSegment.append(LogSegment.scala:106)
at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.clean(LogCleaner.scala:363)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created comple

Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2017-01-05 Thread Gwen Shapira
+1

On Thu, Jan 5, 2017 at 10:14 AM, Vahid S Hashemian
 wrote:
> I'd like to start another round of voting on KIP-88 after recent changes
> to the KIP (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-88%3A+OffsetFetch+Protocol+Update
>
> ).
>
> The discussion thread can be found here:
> https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html
>
> The discussion around recent changes to the KIP starts here:
> https://www.mail-archive.com/dev@kafka.apache.org/msg62357.html
>
> Thank you for your feedback.
>
> Regards,
> --Vahid
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802319#comment-15802319
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

Perfect, that triggered it to run, and now has occured again, just going to 
capture all the bits for you again.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSe

[jira] [Commented] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped

2017-01-05 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802307#comment-15802307
 ] 

Andrew Olson commented on KAFKA-4599:
-

Another recent similar reported occurrence from the mailing list,
http://mail-archives.apache.org/mod_mbox/kafka-users/201612.mbox/%3CBY2PR02MB282F72F26267CAE415F2BA6D69B0%40BY2PR02MB282.namprd02.prod.outlook.com%3E

Could the consumer code possibly be missing logic for handling a rare broker 
response? (For example like the issue fixed by 
https://github.com/apache/kafka/pull/2070)

> KafkaConsumer encounters SchemaException when Kafka broker stopped
> --
>
> Key: KAFKA-4599
> URL: https://issues.apache.org/jira/browse/KAFKA-4599
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Andrew Olson
>
> We recently observed an issue in production that can apparently occur a small 
> percentage of the time when a Kafka broker is stopped. We're using version 
> 0.9.0.1 for all brokers and clients.
> During a recent episode, 3 KafkaConsumer instances (out of approximately 100) 
> ran into the following SchemaException within a few seconds of instructing 
> the broker to shutdown.
> {noformat}
> 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: 
> Error reading field 'responses': Error reading array of size 2774863, only 62 
> bytes available
>   at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> {noformat}
> The exception message was slightly different for one consumer,
> {{Error reading field 'responses': Error reading array of size 2774863, only 
> 260 bytes available}}
> The exception was not caught and caused the Storm Executor thread to restart, 
> so it's not clear if it would have been transient or fatal for the 
> KafkaConsumer.
> Here are the initial broker shutdown logs,
> {noformat}
> 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> shutting down
> 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutting down
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Stopped 
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutdown completed
> 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-3-30], Shutting down
> 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> Controlled shutdown succeeded
> 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutting down
> 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutdown completed
> {noformat}
> We've found one very similar reported occurrence,
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4551) StreamsSmokeTest.test_streams intermittent failure

2017-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802299#comment-15802299
 ] 

ASF GitHub Bot commented on KAFKA-4551:
---

Github user dguy closed the pull request at:

https://github.com/apache/kafka/pull/2319


> StreamsSmokeTest.test_streams intermittent failure
> --
>
> Key: KAFKA-4551
> URL: https://issues.apache.org/jira/browse/KAFKA-4551
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Damian Guy
>Priority: Blocker
>  Labels: system-test-failure
> Fix For: 0.10.1.2
>
>
> {code}
> test_id:
> kafkatest.tests.streams.streams_smoke_test.StreamsSmokeTest.test_streams
> status: FAIL
> run time:   4 minutes 44.872 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/streams/streams_smoke_test.py",
>  line 78, in test_streams
> node.account.ssh("grep SUCCESS %s" % self.driver.STDOUT_FILE, 
> allow_fail=False)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 253, in ssh
> raise RemoteCommandError(self, cmd, exit_status, stderr.read())
> RemoteCommandError: ubuntu@worker6: Command 'grep SUCCESS 
> /mnt/streams/streams.stdout' returned non-zero exit status 1.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-15--001.1481794587--apache--trunk--7049938/StreamsSmokeTest/test_streams/91.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2319: KAFKA-4551: StreamsSmokeTest.test_streams intermit...

2017-01-05 Thread dguy
Github user dguy closed the pull request at:

https://github.com/apache/kafka/pull/2319


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2315: MINOR: Update JavaDoc for KTable helper interfaces

2017-01-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2315


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4551) StreamsSmokeTest.test_streams intermittent failure

2017-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802275#comment-15802275
 ] 

ASF GitHub Bot commented on KAFKA-4551:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2319

KAFKA-4551: StreamsSmokeTest.test_streams intermittent failure

Remove use of TestTimestampExtractor as it causes the logs to roll and 
segments get deleted.
Remove the wcnt example as it is dependent on the TestTimestampExtractor - 
windowed counting is covered elsewhere.
Change all aggregate operations to use TimeWindow as use of UnlimitedWindow 
was causing logs to roll and segments being deleted.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka smoke-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2319


commit 4c84fcf0b63e0f007f6ccf32e11eb4b36ac81480
Author: Damian Guy 
Date:   2017-01-04T17:44:49Z

make smoketest reliable




> StreamsSmokeTest.test_streams intermittent failure
> --
>
> Key: KAFKA-4551
> URL: https://issues.apache.org/jira/browse/KAFKA-4551
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Damian Guy
>Priority: Blocker
>  Labels: system-test-failure
> Fix For: 0.10.1.2
>
>
> {code}
> test_id:
> kafkatest.tests.streams.streams_smoke_test.StreamsSmokeTest.test_streams
> status: FAIL
> run time:   4 minutes 44.872 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/streams/streams_smoke_test.py",
>  line 78, in test_streams
> node.account.ssh("grep SUCCESS %s" % self.driver.STDOUT_FILE, 
> allow_fail=False)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 253, in ssh
> raise RemoteCommandError(self, cmd, exit_status, stderr.read())
> RemoteCommandError: ubuntu@worker6: Command 'grep SUCCESS 
> /mnt/streams/streams.stdout' returned non-zero exit status 1.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-15--001.1481794587--apache--trunk--7049938/StreamsSmokeTest/test_streams/91.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2319: KAFKA-4551: StreamsSmokeTest.test_streams intermit...

2017-01-05 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2319

KAFKA-4551: StreamsSmokeTest.test_streams intermittent failure

Remove use of TestTimestampExtractor as it causes the logs to roll and 
segments get deleted.
Remove the wcnt example as it is dependent on the TestTimestampExtractor - 
windowed counting is covered elsewhere.
Change all aggregate operations to use TimeWindow as use of UnlimitedWindow 
was causing logs to roll and segments being deleted.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka smoke-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2319


commit 4c84fcf0b63e0f007f6ccf32e11eb4b36ac81480
Author: Damian Guy 
Date:   2017-01-04T17:44:49Z

make smoketest reliable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-105: Addition of Record Level for Sensors

2017-01-05 Thread Jay Kreps
This is great! A couple of quick comments:

   1. It'd be good to make the motivation a bit more clear. I think the
   motivation is "We want to have lots of partition/task/etc metrics but we're
   concerned about the performance impact so we want to disable them by
   default." Currently the motivation section is more about the proposed
   change and doesn't really make clear why.
   2. Do we have a microbenchmark that shows that the performance of (1)
   enabled metrics, (2) disabled metrics, (3) no metrics? This would help
   build the case for needing this extra knob. Obviously if metrics are cheap
   you would always just leave them enabled and not worry about it. I think
   there should be some cost because we are at least taking a lock during the
   recording but I'm not sure how material that is.
   3. One consideration in how this exposed: we always found the ability to
   dynamically change the logging level in JMX for log4j pretty useful. I
   think if we want to leave the door open to add this ability to enable
   metrics at runtime it may have some impact on the decision around how
   metrics are registered/reported.

-Jay

On Thu, Jan 5, 2017 at 9:59 AM, Guozhang Wang  wrote:

> I thought about "not registering at all" and left a comment on the PR as
> well regarding this idea. My concern is that it may be not very
> straight-forward to implement though via the MetricsReporter interface, if
> Eno and Aarti has a good approach to tackle it I would love it.
>
>
> Guozhang
>
> On Thu, Jan 5, 2017 at 5:34 AM, Eno Thereska 
> wrote:
>
> > Updated KIP for 1. Waiting to hear from Guozhang on 2 and then we can
> > proceed.
> >
> > Thanks
> > Eno
> > > On 5 Jan 2017, at 12:27, Ismael Juma  wrote:
> > >
> > > Thanks Eno. It would be good to update the KIP as well with regards to
> 1.
> > >
> > > About 2, I am not sure how adding a field could break existing tools.
> > > Having said that, your suggestion not to register sensors at all if
> their
> > > record level is below what is configured works for me.
> > >
> > > Ismael
> > >
> > > On Thu, Jan 5, 2017 at 12:07 PM, Eno Thereska 
> > > wrote:
> > >
> > >> Thanks Ismael. Already addressed 1. in the PR.
> > >>
> > >> As for 2. I'd prefer not adding extra info to the metrics reporters at
> > >> this point, since it might break existing tools out there (e.g., if we
> > add
> > >> things like configured level). Existing tools might be expecting the
> > info
> > >> to be reported in a particular format.
> > >>
> > >> If the current way is confusing, I think the next best alternative is
> to
> > >> not register sensors at all if their record level is below what is
> > >> configured. That way they don't show up at all. This will require some
> > more
> > >> code in Sensors.java to check at every step, but I think it's clean
> from
> > >> the user's point of view.
> > >>
> > >> Eno
> > >>
> > >>
> > >>> On 5 Jan 2017, at 11:23, Ismael Juma  wrote:
> > >>>
> > >>> Thanks for the KIP, it seems like a good improvement. A couple of
> > >> comments:
> > >>>
> > >>> 1. As Jun asked in the PR, do we need a broker config as well? The
> > broker
> > >>> uses Kafka Metrics for some metrics, but we probably don't have any
> > debug
> > >>> sensors at the broker yet. Either way, it would be good to describe
> the
> > >>> thinking around this.
> > >>>
> > >>> 2. The behaviour with regards to the metrics reporter could be
> > >> surprising.
> > >>> It would be good to elaborate a little more on this aspect. For
> > example,
> > >>> maybe we want to expose the sensor level and the current configured
> > level
> > >>> to metric reporters. That could then be used to build the debug/info
> > >>> dashboard that Eno mentioned (while making it clear that some metrics
> > >>> exist, but are not currently being recorded).
> > >>>
> > >>> Ismael
> > >>>
> > >>> On Thu, Jan 5, 2017 at 10:37 AM, Eno Thereska <
> eno.there...@gmail.com>
> > >>> wrote:
> > >>>
> >  Correct on 2. Guozhang: the sensor will be registered and polled by
> a
> >  reporter, but the metrics associated with it will not be updated.
> > 
> >  That would allow a user to have, for example, a debug dashboard and
> an
> >  info dashboard.
> > 
> >  Updated KIP to make this clear.
> > 
> >  Thanks
> >  Eno
> > 
> > 
> > > On 4 Jan 2017, at 18:00, Aarti Gupta 
> wrote:
> > >
> > > Thanks for the review, Guozhang,
> > >
> > > Addressed 2 out of the three comments,
> > >
> > > 1. Fixed and updated KIP (swapped code variable name
> > > METRICS_RECORD_LEVEL_CONFIG with config name metrics.record.level)
> > >
> > > 3. >>Could you elaborate on the "shouldRecord()" function, e.g.
> which
> >  class
> > > it will be added to? Does it contain any parameters?
> > >
> > > Added more details on shouldRecord()  on the KIP
> > >
> > > In Sensor.java the shouldRecord() method is used to compare the
> value
> > >> of
> > >

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802233#comment-15802233
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 7:14 PM:
--

[~junrao] , we took it from the website, via the links. (edited remove exact 
mirror link, as actually im not 100% i know we downloaded from website though, 
will confirm which mirror exactly tomorrow if it makes any difference)

Will look to reduce that down then.


was (Author: michael.andre.pearce):
[~junrao] , we took it from the website, via the links, downloading from the 
first mirror site, 
http://apache.mirror.anlx.net/kafka/0.10.1.1/kafka_2.11-0.10.1.1.tgz

Will look to reduce that down then.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802233#comment-15802233
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] , we took it from the website, via the links, downloading from the 
first mirror site, 
http://apache.mirror.anlx.net/kafka/0.10.1.1/kafka_2.11-0.10.1.1.tgz

Will look to reduce that down then.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log

Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-05 Thread Ben Stopford
Hi Jun

Thanks for raising these points. Thorough as ever!

1) Changes made as requested.
2) Done.
3) My plan for handing returning leaders is to simply to force the Leader
Epoch to increment if a leader returns. I don't plan to fix KAFKA-1120 as
part of this KIP. It is really a separate issue with wider implications.
I'd be happy to add KAFKA-1120 into the release though if we have time.
4) Agreed. Not sure exactly how that's going to play out, but I think we're
on the same page.

Please could

Cheers
B

On Thu, Jan 5, 2017 at 12:50 AM Jun Rao  wrote:

> Hi, Ben,
>
> Thanks for the proposal. Looks good overall. A few comments below.
>
> 1. For LeaderEpochRequest, we need to include topic right? We probably want
> to follow other requests by nesting partition inside topic? For
> LeaderEpochResponse,
> do we need to return leader_epoch? I was thinking that we could just return
> an end_offset, which is the next offset of the last message in the
> requested leader generation. Finally, would OffsetForLeaderEpochRequest be
> a better name?
>
> 2. We should bump up both the produce request and the fetch request
> protocol version since both include the message set.
>
> 3. Extending LeaderEpoch to include Returning Leaders: To support this, do
> you plan to use the approach that stores  CZXID in the broker registration
> and including the CZXID of the leader in /brokers/topics/[topic]/
> partitions/[partitionId]/state in ZK?
>
> 4. Since there are a few other KIPs involving message format too, it would
> be useful to consider if we could combine the message format changes in the
> same release.
>
> Thanks,
>
> Jun
>
>
> On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford  wrote:
>
> > Hi All
> >
> > We’re having some problems with this thread being subsumed by the
> > [Discuss] thread. Hopefully this one will appear distinct. If you see
> more
> > than one, please use this one.
> >
> > KIP-101 should now be ready for a vote. As a reminder the KIP proposes a
> > change to the replication protocol to remove the potential for replicas
> to
> > diverge.
> >
> > The KIP can be found here:  https://cwiki.apache.org/confl
> > uence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+
> > use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-
> > +Alter+Replication+Protocol+to+use+Leader+Epoch+rather+
> > than+High+Watermark+for+Truncation>
> >
> > Please let us know your vote.
> >
> > B
> >
> >
> >
> >
> >
>


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-01-05 Thread Jason Gustafson
Hi Jun,

Let me start picking off a some of your questions (we're giving the shadow
log suggestion a bit more thought before responding).

100. Security: It seems that if an app is mistakenly configured with the
> appId of an existing producer, it can take over the pid and prevent the
> existing app from publishing. So, I am wondering if it makes sense to add
> ACLs at the TransactionResource level just like we do for
> ConsumerGroupResource. So, a user can only do transactions under a
> particular appId if he/she has the write permission to the
> TransactionResource
> associated with the appId.


I think this makes sense in general. There are a couple points worth
mentioning:

1. Because we only use the AppID in requests to the transaction
coordinator, that's the only point at which we can do authorization in the
current proposal. It is possible for a malicious producer to hijack another
producer's PID and use it to write data. It wouldn't be able to commit or
abort transactions, but it could effectively fence the legitimate producer
from a partition by forcing an epoch bump. We could add the AppID to the
ProduceRequest schema, but we would still need to protect its binding to
the PID somehow. This is one argument in favor of dropping the PID and
using the AppID in the log message format. However, there are still ways in
the current proposal to give better protection if we added the AppID
authorization at the transaction coordinator as you suggest. Note that a
malicious producer would have to be authorized to write to the same topics
used by the transactional producer. So one way to protect those topics is
to only allow write access by the authorized transactional producers. The
transactional producers could still interfere with each other, but perhaps
that's a smaller concern (it's similar in effect to the limitations of
consumer group authorization).

2. It's a bit unfortunate that we don't have something like the consumer's
groupId to use for authorization. The AppID is really more of an instance
ID (we were reluctant to introduce any formal notion of a producer group).
I guess distributed applications could use a common prefix and a wildcard
authorization policy. I don't think we currently support general wildcards,
but that might be helpful for this use case.

-Jason

On Wed, Jan 4, 2017 at 12:55 PM, Jay Kreps  wrote:

> Hey Jun,
>
> We had a proposal like this previously. The suppression scheme was slightly
> different. Rather than than attempting to recopy or swap, there was instead
> an aborted offset index maintained along with each segment containing a
> sequential list of aborted offsets. The filtering would happen at fetch
> time and would just ensure that fetch requests never span an aborted
> transaction. That is, if you did a fetch request which would include
> offsets 7,8,9,10,11, but offsets 7 and 10 appears in the index of aborted
> transactions, then the fetch would return 8,9 only even if there was more
> space in the fetch response. This leads to minimal overhead, but
> potentially would give back smaller fetch responses if transactions are
> being continually aborted.
>
> One downside to this approach (both your proposal and the variation that I
> just described is that it does not allow the possibility of consuming in
> transaction commit order. Consuming in transaction commit order means that
> the only delay you incur is the delay in committing a given transaction.
> Consuming in offset order means you cannot consume a given offset until ALL
> previously begun transactions are committed or aborted. KIP-98 doesn't
> propose making this change now, but since it is consumer side it is
> possible.
>
> -Jay
>
> On Tue, Jan 3, 2017 at 7:50 AM, Jun Rao  wrote:
>
> > Just to follow up on Radai's idea of pushing the buffering logic to the
> > broker. It may be possible to do this efficiently if we assume aborted
> > transactions are rare. The following is a draft proposal. For each
> > partition, the broker maintains the last stable offset (LSO) as described
> > in the document, and only exposes messages up to this point if the reader
> > is in the read-committed mode. When a new stable offset (NSO) is
> > determined, if there is no aborted message in this window, the broker
> > simply advances the LSO to the NSO. If there is at least one aborted
> > message, the broker first replaces the current log segment with new log
> > segments excluding the aborted messages and then advances the LSO. To
> make
> > the replacement efficient, we can replace the current log segment with 3
> > new segments: (1) a new "shadow" log segment that simply references the
> > portion of the current log segment from the beginning to the LSO, (2) a
> log
> > segment created by copying only committed messages between the LSO and
> the
> > NSO, (3) a new "shadow" log segment that references the portion of the
> > current log segment from the NSO (open ended). Note that only (2)
> involves
> > real data copying. If 

[jira] [Created] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped

2017-01-05 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-4599:
---

 Summary: KafkaConsumer encounters SchemaException when Kafka 
broker stopped
 Key: KAFKA-4599
 URL: https://issues.apache.org/jira/browse/KAFKA-4599
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Andrew Olson


We recently observed an issue in production that can apparently occur a small 
percentage of the time when a Kafka broker is stopped. We're using version 
0.9.0.1 for all brokers and clients.

During a recent episode, 3 KafkaConsumer instances (out of approximately 100) 
ran into the following SchemaException within a few seconds of instructing the 
broker to shutdown.

{noformat}
2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: 
Error reading field 'responses': Error reading array of size 2774863, only 62 
bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
{noformat}

The exception message was slightly different for one consumer,
{{Error reading field 'responses': Error reading array of size 2774863, only 
260 bytes available}}

The exception was not caught and caused the Storm Executor thread to restart, 
so it's not clear if it would have been transient or fatal for the 
KafkaConsumer.

Here are the initial broker shutdown logs,

{noformat}
2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], 
shutting down
2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-1-40], Shutting down
2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-1-40], Stopped 
2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-1-40], Shutdown completed
2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-3-30], Shutting down
2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], 
Controlled shutdown succeeded
2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on 
Broker 4], Shutting down
2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on 
Broker 4], Shutdown completed
{noformat}

We've found one very similar reported occurrence,
http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802131#comment-15802131
 ] 

Jun Rao commented on KAFKA-4497:


[~michael.andre.pearce], thanks for the update. Did you get the 0.10.1.1 binary 
from the Apache Kafka website or from maven? Also, if you reduce 
min.cleanable.dirty.ratio (can be done dynamically on a topic), it will force 
log cleaner to kick in sooner.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
>

[jira] [Commented] (KAFKA-4598) Create new SourceTask commit callback method that takes offsets param

2017-01-05 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802110#comment-15802110
 ] 

Shikhar Bhushan commented on KAFKA-4598:


In the meantime the workaround is to use 
{{SourceTask.commitRecord(SourceRecord)}} to keep track of committable offset 
state.

> Create new SourceTask commit callback method that takes offsets param
> -
>
> Key: KAFKA-4598
> URL: https://issues.apache.org/jira/browse/KAFKA-4598
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Ewen Cheslack-Postava
>
> {{SourceTask.commit()}} can be invoked concurrently with a 
> {{SourceTask.poll()}} in progress. Thus it is currently not possible to know 
> what offset state the commit call corresponds to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4598) Create new SourceTask commit callback method that takes offsets param

2017-01-05 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4598:
--

 Summary: Create new SourceTask commit callback method that takes 
offsets param
 Key: KAFKA-4598
 URL: https://issues.apache.org/jira/browse/KAFKA-4598
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Ewen Cheslack-Postava


{{SourceTask.commit()}} can be invoked concurrently with a 
{{SourceTask.poll()}} in progress. Thus it is currently not possible to know 
what offset state the commit call corresponds to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] KIP-88: OffsetFetch Protocol Update

2017-01-05 Thread Vahid S Hashemian
I'd like to start another round of voting on KIP-88 after recent changes 
to the KIP (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-88%3A+OffsetFetch+Protocol+Update

).

The discussion thread can be found here: 
https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html

The discussion around recent changes to the KIP starts here:
https://www.mail-archive.com/dev@kafka.apache.org/msg62357.html

Thank you for your feedback.
 
Regards,
--Vahid



Re: KafkaConnect SinkTask::put

2017-01-05 Thread Shikhar Bhushan
Hi David,

You can override the underlying consumer's `max.poll.records` setting for
this. E.g.
consumer.max.poll.records=500

Best,

Shikhar

On Thu, Jan 5, 2017 at 3:59 AM  wrote:

> Is there any way of limiting the number of events that are passed into the
> call to the put(Collection) method?
>
> I'm writing a set of events to Kafka via a source Connector/Task and
> reading these from a sink Connector/Task.
> If I generate of the order of 10k events the number of SinkRecords passed
> to the put method starts off very low but quickly rises in large increments
> such that 9k events are passed to a later invocation of the put method.
>
> Furthermore, processing a large number of events in a single call (I'm
> writing to Elasticsearch) appears to cause the source task poll() method to
> timeout, raising a CommitFailedException which, incidentally, I can't see
> how to catch.
>
> Thanks for any help you can provide,
> David
>


  1   2   >