Build failed in Jenkins: kafka-trunk-jdk8 #1122

2016-12-20 Thread Apache Jenkins Server
See 

Changes:

[junrao] KAFKA-4485; Follower should be in the isr if its FetchRequest has

--
[...truncated 17555 lines...]

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup PASSED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose STARTED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] FAILED
java.lang.AssertionError: Condition not met within timeout 6. Did not 
receive 1 number of records
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:259)
at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinValuesRecordsReceived(IntegrationTestUtils.java:251)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.waitUntilAtLeastNumRecordProcessed(QueryableStateIntegrationTest.java:669)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:350)

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenDeleted STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest 

Question about KIP-73 pr merge

2016-12-20 Thread Json Tu
Hi all, 
We are now using kafka 0.9.0 in our product enviroment, and we add one 
broker to the cluster,and execute reassign partitions between all brokers,we 
find our network card and disk io is very high.
and I know KIP-73 has resolved this problem, but I wonder can I merge it to my 
kafka 0.9.0 directly, because I have two doubts as below.
1.https://github.com/apache/kafka/pull/1776 
 is created compared with trunk, 
kafka 0.9.0 is diff to trunk.
2.because of the diff of kafka 0.9.0 and trunk, there may be some 
bugfixs that associated with this pr, how can I avoid this risk.
 
Any suggestion is appreciated,thanks in advance.

Jenkins build is back to normal : kafka-trunk-jdk7 #1775

2016-12-20 Thread Apache Jenkins Server
See 



Re: [VOTE] 0.10.1.1 RC1

2016-12-20 Thread Guozhang Wang
+1 from myself as well.

This vote passes with 5 +1 votes (3 bindings) and no 0 or -1 votes.

+1 votes

PMC Members:
* Gwen Shapira
* Jun Rao
* Guozhang Wang

Community:
* Vahid Hashemian
* Swen Moczarski

0 votes
* No votes

-1 votes
* No votes

Vote thread:
http://markmail.org/message/urbrfmaxofmbpgl2

I'll continue with the release process and the release announcement will
follow later.


Guozhang


On Tue, Dec 20, 2016 at 7:58 AM, Moczarski, Swen <
smoczar...@ebay-kleinanzeigen.de> wrote:
>
> +1 (non-binding)
>
> Thanks for preparing the release. We installed on our test system, at
first glance looks good.
>
> Am 12/15/16, 10:29 PM schrieb "Guozhang Wang" :
>
> Hello Kafka users, developers and client-developers,
>
> This is the second, and hopefully the last candidate for the release
of
> Apache Kafka 0.10.1.1 before the break. This is a bug fix release and
it
> includes fixes and improvements from 30 JIRAs. See the release notes
for
> more details:
>
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, 20 December, 8pm PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> NOTE the artifacts include the ones built from Scala 2.12.1 and Java8,
> which are treated a pre-alpha artifacts for the Scala community to
try and
> test it out:
>
>
https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka_2.12/0.10.1.1/
>
> We will formally add the scala 2.12 support in future minor releases.
>
>
> * Javadoc:
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
>
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=c3638376708ee6c02dfe4e57747acae0126fa6e7
>
>
> Thanks,
> Guozhang
>
> --
> -- Guozhang
>
>



--
-- Guozhang


[jira] [Commented] (KAFKA-4485) Follower should be in the isr if its FetchRequest has fetched up to the logEndOffset of leader

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766182#comment-15766182
 ] 

ASF GitHub Bot commented on KAFKA-4485:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2208


> Follower should be in the isr if its FetchRequest has fetched up to the 
> logEndOffset of leader
> --
>
> Key: KAFKA-4485
> URL: https://issues.apache.org/jira/browse/KAFKA-4485
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Dong Lin
>Assignee: Dong Lin
> Fix For: 0.10.2.0
>
>
> As of current implementation, we will exclude follower from ISR if the begin 
> offset of FetchRequest from this follower is smaller than logEndOffset of 
> leader for more than replicaLagTimeMaxMs. Also, we will add a follower to ISR 
> if the beginOffset of FetchRequest from this follower is equal or larger than 
> high watermark of this partition.
> This is problematic for the following reasons:
> 1) The criteria for ISR is inconsistent between maybeExpandIsr() and 
> maybeShrinkIsr(). Therefore a follower may be repeatedly remove and added to 
> the ISR (e.g. in the scenario described below).
> 2) A follower may be removed from the ISR even if its fetch rate can keep up 
> with produce rate. Suppose a produce keeps producing a lot of small requests 
> at high request rate but low byte rate (e.g. many mirror makers), and the 
> follower is always able to read all the available data at the time leader 
> receives it. However, the begin offset of fetch request will always be 
> smaller than logEndOffset of leader. Thus the follower will be removed from 
> ISR after replicaLagTimeMaxMs.
> In the following we describe the solution to this problem.
> Terminology:
> - Definition of replica lag: we say a replica lags behind leader by X ms if 
> its current log end offset if equivalent to the log end offset of leader X ms 
> ago.
> - Definition of pseudo-ISR set: pseudo-ISR set of a partition = { replica | 
> replica belongs to the given partition AND replica's lag <= 
> replicaLagTimeMaxMs}
> - Definition of high-watermark of a partition: high-watermark of a partition 
> is the max(current high-watermark of the partition, min(offset of replicas in 
> the pseudo-ISR set of this partition))
> - Definition of ISR set: ISR set of a partition = {replica | replica is in 
> pseudo-ISR set of the given partition AND log end offset of replica >= 
> high-watermark of the given partition}
> Guarantee:
> 1) If a follower is close enough to the replica in the sense that its replica 
> lag <= replicaLagTimeMaxMs, then this follower will be in the pseudo-ISR set. 
> Thus the high-watermark will stop to increase until this follower's log end 
> offset >= high-watermark, at which moment this follower will be added to the 
> ISR set. This allows us the solve the 2nd problem described above.
> 2) If a follower lags behind leader for more than X ms, it will be removed 
> out of ISR set.
> 3) High watermark of a partition will never decrease.
> 4) For any replica in ISR set, its log end offset >= high-watermark. 
> Implementation:
> 1) For each follower, the leader keeps track of the time of the last fetch 
> request from this follower. Let's call it lastFetchTime. In addition, the 
> leader also maintains the log end offset of the leader at the lastFetchTime 
> for each follower. Let's call it lastFetchLeaderLEO. Both variables will be 
> updated after leader has processed a FetchRequest from a follower.
> 2) When leader receives FetchRequest from a follower, if begin offset of the 
> FetchRequest >= current leader's LEO, follower's lastCatchUpTimeMs will be 
> set to current system time. Otherwise, if begin offset of the FetchRequest >= 
> lastFetchLeaderLEO, follower's lastCatchUpTimeMs will be set to 
> lastFetchTime. Replica's lag = current system time - lastCatchUpTimeMs.
> 3) The leader can update pseudo-ISR set, high-watermark and ISR set of the 
> partition based on the lag of replicas of this partition, according to the 
> definition described above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4485) Follower should be in the isr if its FetchRequest has fetched up to the logEndOffset of leader

2016-12-20 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-4485.

Resolution: Fixed

Issue resolved by pull request 2208
[https://github.com/apache/kafka/pull/2208]

> Follower should be in the isr if its FetchRequest has fetched up to the 
> logEndOffset of leader
> --
>
> Key: KAFKA-4485
> URL: https://issues.apache.org/jira/browse/KAFKA-4485
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Dong Lin
>Assignee: Dong Lin
> Fix For: 0.10.2.0
>
>
> As of current implementation, we will exclude follower from ISR if the begin 
> offset of FetchRequest from this follower is smaller than logEndOffset of 
> leader for more than replicaLagTimeMaxMs. Also, we will add a follower to ISR 
> if the beginOffset of FetchRequest from this follower is equal or larger than 
> high watermark of this partition.
> This is problematic for the following reasons:
> 1) The criteria for ISR is inconsistent between maybeExpandIsr() and 
> maybeShrinkIsr(). Therefore a follower may be repeatedly remove and added to 
> the ISR (e.g. in the scenario described below).
> 2) A follower may be removed from the ISR even if its fetch rate can keep up 
> with produce rate. Suppose a produce keeps producing a lot of small requests 
> at high request rate but low byte rate (e.g. many mirror makers), and the 
> follower is always able to read all the available data at the time leader 
> receives it. However, the begin offset of fetch request will always be 
> smaller than logEndOffset of leader. Thus the follower will be removed from 
> ISR after replicaLagTimeMaxMs.
> In the following we describe the solution to this problem.
> Terminology:
> - Definition of replica lag: we say a replica lags behind leader by X ms if 
> its current log end offset if equivalent to the log end offset of leader X ms 
> ago.
> - Definition of pseudo-ISR set: pseudo-ISR set of a partition = { replica | 
> replica belongs to the given partition AND replica's lag <= 
> replicaLagTimeMaxMs}
> - Definition of high-watermark of a partition: high-watermark of a partition 
> is the max(current high-watermark of the partition, min(offset of replicas in 
> the pseudo-ISR set of this partition))
> - Definition of ISR set: ISR set of a partition = {replica | replica is in 
> pseudo-ISR set of the given partition AND log end offset of replica >= 
> high-watermark of the given partition}
> Guarantee:
> 1) If a follower is close enough to the replica in the sense that its replica 
> lag <= replicaLagTimeMaxMs, then this follower will be in the pseudo-ISR set. 
> Thus the high-watermark will stop to increase until this follower's log end 
> offset >= high-watermark, at which moment this follower will be added to the 
> ISR set. This allows us the solve the 2nd problem described above.
> 2) If a follower lags behind leader for more than X ms, it will be removed 
> out of ISR set.
> 3) High watermark of a partition will never decrease.
> 4) For any replica in ISR set, its log end offset >= high-watermark. 
> Implementation:
> 1) For each follower, the leader keeps track of the time of the last fetch 
> request from this follower. Let's call it lastFetchTime. In addition, the 
> leader also maintains the log end offset of the leader at the lastFetchTime 
> for each follower. Let's call it lastFetchLeaderLEO. Both variables will be 
> updated after leader has processed a FetchRequest from a follower.
> 2) When leader receives FetchRequest from a follower, if begin offset of the 
> FetchRequest >= current leader's LEO, follower's lastCatchUpTimeMs will be 
> set to current system time. Otherwise, if begin offset of the FetchRequest >= 
> lastFetchLeaderLEO, follower's lastCatchUpTimeMs will be set to 
> lastFetchTime. Replica's lag = current system time - lastCatchUpTimeMs.
> 3) The leader can update pseudo-ISR set, high-watermark and ISR set of the 
> partition based on the lag of replicas of this partition, according to the 
> definition described above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2208: KAFKA-4485; Follower should be in the isr if its F...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2208


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-trunk-jdk8 #1121

2016-12-20 Thread Apache Jenkins Server
See 



Build failed in Jenkins: kafka-trunk-jdk7 #1774

2016-12-20 Thread Apache Jenkins Server
See 

Changes:

[jason] MINOR: Replace TopicAndPartition with TopicPartition in `Log` and

[wangguoz] MINOR: Clarify log deletion configuration options in 
server.properties

--
[...truncated 7935 lines...]

kafka.server.KafkaConfigTest > testVersionConfiguration STARTED

kafka.server.KafkaConfigTest > testVersionConfiguration PASSED

kafka.server.KafkaConfigTest > testEqualAdvertisedListenersProtocol STARTED

kafka.server.KafkaConfigTest > testEqualAdvertisedListenersProtocol PASSED

kafka.server.KafkaMetricReporterClusterIdTest > testClusterIdPresent STARTED

kafka.server.KafkaMetricReporterClusterIdTest > testClusterIdPresent PASSED

kafka.server.DynamicConfigTest > shouldFailFollowerConfigsWithInvalidValues 
STARTED

kafka.server.DynamicConfigTest > shouldFailFollowerConfigsWithInvalidValues 
PASSED

kafka.server.DynamicConfigTest > shouldFailWhenChangingUserUnknownConfig STARTED

kafka.server.DynamicConfigTest > shouldFailWhenChangingUserUnknownConfig PASSED

kafka.server.DynamicConfigTest > shouldFailLeaderConfigsWithInvalidValues 
STARTED

kafka.server.DynamicConfigTest > shouldFailLeaderConfigsWithInvalidValues PASSED

kafka.server.DynamicConfigTest > shouldFailWhenChangingClientIdUnknownConfig 
STARTED

kafka.server.DynamicConfigTest > shouldFailWhenChangingClientIdUnknownConfig 
PASSED

kafka.server.DynamicConfigTest > shouldFailWhenChangingBrokerUnknownConfig 
STARTED

kafka.server.DynamicConfigTest > shouldFailWhenChangingBrokerUnknownConfig 
PASSED

kafka.server.DynamicConfigChangeTest > testProcessNotification STARTED

kafka.server.DynamicConfigChangeTest > testProcessNotification PASSED

kafka.server.DynamicConfigChangeTest > 
shouldParseWildcardReplicationQuotaProperties STARTED

kafka.server.DynamicConfigChangeTest > 
shouldParseWildcardReplicationQuotaProperties PASSED

kafka.server.DynamicConfigChangeTest > testDefaultClientIdQuotaConfigChange 
STARTED

kafka.server.DynamicConfigChangeTest > testDefaultClientIdQuotaConfigChange 
PASSED

kafka.server.DynamicConfigChangeTest > testQuotaInitialization STARTED

kafka.server.DynamicConfigChangeTest > testQuotaInitialization PASSED

kafka.server.DynamicConfigChangeTest > testUserQuotaConfigChange STARTED

kafka.server.DynamicConfigChangeTest > testUserQuotaConfigChange PASSED

kafka.server.DynamicConfigChangeTest > testClientIdQuotaConfigChange STARTED

kafka.server.DynamicConfigChangeTest > testClientIdQuotaConfigChange PASSED

kafka.server.DynamicConfigChangeTest > testUserClientIdQuotaChange STARTED

kafka.server.DynamicConfigChangeTest > testUserClientIdQuotaChange PASSED

kafka.server.DynamicConfigChangeTest > shouldParseReplicationQuotaProperties 
STARTED

kafka.server.DynamicConfigChangeTest > shouldParseReplicationQuotaProperties 
PASSED

kafka.server.DynamicConfigChangeTest > 
shouldParseRegardlessOfWhitespaceAroundValues STARTED

kafka.server.DynamicConfigChangeTest > 
shouldParseRegardlessOfWhitespaceAroundValues PASSED

kafka.server.DynamicConfigChangeTest > testDefaultUserQuotaConfigChange STARTED

kafka.server.DynamicConfigChangeTest > testDefaultUserQuotaConfigChange PASSED

kafka.server.DynamicConfigChangeTest > shouldParseReplicationQuotaReset STARTED

kafka.server.DynamicConfigChangeTest > shouldParseReplicationQuotaReset PASSED

kafka.server.DynamicConfigChangeTest > testDefaultUserClientIdQuotaConfigChange 
STARTED

kafka.server.DynamicConfigChangeTest > testDefaultUserClientIdQuotaConfigChange 
PASSED

kafka.server.DynamicConfigChangeTest > testConfigChangeOnNonExistingTopic 
STARTED

kafka.server.DynamicConfigChangeTest > testConfigChangeOnNonExistingTopic PASSED

kafka.server.DynamicConfigChangeTest > testConfigChange STARTED

kafka.server.DynamicConfigChangeTest > testConfigChange PASSED

kafka.server.ServerGenerateBrokerIdTest > testGetSequenceIdMethod STARTED

kafka.server.ServerGenerateBrokerIdTest > testGetSequenceIdMethod PASSED

kafka.server.ServerGenerateBrokerIdTest > testBrokerMetadataOnIdCollision 
STARTED

kafka.server.ServerGenerateBrokerIdTest > testBrokerMetadataOnIdCollision PASSED

kafka.server.ServerGenerateBrokerIdTest > testAutoGenerateBrokerId STARTED

kafka.server.ServerGenerateBrokerIdTest > testAutoGenerateBrokerId PASSED

kafka.server.ServerGenerateBrokerIdTest > testMultipleLogDirsMetaProps STARTED

kafka.server.ServerGenerateBrokerIdTest > testMultipleLogDirsMetaProps PASSED

kafka.server.ServerGenerateBrokerIdTest > testDisableGeneratedBrokerId STARTED

kafka.server.ServerGenerateBrokerIdTest > testDisableGeneratedBrokerId PASSED

kafka.server.ServerGenerateBrokerIdTest > testUserConfigAndGeneratedBrokerId 
STARTED

kafka.server.ServerGenerateBrokerIdTest > testUserConfigAndGeneratedBrokerId 
PASSED

kafka.server.ServerGenerateBrokerIdTest > 
testConsistentBrokerIdFromUserConfigAndMetaProps STARTED

kafka.server.ServerGenerateBrokerIdTest > 
testConsistentBrokerIdFromUserConfigAndMetaProps 

Re: Kafka controlled shutdown hangs when there are large number of topics in the cluster

2016-12-20 Thread James Cheng

> On Dec 19, 2016, at 10:13 AM, Gwen Shapira  wrote:
> 
> Can you try setting auto.leader.rebalance.enable=false in your
> configuration (for all brokers) and see if it solves this problem?
> We've had some reports regarding this feature interfering with
> controlled shutdown.
> 

Gwen, are there any JIRAs about this? Or can you point me to messages where I 
can read more about it?

If we did this, then we would have to manually run 
kafka-preferred-replica-election.sh after we're done restarting the cluster, 
right?

-James

> On Mon, Dec 19, 2016 at 5:02 AM, Robin, Martin (Nokia - IN/Bangalore)
>  wrote:
>> Hi
>> 
>> We have 9 broker instances in a kafka cluster spread across 3 linux 
>> machines. The 1st machine has 4 broker instances. 2nd  machine has 4 broker 
>> instances and 3rd one has 1 broker instance.  There are around 101 topics 
>> created in the cluster
>> 
>> We start the broker as follows
>>All 4 brokers are started on first machine
>>All 4 brokers are started on 2nd machine
>>1 broker started on 3rd machine
>> 
>> After brokers were running for sometime, we try to shutdown the brokers as 
>> below
>>All 4 brokers stopped on 1st machine
>>4 brokers are stopped on 2nd machine While we do this kafka 
>> controlled shutdown hangs
>> 
>> This same issue was not seen with 25 topics.
>> 
>> Please let us know if any solution is known to this issue
>> 
>> Thanks
>> Martin
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog



Build failed in Jenkins: kafka-trunk-jdk8 #1120

2016-12-20 Thread Apache Jenkins Server
See 

Changes:

[ismael] MINOR: Support auto-incrementing offsets in MemoryRecordsBuilder

[jason] MINOR: Replace TopicAndPartition with TopicPartition in `Log` and

--
[...truncated 7944 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED


[jira] [Commented] (KAFKA-1972) JMX Tool output for CSV format does not handle attributes with comma in their value

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765908#comment-15765908
 ] 

ASF GitHub Bot commented on KAFKA-1972:
---

Github user rekhajoshm closed the pull request at:

https://github.com/apache/kafka/pull/45


> JMX Tool output for CSV format does not handle attributes with comma in their 
> value
> ---
>
> Key: KAFKA-1972
> URL: https://issues.apache.org/jira/browse/KAFKA-1972
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.8.1.1
>Reporter: Jonathan Rafalski
>Assignee: Jonathan Rafalski
>Priority: Minor
>  Labels: newbie
>
> When the JMXTools outputs all attributes using a comma delimitation it does 
> not have an exit character or a way to handle attributes that contain comma's 
> in their value.  This could potentially limit the uses of the output to 
> single value attributes only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #45: KAFKA-1972: JMXTool multiple attributes

2016-12-20 Thread rekhajoshm
Github user rekhajoshm closed the pull request at:

https://github.com/apache/kafka/pull/45


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #49: KAFKA-269: run-test.sh async test

2016-12-20 Thread rekhajoshm
Github user rekhajoshm closed the pull request at:

https://github.com/apache/kafka/pull/49


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-269) ./system_test/producer_perf/bin/run-test.sh without --async flag does not run

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765906#comment-15765906
 ] 

ASF GitHub Bot commented on KAFKA-269:
--

Github user rekhajoshm closed the pull request at:

https://github.com/apache/kafka/pull/49


> ./system_test/producer_perf/bin/run-test.sh  without --async flag does not run
> --
>
> Key: KAFKA-269
> URL: https://issues.apache.org/jira/browse/KAFKA-269
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, core
>Affects Versions: 0.7, 0.8.1.1
> Environment: Linux 2.6.18-238.1.1.el5 , x86_64 x86_64 x86_64 GNU/Linux
> ext3 file system with raid10 
>Reporter: Praveen Ramachandra
>  Labels: newbie, performance
>
> When I run the tests without --async option, The tests doesn't produce even a 
> single message. 
> Following defaults where changed in the server.properties
> num.threads=Tried with 8, 10, 100
> num.partitions=10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk7 #1773

2016-12-20 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-242) Subsequent calls of ConsumerConnector.createMessageStreams cause Consumer offset to be incorrect

2016-12-20 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765835#comment-15765835
 ] 

Jeff Widman commented on KAFKA-242:
---

Does this bug still exist in 0.10?

> Subsequent calls of ConsumerConnector.createMessageStreams cause Consumer 
> offset to be incorrect
> 
>
> Key: KAFKA-242
> URL: https://issues.apache.org/jira/browse/KAFKA-242
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.7
>Reporter: David Arthur
> Attachments: kafka.log
>
>
> When calling ConsumerConnector.createMessageStreams in rapid succession, the 
> Consumer offset is incorrectly advanced causing the consumer to lose 
> messages. This seems to happen when createMessageStreams is called before the 
> rebalancing triggered by the previous call to createMessageStreams has 
> completed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1833) OfflinePartitionLeaderSelector may return null leader when ISR and Assgined Broker have no common

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765833#comment-15765833
 ] 

ASF GitHub Bot commented on KAFKA-1833:
---

Github user tedxia closed the pull request at:

https://github.com/apache/kafka/pull/39


> OfflinePartitionLeaderSelector may return null leader when ISR and Assgined 
> Broker have no common
> -
>
> Key: KAFKA-1833
> URL: https://issues.apache.org/jira/browse/KAFKA-1833
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.0
>Reporter: xiajun
>Assignee: Neha Narkhede
>  Labels: easyfix, reliability
> Attachments: KAFKA-1883.patch
>
>
> In OfflinePartitonLeaderSelector::selectLeader, when liveBrokerInIsr is not 
> empty and have no common broker with liveAssignedreplicas, selectLeader will 
> return no leader;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #39: KAFKA-1833: OfflinePartitionLeaderSelector may retur...

2016-12-20 Thread tedxia
Github user tedxia closed the pull request at:

https://github.com/apache/kafka/pull/39


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #27: Clarify log deletion configuration options in server...

2016-12-20 Thread MarkRose
Github user MarkRose closed the pull request at:

https://github.com/apache/kafka/pull/27


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #28: Clarify log deletion configuration options in server...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/28


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2268: MINOR: Replace TopicAndPartition with TopicPartiti...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2268


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2282: MINOR: Support auto-incrementing offsets in Memory...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2282


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2231: NOMERGE: Test new Jenkins PR plugin v4

2016-12-20 Thread ijuma
Github user ijuma closed the pull request at:

https://github.com/apache/kafka/pull/2231


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-101: Alter Replication Protocol to use Leader Generation rather than High Watermark for Truncation

2016-12-20 Thread Apurva Mehta
Hi Ben,

Thanks for the KIP. It is very well written and explains the problem and
solution very nicely. I have one --very minor-- question. In the 'steps'
section, you write:

> 4.6 The follower starts fetching from the leader from its log end offset.

The use of 'its' is a bit ambiguous here. I presume that you mean that the
follower fetches from the log end offset of the follower (and not the
leader). Might be worth clarifying whose log end offset is referred to
here.

While the perceived ambiguity may be put down to my english skills, I still
feet it would be better to leave no room for doubt.

Thanks,
Apurva

On Sun, Dec 11, 2016 at 4:30 AM, Ben Stopford  wrote:

> Hi All
>
> Please find the below KIP which describes a proposed solution to a couple
> of issues that have been observed with the replication protocol.
>
> In short, the proposal replaces the use of the High Watermark, for
> follower log trunctation, with an alternate Generation Marker. This
> uniquely defines which leader messages were acknowledged by.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 101+-+Alter+Replication+Protocol+to+use+Leader+
> Generation+rather+than+High+Watermark+for+Truncation <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 101+-+Alter+Replication+Protocol+to+use+Leader+
> Generation+rather+than+High+Watermark+for+Truncation>
>
> All comments and suggestions greatly appreciated.
>
> Ben Stopford
> Confluent, http://www.confluent.io 
>
>


Re: [VOTE] KIP-90 Remove zkClient dependency from Streams

2016-12-20 Thread Edoardo Comar
+1 (non-binding) 
thanks!
--
Edoardo Comar
IBM MessageHub
eco...@uk.ibm.com
IBM UK Ltd, Hursley Park, SO21 2JN

IBM United Kingdom Limited Registered in England and Wales with number 
741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 
3AU



From:   Ismael Juma 
To: dev@kafka.apache.org
Date:   20/12/2016 22:59
Subject:Re: [VOTE] KIP-90 Remove zkClient dependency from Streams
Sent by:isma...@gmail.com



Thanks for the KIP, +1 (binding).

On Tue, Dec 20, 2016 at 1:01 PM, Hojjat Jafarpour 
wrote:

> Hi all,
>
> Seems that there is no opposition to this KIP. This email is to start 
the
> voting for this KIP.
> Once again the KIP is for removing zkClient dependency from Streams. 
Please
> check out the KIP page:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 90+-+Remove+zkClient+
> dependency+from+Streams
>
> Thanks,
> --Hojjat
>



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [VOTE] KIP-90 Remove zkClient dependency from Streams

2016-12-20 Thread Apurva Mehta
+1

On Tue, Dec 20, 2016 at 1:01 PM, Hojjat Jafarpour 
wrote:

> Hi all,
>
> Seems that there is no opposition to this KIP. This email is to start the
> voting for this KIP.
> Once again the KIP is for removing zkClient dependency from Streams. Please
> check out the KIP page:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 90+-+Remove+zkClient+
> dependency+from+Streams
>
> Thanks,
> --Hojjat
>


Re: [VOTE] KIP-90 Remove zkClient dependency from Streams

2016-12-20 Thread Jay Kreps
+1

On Tue, Dec 20, 2016 at 1:01 PM, Hojjat Jafarpour 
wrote:

> Hi all,
>
> Seems that there is no opposition to this KIP. This email is to start the
> voting for this KIP.
> Once again the KIP is for removing zkClient dependency from Streams. Please
> check out the KIP page:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 90+-+Remove+zkClient+
> dependency+from+Streams
>
> Thanks,
> --Hojjat
>


[jira] [Created] (KAFKA-4564) When the destination brokers are down or misconfigured in config, Streams should fail fast

2016-12-20 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-4564:


 Summary: When the destination brokers are down or misconfigured in 
config, Streams should fail fast
 Key: KAFKA-4564
 URL: https://issues.apache.org/jira/browse/KAFKA-4564
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Guozhang Wang


Today if Kafka is down or users misconfigure the bootstrap list, Streams may 
just hangs for a while without any error messages even with the log4j enabled, 
which is quite confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-90 Remove zkClient dependency from Streams

2016-12-20 Thread Ismael Juma
Thanks for the KIP, +1 (binding).

On Tue, Dec 20, 2016 at 1:01 PM, Hojjat Jafarpour 
wrote:

> Hi all,
>
> Seems that there is no opposition to this KIP. This email is to start the
> voting for this KIP.
> Once again the KIP is for removing zkClient dependency from Streams. Please
> check out the KIP page:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 90+-+Remove+zkClient+
> dependency+from+Streams
>
> Thanks,
> --Hojjat
>


[jira] [Commented] (KAFKA-4198) Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures

2016-12-20 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765468#comment-15765468
 ] 

Guozhang Wang commented on KAFKA-4198:
--

Saw another case of this failure, but with different exception message: 
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/279/console

{code}
kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures FAILED
java.lang.IllegalArgumentException: You can only check the position for 
partitions assigned to this consumer.
at 
org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1271)
at 
kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:96)
at 
kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:69)
{code}

> Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures
> 
>
> Key: KAFKA-4198
> URL: https://issues.apache.org/jira/browse/KAFKA-4198
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
>  Labels: transient-unit-test-failure
>
> The issue seems to be that we call `startup` while `shutdown` is still taking 
> place.
> {code}
> java.lang.AssertionError: expected:<107> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> kafka.api.ConsumerBounceTest$$anonfun$consumeWithBrokerFailures$2.apply(ConsumerBounceTest.scala:91)
>   at 
> kafka.api.ConsumerBounceTest$$anonfun$consumeWithBrokerFailures$2.apply(ConsumerBounceTest.scala:90)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:90)
>   at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> {code}
> {code}
> java.lang.IllegalStateException: Kafka server is still shutting down, cannot 
> re-start!
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply$mcVI$sp(KafkaServerTestHarness.scala:117)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
>   at 
> kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>   at 
> kafka.integration.KafkaServerTestHarness$class.restartDeadBrokers(KafkaServerTestHarness.scala:116)
>   at 
> kafka.api.ConsumerBounceTest.restartDeadBrokers(ConsumerBounceTest.scala:34)
>   at 
> kafka.api.ConsumerBounceTest$BounceBrokerScheduler.doWork(ConsumerBounceTest.scala:158)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-20 Thread radai
when the leader decides to commit a TX (of X msgs, known at this point), it
writes an "intent to append X msgs" msg (control?) followed by the X msgs
(at this point it is the leader and therefor point of sync, so this can be
done with no "foreign" msgs in between).
if there's a crash/change of leadership the new leader can just roll back
(remove what partial contents it had) if it sees the "intent" msg but dosnt
see X msgs belonging to the TX after it. the watermark does not advance
into the middle of a TX - so nothing is visible to any consumer until the
whole thing is committed and replicated (or crashes and rolled back). which
means i dont think TX storage needs replication, and atomicity to consumers
is retained.

I cant argue with the latency argument, but:

1. if TXs can be done in-mem maybe TX per-msg isnt that expensive?
2. I think a logical clock approach (with broker-side dedup based on the
clock) could provide the same exactly once semantics without requiring
transactions at all?

however, I concede that as you describe it (long running TXs where commits
are actually "checkpoint"s spaced to optimize overhead vs RPO/RTO) you
would require read uncommitted to minimize latency.

On Tue, Dec 20, 2016 at 1:24 PM, Apurva Mehta  wrote:

> durably at the moment we enter the pre-commit phase. If we
> don't have durable persistence of these messages, we can't have idempotent
> and atomic copying into the main  log, and your proposal to date does not
> show otherwise.
>


[GitHub] kafka pull request #2283: HOTFIX: Convert exception to warning since clearly...

2016-12-20 Thread enothereska
GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/2283

HOTFIX: Convert exception to warning since clearly it is happening.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka hotfix-state-transition-warn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2283


commit 79079abdad2687882270a6f66906b29dc67aaa5c
Author: Eno Thereska 
Date:   2016-12-20T22:30:28Z

Convert exception to warning since clearly it is happening.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4563) State transitions error PARTITIONS_REVOKED to NOT_RUNNING

2016-12-20 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska updated KAFKA-4563:

Description: 
When starting and stopping streams quickly, the following exception is thrown:

java.lang.IllegalStateException: Incorrect state transition from 
PARTITIONS_REVOKED to NOT_RUNNING
at 
org.apache.kafka.streams.processor.internals.StreamThread.setState(StreamThread.java:164)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:414)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:366)

A temporary fix is to convert the exception into a warning, since clearly not 
all state transitions are thought through yet.

  was:
When starting and stopping streams quickly, the following exception is thrown:

java.lang.IllegalStateException: Incorrect state transition from 
PARTITIONS_REVOKED to NOT_RUNNING
at 
org.apache.kafka.streams.processor.internals.StreamThread.setState(StreamThread.java:164)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:414)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:366)


> State transitions error PARTITIONS_REVOKED to NOT_RUNNING
> -
>
> Key: KAFKA-4563
> URL: https://issues.apache.org/jira/browse/KAFKA-4563
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.10.2.0
>
>
> When starting and stopping streams quickly, the following exception is thrown:
> java.lang.IllegalStateException: Incorrect state transition from 
> PARTITIONS_REVOKED to NOT_RUNNING
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.setState(StreamThread.java:164)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:414)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:366)
> A temporary fix is to convert the exception into a warning, since clearly not 
> all state transitions are thought through yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-90 Remove zkClient dependency from Streams

2016-12-20 Thread Damian Guy
+1

On Tue, 20 Dec 2016 at 21:16 Sriram Subramanian  wrote:

> +1
>
> On Tue, Dec 20, 2016 at 1:13 PM, Guozhang Wang  wrote:
>
> > +1. Thanks!
> >
> > On Tue, Dec 20, 2016 at 1:01 PM, Hojjat Jafarpour 
> > wrote:
> >
> > > Hi all,
> > >
> > > Seems that there is no opposition to this KIP. This email is to start
> the
> > > voting for this KIP.
> > > Once again the KIP is for removing zkClient dependency from Streams.
> > Please
> > > check out the KIP page:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 90+-+Remove+zkClient+
> > > dependency+from+Streams
> > >
> > > Thanks,
> > > --Hojjat
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-20 Thread Damian Guy
Hi Guozhang,

Thanks for your input. Answers below, but i'm thinking we should remove
joins from GlobalKTables for the time being and re-visit if necessary in
the future.

1. with a global table the joins are never really materialized (at least
how i see it), rather they are just views on the existing global tables.
I've deliberately taken this approach so we don't have to create yet
another State Store and changelog topic etc. These all consume resources
that i believe are unnecessary. So, i don't really see the point of having
a materialize method. Further, one of the major benefits of joining two
global tables is being able to query them via Interactive Queries. For this
you need the name, so i think it makes sense to provide it with the join.

2. This has been discussed already in this thread (with Michael), and
outerJoin is deliberately not part of the KIP. To be able to join both
ways, as you suggest, requires that both inputs are able to map to the same
key. This is not always going to be possible, i.e., relationships can be
one way, so for that reason i felt it was best to not go down that path as
we'd not be able to resolve it at the time that
globalTable.join(otherGlobalTable,...) was called, and this would result in
possible confusion. Also, to support this we'd need to physically
materialize a StateStore that represents the join (which i think is a waste
of resources), or, we'd need to provide another interface where we can map
from the key of the resulting global table to the keys of both of the
joined tables.

3. The intention is that the GlobalKTables are in a single topology that is
owned and updated by a single thread. So yes it is necessary that they can
be created separately.

4. Bootstrapping and maintaining of the state of GlobalKTables are done on
a single thread. This thread will run simultaneously with the current
StreamThreads. It doesn't make sense to move the bootstrapping of the
StandbyTasks to this thread as they are logically part of a StreamThread,
they are 'assigned' to the StreamThread. With GlobalKTables there is no
assignment as such, the thread just maintains all of them.

5. Yes i'll update the KIP - the state directory will be under the same
path as StreamsConfig.STATE_DIR_CONFIG, but it will be a specific
directory, i.e, global_state, rather then being a task directory.

6. The whole point of GlobalKTables is to have a copy of ALL of the data on
each node. I don't think it makes sense to be able to reset the starting
position.

Thanks,
Damian

On Tue, 20 Dec 2016 at 20:00 Guozhang Wang  wrote:

> One more thing to add:
>
> 6. For KGlobalTable, it is always bootstrapped from the beginning while for
> other KTables, we are enabling users to override their resetting position
> as in
>
> https://github.com/apache/kafka/pull/2007
>
> Should we consider doing the same for KGlobalTable as well?
>
>
> Guozhang
>
>
> On Tue, Dec 20, 2016 at 11:39 AM, Guozhang Wang 
> wrote:
>
> > Thanks for the very well written proposal, and sorry for the very-late
> > review. I have a few comments here:
> >
> > 1. We are introducing a "queryableViewName" in the GlobalTable join
> > results, while I'm wondering if we should just add a more general
> function
> > like "materialize" to KTable and KGlobalTable with the name to be used in
> > queries?
> >
> > 2. For KGlobalTable's own "join" and "leftJoin": since we are only
> passing
> > the KeyValueMapper keyMapper it seems that for either case only
> > the left hand side will logically "trigger" the join, which is different
> to
> > KTable's join semantics. I'm wondering if it would be more consistent to
> > have them as:
> >
> >
> >  GlobalKTable join(final GlobalKTable other,
> > final KeyValueMapper
> > leftkeyMapper,
> > final KeyValueMapper
> > rightkeyMapper,
> > final ValueJoiner
> joiner
> > final String queryableViewName);
> >
> >  GlobalKTable outerJoin(final GlobalKTable
> other,
> >  final KeyValueMapper
> > leftkeyMapper,
> >  final KeyValueMapper
> > rightkeyMapper,
> >  final ValueJoiner
> > joiner,
> >  final String queryableViewName);
> >
> >  GlobalKTable leftJoin(final GlobalKTable other,
> > final KeyValueMapper
> > keyMapper,
> > final ValueJoiner
> > joiner,
> > final String queryableViewName);
> >
> >
> > I.e. add another directional key mapper to join and also to outerJoin.
> >
> >
> 

Build failed in Jenkins: kafka-trunk-jdk8 #1119

2016-12-20 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4540: Suspended tasks that are not assigned to the StreamThread

[wangguoz] MINOR: Add more exception information in ProcessorStateManager

[jason] KAFKA-4554; Fix ReplicaBuffer.verifyChecksum to use iterators instead of

--
[...truncated 7942 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-20 Thread Apurva Mehta
@Radai, regarding the replication for inflight transactional messages.

I think Jay and Joel have addressed the need for transactional messages to
be persisted durably at the moment we enter the pre-commit phase. If we
don't have durable persistence of these messages, we can't have idempotent
and atomic copying into the main  log, and your proposal to date does not
show otherwise.

Additionally, I would like to point out that both the proposed solutions
for the copy operation in the transaction journal approach are pretty
invasive changes at the core of the kafka log manager layer and below: you
either have to 'splice in' segments. Or else you have to guarantee that set
of messages will be copied from one log to another idempotently and
atomically even in the case of failures, which means reliably keeping track
of messages already copied, reliably knowing from where to resume the copy,
etc.

The proposal in the KIP does not require major changes to Kafka at the Log
manager level and below: every partition involved in the transaction
(including the transaction log) is just another partition, so we inherit
all the durability guarantees for these partitions.

I don't think significantly complicating the log manager level is a deal
breaker, but I would like to point out the costs of the two log approach
from an implementation perspective.

@Joel,

I read over your wiki, and apart from the introduction of the notion of
journal partitions --whose pros and cons are already being discussed-- you
also introduce the notion of a 'producer group' which enables multiple
producers to participate in a single transaction. This is completely
opposite of the model in the KIP where a transaction is defined by a
producer id, and hence there is a 1-1 mapping between producers and
transactions. Further, each producer can have exactly one in-flight
transaction at a time in the KIP.

The motivation for the model in the KIP is the streams use-case, where a
1-1 mapping between producers and transactions is natural. I am curious
about the use cases you have in mind for a many-to-one mapping between
producers and transactions.

@all,

As Jay and Sriram have alluded to, the current proposal is geared toward
enabling transactions for streaming applications. However, the details of
these use-cases and the features they need are missing from the KIP. In
particular, enabling deep stream topologies with low end-to-end processing
time necessitates speculative execution, and is one of the driving factors
behind the present proposal. We will update the document with these
details.

Regards,
Apurva


On Tue, Dec 20, 2016 at 11:28 AM, Jay Kreps  wrote:

> I don't think the simple approach of writing to a local store (in memory or
> on disk) and then copying out to the destination topics would work but
> there could well be more sophisticated things that would. As you say, it is
> fine for the data to be un-replicated while you are accumulating the
> transaction, because you can always just abort the transaction if that node
> fails, but once you decided to commit and begin the process of copying out
> data you must guarantee you eventually will copy out the full transaction.
> If you have a non-durable store on one broker, and that broker crashes in
> the middle of copying out the transaction to the destination brokers, if it
> is possible that some of the writes have already succeeded, and the others
> are now lost, then you would violate atomicity.
>
> This is similar in classic two-phase commit protocols: a post-condition of
> a successful prepare commit is a promise that the transaction will
> eventually be successfully committed if requested so full durability is
> required in the pre-commit phase.
>
> But the flaw in the simple approach doesn't mean there isn't some less
> obvious solution that hasn't been thought of yet.
>
> For latency, yeah you're exactly right. We're assuming the latency of
> transactions can be pushed down to almost the duration of the transaction
> and obviously it can't be less than that. Let me try to flesh out the
> motivation for caring about latency (I think Sriram touched on this):
>
>- We're primarily motivated by uses that fit a generalized notion of
>correct, stateful stream processing. That is you consume/process/produce
>potentially with associated local state in the processing. This fits KS
> and
>Samza, but potentially a whole world of things that do transformation of
>data. I think this is a really general notion of stream processing as a
>kind of "protocol" and the proposed semantics give a kind of "closure"
> to
>Kafka's producer and consumer protocols so they can be correctly
> chained.
>- These use cases end up being a kind of DAG of transformations, often
>even a fairly simple flow will have a depth of 5 stages and more
> realistic
>flows can be more like 10.
>- The transaction size is proportional to the efficiency since the
>

Re: [VOTE] KIP-90 Remove zkClient dependency from Streams

2016-12-20 Thread Sriram Subramanian
+1

On Tue, Dec 20, 2016 at 1:13 PM, Guozhang Wang  wrote:

> +1. Thanks!
>
> On Tue, Dec 20, 2016 at 1:01 PM, Hojjat Jafarpour 
> wrote:
>
> > Hi all,
> >
> > Seems that there is no opposition to this KIP. This email is to start the
> > voting for this KIP.
> > Once again the KIP is for removing zkClient dependency from Streams.
> Please
> > check out the KIP page:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 90+-+Remove+zkClient+
> > dependency+from+Streams
> >
> > Thanks,
> > --Hojjat
> >
>
>
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-90 Remove zkClient dependency from Streams

2016-12-20 Thread Guozhang Wang
+1. Thanks!

On Tue, Dec 20, 2016 at 1:01 PM, Hojjat Jafarpour 
wrote:

> Hi all,
>
> Seems that there is no opposition to this KIP. This email is to start the
> voting for this KIP.
> Once again the KIP is for removing zkClient dependency from Streams. Please
> check out the KIP page:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 90+-+Remove+zkClient+
> dependency+from+Streams
>
> Thanks,
> --Hojjat
>



-- 
-- Guozhang


[VOTE] KIP-90 Remove zkClient dependency from Streams

2016-12-20 Thread Hojjat Jafarpour
Hi all,

Seems that there is no opposition to this KIP. This email is to start the
voting for this KIP.
Once again the KIP is for removing zkClient dependency from Streams. Please
check out the KIP page:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-90+-+Remove+zkClient+
dependency+from+Streams

Thanks,
--Hojjat


Build failed in Jenkins: kafka-trunk-jdk8 #1118

2016-12-20 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4473: RecordCollector should handle retriable exceptions more

--
[...truncated 32680 lines...]

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup PASSED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose STARTED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] FAILED
java.lang.AssertionError: Condition not met within timeout 6. Did not 
receive 1 number of records
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:283)
at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinValuesRecordsReceived(IntegrationTestUtils.java:251)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.waitUntilAtLeastNumRecordProcessed(QueryableStateIntegrationTest.java:669)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:350)

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenDeleted STARTED


Build failed in Jenkins: kafka-trunk-jdk7 #1772

2016-12-20 Thread Apache Jenkins Server
See 

Changes:

[jason] KAFKA-4554; Fix ReplicaBuffer.verifyChecksum to use iterators instead of

--
[...truncated 3870 lines...]

kafka.log.LogTest > testReadWithTooSmallMaxLength STARTED

kafka.log.LogTest > testReadWithTooSmallMaxLength PASSED

kafka.log.LogTest > testOverCompactedLogRecovery STARTED

kafka.log.LogTest > testOverCompactedLogRecovery PASSED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved STARTED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest > testCompressedMessages STARTED

kafka.log.LogTest > testCompressedMessages PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload STARTED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog STARTED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset STARTED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest > testReopenThenTruncate STARTED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName STARTED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles STARTED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testRebuildTimeIndexForOldMessages STARTED

kafka.log.LogTest > testRebuildTimeIndexForOldMessages PASSED

kafka.log.LogTest > testSizeBasedLogRoll STARTED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > shouldNotDeleteSizeBasedSegmentsWhenUnderRetentionSize 
STARTED

kafka.log.LogTest > shouldNotDeleteSizeBasedSegmentsWhenUnderRetentionSize 
PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter STARTED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName STARTED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo STARTED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile STARTED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.LogTest > testBuildTimeIndexWhenNotAssigningOffsets STARTED

kafka.log.LogTest > testBuildTimeIndexWhenNotAssigningOffsets PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] STARTED


Re: [DISCUSS] KIP-101: Alter Replication Protocol to use Leader Generation rather than High Watermark for Truncation

2016-12-20 Thread Ben Stopford
Hi all

So having gone through a few extra failure scenarios it appears it is still
possible for logs to diverge if the unclean.leader.election setting is
enabled. The protocol could be evolved further to protect against this. The
issue is that it adds significant complexity, and potentially impacts other
primitives like log compaction. As a result the most pragmatic solution is
to *limit the guarantees this KIP provides to clusters where unclean leader
election is disabled*.

If anyone has any strong feelings on this, or useful insights, that would
be awesome. Otherwise I'll update the KIP to reflect this stance (along
with the example below).

All the best
B

*Divergent Logs with Leader Epochs & Unclean Leader Election*
It should be possible to still corrupt the log, even with Leader epochs, if
min.isr=1 and unclean.leader.election=true. Consider two brokers A,B, a
single topic, a single partition, reps=2, min.isr=1.

Intuitively the issue can be seen as:
-> The first two writes create a divergent log at offset 0 on completely
isolated brokers.
-> The second two writes “cover up” that first divergent write so the
LeaderEpoch request doesn’t see it.

Scenario:
1. [LeaderEpoch0] Write a message to A (offset A:0), Stop broker A. Bring
up broker B which becomes leader
2. [LeaderEpoch1] Write a message to B (offset B:0), Stop broker B. Bring
up broker A which becomes leader
3. [LeaderEpoch2] Write a message to A (offset A:1), Stop broker A. Bring
up broker B which becomes leader
4. [LeaderEpoch3] Write a message to B (offset B:1),
5. Bring up broker A. It sends a Epoch Request for Epoch 2 to broker B. B
has only epochs 1,3, not 2, so it replies with the first offset of Epoch 3
(which is 1). So offset 0 is divergent.

The underlying problem here is that, whilst B can tell something is wrong,
it can't tell where in the log the divergence started.

One solution is to detect the break, by comparing complete epoch lineage
between brokers, then truncate either to (a) zero or (b) the point of
divergence, then refetch. However compacted topics make both of these
options hard as arbitrary epochs & offset information can be 'lost' from
the log. This information could be retained and managed in the LeaderEpoch
file instead, but the whole solution is becoming quite complex. Hence it
seems sensible to forgo this guarantee for the unclean leader election
case, or at least push it to a subsequent kip.


On Wed, Dec 14, 2016 at 6:45 PM Jun Rao  wrote:

Hi, Onur,

The reason for keeping track of the CZXID of the broker registration path
is the following. There is one corner case bug (KAFKA-1120) that Ben
mentioned where the controller could miss a ZK watcher event if the broker
deregisters and registers quickly. Always triggering a leader election (and
thus increasing the leader epoch) on broker registration event may work,
but we have to think through the controller failover logic. When the
controller initializes, it simply reads all current broker registration
from ZK. The controller doesn't know whether any broker registration has
changed since the previous controller has failed. Just blindly forcing
leader election on all partitions during the controller failover probably
adds too much overhead.

So, the idea is to have the broker tracks the broker -> CZXID mapping in
memory. Every time the controller changes the leader for a partition, the
controller stores the CZXID of the leader together with the leader broker
id (and leader epoch, controller epoch etc) in memory and in
/brokers/topics/[topic]/partitions/[partitionId]/state
(this is missing in the KIP wiki). Now if the controller gets a broker
registration event or when there is a controller failover, the controller
just needs to force a leader election if the CZXID of the broker
registration doesn't match the CZXID associated with the leader in
/brokers/topics/[topic]/partitions/[partitionId]/state.
This way, we will only do leader election when it's truly necessary.

The reason why this change is related to this KIP is that it also addresses
the issue of keeping the replicas identical during correlated failures. If
all replicas are down and the leader replica is the first being restarted,
by forcing the increase of leader epoch even though the leader remains on
the same replica, we can distinguish the data written since the leader
replica is restarted from those written by the same leader replica before
it's restarted. This allows us to maintain all replicas to be identical
even in the correlated failure case.

Thanks,

Jun

On Sun, Dec 11, 2016 at 3:54 PM, Onur Karaman 
wrote:

> Pretty happy to see a KIP tackling this problem! One comment below.
>
> The "Extending LeaderEpoch to include Returning Leaders" states:
> "To protect against this eventuality the controller will maintain a cached
> mapping of [broker -> Zookeeper CZXID] (CZXID is a unique and monotonic
> 64-bit number) for the broker’s registration in 

[GitHub] kafka pull request #2282: MINOR: Support auto-incrementing offsets in Memory...

2016-12-20 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2282

MINOR: Support auto-incrementing offsets in MemoryRecordsBuilder



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka builder-autoincrement-offsets

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2282.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2282


commit 834873961ea81873e79583a6f2bd23d56ff23572
Author: Jason Gustafson 
Date:   2016-12-20T20:01:14Z

MINOR: Support auto-incrementing offsets in MemoryRecordsBuilder




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-20 Thread Guozhang Wang
One more thing to add:

6. For KGlobalTable, it is always bootstrapped from the beginning while for
other KTables, we are enabling users to override their resetting position
as in

https://github.com/apache/kafka/pull/2007

Should we consider doing the same for KGlobalTable as well?


Guozhang


On Tue, Dec 20, 2016 at 11:39 AM, Guozhang Wang  wrote:

> Thanks for the very well written proposal, and sorry for the very-late
> review. I have a few comments here:
>
> 1. We are introducing a "queryableViewName" in the GlobalTable join
> results, while I'm wondering if we should just add a more general function
> like "materialize" to KTable and KGlobalTable with the name to be used in
> queries?
>
> 2. For KGlobalTable's own "join" and "leftJoin": since we are only passing
> the KeyValueMapper keyMapper it seems that for either case only
> the left hand side will logically "trigger" the join, which is different to
> KTable's join semantics. I'm wondering if it would be more consistent to
> have them as:
>
>
>  GlobalKTable join(final GlobalKTable other,
> final KeyValueMapper
> leftkeyMapper,
> final KeyValueMapper
> rightkeyMapper,
> final ValueJoiner joiner
> final String queryableViewName);
>
>  GlobalKTable outerJoin(final GlobalKTable other,
>  final KeyValueMapper
> leftkeyMapper,
>  final KeyValueMapper
> rightkeyMapper,
>  final ValueJoiner
> joiner,
>  final String queryableViewName);
>
>  GlobalKTable leftJoin(final GlobalKTable other,
> final KeyValueMapper
> keyMapper,
> final ValueJoiner
> joiner,
> final String queryableViewName);
>
>
> I.e. add another directional key mapper to join and also to outerJoin.
>
>
> 3. For "TopologyBuilder.buildGlobalStateTopology", is it necessary to
> have a separate function from "TopologyBuilder.build" itself? With global
> tables, is there any scenarios that we want to build the topology without
> the embedded global tables (i.e. still calling "build")?
>
> 4. As for implementation, you mentioned that global table bootstraping
> will be done in another dedicated thread. Could we also consider moving the
> logic of bootstrapping the standby-replica state stores into this thread as
> well, which can then leverage on the existing "restoreConsumer" that does
> not participate in the consumer group protocol? By doing this I think we
> can still avoid thread-synchronization while making the logic more clear
> (ideally the standby restoration do not really need to be in part of the
> stream thread's main loops).
>
> 5. Also for the global table's state directory, I'm assuming it will not
> be under the per-task directory as it is per instance. But could you
> elaborate a bit in the wiki about its directory as well? Also could we
> consider adding https://issues.apache.org/jira/browse/KAFKA-3522 along
> with this feature since we may need to change the directory path / storage
> schema formats for these different types of stores moving forward.
>
>
>
> Guozhang
>
>
> On Fri, Dec 9, 2016 at 4:21 AM, Damian Guy  wrote:
>
>> Thanks for the update Michael.
>>
>> I just wanted to add that there is one crucial piece of information that
>> i've failed to add (I apologise).
>>
>> To me, the join between 2 Global Tables just produces a view on top of the
>> underlying tables (this is the same as it works for KTables today). So
>> that
>> means there is no Physical StateStore that backs the join result, it is
>> just a Virtual StateStore that knows how to resolve the join when it is
>> required. I've deliberately taken this path so that we don't end up having
>> yet another copy of the data, stored on local disk, and sent to another
>> change-log topic. This also reduces the memory overhead from creating
>> RocksDBStores and reduces load on the Thread based caches we have. So it
>> is
>> a resource optimization.
>>
>> So while it is technically possible to support outer joins, we would need
>> to physically materialize the StateStore (and create a changelog-topic for
>> it), or, we'd need to provide another interface where the user could map
>> from the outerJoin key to both of the other table keys. This is because
>> the
>> key of the outerJoin table could be either the key of the lhs table, or
>> the
>> rhs tables, or something completely different.
>>
>> With this and what you have mentioned above in mind i think we should park
>> 

Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-20 Thread Guozhang Wang
Thanks for the very well written proposal, and sorry for the very-late
review. I have a few comments here:

1. We are introducing a "queryableViewName" in the GlobalTable join
results, while I'm wondering if we should just add a more general function
like "materialize" to KTable and KGlobalTable with the name to be used in
queries?

2. For KGlobalTable's own "join" and "leftJoin": since we are only passing
the KeyValueMapper keyMapper it seems that for either case only
the left hand side will logically "trigger" the join, which is different to
KTable's join semantics. I'm wondering if it would be more consistent to
have them as:


 GlobalKTable join(final GlobalKTable other,
final KeyValueMapper
leftkeyMapper,
final KeyValueMapper
rightkeyMapper,
final ValueJoiner joiner
final String queryableViewName);

 GlobalKTable outerJoin(final GlobalKTable other,
 final KeyValueMapper
leftkeyMapper,
 final KeyValueMapper
rightkeyMapper,
 final ValueJoiner joiner,
 final String queryableViewName);

 GlobalKTable leftJoin(final GlobalKTable other,
final KeyValueMapper
keyMapper,
final ValueJoiner joiner,
final String queryableViewName);


I.e. add another directional key mapper to join and also to outerJoin.


3. For "TopologyBuilder.buildGlobalStateTopology", is it necessary to have
a separate function from "TopologyBuilder.build" itself? With global
tables, is there any scenarios that we want to build the topology without
the embedded global tables (i.e. still calling "build")?

4. As for implementation, you mentioned that global table bootstraping will
be done in another dedicated thread. Could we also consider moving the
logic of bootstrapping the standby-replica state stores into this thread as
well, which can then leverage on the existing "restoreConsumer" that does
not participate in the consumer group protocol? By doing this I think we
can still avoid thread-synchronization while making the logic more clear
(ideally the standby restoration do not really need to be in part of the
stream thread's main loops).

5. Also for the global table's state directory, I'm assuming it will not be
under the per-task directory as it is per instance. But could you elaborate
a bit in the wiki about its directory as well? Also could we consider
adding https://issues.apache.org/jira/browse/KAFKA-3522 along with this
feature since we may need to change the directory path / storage schema
formats for these different types of stores moving forward.



Guozhang


On Fri, Dec 9, 2016 at 4:21 AM, Damian Guy  wrote:

> Thanks for the update Michael.
>
> I just wanted to add that there is one crucial piece of information that
> i've failed to add (I apologise).
>
> To me, the join between 2 Global Tables just produces a view on top of the
> underlying tables (this is the same as it works for KTables today). So that
> means there is no Physical StateStore that backs the join result, it is
> just a Virtual StateStore that knows how to resolve the join when it is
> required. I've deliberately taken this path so that we don't end up having
> yet another copy of the data, stored on local disk, and sent to another
> change-log topic. This also reduces the memory overhead from creating
> RocksDBStores and reduces load on the Thread based caches we have. So it is
> a resource optimization.
>
> So while it is technically possible to support outer joins, we would need
> to physically materialize the StateStore (and create a changelog-topic for
> it), or, we'd need to provide another interface where the user could map
> from the outerJoin key to both of the other table keys. This is because the
> key of the outerJoin table could be either the key of the lhs table, or the
> rhs tables, or something completely different.
>
> With this and what you have mentioned above in mind i think we should park
> outerJoin support for this KIP and re-visit if and when we need it in the
> future.
>
> I'll update the KIP with this.
>
> Thanks,
> Damian
>
> On Fri, 9 Dec 2016 at 09:53 Michael Noll  wrote:
>
> > Damian and I briefly chatted offline (thanks, Damian!), and here's the
> > summary of my thoughts and conclusion.
> >
> > TL;DR: Let's skip outer join support for global tables.
> >
> > In more detail:
> >
> > - We agreed that, technically, we can add OUTER JOIN support.  However,
> > outer joins only work if certain 

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-20 Thread Jay Kreps
I don't think the simple approach of writing to a local store (in memory or
on disk) and then copying out to the destination topics would work but
there could well be more sophisticated things that would. As you say, it is
fine for the data to be un-replicated while you are accumulating the
transaction, because you can always just abort the transaction if that node
fails, but once you decided to commit and begin the process of copying out
data you must guarantee you eventually will copy out the full transaction.
If you have a non-durable store on one broker, and that broker crashes in
the middle of copying out the transaction to the destination brokers, if it
is possible that some of the writes have already succeeded, and the others
are now lost, then you would violate atomicity.

This is similar in classic two-phase commit protocols: a post-condition of
a successful prepare commit is a promise that the transaction will
eventually be successfully committed if requested so full durability is
required in the pre-commit phase.

But the flaw in the simple approach doesn't mean there isn't some less
obvious solution that hasn't been thought of yet.

For latency, yeah you're exactly right. We're assuming the latency of
transactions can be pushed down to almost the duration of the transaction
and obviously it can't be less than that. Let me try to flesh out the
motivation for caring about latency (I think Sriram touched on this):

   - We're primarily motivated by uses that fit a generalized notion of
   correct, stateful stream processing. That is you consume/process/produce
   potentially with associated local state in the processing. This fits KS and
   Samza, but potentially a whole world of things that do transformation of
   data. I think this is a really general notion of stream processing as a
   kind of "protocol" and the proposed semantics give a kind of "closure" to
   Kafka's producer and consumer protocols so they can be correctly chained.
   - These use cases end up being a kind of DAG of transformations, often
   even a fairly simple flow will have a depth of 5 stages and more realistic
   flows can be more like 10.
   - The transaction size is proportional to the efficiency since the
   overhead of the transaction is fixed irrespective of the number of
   messages. A transaction with two messages will be extremely inefficient,
   but one with a few thousand should be much better. So you can't comfortably
   make the transactions too small but yes you probably wouldn't need them to
   be multisecond.
   - The latency of the transactions stack up with the stages in the DAG in
   a naive usage. Say you commit every 100ms, if you have 10 stages your
   latency is going to be 1 second.
   - This latency is definitely a concern in many domains. This is why we
   are interested in having the option of supporting speculative execution.
   For speculative execution you assume likely processes won't fail and you go
   ahead and compute downstream results but co-ordinate the commit. This
   trades more work rolling back when there are failures for lower latency.
   This lets you push the end-to-end latency closer to 100ms rather than the
   100ms*num_stages.

Hopefully that gives a bit more color on the latency concern and desire for
"read uncommitted".

-Jay

On Tue, Dec 20, 2016 at 10:33 AM, radai  wrote:

> obviously anything committed would need to be replicated to all followers -
> just like current msgs.
>
> what im trying to say is that in-flight data (written as part of an ongoing
> TX and not committed yet) does not necessarily need to be replicated, or
> even written out to disk. taken to the extreme it means i can buffer in
> memory on the leader alone and incur no extra writes at all.
>
> if you dont want to just buffer in-memory on the leader (or are forced to
> spool to disk because of size) you could still avoid a double write by
> messing around with segment files (so the TX file becomes part of the
> "linked-list" of segment files instead of reading it and appending it's
> contents verbatim to the current segment file).
>
> the area when this does inevitably come short is latency and "read
> uncommitted" (which are related). the added delay (after cutting all the
> corners above) would really be the "time span" of a TX - the amount of time
> from the moment the producer started the TX to the time when it was
> committed. in my mind this time span is very short. am I failing to
> understand the proposed "typical" use case? is the plan to use long-running
> transactions and only commit at, say, 5 minute "checkpoints" ?
>
> On Tue, Dec 20, 2016 at 10:00 AM, Jay Kreps  wrote:
>
> > Cool. It sounds like you guys will sync up and come up with a specific
> > proposal. I think point (3) does require full replication of the
> pre-commit
> > transaction, but I'm not sure, and I would be very happy to learn
> > otherwise. That was actually the blocker on 

[jira] [Commented] (KAFKA-4554) ReplicaBuffer.verifyChecksum should use use iterators instead of iterables

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765006#comment-15765006
 ] 

ASF GitHub Bot commented on KAFKA-4554:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2280


> ReplicaBuffer.verifyChecksum should use use iterators instead of iterables
> --
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
> Fix For: 0.10.2.0
>
>
> This regressed in 
> https://github.com/apache/kafka/commit/b58b6a1bef0ecdc2107a415e222af099fcd9bce3
>  causing the following system test failure:
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2280: KAFKA-4554: Fix ReplicaBuffer.verifyChecksum to us...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2280


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4554) ReplicaBuffer.verifyChecksum should use use iterators instead of iterables

2016-12-20 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-4554:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2280
[https://github.com/apache/kafka/pull/2280]

> ReplicaBuffer.verifyChecksum should use use iterators instead of iterables
> --
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
> Fix For: 0.10.2.0
>
>
> This regressed in 
> https://github.com/apache/kafka/commit/b58b6a1bef0ecdc2107a415e222af099fcd9bce3
>  causing the following system test failure:
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4561) Ordering of operations in StreamThread.shutdownTasksAndState may void at-least-once guarantees

2016-12-20 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4561:
--
Status: Patch Available  (was: In Progress)

> Ordering of operations in StreamThread.shutdownTasksAndState may void 
> at-least-once guarantees
> --
>
> Key: KAFKA-4561
> URL: https://issues.apache.org/jira/browse/KAFKA-4561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> In {{shutdownTasksAndState}} we currently commit offsets as the first step. 
> If a subsequent step throws an exception, i.e, flushing the producer, then 
> this would violate the at-least-once guarantees.
> We need to commit after all other state has been flushed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4561) Ordering of operations in StreamThread.shutdownTasksAndState may void at-least-once guarantees

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764971#comment-15764971
 ] 

ASF GitHub Bot commented on KAFKA-4561:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2281

KAFKA-4561: Ordering of operations in StreamThread.shutdownTasksAndState 
may void at-least-once guarantees

In `shutdownTasksAndState` and `suspendTasksAndState` we commit offsets 
BEFORE we flush any state. This is wrong as if an exception occurs during a 
flush, we may violate the at-least-once guarantees, that is we would have 
committed some offsets but NOT sent the processed data on to other Sinks.
Also during suspend and shutdown, we should try and complete all tasks even 
when exceptions occur. We should just keep track of the exception and rethrow 
it at the end if necessary. This helps with ensuring that StateStores etc are 
closed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4561

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2281


commit e60f0d020d837f6c10406bdecd92d77a3b6c089b
Author: Damian Guy 
Date:   2016-12-20T18:59:14Z

change ordering of suspend and shutdown in StreamThread so that we dont 
lose data. Always try and run all the steps even if there are exceptions




> Ordering of operations in StreamThread.shutdownTasksAndState may void 
> at-least-once guarantees
> --
>
> Key: KAFKA-4561
> URL: https://issues.apache.org/jira/browse/KAFKA-4561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> In {{shutdownTasksAndState}} we currently commit offsets as the first step. 
> If a subsequent step throws an exception, i.e, flushing the producer, then 
> this would violate the at-least-once guarantees.
> We need to commit after all other state has been flushed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2281: KAFKA-4561: Ordering of operations in StreamThread...

2016-12-20 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2281

KAFKA-4561: Ordering of operations in StreamThread.shutdownTasksAndState 
may void at-least-once guarantees

In `shutdownTasksAndState` and `suspendTasksAndState` we commit offsets 
BEFORE we flush any state. This is wrong as if an exception occurs during a 
flush, we may violate the at-least-once guarantees, that is we would have 
committed some offsets but NOT sent the processed data on to other Sinks.
Also during suspend and shutdown, we should try and complete all tasks even 
when exceptions occur. We should just keep track of the exception and rethrow 
it at the end if necessary. This helps with ensuring that StateStores etc are 
closed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4561

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2281


commit e60f0d020d837f6c10406bdecd92d77a3b6c089b
Author: Damian Guy 
Date:   2016-12-20T18:59:14Z

change ordering of suspend and shutdown in StreamThread so that we dont 
lose data. Always try and run all the steps even if there are exceptions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Kafka Summit 2017

2016-12-20 Thread Gwen Shapira
Hi Kafka Fans,

Just in case you didn't hear / read:

Last year was the first Kafka Summit and it was quite successful. So
we are doing two this year: May in NYC and Aug in SF.

You can read more details here: https://kafka-summit.org/ and you can
use the code "kafkcom17" for a $50 community discount. Early bird
registration for NYC is ending soon :)

I also encourage you to share your Kafka experience with the wider
community - got a cool use-case? Kafka tips and tricks? Amazing
streams application? The best PHP client?  Call for paper is open and
the conference committee is waiting for your abstracts :)

-- 
Gwen Shapira


[jira] [Updated] (KAFKA-4540) Suspended tasks that are not assigned to the StreamThread need to be closed before new active and standby tasks are created

2016-12-20 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4540:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2266
[https://github.com/apache/kafka/pull/2266]

> Suspended tasks that are not assigned to the StreamThread need to be closed 
> before new active and standby tasks are created
> ---
>
> Key: KAFKA-4540
> URL: https://issues.apache.org/jira/browse/KAFKA-4540
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> When partition assignment happens we first try and add the active tasks and 
> then add the standby tasks. The problem with this is that a new active task 
> might already be an existing suspended standby task. if this is the case then 
> when the active task initialises it will throw an exception from RocksDB:
> {{Caused by: org.rocksdb.RocksDBException: IO error: lock 
> /tmp/kafka-streams-7071/kafka-music-charts/1_1/rocksdb/all-songs/LOCK: No 
> locks available}}
> We need to make sure we have removed an closed any no-longer assigned 
> Suspended tasks before creating new tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2266: KAFKA-4540: Suspended tasks that are not assigned ...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2266


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2276: MINOR: Add more exception information in Processor...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2276


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-20 Thread radai
obviously anything committed would need to be replicated to all followers -
just like current msgs.

what im trying to say is that in-flight data (written as part of an ongoing
TX and not committed yet) does not necessarily need to be replicated, or
even written out to disk. taken to the extreme it means i can buffer in
memory on the leader alone and incur no extra writes at all.

if you dont want to just buffer in-memory on the leader (or are forced to
spool to disk because of size) you could still avoid a double write by
messing around with segment files (so the TX file becomes part of the
"linked-list" of segment files instead of reading it and appending it's
contents verbatim to the current segment file).

the area when this does inevitably come short is latency and "read
uncommitted" (which are related). the added delay (after cutting all the
corners above) would really be the "time span" of a TX - the amount of time
from the moment the producer started the TX to the time when it was
committed. in my mind this time span is very short. am I failing to
understand the proposed "typical" use case? is the plan to use long-running
transactions and only commit at, say, 5 minute "checkpoints" ?

On Tue, Dec 20, 2016 at 10:00 AM, Jay Kreps  wrote:

> Cool. It sounds like you guys will sync up and come up with a specific
> proposal. I think point (3) does require full replication of the pre-commit
> transaction, but I'm not sure, and I would be very happy to learn
> otherwise. That was actually the blocker on that alternate proposal. From
> my point of view 2x overhead is kind of a deal breaker since it makes
> correctness so expensive you'd have to think very hard before turning it
> on, but if there is a way to do it with less and there aren't too many
> other negative side effects that would be very appealing. I think we can
> also dive a bit into why we are so perf and latency sensitive as it relates
> to the stream processing use cases...I'm not sure how much of that is
> obvious from the proposal.
>
> -Jay
>
>
>
>
>
> On Tue, Dec 20, 2016 at 9:11 AM, Joel Koshy  wrote:
>
> > Just got some time to go through most of this thread and KIP - great to
> see
> > this materialize and discussed!!
> > I will add more comments in the coming days on some of the other "tracks"
> > in this thread; but since Radai brought up the double-journaling approach
> > that we had discussed I thought I would move over some content from
> > our internal
> > wiki on double-journalling
> >  > Double+journaling+with+local+data+copy>
> > It is thin on details with a few invalid statements because I don't think
> > we dwelt long enough on it - it was cast aside as being too expensive
> from
> > a storage and latency perspective. As the immediately preceding emails
> > state, I tend to agree that those are compelling enough reasons to take a
> > hit in complexity/increased memory usage in the consumer. Anyway, couple
> of
> > us at LinkedIn can spend some time today brainstorming a little more on
> > this today.
> >
> > 1. on write amplification: i dont see x6 the writes, at worst i see x2
> the
> > > writes - once to the "tx log", then read and again to the destination
> > > partition. if you have some != 1 replication factor than both the 1st
> and
> > > the 2nd writes get replicated, but it is still a relative factor of x2.
> > > what am I missing?
> > >
> >
> > I think that's right - it would be six total copies if we are doing RF 3.
> >
> >
> > > 3. why do writes to a TX need the same guarantees as "plain" writes? in
> > > cases where the user can live with a TX rollback on change of
> > > leadership/broker crash the TX log can be unreplicated, and even live
> in
> > > the leader's memory. that would cut down on writes. this is also an
> > > acceptable default in SQL - if your socket connection to a DB dies
> mid-TX
> > > your TX is toast (mysql is even worse)
> > >
> >
> > I may have misunderstood - while the above may be true for transactions
> > in-flight, it definitely needs the same guarantees at the point of commit
> > and the straightforward way to achieve that is to rely on the same
> > guarantees while the transaction is in flight.
> >
> > 4. even if we replicate the TX log, why do we need to re-read it and
> > > re-write it to the underlying partition? if its already written to disk
> > all
> > > I would need is to make that file the current segment of the "real"
> > > partition and i've avoided the double write (at the cost of
> complicating
> > > segment management). if the data is replicated fetchers could do the
> > same.
> > >
> >
> > I think we had considered the above as well - i.e., if you abstract the
> > partition's segments into segments that contain non-transactional
> messages
> > and those that contain transactional messages then it should be possible
> to
> > jump from one to the other and back. It does add quite a 

[jira] [Commented] (KAFKA-4540) Suspended tasks that are not assigned to the StreamThread need to be closed before new active and standby tasks are created

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764886#comment-15764886
 ] 

ASF GitHub Bot commented on KAFKA-4540:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2266


> Suspended tasks that are not assigned to the StreamThread need to be closed 
> before new active and standby tasks are created
> ---
>
> Key: KAFKA-4540
> URL: https://issues.apache.org/jira/browse/KAFKA-4540
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> When partition assignment happens we first try and add the active tasks and 
> then add the standby tasks. The problem with this is that a new active task 
> might already be an existing suspended standby task. if this is the case then 
> when the active task initialises it will throw an exception from RocksDB:
> {{Caused by: org.rocksdb.RocksDBException: IO error: lock 
> /tmp/kafka-streams-7071/kafka-music-charts/1_1/rocksdb/all-songs/LOCK: No 
> locks available}}
> We need to make sure we have removed an closed any no-longer assigned 
> Suspended tasks before creating new tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4563) State transitions error PARTITIONS_REVOKED to NOT_RUNNING

2016-12-20 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-4563:
---

 Summary: State transitions error PARTITIONS_REVOKED to NOT_RUNNING
 Key: KAFKA-4563
 URL: https://issues.apache.org/jira/browse/KAFKA-4563
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Eno Thereska
Assignee: Eno Thereska
 Fix For: 0.10.2.0


When starting and stopping streams quickly, the following exception is thrown:

java.lang.IllegalStateException: Incorrect state transition from 
PARTITIONS_REVOKED to NOT_RUNNING
at 
org.apache.kafka.streams.processor.internals.StreamThread.setState(StreamThread.java:164)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:414)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:366)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4473) RecordCollector should handle retriable exceptions more strictly

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764822#comment-15764822
 ] 

ASF GitHub Bot commented on KAFKA-4473:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2249


> RecordCollector should handle retriable exceptions more strictly
> 
>
> Key: KAFKA-4473
> URL: https://issues.apache.org/jira/browse/KAFKA-4473
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Thomas Schulz
>Assignee: Damian Guy
>Priority: Critical
>  Labels: architecture
> Fix For: 0.10.2.0
>
>
> see: https://groups.google.com/forum/#!topic/confluent-platform/DT5bk1oCVk8
> There is probably a bug in the RecordCollector as described in my detailed 
> Cluster test published in the aforementioned post.
> The class RecordCollector has the following behavior:
> - if there is no exception, add the message offset to a map
> - otherwise, do not add the message offset and instead log the above statement
> Is it possible that this offset map contains the latest offset to commit? If 
> so, a message that fails might be overriden be a successful (later) message 
> and the consumer commits every message up to the latest offset?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2249: KAFKA-4473: RecordCollector should handle retriabl...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-20 Thread Jay Kreps
Cool. It sounds like you guys will sync up and come up with a specific
proposal. I think point (3) does require full replication of the pre-commit
transaction, but I'm not sure, and I would be very happy to learn
otherwise. That was actually the blocker on that alternate proposal. From
my point of view 2x overhead is kind of a deal breaker since it makes
correctness so expensive you'd have to think very hard before turning it
on, but if there is a way to do it with less and there aren't too many
other negative side effects that would be very appealing. I think we can
also dive a bit into why we are so perf and latency sensitive as it relates
to the stream processing use cases...I'm not sure how much of that is
obvious from the proposal.

-Jay





On Tue, Dec 20, 2016 at 9:11 AM, Joel Koshy  wrote:

> Just got some time to go through most of this thread and KIP - great to see
> this materialize and discussed!!
> I will add more comments in the coming days on some of the other "tracks"
> in this thread; but since Radai brought up the double-journaling approach
> that we had discussed I thought I would move over some content from
> our internal
> wiki on double-journalling
>  Double+journaling+with+local+data+copy>
> It is thin on details with a few invalid statements because I don't think
> we dwelt long enough on it - it was cast aside as being too expensive from
> a storage and latency perspective. As the immediately preceding emails
> state, I tend to agree that those are compelling enough reasons to take a
> hit in complexity/increased memory usage in the consumer. Anyway, couple of
> us at LinkedIn can spend some time today brainstorming a little more on
> this today.
>
> 1. on write amplification: i dont see x6 the writes, at worst i see x2 the
> > writes - once to the "tx log", then read and again to the destination
> > partition. if you have some != 1 replication factor than both the 1st and
> > the 2nd writes get replicated, but it is still a relative factor of x2.
> > what am I missing?
> >
>
> I think that's right - it would be six total copies if we are doing RF 3.
>
>
> > 3. why do writes to a TX need the same guarantees as "plain" writes? in
> > cases where the user can live with a TX rollback on change of
> > leadership/broker crash the TX log can be unreplicated, and even live in
> > the leader's memory. that would cut down on writes. this is also an
> > acceptable default in SQL - if your socket connection to a DB dies mid-TX
> > your TX is toast (mysql is even worse)
> >
>
> I may have misunderstood - while the above may be true for transactions
> in-flight, it definitely needs the same guarantees at the point of commit
> and the straightforward way to achieve that is to rely on the same
> guarantees while the transaction is in flight.
>
> 4. even if we replicate the TX log, why do we need to re-read it and
> > re-write it to the underlying partition? if its already written to disk
> all
> > I would need is to make that file the current segment of the "real"
> > partition and i've avoided the double write (at the cost of complicating
> > segment management). if the data is replicated fetchers could do the
> same.
> >
>
> I think we had considered the above as well - i.e., if you abstract the
> partition's segments into segments that contain non-transactional messages
> and those that contain transactional messages then it should be possible to
> jump from one to the other and back. It does add quite a bit of complexity
> though and you still need to do buffering on reads so the upside perhaps
> isn't worth the effort. I'm not convinced about that though - i.e., may
> help to spend more time thinking this one through.
>
>
> > 5. on latency - youre right, what im suggesting would result in tx
> ordering
> > of messages ,"read committed" semantics and therefore higher latency.
>
>
> *"read committed"* only if you do the copy back to actual log. If you don't
> do that (your point 4) then I think you still need to do buffering to
> achieve read-committed semantics.
>
>
>
> > 6. the added delay (vs your read uncommitted) would be roughly the time
> > span of a TX.
>
>
> I think it would be significantly less given that this is local copying.
>
>
>
> >
> >
> > On Mon, Dec 19, 2016 at 3:15 PM, Guozhang Wang 
> wrote:
> >
> > > One more thing about the double journal proposal: when discussing about
> > > this method back at LinkedIn, another raised issue besides double
> writing
> > > was that it will void the offset ordering and enforce people to accept
> > > "transaction ordering", that is, consumer will not see messages from
> the
> > > same partition in the order where they were produced, but only in the
> > order
> > > of when the corresponding transaction was committed. For some
> scenarios,
> > we
> > > believe that offset ordering would still be preferred than transaction
> > > ordering and that 

[jira] [Updated] (KAFKA-4473) RecordCollector should handle retriable exceptions more strictly

2016-12-20 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4473:
-
   Resolution: Fixed
Fix Version/s: 0.10.2.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2249
[https://github.com/apache/kafka/pull/2249]

> RecordCollector should handle retriable exceptions more strictly
> 
>
> Key: KAFKA-4473
> URL: https://issues.apache.org/jira/browse/KAFKA-4473
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Thomas Schulz
>Assignee: Damian Guy
>Priority: Critical
>  Labels: architecture
> Fix For: 0.10.2.0
>
>
> see: https://groups.google.com/forum/#!topic/confluent-platform/DT5bk1oCVk8
> There is probably a bug in the RecordCollector as described in my detailed 
> Cluster test published in the aforementioned post.
> The class RecordCollector has the following behavior:
> - if there is no exception, add the message offset to a map
> - otherwise, do not add the message offset and instead log the above statement
> Is it possible that this offset map contains the latest offset to commit? If 
> so, a message that fails might be overriden be a successful (later) message 
> and the consumer commits every message up to the latest offset?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-20 Thread Joel Koshy
Just got some time to go through most of this thread and KIP - great to see
this materialize and discussed!!
I will add more comments in the coming days on some of the other "tracks"
in this thread; but since Radai brought up the double-journaling approach
that we had discussed I thought I would move over some content from
our internal
wiki on double-journalling

It is thin on details with a few invalid statements because I don't think
we dwelt long enough on it - it was cast aside as being too expensive from
a storage and latency perspective. As the immediately preceding emails
state, I tend to agree that those are compelling enough reasons to take a
hit in complexity/increased memory usage in the consumer. Anyway, couple of
us at LinkedIn can spend some time today brainstorming a little more on
this today.

1. on write amplification: i dont see x6 the writes, at worst i see x2 the
> writes - once to the "tx log", then read and again to the destination
> partition. if you have some != 1 replication factor than both the 1st and
> the 2nd writes get replicated, but it is still a relative factor of x2.
> what am I missing?
>

I think that's right - it would be six total copies if we are doing RF 3.


> 3. why do writes to a TX need the same guarantees as "plain" writes? in
> cases where the user can live with a TX rollback on change of
> leadership/broker crash the TX log can be unreplicated, and even live in
> the leader's memory. that would cut down on writes. this is also an
> acceptable default in SQL - if your socket connection to a DB dies mid-TX
> your TX is toast (mysql is even worse)
>

I may have misunderstood - while the above may be true for transactions
in-flight, it definitely needs the same guarantees at the point of commit
and the straightforward way to achieve that is to rely on the same
guarantees while the transaction is in flight.

4. even if we replicate the TX log, why do we need to re-read it and
> re-write it to the underlying partition? if its already written to disk all
> I would need is to make that file the current segment of the "real"
> partition and i've avoided the double write (at the cost of complicating
> segment management). if the data is replicated fetchers could do the same.
>

I think we had considered the above as well - i.e., if you abstract the
partition's segments into segments that contain non-transactional messages
and those that contain transactional messages then it should be possible to
jump from one to the other and back. It does add quite a bit of complexity
though and you still need to do buffering on reads so the upside perhaps
isn't worth the effort. I'm not convinced about that though - i.e., may
help to spend more time thinking this one through.


> 5. on latency - youre right, what im suggesting would result in tx ordering
> of messages ,"read committed" semantics and therefore higher latency.


*"read committed"* only if you do the copy back to actual log. If you don't
do that (your point 4) then I think you still need to do buffering to
achieve read-committed semantics.



> 6. the added delay (vs your read uncommitted) would be roughly the time
> span of a TX.


I think it would be significantly less given that this is local copying.



>
>
> On Mon, Dec 19, 2016 at 3:15 PM, Guozhang Wang  wrote:
>
> > One more thing about the double journal proposal: when discussing about
> > this method back at LinkedIn, another raised issue besides double writing
> > was that it will void the offset ordering and enforce people to accept
> > "transaction ordering", that is, consumer will not see messages from the
> > same partition in the order where they were produced, but only in the
> order
> > of when the corresponding transaction was committed. For some scenarios,
> we
> > believe that offset ordering would still be preferred than transaction
> > ordering and that is why in KIP-98 proposal we default to the former
> while
> > leave the door open if users want to switch to the latter case.
> >
> >
> > Guozhang
> >
> > On Mon, Dec 19, 2016 at 10:56 AM, Jay Kreps  wrote:
> >
> > > Hey Radai,
> > >
> > > I'm not sure if I fully understand what you are proposing, but I
> > > interpreted it to be similar to a proposal we worked through back at
> > > LinkedIn. The proposal was to commit to a central txlog topic, and then
> > > recopy to the destination topic upon transaction commit. The
> observation
> > on
> > > that approach at the time were the following:
> > >
> > >1. It is cleaner since the output topics have only committed data!
> > >2. You need full replication on the txlog topic to ensure atomicity.
> > We
> > >weren't able to come up with a solution where you buffer in memory
> or
> > > use
> > >renaming tricks the way you are describing. The reason is that once
> > you
> > >begin committing you must ensure 

Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-12-20 Thread Manikumar
Hi Devs,

If there are no more comments, I will start vote on this KIP later this
week.


Thanks

On Fri, Dec 16, 2016 at 12:28 PM, Manikumar 
wrote:

> Hi,
>
>
>> Can you add a sample Jaas configuration using delegation tokens to the
>> KIP?
>>
>
> Will add sample Jaas configuration.
>
>
>> To make sure I have understood correctly, KAFKA-3712 is aimed at enabling
>> a
>> superuser to impersonate another (single) user, say alice. A producer
>> using
>> impersonation will authenticate with superuser credentials. All requests
>> from the producer will be run with the principal alice. But alice is not
>> involved in the authentication and alice's credentials are not actually
>> provided to the broker?
>>
>>
>  Yes, this matches with my understanding of impersonation work . Even in
> this approach
>  we have to handle all producer related issues mentioned by you. Yes, this
> is big change
>  and can be implemented if there is a pressing need. I hope we are all in
> agreement, that
>  this can be done in a separate KIP.
>
>
>  I request others give any suggestions/concerns on this KIP.
>
>
> Thanks,
>
>
>
>>
>> On Thu, Dec 15, 2016 at 9:04 AM, Manikumar 
>> wrote:
>>
>> > @Gwen, @Rajini,
>> >
>> > As mentioned in the KIP, main motivation for this KIP is to reduce load
>> on
>> > Kerberos
>> > server on large kafka deployments with large number of clients.
>> >
>> > Also it looks like we are combining two overlapping concepts
>> > 1. Single client sending requests with multiple users/authentications
>> > 2. Impersonation
>> >
>> > Option 1, is definitely useful in some use cases and can be used to
>> > implement workaround for
>> > impersonation
>> >
>> > In Impersonation, a super user can send requests on behalf of another
>> > user(Alice) in a secured way.
>> > superuser has credentials but user Alice doesn't have any. The requests
>> are
>> > required
>> > to run as user Alice and accesses/ACLs on Broker are required to be
>> done as
>> > user Alice.
>> > It is required that user Alice can connect to the Broker on a connection
>> > authenticated with
>> > superuser's credentials. In other words superuser is impersonating the
>> user
>> > Alice.
>> >
>> > The approach mentioned by Harsha in previous mail is implemented in
>> hadoop,
>> > storm etc..
>> >
>> > Some more details here:
>> > https://hadoop.apache.org/docs/r2.7.2/hadoop-project-
>> > dist/hadoop-common/Superusers.html
>> >
>> >
>> > @Rajini
>> >
>> > Thanks for your comments on SASL/SCRAM usage. I am thinking to send
>> > tokenHmac (salted-hashed version)
>> > as password for authentication and tokenID for retrial of tokenHmac at
>> > server side.
>> > Does above sound OK?
>> >
>> >
>> > Thanks,
>> > Manikumar
>> >
>> > On Wed, Dec 14, 2016 at 10:33 PM, Harsha Chintalapani 
>> > wrote:
>> >
>> > > @Gwen @Mani  Not sure why we want to authenticate at every request.
>> Even
>> > if
>> > > the token exchange is cheap it still a few calls that need to go
>> through
>> > > round trip.  Impersonation doesn't require authentication for every
>> > > request.
>> > >
>> > > "So a centralized app can create few producers, do the metadata
>> request
>> > and
>> > > broker discovery with its own user auth, but then use delegation
>> tokens
>> > to
>> > > allow performing produce/fetch requests as different users? Instead of
>> > > having to re-connect for each impersonated user?"
>> > >
>> > > Yes. But what we will have is this centralized user as impersonation
>> user
>> > > on behalf of other users. When it authenticates initially we will
>> create
>> > a
>> > > "Subject" and from there on wards centralized user can do
>> > > Subject.doAsPrivileged
>> > > on behalf, other users.
>> > > On the server side, we can retrieve two principals out of this one is
>> the
>> > > authenticated user (centralized user) and another is impersonated
>> user.
>> > We
>> > > will first check if the authenticated user allowed to impersonate and
>> > then
>> > > move on to check if the user Alice has access to the topic "X" to
>> > > read/write.
>> > >
>> > > @Rajini Intention of this KIP is to support token auth via SASL/SCRAM,
>> > not
>> > > just with TLS.  What you raised is a good point let me take a look and
>> > add
>> > > details.
>> > >
>> > > It will be easier to add impersonation once we reach agreement on this
>> > KIP.
>> > >
>> > >
>> > > On Wed, Dec 14, 2016 at 5:51 AM Ismael Juma 
>> wrote:
>> > >
>> > > > Hi Rajini,
>> > > >
>> > > > I think it would definitely be valuable to have a KIP for
>> > impersonation.
>> > > >
>> > > > Ismael
>> > > >
>> > > > On Wed, Dec 14, 2016 at 4:03 AM, Rajini Sivaram <
>> rsiva...@pivotal.io>
>> > > > wrote:
>> > > >
>> > > > > It would clearly be very useful to enable clients to send
>> requests on
>> > > > > behalf of multiple users. A separate KIP makes sense, but it may
>> be
>> > > worth
>> > > > > thinking through some of the 

Re: [VOTE] 0.10.1.1 RC1

2016-12-20 Thread Moczarski, Swen
+1 (non-binding)

Thanks for preparing the release. We installed on our test system, at first 
glance looks good.

Am 12/15/16, 10:29 PM schrieb "Guozhang Wang" :

Hello Kafka users, developers and client-developers,

This is the second, and hopefully the last candidate for the release of
Apache Kafka 0.10.1.1 before the break. This is a bug fix release and it
includes fixes and improvements from 30 JIRAs. See the release notes for
more details:

http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/RELEASE_NOTES.html

*** Please download, test and vote by Tuesday, 20 December, 8pm PT ***

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

NOTE the artifacts include the ones built from Scala 2.12.1 and Java8,
which are treated a pre-alpha artifacts for the Scala community to try and
test it out:


https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka_2.12/0.10.1.1/

We will formally add the scala 2.12 support in future minor releases.


* Javadoc:
http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/javadoc/

* Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:

https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=c3638376708ee6c02dfe4e57747acae0126fa6e7


Thanks,
Guozhang

-- 
-- Guozhang




Jenkins build is back to normal : kafka-trunk-jdk8 #1117

2016-12-20 Thread Apache Jenkins Server
See 



[jira] [Work started] (KAFKA-4561) Ordering of operations in StreamThread.shutdownTasksAndState may void at-least-once guarantees

2016-12-20 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4561 started by Damian Guy.
-
> Ordering of operations in StreamThread.shutdownTasksAndState may void 
> at-least-once guarantees
> --
>
> Key: KAFKA-4561
> URL: https://issues.apache.org/jira/browse/KAFKA-4561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> In {{shutdownTasksAndState}} we currently commit offsets as the first step. 
> If a subsequent step throws an exception, i.e, flushing the producer, then 
> this would violate the at-least-once guarantees.
> We need to commit after all other state has been flushed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4554) ReplicaBuffer.verifyChecksum should use use iterators instead of iterables

2016-12-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4554:
---
Description: 
This regressed in 
https://github.com/apache/kafka/commit/b58b6a1bef0ecdc2107a415e222af099fcd9bce3 
causing the following system test failure:

{code}

test_id:
kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
status: FAIL
run time:   1 minute 18.074 seconds


Timed out waiting to reach zero replica lags.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
 line 84, in test_replica_lags
err_msg="Timed out waiting to reach zero replica lags.")
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Timed out waiting to reach zero replica lags.
{code}

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/

  was:
{code}

test_id:
kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
status: FAIL
run time:   1 minute 18.074 seconds


Timed out waiting to reach zero replica lags.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
 line 84, in test_replica_lags
err_msg="Timed out waiting to reach zero replica lags.")
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Timed out waiting to reach zero replica lags.
{code}

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/


> ReplicaBuffer.verifyChecksum should use use iterators instead of iterables
> --
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
> Fix For: 0.10.2.0
>
>
> This regressed in 
> https://github.com/apache/kafka/commit/b58b6a1bef0ecdc2107a415e222af099fcd9bce3
>  causing the following system test failure:
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in 

[jira] [Updated] (KAFKA-4554) ReplicaBuffer.verifyChecksum should use use iterators instead of iterables

2016-12-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4554:
---
Summary: ReplicaBuffer.verifyChecksum should use use iterators instead of 
iterables  (was: ReplicaVerificationToolTest.test_replica_lags system test 
failure)

> ReplicaBuffer.verifyChecksum should use use iterators instead of iterables
> --
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
> Fix For: 0.10.2.0
>
>
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4554:
---
Fix Version/s: 0.10.2.0

> ReplicaVerificationToolTest.test_replica_lags system test failure
> -
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
> Fix For: 0.10.2.0
>
>
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4554:
---
Reviewer: Jason Gustafson
  Status: Patch Available  (was: Reopened)

> ReplicaVerificationToolTest.test_replica_lags system test failure
> -
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
>
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #1116

2016-12-20 Thread Apache Jenkins Server
See 

Changes:

[ismael] MINOR: Remove incomplete gradle wrapper infrastructure

--
[...truncated 17461 lines...]

org.apache.kafka.streams.KafkaStreamsTest > 
shouldReturnFalseOnCloseWhenThreadsHaventTerminated PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup PASSED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose STARTED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] FAILED
java.lang.AssertionError: Condition not met within timeout 3. waiting 
for store count-by-key
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:283)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable(QueryableStateIntegrationTest.java:502)

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenDeleted STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenDeleted PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testNoMessagesSentExceptionFromOverlappingPatterns STARTED

[GitHub] kafka pull request #2222: KAFKA-4500: code improvements

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4500) Kafka Code Improvements

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764121#comment-15764121
 ] 

ASF GitHub Bot commented on KAFKA-4500:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/


> Kafka Code Improvements
> ---
>
> Key: KAFKA-4500
> URL: https://issues.apache.org/jira/browse/KAFKA-4500
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.2
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> Code Corrections on clients module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4500) Kafka Code Improvements

2016-12-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4500.

   Resolution: Fixed
Fix Version/s: 0.10.2.0

Issue resolved by pull request 
[https://github.com/apache/kafka/pull/]

> Kafka Code Improvements
> ---
>
> Key: KAFKA-4500
> URL: https://issues.apache.org/jira/browse/KAFKA-4500
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.2
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> Code Corrections on clients module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764099#comment-15764099
 ] 

ASF GitHub Bot commented on KAFKA-4554:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/2280

KAFKA-4554: Fix `ReplicaBuffer.verifyChecksum` to use iterators instead of 
iterables

This was changed in b58b6a1bef0 and caused the 
`ReplicaVerificationToolTest.test_replica_lags`
system test to start failing.

I also added a unit test and a couple of other minor clean-ups.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-4554-fix-replica-buffer-verify-checksum

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2280


commit f85d8c46eb22b88f26b8f6219f8287c8544f9cf9
Author: Ismael Juma 
Date:   2016-12-20T12:26:25Z

Remove `JTestUtils.partitionRecordsBuffer` in favour of 
`MemoryRecords.withRecords`

commit 185e9048d282ba6e68ba55594aa13568d0f2d035
Author: Ismael Juma 
Date:   2016-12-20T12:27:33Z

Fix regression in `verifyChecksum` where iterators had incorrectly been 
replaced by iterables

Also add unit test.

commit 3ddfad07bbf064ff90a3856c7a8233e4e20d8222
Author: Ismael Juma 
Date:   2016-12-20T12:28:27Z

Use `long` instead of `Long` in `KafkaStreamsTest.onChange`

Fixes a compilation failure when compiling in IntelliJ with Java 8.
Not sure why, but `long` is better than `Long` in this case anyway.




> ReplicaVerificationToolTest.test_replica_lags system test failure
> -
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
>
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2280: KAFKA-4554: Fix `ReplicaBuffer.verifyChecksum` to ...

2016-12-20 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/2280

KAFKA-4554: Fix `ReplicaBuffer.verifyChecksum` to use iterators instead of 
iterables

This was changed in b58b6a1bef0 and caused the 
`ReplicaVerificationToolTest.test_replica_lags`
system test to start failing.

I also added a unit test and a couple of other minor clean-ups.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-4554-fix-replica-buffer-verify-checksum

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2280


commit f85d8c46eb22b88f26b8f6219f8287c8544f9cf9
Author: Ismael Juma 
Date:   2016-12-20T12:26:25Z

Remove `JTestUtils.partitionRecordsBuffer` in favour of 
`MemoryRecords.withRecords`

commit 185e9048d282ba6e68ba55594aa13568d0f2d035
Author: Ismael Juma 
Date:   2016-12-20T12:27:33Z

Fix regression in `verifyChecksum` where iterators had incorrectly been 
replaced by iterables

Also add unit test.

commit 3ddfad07bbf064ff90a3856c7a8233e4e20d8222
Author: Ismael Juma 
Date:   2016-12-20T12:28:27Z

Use `long` instead of `Long` in `KafkaStreamsTest.onChange`

Fixes a compilation failure when compiling in IntelliJ with Java 8.
Not sure why, but `long` is better than `Long` in this case anyway.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest

2016-12-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3808.

Resolution: Fixed
  Assignee: (was: Ismael Juma)

> Transient failure in ReplicaVerificationToolTest
> 
>
> Key: KAFKA-3808
> URL: https://issues.apache.org/jira/browse/KAFKA-3808
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Geoff Anderson
>
> {code}
> test_id:
> 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 9.231 seconds
> Timed out waiting to reach non-zero number of replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 88, in test_replica_lags
> err_msg="Timed out waiting to reach non-zero number of replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach non-zero number of replica lags.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest

2016-12-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763960#comment-15763960
 ] 

Ismael Juma edited comment on KAFKA-3808 at 12/20/16 11:17 AM:
---

The new failures are being tracked in KAFKA-4554 as it's a new issue. I am 
going to close this one for now as we haven't seen the transient failure since 
October. If it happens again, we can reopen.


was (Author: ijuma):
The new failures are being tracked in KAFKA-4285 as it's a new issue. I am 
going to close this one for now as we haven't seen the transient failure since 
October. If it happens again, we can reopen.

> Transient failure in ReplicaVerificationToolTest
> 
>
> Key: KAFKA-3808
> URL: https://issues.apache.org/jira/browse/KAFKA-3808
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Geoff Anderson
>
> {code}
> test_id:
> 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 9.231 seconds
> Timed out waiting to reach non-zero number of replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 88, in test_replica_lags
> err_msg="Timed out waiting to reach non-zero number of replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach non-zero number of replica lags.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest

2016-12-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763960#comment-15763960
 ] 

Ismael Juma commented on KAFKA-3808:


The new failures are being tracked in KAFKA-4285 as it's a new issue. I am 
going to close this one for now as we haven't seen the transient failure since 
October. If it happens again, we can reopen.

> Transient failure in ReplicaVerificationToolTest
> 
>
> Key: KAFKA-3808
> URL: https://issues.apache.org/jira/browse/KAFKA-3808
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Geoff Anderson
>Assignee: Ismael Juma
>
> {code}
> test_id:
> 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 9.231 seconds
> Timed out waiting to reach non-zero number of replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 88, in test_replica_lags
> err_msg="Timed out waiting to reach non-zero number of replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach non-zero number of replica lags.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763957#comment-15763957
 ] 

Ismael Juma commented on KAFKA-4554:


I reopened this because it's a new issue and not the same as KAFKA-3808.

> ReplicaVerificationToolTest.test_replica_lags system test failure
> -
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
>
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reopened KAFKA-4554:

  Assignee: Ismael Juma

> ReplicaVerificationToolTest.test_replica_lags system test failure
> -
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>Reporter: Roger Hoover
>Assignee: Ismael Juma
>
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1722: remove incomplete gradle wrapper infrastructure

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1722


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4562) deadlock heartbeat, metadata-manager, request-handler

2016-12-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763903#comment-15763903
 ] 

Ismael Juma commented on KAFKA-4562:


Thanks for the report. Same as KAFKA-3994/KAFKA-4478? If so, 0.10.1.1 will 
contain the fix.

> deadlock heartbeat, metadata-manager, request-handler
> -
>
> Key: KAFKA-4562
> URL: https://issues.apache.org/jira/browse/KAFKA-4562
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: rhel7, java 1.8.0_77, vmware, two broker setup
>Reporter: Hans Kowallik
>
> Found one Java-level deadlock:
> =
> "executor-Heartbeat":
>   waiting to lock monitor 0x7f8c9c954378 (object 0x0006cd17dd18, a 
> kafka.coordinator.GroupMetadata),
>   which is held by "group-metadata-manager-0"
> "group-metadata-manager-0":
>   waiting to lock monitor 0x7f8d60002bc8 (object 0x0006d9e386e8, a 
> java.util.LinkedList),
>   which is held by "kafka-request-handler-1"
> "kafka-request-handler-1":
>   waiting to lock monitor 0x7f8c9c954378 (object 0x0006cd17dd18, a 
> kafka.coordinator.GroupMetadata),
>   which is held by "group-metadata-manager-0"
> When this happens, RAM Usage, network connections and threads increase 
> linearly.
> controller can't talk to local broker:
> [2016-12-19 16:22:44,639] INFO 
> [Controller-614897-to-broker-614897-send-thread], Controller 614897 connected 
> to kafka-dev-614897.lhotse.ov.otto.de:9092 (id: 614897 rack: null) for sending
>  state change requests (kafka.controller.RequestSendThread)
> replication thread can't talk to remote broker:
> [2016-12-19 16:22:42,014] WARN [ReplicaFetcherThread-0-614897], Error in 
> fetch kafka.server.ReplicaFetcherThread$FetchRequest@6cae17f6 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 614897 was disconnected before the 
> response was read
> Not failover happens until machine runs out of swap space or kafka is 
> restarted manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4562) deadlock heartbeat, metadata-manager, request-handler

2016-12-20 Thread Hans Kowallik (JIRA)
Hans Kowallik created KAFKA-4562:


 Summary: deadlock heartbeat, metadata-manager, request-handler
 Key: KAFKA-4562
 URL: https://issues.apache.org/jira/browse/KAFKA-4562
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.1.0
 Environment: rhel7, java 1.8.0_77, vmware, two broker setup
Reporter: Hans Kowallik



Found one Java-level deadlock:
=
"executor-Heartbeat":
  waiting to lock monitor 0x7f8c9c954378 (object 0x0006cd17dd18, a 
kafka.coordinator.GroupMetadata),
  which is held by "group-metadata-manager-0"
"group-metadata-manager-0":
  waiting to lock monitor 0x7f8d60002bc8 (object 0x0006d9e386e8, a 
java.util.LinkedList),
  which is held by "kafka-request-handler-1"
"kafka-request-handler-1":
  waiting to lock monitor 0x7f8c9c954378 (object 0x0006cd17dd18, a 
kafka.coordinator.GroupMetadata),
  which is held by "group-metadata-manager-0"

When this happens, RAM Usage, network connections and threads increase linearly.

controller can't talk to local broker:
[2016-12-19 16:22:44,639] INFO 
[Controller-614897-to-broker-614897-send-thread], Controller 614897 connected 
to kafka-dev-614897.lhotse.ov.otto.de:9092 (id: 614897 rack: null) for sending
 state change requests (kafka.controller.RequestSendThread)

replication thread can't talk to remote broker:
[2016-12-19 16:22:42,014] WARN [ReplicaFetcherThread-0-614897], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@6cae17f6 
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 614897 was disconnected before the response 
was read

Not failover happens until machine runs out of swap space or kafka is restarted 
manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4561) Ordering of operations in StreamThread.shutdownTasksAndState may void at-least-once guarantees

2016-12-20 Thread Damian Guy (JIRA)
Damian Guy created KAFKA-4561:
-

 Summary: Ordering of operations in 
StreamThread.shutdownTasksAndState may void at-least-once guarantees
 Key: KAFKA-4561
 URL: https://issues.apache.org/jira/browse/KAFKA-4561
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Damian Guy
Assignee: Damian Guy
 Fix For: 0.10.2.0


In {{shutdownTasksAndState}} we currently commit offsets as the first step. If 
a subsequent step throws an exception, i.e, flushing the producer, then this 
would violate the at-least-once guarantees.
We need to commit after all other state has been flushed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)