Jenkins build is back to normal : kafka-2.1-jdk8 #146

2019-03-08 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : kafka-2.0-jdk8 #238

2019-03-08 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-8083) Flaky Test DelegationTokenRequestsTest#testDelegationTokenRequests

2019-03-08 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8083:
--

 Summary: Flaky Test 
DelegationTokenRequestsTest#testDelegationTokenRequests
 Key: KAFKA-8083
 URL: https://issues.apache.org/jira/browse/KAFKA-8083
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 2.2.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0, 2.2.1


[https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/61/testReport/junit/kafka.server/DelegationTokenRequestsTest/testDelegationTokenRequests/]
{quote}java.lang.AssertionError: Partition [__consumer_offsets,0] metadata not 
propagated after 15000 ms at kafka.utils.TestUtils$.fail(TestUtils.scala:381) 
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:791) at 
kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:880) at 
kafka.utils.TestUtils$.$anonfun$createTopic$3(TestUtils.scala:318) at 
kafka.utils.TestUtils$.$anonfun$createTopic$3$adapted(TestUtils.scala:317) at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
scala.collection.immutable.Range.foreach(Range.scala:158) at 
scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
kafka.utils.TestUtils$.createTopic(TestUtils.scala:317) at 
kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:375) at 
kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:95) at 
kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73) at 
kafka.server.DelegationTokenRequestsTest.setUp(DelegationTokenRequestsTest.scala:46){quote}
STDOUT
{quote}[2019-03-09 04:01:31,789] WARN SASL configuration failed: 
javax.security.auth.login.LoginException: No JAAS configuration section named 
'Client' was found in specified JAAS configuration file: 
'/tmp/kafka1872564121337557452.tmp'. Will continue connection to Zookeeper 
server without SASL authentication, if Zookeeper server allows it. 
(org.apache.zookeeper.ClientCnxn:1011) [2019-03-09 04:01:31,789] ERROR 
[ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient:74) [2019-03-09 
04:01:31,793] WARN SASL configuration failed: 
javax.security.auth.login.LoginException: No JAAS configuration section named 
'Client' was found in specified JAAS configuration file: 
'/tmp/kafka1872564121337557452.tmp'. Will continue connection to Zookeeper 
server without SASL authentication, if Zookeeper server allows it. 
(org.apache.zookeeper.ClientCnxn:1011) [2019-03-09 04:01:31,794] ERROR 
[ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient:74){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8082) Flaky Test ProducerFailureHandlingTest#testNotEnoughReplicasAfterBrokerShutdown

2019-03-08 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8082:
--

 Summary: Flaky Test 
ProducerFailureHandlingTest#testNotEnoughReplicasAfterBrokerShutdown
 Key: KAFKA-8082
 URL: https://issues.apache.org/jira/browse/KAFKA-8082
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 2.2.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0, 2.2.1


[https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/61/testReport/junit/kafka.api/ProducerFailureHandlingTest/testNotEnoughReplicasAfterBrokerShutdown/]
{quote}java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.NotEnoughReplicasAfterAppendException: Messages 
are written to the log, but to fewer in-sync replicas than required. at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:98)
 at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:67)
 at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
 at 
kafka.api.ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown(ProducerFailureHandlingTest.scala:270){quote}
STDOUT
{quote}[2019-03-09 03:59:24,897] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=0] Error for partition topic-1-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 03:59:28,028] ERROR 
[ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
topic-1-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 03:59:42,046] ERROR 
[ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
minisrtest-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 03:59:42,245] ERROR 
[ReplicaManager broker=1] Error processing append operation on partition 
minisrtest-0 (kafka.server.ReplicaManager:76) 
org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the 
current ISR Set(1, 0) is insufficient to satisfy the min.isr requirement of 3 
for partition minisrtest-0 [2019-03-09 04:00:01,212] ERROR [ReplicaFetcher 
replicaId=1, leaderId=0, fetcherId=0] Error for partition topic-1-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 04:00:02,214] ERROR 
[ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
topic-1-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 04:00:03,216] ERROR 
[ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
topic-1-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 04:00:23,144] ERROR 
[ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
topic-1-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 04:00:24,146] ERROR 
[ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
topic-1-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 04:00:25,148] ERROR 
[ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
topic-1-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-09 04:00:44,607] ERROR 
[ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
minisrtest2-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.{quote}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2019-03-08 Thread Boyang Chen
Hi Mike,

Yes that's the plan!


From: Mike Freyberger 
Sent: Saturday, March 9, 2019 10:04 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
specifying member id

Hi Boyang,

Is this work targeted for Kafka 2.3? I am eager to use this new feature.

Thanks,

Mike Freyberger

On 12/21/18, 1:21 PM, "Mayuresh Gharat"  wrote:

Hi Boyang,

Regarding "However, we shall still attempt to remove the member static info
if the given `member.id` points to an existing `group.instance.id` upon
LeaveGroupRequest, because I could think of the possibility that in long
term we could want to add static membership leave group logic for more
fine-grained use cases."

> I think, there is some confusion here. I am probably not putting it
> right.
>
I agree, If a static member sends LeaveGroupRequest, it should be removed
> from the group.
>
Now getting back to downgrade of static membership to Dynamic membership,
> with the example described earlier  (copying it again for ease of reading)
> :
>

>>1. Lets say we have 4 consumers :  c1, c2, c3, c4 in the static group.
>>2. The group.instance.id for each of there are as follows :
>>   - c1 -> gc1, c2 -> gc2, c3 -> gc3, c4 -> gc4
>>3. The mapping on the GroupCordinator would be :
>>   - gc1 -> mc1, gc2 -> mc2, gc3 -> mc3, gc4 -> mc4, where mc1, mc2,
>>   mc3, mc4 are the randomly generated memberIds for c1, c2, c3, c4
>>   respectively, by the GroupCoordinator.
>>4. Now we do a restart to move the group to dynamic membership.
>>5. We bounce c1 first and it rejoins with UNKNOWN_MEMBERID (since we
>>don't persist the previously assigned memberId mc1 anywhere on the 
c1).
>>
> - We agree that there is no way to recognize that c1 was a part of the
> group, *earlier*.  If yes, the statement : "The dynamic member rejoins
> the group without `group.instance.id`. It will be accepted since it is a
> known member." is not necessarily true, right?
>


> - Now I *agree* with "However, we shall still attempt to remove the
> member static info if the given `member.id` points to an existing `
> group.instance.id` upon LeaveGroupRequest, because I could think of the
> possibility that in long term we could want to add static membership leave
> group logic for more fine-grained use cases."
>
But that would only happen if the GroupCoordinator allocates the same
> member.id (mc1) to the consumer c1, when it rejoins the group in step 5
> above as a dynamic member, which is very rare as it is randomly generated,
> but possible.
>


> - This raises another question, if the GroupCoordinator assigns a
> member.id (mc1~) to consumer c1 after step 5. It will join the group and
> rebalance and the group will become stable, eventually. Now the
> GroupCoordinator still maintains a mapping of  "group.instance.id ->
> member.id" (c1 -> gc1, c2 -> gc2, c3 -> gc3, c4 -> gc4) internally and
> after some time, it realizes that it has not received heartbeat from the
> consumer with "group.instance.id" = gc1. In that case, it will trigger
> another rebalance assuming that a static member has left the group (when
> actually it (c1) has not left the group but moved to dynamic membership).
> This can result in multiple rebalances as the same will happen for c2, c3,
> c4.
>

Thoughts ???
One thing, I can think of right now is to run :
removeMemberFromGroup(String groupId, list
groupInstanceIdsToRemove, RemoveMemberFromGroupOptions options)
with groupInstanceIdsToRemove =  once we have bounced
all the members in the group. This assumes that we will be able to complete
the bounces before the GroupCoordinator realizes that it has not received a
heartbeat for any of . This is tricky and error prone.
Will have to think more on this.

Thanks,

Mayuresh




Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2019-03-08 Thread Matthias J. Sax
SGTM.

I also had the impression that those duplicates are rather an error than
an case of eventual consistency. Using hashing to avoid sending the
payload is a good idea IMHO.

@Adam: can you update the KIP accordingly?

 - add the optimization to not send a reply from RHS to LHS on
unsubscribe (if not a tombstone)
 - explain why using offsets to avoid duplicates does not work
 - add hashing to avoid duplicates

Beside this, I don't have any further comments. Excited to finally get
this in!

Let us know when you have updated the KIP so we can move forward with
the VOTE. Thanks a lot for your patience! This was a very lng shot!


-Matthias

On 3/8/19 8:47 AM, John Roesler wrote:
> Hi all,
> 
> This proposal sounds good to me, especially since we observe that people
> are already confused when the see duplicate results coming out of 1:1 joins
> (which is a bug). I take this as "evidence" that we're better off
> eliminating those duplicates from the start. Guozhang's proposal seems like
> a lightweight solution to the problem, so FWIW, I'm in favor.
> 
> Thanks,
> -John
> 
> On Fri, Mar 8, 2019 at 7:59 AM Adam Bellemare 
> wrote:
> 
>> Hi Guozhang
>>
>> That would certainly work for eliminating those duplicate values. As it
>> stands right now, this would be consistent with swallowing changes due to
>> out-of-order processing with multiple threads, and seems like a very
>> reasonable way forward. Thank you for the suggestion!
>>
>> I have been trying to think if there are any other scenarios where we can
>> end up with duplicates, though I have not been able to identify any others
>> at the moment. I will think on it a bit more, but if anyone else has any
>> ideas, please chime in.
>>
>> Thanks,
>> Adam
>>
>>
>>
>> On Thu, Mar 7, 2019 at 8:19 PM Guozhang Wang  wrote:
>>
>>> One more thought regarding *c-P2: Duplicates)*: first I want to separate
>>> this issue with the more general issue that today (not only foreign-key,
>>> but also co-partition primary-key) table-table joins is still not
>> strictly
>>> respecting the timestamp ordering since the two changelog streams may be
>>> fetched and hence processed out-of-order and we do not allow a record to
>> be
>>> joined with the other table at any given time snapshot yet. So ideally
>> when
>>> there are two changelog records (k1, (f-k1, v1)), (k1, (f-k1, v2)) coming
>>> at the left hand table and one record (f-k1, v3) at the right hand table,
>>> depending on the processing ordering we may get:
>>>
>>> (k1, (f-k1, v2-v3))
>>>
>>> or
>>>
>>> (k1, (f-k1, v1-v3))
>>> (k1, (f-k1, v2-v3))
>>>
>>> And this is not to be addressed by this KIP.
>>>
>>> What I would advocate is to fix the issue that is introduced in this KIP
>>> alone, that is we may have
>>>
>>> (k1, (f-k1, v2-v3))   // this should actually be v1-v3
>>> (k1, (f-k1, v2-v3))
>>>
>>> I admit that it does not have correctness issue from the semantics along,
>>> comparing it with "discarding the first result", but it may be confusing
>>> from user's observation who do not expect to see the seemingly
>> duplicates.
>>> On the other hand, I think there's a light solution to avoid it, which is
>>> that we can still optimize away to not send the full payload of "v1" from
>>> left hand side to right hand side, but instead of just trimming off the
>>> whole bytes, we can send, e.g., an MD5 hash of the bytes (I'm using MD5
>>> here just as an example, we can definitely replace it with other
>>> functions), by doing which we can discard the join operation if the hash
>>> value sent back from the right hand side does not match with the left
>> hand
>>> side any more, i.e. we will only send:
>>>
>>> (k1, (f-k1, v2-v3))
>>>
>>> to down streams once.
>>>
>>> WDYT?
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Wed, Mar 6, 2019 at 7:58 AM Adam Bellemare 
>>> wrote:
>>>
 Ah yes, I recall it all now. That answers that question as to why I had
 caching disabled. I can certainly re-enable it since I believe the main
 concern was simply about reconciling those two iterators. A lack of
 knowledge there on my part.


 Thank you John for weighing in - we certainly both do appreciate it. I
 think that John hits it on the head though with his comment of "If it
>>> turns
 out we're wrong about this, then it should be possible to fix the
>>> semantics
 in place, without messing with the API."

 If anyone else would like to weigh in, your thoughts would be greatly
 appreciated.

 Thanks

 On Tue, Mar 5, 2019 at 6:05 PM Matthias J. Sax 
 wrote:

>>> I dont know how to range scan over a caching store, probably one
>> had
>>> to open 2 iterators and merge them.
>
> That happens automatically. If you query a cached KTable, it ranges
>>> over
> the cache and the underlying RocksDB and performs the merging under
>> the
> hood.
>
>>> Other than that, I still think even the regualr join is broken
>> with
>>> caching enabled right?

[jira] [Created] (KAFKA-8081) Flaky Test TopicCommandWithAdminClientTest#testDescribeUnderMinIsrPartitions

2019-03-08 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8081:
--

 Summary: Flaky Test 
TopicCommandWithAdminClientTest#testDescribeUnderMinIsrPartitions
 Key: KAFKA-8081
 URL: https://issues.apache.org/jira/browse/KAFKA-8081
 Project: Kafka
  Issue Type: Bug
  Components: admin, unit tests
Affects Versions: 2.3.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0


[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3450/tests]
{quote}java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertTrue(Assert.java:53)
at 
kafka.admin.TopicCommandWithAdminClientTest.testDescribeUnderMinIsrPartitions(TopicCommandWithAdminClientTest.scala:560){quote}
STDOUT
{quote}[2019-03-09 00:13:23,129] ERROR [ReplicaFetcher replicaId=2, leaderId=0, 
fetcherId=0] Error for partition testAlterPartitionCount-yrX0KHVdgf-1 at offset 
0 (kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-09 00:13:23,130] ERROR [ReplicaFetcher replicaId=3, leaderId=5, 
fetcherId=0] Error for partition testAlterPartitionCount-yrX0KHVdgf-0 at offset 
0 (kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-09 00:13:33,184] ERROR [ReplicaFetcher replicaId=4, leaderId=1, 
fetcherId=0] Error for partition testAlterPartitionCount-yrX0KHVdgf-2 at offset 
0 (kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-09 00:13:36,319] WARN Unable to read additional data from client 
sessionid 0x10433c934dc0006, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:13:46,311] WARN Unable to read additional data from client 
sessionid 0x10433c98cb10003, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:13:46,356] WARN Unable to read additional data from client 
sessionid 0x10433c98cb10004, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:14:06,066] WARN Unable to read additional data from client 
sessionid 0x10433c9b17d0005, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:14:06,457] WARN Unable to read additional data from client 
sessionid 0x10433c9b17d0001, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:14:11,206] ERROR [ReplicaFetcher replicaId=2, leaderId=3, 
fetcherId=0] Error for partition kafka.testTopic1-1 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-09 00:14:11,218] ERROR [ReplicaFetcher replicaId=4, leaderId=1, 
fetcherId=0] Error for partition __consumer_offsets-1 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-09 00:14:22,096] WARN Unable to read additional data from client 
sessionid 0x10433c9f1210004, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:14:28,290] WARN Unable to read additional data from client 
sessionid 0x10433ca28de0005, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:14:28,733] WARN Unable to read additional data from client 
sessionid 0x10433ca28de0006, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:14:29,529] WARN Unable to read additional data from client 
sessionid 0x10433ca28de, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:14:31,841] WARN Unable to read additional data from client 
sessionid 0x10433ca39ed0002, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376)
[2019-03-09 00:14:40,221] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=0] Error for partition 
testCreateAlterTopicWithRackAware-IQe98agDrW-16 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-09 00:14:40,222] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=0] Error for partition 
testCreateAlterTopicWithRackAware-IQe98agDrW-10 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-09 00:14:40,227] ERROR [ReplicaFetcher replicaId=0, leaderId=2, 
fetcherId=0] Error for pa

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2019-03-08 Thread Mike Freyberger
Hi Boyang,

Is this work targeted for Kafka 2.3? I am eager to use this new feature.

Thanks,

Mike Freyberger

On 12/21/18, 1:21 PM, "Mayuresh Gharat"  wrote:

Hi Boyang,

Regarding "However, we shall still attempt to remove the member static info
if the given `member.id` points to an existing `group.instance.id` upon
LeaveGroupRequest, because I could think of the possibility that in long
term we could want to add static membership leave group logic for more
fine-grained use cases."

> I think, there is some confusion here. I am probably not putting it
> right.
>
I agree, If a static member sends LeaveGroupRequest, it should be removed
> from the group.
>
Now getting back to downgrade of static membership to Dynamic membership,
> with the example described earlier  (copying it again for ease of reading)
> :
>

>>1. Lets say we have 4 consumers :  c1, c2, c3, c4 in the static group.
>>2. The group.instance.id for each of there are as follows :
>>   - c1 -> gc1, c2 -> gc2, c3 -> gc3, c4 -> gc4
>>3. The mapping on the GroupCordinator would be :
>>   - gc1 -> mc1, gc2 -> mc2, gc3 -> mc3, gc4 -> mc4, where mc1, mc2,
>>   mc3, mc4 are the randomly generated memberIds for c1, c2, c3, c4
>>   respectively, by the GroupCoordinator.
>>4. Now we do a restart to move the group to dynamic membership.
>>5. We bounce c1 first and it rejoins with UNKNOWN_MEMBERID (since we
>>don't persist the previously assigned memberId mc1 anywhere on the 
c1).
>>
> - We agree that there is no way to recognize that c1 was a part of the
> group, *earlier*.  If yes, the statement : "The dynamic member rejoins
> the group without `group.instance.id`. It will be accepted since it is a
> known member." is not necessarily true, right?
>


> - Now I *agree* with "However, we shall still attempt to remove the
> member static info if the given `member.id` points to an existing `
> group.instance.id` upon LeaveGroupRequest, because I could think of the
> possibility that in long term we could want to add static membership leave
> group logic for more fine-grained use cases."
>
But that would only happen if the GroupCoordinator allocates the same
> member.id (mc1) to the consumer c1, when it rejoins the group in step 5
> above as a dynamic member, which is very rare as it is randomly generated,
> but possible.
>


> - This raises another question, if the GroupCoordinator assigns a
> member.id (mc1~) to consumer c1 after step 5. It will join the group and
> rebalance and the group will become stable, eventually. Now the
> GroupCoordinator still maintains a mapping of  "group.instance.id ->
> member.id" (c1 -> gc1, c2 -> gc2, c3 -> gc3, c4 -> gc4) internally and
> after some time, it realizes that it has not received heartbeat from the
> consumer with "group.instance.id" = gc1. In that case, it will trigger
> another rebalance assuming that a static member has left the group (when
> actually it (c1) has not left the group but moved to dynamic membership).
> This can result in multiple rebalances as the same will happen for c2, c3,
> c4.
>

Thoughts ???
One thing, I can think of right now is to run :
removeMemberFromGroup(String groupId, list
groupInstanceIdsToRemove, RemoveMemberFromGroupOptions options)
with groupInstanceIdsToRemove =  once we have bounced
all the members in the group. This assumes that we will be able to complete
the bounces before the GroupCoordinator realizes that it has not received a
heartbeat for any of . This is tricky and error prone.
Will have to think more on this.

Thanks,

Mayuresh




Build failed in Jenkins: kafka-trunk-jdk8 #3450

2019-03-08 Thread Apache Jenkins Server
See 


Changes:

[bbejeck] MINOR: cleanup deprectaion annotations (#6290)

--
[...truncated 2.33 MB...]

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnIsOpen 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnIsOpen 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldDeleteAndReturnPlainValue STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldDeleteAndReturnPlainValue PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnName 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnName 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutAllWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutAllWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutIfAbsentWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutIfAbsentWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardClose 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardClose 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardFlush 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardFlush 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardInit 
STA

Build failed in Jenkins: kafka-2.2-jdk8 #52

2019-03-08 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] KAFKA-7980 - Fix timing issue in

[bill] KAFKA-8040: Streams handle initTransactions timeout (#6372)

[rajinisivaram] KAFKA-8070: Increase consumer startup timeout in system tests 
(#6405)

[jason] KAFKA-8069; Fix early expiration of offsets due to invalid loading of

--
[...truncated 2.73 MB...]
kafka.controller.PartitionStateMachineTest > 
testNonexistentPartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionErrorCodeFromCreateStates STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionErrorCodeFromCreateStates PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOfflineTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOfflineTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOfflinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOfflinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > testUpdatingOfflinePartitionsCount 
STARTED

kafka.controller.PartitionStateMachineTest > testUpdatingOfflinePartitionsCount 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOnlinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testUpdatingOfflinePartitionsCountDuringTopicDeletion STARTED

kafka.controller.PartitionStateMachineTest > 
testUpdatingOfflinePartitionsCountDuringTopicDeletion PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
PASSED

kafka.controller.PartitionStateMachineTest > 
testNoOfflinePartitionsChangeForTopicsBeingDeleted STARTED

kafka.controller.PartitionStateMachineTest > 
testNoOfflinePartitionsChangeForTopicsBeingDeleted PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOfflinePartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOfflinePartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransition PASSED

kafka.controller.ControllerFailoverTest > testHandleIllegalState

[jira] [Created] (KAFKA-8080) Remove streams_eos_test system test

2019-03-08 Thread John Roesler (JIRA)
John Roesler created KAFKA-8080:
---

 Summary: Remove streams_eos_test system test
 Key: KAFKA-8080
 URL: https://issues.apache.org/jira/browse/KAFKA-8080
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: John Roesler


After KAFKA-7944 / [https://github.com/apache/kafka/pull/6382] , the system 
test streams_eos_test.py is mostly redundant.

Quoting my analysis from 
[https://github.com/apache/kafka/pull/6382#discussion_r263536548,] 
{quote}Ok, so the smoke test and the eos test are similar, but not identical.

The smoke test application has more features in it, though. So, we have more 
feature coverage under eos when we test with the smoke test.

The eos test evaluates two topologies one with no repartition, and one with a 
repartition. The smoke test topology contains repartitions, so it only tests 
_with_ repartition. I think that it should be sufficient to test only _with_ 
repartition.

The eos test verification specifically checks that "all transactions finished" 
({{org.apache.kafka.streams.tests.EosTestDriver#verifyAllTransactionFinished}}).
 I'm not clear on exactly what we're looking for here. It looks like we create 
a transactional producer and send a record to each partition and then expect to 
get all those records back, without seeing any other records. But I'm not sure 
why we're doing this. If we want to drop the eos test, then we might need to 
add this to the smoke test verification.
{quote}
And [~guozhang]'s reply:
{quote}{{verifyAllTransactionFinished}} aimed to avoid a situation that some 
dangling txn is open forever, without committing or aborting. Because the 
consumer needs to guarantee offset ordering when returning data, with 
read-committed they will also be blocked on those open txn's data forever (this 
usually indicates a broker-side issue, not streams though, but still).

I think we should still retain this check if we want to merge in the EosTest to 
SmokeTest.
{quote}
 

As described above, the task is simply to add a similar check to the end of the 
verification logic in the SmokeTestDriver. Then, we can remove the EOS system 
test, as well as all the Java code that supports it.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Using Kafka Internals from Within Plugin

2019-03-08 Thread Gwen Shapira
Since you have the Kafka configuration, you can open your own connection to
ZK.
You also have the advertised listeners from same file, if you want to
connect back to the Kafka cluster to check things.
 I'd use that if possible for your use-case, accessing the log files
directly seems a bit risky to me.


I'd love to hear what your authentication workflow looks like where you
need to actually read data from disk.

On Fri, Mar 8, 2019 at 1:20 PM Christopher Vollick
 wrote:

> Hello! I'm experimenting with an implementation of
> AuthenticateCallbackHandler in an external JAR I’m loading, and I'd like to
> use some of the methods / properties from ReplicaManager (or KafkaServer
> which has a ReplicaManager), but I don't see anything that's passed to me
> or any singletons that will give me access to those objects from my class.
>
> I figure that’s probably intentional, but I wanted to ask just incase I’m
> missing a hook I don’t know is there.
> Specifically, I was looking to either get access to the logs on disk
> (ReplicaManager’s fetchMessages method) so I could read some things, or
> alternatively the ZooKeeper connection.
>
> Since I’m in a JAR, any solution which involves changing the core Kafka
> code-base isn’t something I’m interested in.
> Thanks!



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Using Kafka Internals from Within Plugin

2019-03-08 Thread Christopher Vollick
Hello! I'm experimenting with an implementation of AuthenticateCallbackHandler 
in an external JAR I’m loading, and I'd like to use some of the methods / 
properties from ReplicaManager (or KafkaServer which has a ReplicaManager), but 
I don't see anything that's passed to me or any singletons that will give me 
access to those objects from my class.

I figure that’s probably intentional, but I wanted to ask just incase I’m 
missing a hook I don’t know is there.
Specifically, I was looking to either get access to the logs on disk 
(ReplicaManager’s fetchMessages method) so I could read some things, or 
alternatively the ZooKeeper connection.

Since I’m in a JAR, any solution which involves changing the core Kafka 
code-base isn’t something I’m interested in.
Thanks!

[jira] [Reopened] (KAFKA-7288) Transient failure in SslSelectorTest.testCloseConnectionInClosingState

2019-03-08 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reopened KAFKA-7288:
---

The check added to fix this issue was wrong, submitted new PR to fix properly.

> Transient failure in SslSelectorTest.testCloseConnectionInClosingState
> --
>
> Key: KAFKA-7288
> URL: https://issues.apache.org/jira/browse/KAFKA-7288
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 2.3.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.2.0
>
>
> Noticed this failure in SslSelectorTest.testCloseConnectionInClosingState a 
> few times in unit tests in Jenkins:
> {quote}
> java.lang.AssertionError: Channel not expired expected null, but 
> was: at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotNull(Assert.java:755) at 
> org.junit.Assert.assertNull(Assert.java:737) at 
> org.apache.kafka.common.network.SelectorTest.testCloseConnectionInClosingState(SelectorTest.java:341)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Speeding up integration tests

2019-03-08 Thread Ron Dagostino
It's a classic problem: you can't string N things together serially and
expect high reliability.  5,000 tests in a row isn't going to give you a
bunch of 9's.  It feels to me that the test frameworks themselves should
support a more robust model -- like a way to tag a test as "retry me up to
N times before you really consider me a failure" or something like that.

Ron

On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski 
wrote:

> > We internally have an improvement for a half a year now which reruns the
> flaky test classes at the end of the test gradle task, lets you know that
> they were rerun and probably flaky. It fails the build only if the second
> run of the test class was also unsuccessful. I think it works pretty good,
> we mostly have green builds. If there is interest, I can try to contribute
> that.
>
> That does sound very intriguing. Does it rerun the test classes that failed
> or some known, marked classes? If it is the former, I can see a lot of
> value in having that automated in our PR builds. I wonder what others think
> of this
>
> On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> wrote:
>
> > Hey All,
> >
> > Thanks for the loads of ideas.
> >
> > @Stanislav, @Sonke
> > I probably left it out from my email but I really imagined this as a
> > case-by-case basis change. If we think that it wouldn't cause problems,
> > then it might be applied. That way we'd limit the blast radius somewhat.
> > The 1 hour gain is really just the most optimistic scenario, I'm almost
> > sure that not every test could be transformed to use a common cluster.
> > We internally have an improvement for a half a year now which reruns the
> > flaky test classes at the end of the test gradle task, lets you know that
> > they were rerun and probably flaky. It fails the build only if the second
> > run of the test class was also unsuccessful. I think it works pretty
> good,
> > we mostly have green builds. If there is interest, I can try to
> contribute
> > that.
> >
> > >I am also extremely annoyed at times by the amount of coffee I have to
> > drink before tests finish
> > Just please don't get a heart attack :)
> >
> > @Ron, @Colin
> > You bring up a very good point that it is easier and frees up more
> > resources if we just run change specific tests and it's good to know
> that a
> > similar solution (meaning using a shared resource for testing) have
> failed
> > elsewhere. I second Ron on the test categorization though, although as a
> > first attempt I think using a flaky retry + running only the necessary
> > tests would help in both time saving and effectiveness. Also it would be
> > easier to achieve.
> >
> > @Ismael
> > Yea, it'd be interesting to profile the startup/shutdown, I've never done
> > that. Perhaps I'll set some time apart for that :). It's definitely true
> > though that if we see a significant delay there we wouldn't just improve
> > the efficiency of the tests but also customer experience.
> >
> > Best,
> > Viktor
> >
> >
> >
> > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma  wrote:
> >
> > > It's an idea that has come up before and worth exploring eventually.
> > > However, I'd first try to optimize the server startup/shutdown process.
> > If
> > > we measure where the time is going, maybe some opportunities will
> present
> > > themselves.
> > >
> > > Ismael
> > >
> > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I've been observing lately that unit tests usually take 2.5 hours to
> > run
> > > > and a very big portion of these are the core tests where a new
> cluster
> > is
> > > > spun up for every test. This takes most of the time. I ran a test
> > > > (TopicCommandWithAdminClient with 38 test inside) through the
> profiler
> > > and
> > > > it shows for instance that running the whole class itself took 10
> > minutes
> > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> That's a
> > > > 100% overhead. Without profiler the whole class takes 7 minutes and
> 48
> > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > > bigger
> > > > test though, most of them won't take this much.
> > > > There are 74 classes that implement KafkaServerTestHarness and just
> > > running
> > > > :core:integrationTest takes almost 2 hours.
> > > >
> > > > I think we could greatly speed up these integration tests by just
> > > creating
> > > > the cluster once per class and perform the tests on separate
> methods. I
> > > > know that this a little bit contradicts to the principle that tests
> > > should
> > > > be independent but it seems like recreating clusters for each is a
> very
> > > > expensive operation. Also if the tests are acting on different
> > resources
> > > > (different topics, etc.) then it might not hurt their independence.
> > There
> > > > might be cases of course where this is not possible but I think there
> > > could

Jenkins build is back to normal : kafka-trunk-jdk8 #3448

2019-03-08 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-8079) Flaky Test EpochDrivenReplicationProtocolAcceptanceTest#shouldSurviveFastLeaderChange

2019-03-08 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8079:
--

 Summary: Flaky Test 
EpochDrivenReplicationProtocolAcceptanceTest#shouldSurviveFastLeaderChange
 Key: KAFKA-8079
 URL: https://issues.apache.org/jira/browse/KAFKA-8079
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 2.3.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0


[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3445/tests]
{quote}java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertTrue(Assert.java:53)
at 
kafka.server.epoch.EpochDrivenReplicationProtocolAcceptanceTest.$anonfun$shouldSurviveFastLeaderChange$2(EpochDrivenReplicationProtocolAcceptanceTest.scala:294)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at 
kafka.server.epoch.EpochDrivenReplicationProtocolAcceptanceTest.shouldSurviveFastLeaderChange(EpochDrivenReplicationProtocolAcceptanceTest.scala:273){quote}
STDOUT
{quote}[2019-03-08 01:16:02,452] ERROR [ReplicaFetcher replicaId=101, 
leaderId=100, fetcherId=0] Error for partition topic1-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-08 01:16:23,677] ERROR [ReplicaFetcher replicaId=101, leaderId=100, 
fetcherId=0] Error for partition topic1-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-08 01:16:35,779] ERROR [Controller id=100] Error completing preferred 
replica leader election for partition topic1-0 
(kafka.controller.KafkaController:76)
kafka.common.StateChangeFailedException: Failed to elect leader for partition 
topic1-0 under strategy PreferredReplicaPartitionLeaderElectionStrategy
at 
kafka.controller.PartitionStateMachine.$anonfun$doElectLeaderForPartitions$9(PartitionStateMachine.scala:390)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:388)
at 
kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:315)
at 
kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:225)
at 
kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:141)
at 
kafka.controller.KafkaController.kafka$controller$KafkaController$$onPreferredReplicaElection(KafkaController.scala:649)
at 
kafka.controller.KafkaController.$anonfun$checkAndTriggerAutoLeaderRebalance$6(KafkaController.scala:1008)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:128)
at 
kafka.controller.KafkaController.kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance(KafkaController.scala:989)
at 
kafka.controller.KafkaController$AutoPreferredReplicaLeaderElection$.process(KafkaController.scala:1020)
at 
kafka.controller.ControllerEventManager$ControllerEventThread.$anonfun$doWork$1(ControllerEventManager.scala:95)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at 
kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:95)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:89)
Dumping /tmp/kafka-2158669830092629415/topic1-0/.log
Starting offset: 0
baseOffset: 0 lastOffset: 0 count: 1 baseSequence: -1 lastSequence: -1 
producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false 
isControl: false position: 0 CreateTime: 1552007783877 size: 141 magic: 2 
compresscodec: SNAPPY crc: 2264724941 isvalid: true
baseOffset: 1 lastOffset: 1 count: 1 baseSequence: -1 lastSequence: -1 
producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false 
isControl: false position: 141 CreateTime: 1552007784731 size: 141 magic: 2 
compresscodec: SNAPPY crc: 14988968 isvalid: true
baseOffset: 2 lastOffset: 2 count: 1 baseSequence: -1 lastSequence: -1 
producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false 
isControl: false position: 282 CreateTime: 1552007784734 size: 141 magic: 2 
compresscodec: SNAPPY crc: 460124078 isvalid: true
baseOffset: 3 lastOffset: 3 count: 1 baseSequence: -1 lastSequence: -1 
producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false 
isControl: false position: 423 CreateTime: 1552007784737 size: 141 magic: 2 
compresscodec: SNAPPY crc: 2692183172 isvalid: true
baseOffset: 4 lastOffset: 4 count: 1 baseSeq

[jira] [Created] (KAFKA-8078) Flaky Test TableTableJoinIntegrationTest#testInnerInner

2019-03-08 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8078:
--

 Summary: Flaky Test TableTableJoinIntegrationTest#testInnerInner
 Key: KAFKA-8078
 URL: https://issues.apache.org/jira/browse/KAFKA-8078
 Project: Kafka
  Issue Type: Bug
  Components: streams, unit tests
Affects Versions: 2.3.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0


[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3445/tests]
{quote}java.lang.AssertionError: Condition not met within timeout 15000. Never 
received expected final result.
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:365)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325)
at 
org.apache.kafka.streams.integration.AbstractJoinIntegrationTest.runTest(AbstractJoinIntegrationTest.java:246)
at 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testInnerInner(TableTableJoinIntegrationTest.java:196){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8077) Flaky Test AdminClientIntegrationTest#testConsumeAfterDeleteRecords

2019-03-08 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8077:
--

 Summary: Flaky Test 
AdminClientIntegrationTest#testConsumeAfterDeleteRecords
 Key: KAFKA-8077
 URL: https://issues.apache.org/jira/browse/KAFKA-8077
 Project: Kafka
  Issue Type: Bug
  Components: admin, unit tests
Affects Versions: 2.0.1
Reporter: Matthias J. Sax
 Fix For: 2.0.2, 2.3.0, 2.1.2, 2.2.1


[https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/237/tests]
{quote}java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
at 
kafka.api.AdminClientIntegrationTest$$anonfun$sendRecords$1.apply(AdminClientIntegrationTest.scala:994)
at 
kafka.api.AdminClientIntegrationTest$$anonfun$sendRecords$1.apply(AdminClientIntegrationTest.scala:994)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
kafka.api.AdminClientIntegrationTest.sendRecords(AdminClientIntegrationTest.scala:994)
at 
kafka.api.AdminClientIntegrationTest.testConsumeAfterDeleteRecords(AdminClientIntegrationTest.scala:909)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 
This server does not host this topic-partition.{quote}
STDERR
{quote}Exception in thread "Thread-1638" 
org.apache.kafka.common.errors.InterruptException: 
java.lang.InterruptedException
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeThrowInterruptException(ConsumerNetworkClient.java:504)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:287)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1247)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1187)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at 
kafka.api.AdminClientIntegrationTest$$anon$1.run(AdminClientIntegrationTest.scala:1132)
Caused by: java.lang.InterruptedException
... 7 more{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect

2019-03-08 Thread Robert Yokota
Thanks for the great KIP Konstantine!

+1 (non-binding)

Robert

On Thu, Mar 7, 2019 at 2:56 PM Guozhang Wang  wrote:

> Thanks Konstantine, I've read the updated section on
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
> and it lgtm.
>
> I'm +1 on the KIP.
>
>
> Guozhang
>
>
> On Thu, Mar 7, 2019 at 2:35 PM Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > Thanks Guozhang. This is a valid observation regarding the current status
> > of the PR.
> >
> > I updated the KIP to explicitly call out how the downgrade process should
> > work in the section Compatibility, Deprecation, and Migration.
> >
> > Additionally, I reduced the configuration modes for the connect.protocol
> to
> > only two: eager and compatible.
> > That's because there's no way at the moment to select a protocol based on
> > simple majority and not unanimity across at least one option for the
> > sub-protocol.
> > Therefore there's no way to lock a group of workers in a cooperative-only
> > mode at the moment, if we account for accidental joins of workers running
> > at an older version.
> >
> > The changes have been reflected in the KIP doc and will be reflected in
> the
> > PR in a subsequent commit.
> >
> > Thanks,
> > Konstantine
> >
> >
> > On Thu, Mar 7, 2019 at 1:17 PM Guozhang Wang  wrote:
> >
> > > Hi Konstantine,
> > >
> > > Thanks for the updated KIP and the PR as well (which is huge :) I
> briefly
> > > looked through it as well as the KIP, and I have one minor comment to
> add
> > > (otherwise I'm binding +1 on it as well) about the backward
> > compatibility.
> > > I'll use one example to illustrate the issue:
> > >
> > > 1) Suppose you have workerA and B on newer version and configured the
> > > connect.protocol as "compatible", they will send both V0/V1 to the
> leader
> > > (say it's workerA) who will choose V1 as the current protocol, this
> will
> > be
> > > sent back to A and B who would remember the current protocol version is
> > > already V1. So after this rebalance everyone remembers that V1 can be
> > used,
> > > which means that upon prepareJoin they will not revoke all the assigned
> > > tasks.
> > >
> > > 2) Now let's say a new worker joins but with old version V0
> (practically
> > > this is rare, but for illustration purposes some common scenarios may
> > falls
> > > into this, e.g. an existing worker being downgraded, which is
> essentially
> > > as being kicked out of the group, and then rejoined as a new member on
> > the
> > > older version), the leader realized that at least one of the member
> does
> > > not know V1 and hence would fall back to use version V0 to perform
> > > assignment. V0 algorithm would do eager rebalance which may move some
> > tasks
> > > to the new comer immediately from the existing members, as it assumes
> > that
> > > everyone would revoke everything before join (a.k.a the sync-barrier)
> but
> > > this is actually not true, since everyone other than the old versioned
> > new
> > > comer would still follow the behavior of V1 --- not revoking anything
> ---
> > > before sending the join group request.
> > >
> > > This could be solvable though, e.g. when leader realized that he needs
> to
> > > use V0, while the previous "currentProtocol" value is V1, instead of
> just
> > > blindly follow the algorithm of V0 it could just reassign the existing
> > > partitions without migrating anything, while at the same time tell
> > everyone
> > > that the currentProtocol version is downgraded to V0; and then they can
> > > trigger another rebalance based on V0 where everything will revoke the
> > > tasks before sending join group requests.
> > >
> > >
> > > Guozhang
> > >
> > > On Wed, Mar 6, 2019 at 2:28 PM Konstantine Karantasis <
> > > konstant...@confluent.io> wrote:
> > >
> > > > I'd like to open the vote on KIP-415: Incremental Cooperative
> > Rebalancing
> > > > in Kafka Connect
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
> > > >
> > > > a proposal that will allow Kafka Connect to scale significantly the
> > > number
> > > > of connectors and tasks it can run in a cluster of Connect workers.
> > > >
> > > > Thanks,
> > > > Konstantine
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-427: Add AtMinIsr topic partition category (new metric & TopicCommand option)

2019-03-08 Thread Kevin Lu
Hi All,

Thanks for voting!

I have moved the KIP to ACCEPTED. Stay tuned for the PR.

Regards,
Kevin

On Fri, Mar 8, 2019 at 9:43 AM Dongjin Lee  wrote:

> With 3 binding with 3 non-binding +1, this proposal is now passed.
>
> Thanks for the KIP, Kevin!
>
> On Sat, 9 Mar 2019 at 2:40 AM Harsha  wrote:
>
> >
> >  +1 (binding)
> >
> > -Harsha
> >
> > On Thu, Mar 7, 2019, at 6:48 PM, hacker win7 wrote:
> > > +1 (non-binding)
> > >
> > > > On Mar 8, 2019, at 02:32, Stanislav Kozlovski <
> stanis...@confluent.io>
> > wrote:
> > > >
> > > > Thanks for the KIP, Kevin! This change will be a good improvement to
> > > > Kafka's observability story
> > > >
> > > > +1 (non-binding)
> > > >
> > > > On Thu, Mar 7, 2019 at 4:49 AM Vahid Hashemian <
> > vahid.hashem...@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks for the KIP Kevin.
> > > >>
> > > >> +1 (binding)
> > > >>
> > > >> --Vahid
> > > >>
> > > >> On Wed, Mar 6, 2019 at 8:39 PM Dongjin Lee 
> > wrote:
> > > >>
> > > >>> +1 (non-binding)
> > > >>>
> > > >>> On Wed, Mar 6, 2019, 3:14 AM Dong Lin  wrote:
> > > >>>
> > >  Hey Kevin,
> > > 
> > >  Thanks for the KIP!
> > > 
> > >  +1 (binding)
> > > 
> > >  Thanks,
> > >  Dong
> > > 
> > >  On Tue, Mar 5, 2019 at 9:38 AM Kevin Lu 
> > wrote:
> > > 
> > > > Hi All,
> > > >
> > > > I would like to start the vote thread for KIP-427: Add AtMinIsr
> > topic
> > > > partition category (new metric & TopicCommand option).
> > > >
> > > >
> > > 
> > > >>>
> > > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089398
> > > >
> > > > Thanks!
> > > >
> > > > Regards,
> > > > Kevin
> > > >
> > > 
> > > >>>
> > > >>
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > >
> > >
> >
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> speakerdeck:
> speakerdeck.com/dongjin
> *
>


Build failed in Jenkins: kafka-2.1-jdk8 #145

2019-03-08 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-8069; Fix early expiration of offsets due to invalid loading of

--
[...truncated 91.03 KB...]

  ^
  required: Serializer
  found:Serializer
  where K is a type-variable:
K extends Object declared in class OptimizableRepartitionNode
:79:
 warning: [unchecked] unchecked conversion
final Deserializer keyDeserializer = keySerde != null ? 
keySerde.deserializer() : null;

^
  required: Deserializer
  found:Deserializer
  where K is a type-variable:
K extends Object declared in class OptimizableRepartitionNode
:80:
 warning: [unchecked] unchecked cast
topologyBuilder.addGlobalStore((StoreBuilder) 
storeBuilder,
 ^
  required: StoreBuilder
  found:StoreBuilder
  where S is a type-variable:
S extends StateStore declared in class TableSourceNode
:133:
 warning: [unchecked] unchecked conversion
this.consumedInternal = consumedInternal;
^
  required: ConsumedInternal
  found:ConsumedInternal
  where K,V are type-variables:
K extends Object declared in class TableSourceNodeBuilder
V extends Object declared in class TableSourceNodeBuilder
:61:
 warning: [unchecked] unchecked method invocation: constructor  in class 
WindowedStreamPartitioner is applied to given types
final StreamPartitioner windowedPartitioner = 
(StreamPartitioner) new WindowedStreamPartitioner((WindowedSerializer) keySerializer);

  ^
  required: WindowedSerializer
  found: WindowedSerializer
  where K is a type-variable:
K extends Object declared in class WindowedStreamPartitioner
:61:
 warning: [unchecked] unchecked conversion
final StreamPartitioner windowedPartitioner = 
(StreamPartitioner) new WindowedStreamPartitioner((WindowedSerializer) keySerializer);

   ^
  required: WindowedSerializer
  found:WindowedSerializer
  where K is a type-variable:
K extends Object declared in class WindowedStreamPartitioner
:61:
 warning: [unchecked] unchecked cast
final StreamPartitioner windowedPartitioner = 
(StreamPartitioner) new WindowedStreamPartitioner((WindowedSerializer) keySerializer);

  ^
  required: StreamPartitioner
  found:WindowedStreamPartitioner
  where V,K are type-variables:
V extends Object declared in class StreamSinkNode
K extends Object declared in class StreamSinkNode
:88:
 warning: [unchecked] unchecked method invocation: constructor  in class 
KeyValueStoreMaterializer is applied to given types
= new 
KeyValueStoreMaterializer<>(materializedInternal).materialize();
  ^
  required: MaterializedInternal>
  found: MaterializedInternal
  where K,V are type-variables:
K extends Object declared in class KeyValueStoreMaterializer
V extends Object declared in class KeyValueStoreMaterializer
:88:
 warning: [unchecked] unchecked conversion
= new 
KeyValueStoreMaterializer<>(materializedInternal).materialize();
  ^
  required: MaterializedInternal>
  found:MaterializedInternal
  where K,V are type-variables:
K extends Object declared in class KeyValueStoreMaterializer

Re: [VOTE] 2.2.0 RC1 [CANCELED]

2019-03-08 Thread Matthias J. Sax
A blocker was found: https://issues.apache.org/jira/browse/KAFKA-8069

I am canceling this VOTE in favor of a new RC.


-Matthias

On 3/7/19 10:13 AM, Jonathan Santilli wrote:
> Hello Matthias, I have run the tests without issue.
> Also, I have performed the quick start successfully, thanks for the release.
> 
> 
> Cheers!
> --
> Jonathan
> 
> 
> 
> On Wed, Mar 6, 2019 at 8:58 PM Matthias J. Sax 
> wrote:
> 
>> Thanks for testing the RC Jakub!
>>
>> I agree, that both issue are no reason to cancel the current RC. I'll
>> make sure that the doc fixes go into 2.2 webpage when we do the release
>> though.
>>
>> Thanks for the patches!
>>
>>
>> -Matthias
>>
>>
>> On 3/6/19 11:58 AM, Jakub Scholz wrote:
>>> Thanks for driving the release Matthias. I noticed two very minor issues
>>> while testing the RC1 and opened PRs against master:
>>> * https://github.com/apache/kafka/pull/6379
>>> * https://github.com/apache/kafka/pull/6381
>>>
>>> I assume it might not be worth including these into 2.2 since it is
>> nothing
>>> serious. Otherwise the RC1 worked fine for me.
>>>
>>> Jakub
>>>
>>>
>>> On Fri, Mar 1, 2019 at 8:48 PM Matthias J. Sax 
>>> wrote:
>>>
 Hello Kafka users, developers and client-developers,

 This is the second candidate for release of Apache Kafka 2.2.0.

  - Added SSL support for custom principle name
  - Allow SASL connections to periodically re-authenticate
  - Improved consumer group management
- default group.id is `null` instead of empty string
  - Add --under-min-isr option to describe topics command
  - API improvement
- Producer: introduce close(Duration)
- AdminClient: introduce close(Duration)
- Kafka Streams: new flatTransform() operator in Streams DSL
- KafkaStreams (and other classed) now implement AutoClosable to
 support try-with-resource
- New Serdes and default method implementations
  - Kafka Streams exposed internal client.id via ThreadMetadata
  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will now
 output `NaN` as default value

 Release notes for the 2.2.0 release:
 https://home.apache.org/~mjsax/kafka-2.2.0-rc1/RELEASE_NOTES.html

 *** Please download, test and vote by Thursday, March 7, 9am PST

 Kafka's KEYS file containing PGP keys we use to sign the release:
 https://kafka.apache.org/KEYS

 * Release artifacts to be voted upon (source and binary):
 https://home.apache.org/~mjsax/kafka-2.2.0-rc1/

 * Maven artifacts to be voted upon:
 https://repository.apache.org/content/groups/staging/org/apache/kafka/

 * Javadoc:
 https://home.apache.org/~mjsax/kafka-2.2.0-rc1/javadoc/

 * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
 https://github.com/apache/kafka/releases/tag/2.2.0-rc1

 * Documentation:
 https://kafka.apache.org/22/documentation.html

 * Protocol:
 https://kafka.apache.org/22/protocol.html

 * Jenkins builds for the 2.2 branch:
 Unit/integration tests: https://builds.apache.org/job/kafka-2.2-jdk8/

 * System tests for the 2.2 branch:
 https://jenkins.confluent.io/job/system-test-kafka/job/2.2/


 /**

 Thanks,


 -Matthias


>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-436 Add a metric indicating start time

2019-03-08 Thread Dongjin Lee
With 3 binding with 3 non-binding +1, this proposal is now passed.

On Sat, 9 Mar 2019 at 2:37 AM Harsha  wrote:

> +1 (binding)
>
> Thanks,
> Harsha
>
> On Fri, Mar 8, 2019, at 2:55 AM, Dongjin Lee wrote:
> > +1 (non binding)
> >
> > 2 bindings, 3 non-bindings until now. (Colin, Manikumar / Satish,
> Mickael,
> > Dongjin)
> >
> > On Fri, Mar 8, 2019 at 7:44 PM Mickael Maison 
> > wrote:
> >
> > > +1 (non binding)
> > > Thanks
> > >
> > > On Fri, Mar 8, 2019 at 6:39 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks for the KIP,
> > > > +1 (non-binding)
> > > >
> > > > ~Satish.
> > > >
> > > > On Thu, Mar 7, 2019 at 11:58 PM Manikumar  >
> > > wrote:
> > > >
> > > > > +1 (binding).
> > > > >
> > > > > Thanks for the KIP.
> > > > >
> > > > > Thanks,
> > > > > Manikumar
> > > > >
> > > > >
> > > > > On Thu, Mar 7, 2019 at 11:52 PM Colin McCabe 
> > > wrote:
> > > > >
> > > > > > +1 (binding).
> > > > > >
> > > > > > Thanks, Stanislav.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > > On Tue, Mar 5, 2019, at 05:23, Stanislav Kozlovski wrote:
> > > > > > > Hey everybody,
> > > > > > >
> > > > > > > I'd like to start a vote thread about the lightweight KIP-436
> > > > > > > KIP: KIP-436
> > > > > > > <
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-436%3A+Add+a+metric+indicating+start+time
> > > > > > >
> > > > > > > JIRA: KAFKA-7992 <
> https://issues.apache.org/jira/browse/KAFKA-7992
> > > >
> > > > > > > Pull Request: 6318 
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  github.com/dongjinleekr
> > linkedin:
> kr.linkedin.com/in/dongjinleekr
> > speakerdeck:
> speakerdeck.com/dongjin
> > *
> >
>
-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
speakerdeck: speakerdeck.com/dongjin
*


Re: [VOTE] KIP-427: Add AtMinIsr topic partition category (new metric & TopicCommand option)

2019-03-08 Thread Dongjin Lee
With 3 binding with 3 non-binding +1, this proposal is now passed.

Thanks for the KIP, Kevin!

On Sat, 9 Mar 2019 at 2:40 AM Harsha  wrote:

>
>  +1 (binding)
>
> -Harsha
>
> On Thu, Mar 7, 2019, at 6:48 PM, hacker win7 wrote:
> > +1 (non-binding)
> >
> > > On Mar 8, 2019, at 02:32, Stanislav Kozlovski 
> wrote:
> > >
> > > Thanks for the KIP, Kevin! This change will be a good improvement to
> > > Kafka's observability story
> > >
> > > +1 (non-binding)
> > >
> > > On Thu, Mar 7, 2019 at 4:49 AM Vahid Hashemian <
> vahid.hashem...@gmail.com>
> > > wrote:
> > >
> > >> Thanks for the KIP Kevin.
> > >>
> > >> +1 (binding)
> > >>
> > >> --Vahid
> > >>
> > >> On Wed, Mar 6, 2019 at 8:39 PM Dongjin Lee 
> wrote:
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>> On Wed, Mar 6, 2019, 3:14 AM Dong Lin  wrote:
> > >>>
> >  Hey Kevin,
> > 
> >  Thanks for the KIP!
> > 
> >  +1 (binding)
> > 
> >  Thanks,
> >  Dong
> > 
> >  On Tue, Mar 5, 2019 at 9:38 AM Kevin Lu 
> wrote:
> > 
> > > Hi All,
> > >
> > > I would like to start the vote thread for KIP-427: Add AtMinIsr
> topic
> > > partition category (new metric & TopicCommand option).
> > >
> > >
> > 
> > >>>
> > >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089398
> > >
> > > Thanks!
> > >
> > > Regards,
> > > Kevin
> > >
> > 
> > >>>
> > >>
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> >
> >
>
-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
speakerdeck: speakerdeck.com/dongjin
*


Re: [VOTE] KIP-427: Add AtMinIsr topic partition category (new metric & TopicCommand option)

2019-03-08 Thread Harsha


 +1 (binding)

-Harsha

On Thu, Mar 7, 2019, at 6:48 PM, hacker win7 wrote:
> +1 (non-binding)
> 
> > On Mar 8, 2019, at 02:32, Stanislav Kozlovski  
> > wrote:
> > 
> > Thanks for the KIP, Kevin! This change will be a good improvement to
> > Kafka's observability story
> > 
> > +1 (non-binding)
> > 
> > On Thu, Mar 7, 2019 at 4:49 AM Vahid Hashemian 
> > wrote:
> > 
> >> Thanks for the KIP Kevin.
> >> 
> >> +1 (binding)
> >> 
> >> --Vahid
> >> 
> >> On Wed, Mar 6, 2019 at 8:39 PM Dongjin Lee  wrote:
> >> 
> >>> +1 (non-binding)
> >>> 
> >>> On Wed, Mar 6, 2019, 3:14 AM Dong Lin  wrote:
> >>> 
>  Hey Kevin,
>  
>  Thanks for the KIP!
>  
>  +1 (binding)
>  
>  Thanks,
>  Dong
>  
>  On Tue, Mar 5, 2019 at 9:38 AM Kevin Lu  wrote:
>  
> > Hi All,
> > 
> > I would like to start the vote thread for KIP-427: Add AtMinIsr topic
> > partition category (new metric & TopicCommand option).
> > 
> > 
>  
> >>> 
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089398
> > 
> > Thanks!
> > 
> > Regards,
> > Kevin
> > 
>  
> >>> 
> >> 
> > 
> > 
> > -- 
> > Best,
> > Stanislav
> 
>


Re: [VOTE] KIP-436 Add a metric indicating start time

2019-03-08 Thread Harsha
+1 (binding)

Thanks,
Harsha

On Fri, Mar 8, 2019, at 2:55 AM, Dongjin Lee wrote:
> +1 (non binding)
> 
> 2 bindings, 3 non-bindings until now. (Colin, Manikumar / Satish, Mickael,
> Dongjin)
> 
> On Fri, Mar 8, 2019 at 7:44 PM Mickael Maison 
> wrote:
> 
> > +1 (non binding)
> > Thanks
> >
> > On Fri, Mar 8, 2019 at 6:39 AM Satish Duggana 
> > wrote:
> > >
> > > Thanks for the KIP,
> > > +1 (non-binding)
> > >
> > > ~Satish.
> > >
> > > On Thu, Mar 7, 2019 at 11:58 PM Manikumar 
> > wrote:
> > >
> > > > +1 (binding).
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > Thanks,
> > > > Manikumar
> > > >
> > > >
> > > > On Thu, Mar 7, 2019 at 11:52 PM Colin McCabe 
> > wrote:
> > > >
> > > > > +1 (binding).
> > > > >
> > > > > Thanks, Stanislav.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > > On Tue, Mar 5, 2019, at 05:23, Stanislav Kozlovski wrote:
> > > > > > Hey everybody,
> > > > > >
> > > > > > I'd like to start a vote thread about the lightweight KIP-436
> > > > > > KIP: KIP-436
> > > > > > <
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-436%3A+Add+a+metric+indicating+start+time
> > > > > >
> > > > > > JIRA: KAFKA-7992  > >
> > > > > > Pull Request: 6318 
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> >
> 
> 
> -- 
> *Dongjin Lee*
> 
> *A hitchhiker in the mathematical world.*
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> speakerdeck: speakerdeck.com/dongjin
> *
>


[jira] [Resolved] (KAFKA-8069) Committed offsets get cleaned up right after the coordinator loading them back from __consumer_offsets in broker with old inter-broker protocol version (< 2.2)

2019-03-08 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8069.

   Resolution: Fixed
Fix Version/s: 2.1.2
   2.0.2

> Committed offsets get cleaned up right after the coordinator loading them 
> back from __consumer_offsets in broker with old inter-broker protocol version 
> (< 2.2)
> ---
>
> Key: KAFKA-8069
> URL: https://issues.apache.org/jira/browse/KAFKA-8069
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.2.1
>Reporter: Zhanxiang (Patrick) Huang
>Assignee: Zhanxiang (Patrick) Huang
>Priority: Critical
> Fix For: 2.2.0, 2.0.2, 2.1.2
>
>
> After the 2.1 release, if the broker hasn't been upgrade to the latest 
> inter-broker protocol version, 
> the committed offsets stored in the __consumer_offset topic will get cleaned 
> up way earlier than it should be when the offsets are loaded back from the 
> __consumer_offset topic in GroupCoordinator, which will happen during 
> leadership transition or after broker bounce.
> TL;DR
> For V1 on-disk format for __consumer_offsets, we have the *expireTimestamp* 
> field and if the inter-broker protocol (IBP) version is prior to 2.1 (prior 
> to 
> [KIP-211|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets])
>  for a kafka 2.1 broker, the logic of getting the expired offsets looks like:
> {code:java}
> def getExpiredOffsets(baseTimestamp: CommitRecordMetadataAndOffset => Long): 
> Map[TopicPartition, OffsetAndMetadata] = {
>  offsets.filter {
>  case (topicPartition, commitRecordMetadataAndOffset) =>
>  ... && {
>  commitRecordMetadataAndOffset.offsetAndMetadata.expireTimestamp match {
>  case None =>
>  // current version with no per partition retention
>  currentTimestamp - baseTimestamp(commitRecordMetadataAndOffset) >= 
> offsetRetentionMs
>  case Some(expireTimestamp) =>
>  // older versions with explicit expire_timestamp field => old expiration 
> semantics is used
>  currentTimestamp >= expireTimestamp
>  }
>  }
>  }
>  }
> {code}
> The expireTimestamp in the on-disk offset record can only be set when storing 
> the committed offset in the __consumer_offset topic. But the GroupCoordinator 
> also has keep a in-memory representation for the expireTimestamp (see the 
> codes above), which can be set in the following two cases:
>  # Upon the GroupCoordinator receiving OffsetCommitRequest, the 
> expireTimestamp is set using the following logic:
> {code:java}
> expireTimestamp = offsetCommitRequest.retentionTime match {
>  case OffsetCommitRequest.DEFAULT_RETENTION_TIME => None
>  case retentionTime => Some(currentTimestamp + retentionTime)
> }
> {code}
> In all the latest client versions, the consumer will set out 
> OffsetCommitRequest with DEFAULT_RETENTION_TIME so the expireTimestamp will 
> always be None in this case. *This means any committed offset set in this 
> case will always hit the "case None" in the "getExpiredOffsets(...)" when 
> coordinator is doing the cleanup, which is correct.*
>  # Upon the GroupCoordinatorReceiving loading the committed offset stored in 
> the __consumer_offsets topic from disk, the expireTimestamp is set using the 
> following logic if IBP<2.1:
> {code:java}
> val expireTimestamp = 
> value.get(OFFSET_VALUE_EXPIRE_TIMESTAMP_FIELD_V1).asInstanceOf[Long]
> {code}
> and the logic to persist the expireTimestamp is:
> {code:java}
> // OffsetCommitRequest.DEFAULT_TIMESTAMP = -1
> value.set(OFFSET_VALUE_EXPIRE_TIMESTAMP_FIELD_V1, 
> offsetAndMetadata.expireTimestamp.getOrElse(OffsetCommitRequest.DEFAULT_TIMESTAMP))
> {code}
> Since the in-memory expireTimestamp will always be None in our case as 
> mentioned in 1), we will always store -1 on-disk. Therefore, when the offset 
> is loaded from the __consumer_offsets topic, the in-memory expireTimestamp 
> will always be set to -1. *This means any committed offset set in this case 
> will always hit "case Some(expireTimestamp)" in the "getExpiredOffsets(...)" 
> when coordinator is doing the cleanup, which basically indicates we will 
> always expire the committed offset on the first expiration check (which is 
> shortly after they are loaded from __consumer_offsets topic)*.
> I am able to reproduce this bug on my local box with one broker using 2.*,1.* 
> and 0.11.* consumer. The consumer will see null committed offset after the 
> broker is bounced.
> This bug is introduced by [PR-5690|https://github.com/apache/kafka/pull/5690] 
> in the kafka 2.1 release and the fix is very straight-forward, which is 
> basically set the expireTimestamp to None if it is -1 in

Build failed in Jenkins: kafka-2.1-jdk8 #144

2019-03-08 Thread Apache Jenkins Server
See 


Changes:

[bill] KAFKA-8040: Streams handle initTransactions timeout (#6372)

--
[...truncated 92.11 KB...]
  where K,V are type-variables:
K extends Object declared in interface ReadOnlyWindowStore
V extends Object declared in interface ReadOnlyWindowStore
:550:
 warning: [unchecked] unchecked method invocation: constructor  in class 
KTableImpl is applied to given types
return new KTableImpl, R>(
   ^
  required: 
String,Serde,Serde,Set,String,boolean,ProcessorSupplier,StreamsGraphNode,InternalStreamsBuilder
  found: 
String,Serde,Serde,Set,String,boolean,KTableKTableJoinMerger,KTableKTableJoinNode,Change,Change>,InternalStreamsBuilder
  where K,V,R,V1 are type-variables:
K extends Object declared in class KTableImpl
V extends Object declared in class KTableImpl
R extends Object declared in method 
buildJoin(AbstractStream,ValueJoiner,boolean,boolean,String,String,MaterializedInternal)
V1 extends Object declared in method 
buildJoin(AbstractStream,ValueJoiner,boolean,boolean,String,String,MaterializedInternal)
:88:
 warning: [unchecked] unchecked method invocation: constructor  in class 
KeyValueStoreMaterializer is applied to given types
= new 
KeyValueStoreMaterializer<>(materializedInternal).materialize();
  ^
  required: MaterializedInternal>
  found: MaterializedInternal
  where K,V are type-variables:
K extends Object declared in class KeyValueStoreMaterializer
V extends Object declared in class KeyValueStoreMaterializer
:88:
 warning: [unchecked] unchecked conversion
= new 
KeyValueStoreMaterializer<>(materializedInternal).materialize();
  ^
  required: MaterializedInternal>
  found:MaterializedInternal
  where K,V are type-variables:
K extends Object declared in class KeyValueStoreMaterializer
V extends Object declared in class KeyValueStoreMaterializer
:88:
 warning: [unchecked] unchecked conversion
= new 
KeyValueStoreMaterializer<>(materializedInternal).materialize();

   ^
  required: StoreBuilder>
  found:StoreBuilder
  where K,VR are type-variables:
K extends Object declared in class KTableKTableJoinNode
VR extends Object declared in class KTableKTableJoinNode
:80:
 warning: [unchecked] unchecked cast
topologyBuilder.addGlobalStore((StoreBuilder) 
storeBuilder,
 ^
  required: StoreBuilder
  found:StoreBuilder
  where S is a type-variable:
S extends StateStore declared in class TableSourceNode
:133:
 warning: [unchecked] unchecked conversion
this.consumedInternal = consumedInternal;
^
  required: ConsumedInternal
  found:ConsumedInternal
  where K,V are type-variables:
K extends Object declared in class TableSourceNodeBuilder
V extends Object declared in class TableSourceNodeBuilder
:61:
 warning: [unchecked] unchecked method invocation: constructor  in class 
WindowedStreamPartitioner is applied to given types
final StreamPartitioner windowedPartitioner = 
(StreamPartitioner) new WindowedStreamPartitioner((WindowedSerializer) keySerializer);

  ^
  required: WindowedSerializer
  found: WindowedSerializer
  where K is a type-variable:
K extends Object declared in class WindowedStreamPartitioner
:61:
 warning: [unchecked] unchecked conversion
final StreamPartitioner windowedPartitioner = 
(StreamPartitioner) new WindowedStreamPartitio

[jira] [Reopened] (KAFKA-2933) Failure in kafka.api.PlaintextConsumerTest.testMultiConsumerDefaultAssignment

2019-03-08 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-2933:


Reopening this. Failed again in 2.2: 
[https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/56/testReport/junit/kafka.api/PlaintextConsumerTest/testMultiConsumerDefaultAssignment/]
{quote}java.lang.AssertionError: Did not get valid initial assignment for 
partitions [topic1-2, topic1-4, topic-1, topic-0, topic1-0, topic1-3, topic1-1] 
at kafka.utils.TestUtils$.fail(TestUtils.scala:381) at 
kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:791) at 
kafka.api.PlaintextConsumerTest.validateGroupAssignment(PlaintextConsumerTest.scala:1772)
 at 
kafka.api.PlaintextConsumerTest.testMultiConsumerDefaultAssignment(PlaintextConsumerTest.scala:948){quote}
STDOUT
{quote}[2019-03-08 01:46:21,512] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:21,522] ERROR 
[ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition 
__consumer_offsets-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:21,730] ERROR 
[ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition 
topic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:22,034] ERROR 
[ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition 
topicWithNewMessageFormat-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:27,375] ERROR 
[ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition 
topic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:27,384] ERROR 
[ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition 
topic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:32,283] ERROR 
[ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
__consumer_offsets-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:32,496] ERROR 
[ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:32,509] ERROR 
[ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition 
topic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:32,516] ERROR 
[ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Error for partition 
topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:32,792] ERROR 
[ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Error for partition 
other-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:32,858] ERROR 
[ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
other-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:33,145] ERROR 
[ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition 
other-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:39,130] ERROR 
[ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition 
__consumer_offsets-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:46:39,317] ERROR 
[ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for p

[jira] [Created] (KAFKA-8076) Flaky Test ProduceRequestTest#testSimpleProduceRequest

2019-03-08 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8076:
--

 Summary: Flaky Test ProduceRequestTest#testSimpleProduceRequest
 Key: KAFKA-8076
 URL: https://issues.apache.org/jira/browse/KAFKA-8076
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 2.2.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0, 2.2.1


[https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/56/testReport/junit/kafka.server/ProduceRequestTest/testSimpleProduceRequest/]
{quote}java.lang.AssertionError: Partition [topic,0] metadata not propagated 
after 15000 ms at kafka.utils.TestUtils$.fail(TestUtils.scala:381) at 
kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:791) at 
kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:880) at 
kafka.utils.TestUtils$.$anonfun$createTopic$3(TestUtils.scala:318) at 
kafka.utils.TestUtils$.$anonfun$createTopic$3$adapted(TestUtils.scala:317) at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
scala.collection.immutable.Range.foreach(Range.scala:158) at 
scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
kafka.utils.TestUtils$.createTopic(TestUtils.scala:317) at 
kafka.server.ProduceRequestTest.createTopicAndFindPartitionWithLeader(ProduceRequestTest.scala:91)
 at 
kafka.server.ProduceRequestTest.testSimpleProduceRequest(ProduceRequestTest.scala:42)
{quote}
STDOUT
{quote}[2019-03-08 01:42:24,797] ERROR [ReplicaFetcher replicaId=0, leaderId=2, 
fetcherId=0] Error for partition topic-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2019-03-08 01:42:38,287] WARN Unable to 
read additional data from client sessionid 0x100712b09280002, likely client has 
closed socket (org.apache.zookeeper.server.NIOServerCnxn:376)
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8075) Flaky Test GroupAuthorizerIntegrationTest#testTransactionalProducerTopicAuthorizationExceptionInCommit

2019-03-08 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8075:
--

 Summary: Flaky Test 
GroupAuthorizerIntegrationTest#testTransactionalProducerTopicAuthorizationExceptionInCommit
 Key: KAFKA-8075
 URL: https://issues.apache.org/jira/browse/KAFKA-8075
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 2.2.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0, 2.2.1


[https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/56/testReport/junit/kafka.api/GroupAuthorizerIntegrationTest/testTransactionalProducerTopicAuthorizationExceptionInCommit/]
{quote}org.apache.kafka.common.errors.TimeoutException: Timeout expired while 
initializing transactional state in 3000ms.{quote}
STDOUT
{quote}[2019-03-08 01:48:45,226] ERROR [Consumer clientId=consumer-99, 
groupId=my-group] Offset commit failed on partition topic-0 at offset 5: Not 
authorized to access topics: [Topic authorization failed.] 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:812) 
[2019-03-08 01:48:45,227] ERROR [Consumer clientId=consumer-99, 
groupId=my-group] Not authorized to commit to topics [topic] 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:850) 
[2019-03-08 01:48:57,870] ERROR [KafkaApi-0] Error when handling request: 
clientId=0, correlationId=0, api=UPDATE_METADATA, 
body=\{controller_id=0,controller_epoch=1,broker_epoch=25,topic_states=[],live_brokers=[{id=0,end_points=[{port=43610,host=localhost,listener_name=PLAINTEXT,security_protocol_type=0}],rack=null}]}
 (kafka.server.KafkaApis:76) 
org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
Request(processor=0, connectionId=127.0.0.1:43610-127.0.0.1:44870-0, 
session=Session(Group:testGroup,/127.0.0.1), 
listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, buffer=null) 
is not authorized. [2019-03-08 01:49:14,858] ERROR [KafkaApi-0] Error when 
handling request: clientId=0, correlationId=0, api=UPDATE_METADATA, 
body=\{controller_id=0,controller_epoch=1,broker_epoch=25,topic_states=[],live_brokers=[{id=0,end_points=[{port=44107,host=localhost,listener_name=PLAINTEXT,security_protocol_type=0}],rack=null}]}
 (kafka.server.KafkaApis:76) 
org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
Request(processor=0, connectionId=127.0.0.1:44107-127.0.0.1:38156-0, 
session=Session(Group:testGroup,/127.0.0.1), 
listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, buffer=null) 
is not authorized. [2019-03-08 01:49:21,984] ERROR [KafkaApi-0] Error when 
handling request: clientId=0, correlationId=0, api=UPDATE_METADATA, 
body=\{controller_id=0,controller_epoch=1,broker_epoch=25,topic_states=[],live_brokers=[{id=0,end_points=[{port=39025,host=localhost,listener_name=PLAINTEXT,security_protocol_type=0}],rack=null}]}
 (kafka.server.KafkaApis:76) 
org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
Request(processor=0, connectionId=127.0.0.1:39025-127.0.0.1:41474-0, 
session=Session(Group:testGroup,/127.0.0.1), 
listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, buffer=null) 
is not authorized. [2019-03-08 01:49:39,438] ERROR [KafkaApi-0] Error when 
handling request: clientId=0, correlationId=0, api=UPDATE_METADATA, 
body=\{controller_id=0,controller_epoch=1,broker_epoch=25,topic_states=[],live_brokers=[{id=0,end_points=[{port=44798,host=localhost,listener_name=PLAINTEXT,security_protocol_type=0}],rack=null}]}
 (kafka.server.KafkaApis:76) 
org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
Request(processor=0, connectionId=127.0.0.1:44798-127.0.0.1:58496-0, 
session=Session(Group:testGroup,/127.0.0.1), 
listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, buffer=null) 
is not authorized. Error: Consumer group 'my-group' does not exist. [2019-03-08 
01:49:55,502] WARN Ignoring unexpected runtime exception 
(org.apache.zookeeper.server.NIOServerCnxnFactory:236) 
java.nio.channels.CancelledKeyException at 
sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at 
sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205)
 at java.lang.Thread.run(Thread.java:748) [2019-03-08 01:50:02,720] WARN Unable 
to read additional data from client sessionid 0x1007131d81c0001, likely client 
has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2019-03-08 
01:50:03,855] ERROR [KafkaApi-0] Error when handling request: clientId=0, 
correlationId=0, api=UPDATE_METADATA, 
body=\{controller_id=0,controller_epoch=1,broker_epoch=25,topic_states=[],live_brokers=[{id=0,end_points=[{port=32870,host=localhost,listener_name=PLAINTEXT,security_protocol_type=0}],rack=null}]}
 (kafka.server.KafkaApis:76) 
org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
Request(processor=0, connectionId=127.0.0.1:3287

[jira] [Resolved] (KAFKA-7980) Flaky Test SocketServerTest#testConnectionRateLimit

2019-03-08 Thread Bill Bejeck (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck resolved KAFKA-7980.

Resolution: Fixed

Resolving this as the PR build was started before the fix for this test was 
merged

> Flaky Test SocketServerTest#testConnectionRateLimit
> ---
>
> Key: KAFKA-7980
> URL: https://issues.apache.org/jira/browse/KAFKA-7980
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/25/]
> {quote}java.lang.AssertionError: Connections created too quickly: 4 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.network.SocketServerTest.testConnectionRateLimit(SocketServerTest.scala:1122){quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8040) Streams needs to retry initTransactions

2019-03-08 Thread Bill Bejeck (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck resolved KAFKA-8040.

Resolution: Fixed

> Streams needs to retry initTransactions
> ---
>
> Key: KAFKA-8040
> URL: https://issues.apache.org/jira/browse/KAFKA-8040
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Critical
> Fix For: 2.0.2, 2.3.0, 2.1.2, 2.2.1
>
>
> Following on KAFKA-6446, Streams needs to handle the new behavior.
> `initTxn` can throw TimeoutException now: default `MAX_BLOCK_MS_CONFIG` in 
> producer is 60 seconds, so I ([~guozhang]) think just wrapping it as 
> StreamsException should be reasonable, similar to what we do for 
> `producer#send`'s TimeoutException 
> ([https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L220-L225]
>  ).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8070) System test ConsumerGroupCommandTest fails intermittently with SSL

2019-03-08 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-8070.
---
Resolution: Fixed
  Reviewer: Ismael Juma

> System test ConsumerGroupCommandTest fails intermittently with SSL
> --
>
> Key: KAFKA-8070
> URL: https://issues.apache.org/jira/browse/KAFKA-8070
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.3.0, 2.2.1
>
>
> Same failure occurred in both 
> ConsumerGroupCommandTest.test_describe_consumer_group and 
> ConsumerGroupCommandTest.test_list_consumer_groups, both with SSL enabled. 
> Looks like the consumer started up successfully in both cases, but took 
> longer than the 10 second timeout.
> {noformat}
> Consumer was too slow to start
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
>  line 132, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
>  line 189, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py",
>  line 428, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/tests/kafkatest/tests/core/consumer_group_command_test.py",
>  line 106, in test_describe_consumer_group
> self.setup_and_verify(security_protocol, group="test-consumer-group")
>   File 
> "/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/tests/kafkatest/tests/core/consumer_group_command_test.py",
>  line 70, in setup_and_verify
> timeout_sec=10, backoff_sec=.2, err_msg="Consumer was too slow to start")
>   File 
> "/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/utils/util.py",
>  line 41, in wait_until
> raise TimeoutError(err_msg() if callable(err_msg) else err_msg)
> TimeoutError: Consumer was too slow to start
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8074) Create SSL keystore/truststore only once for each node in system tests

2019-03-08 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created KAFKA-8074:
-

 Summary: Create SSL keystore/truststore only once for each node in 
system tests
 Key: KAFKA-8074
 URL: https://issues.apache.org/jira/browse/KAFKA-8074
 Project: Kafka
  Issue Type: Task
  Components: system tests
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
 Fix For: 2.3.0


>From logs of a failing test, it looks like we may be creating keystores twice 
>for consumers. Check all services to make sure we are not unnecessarily create 
>these twice causing delays in tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2019-03-08 Thread John Roesler
Hi all,

This proposal sounds good to me, especially since we observe that people
are already confused when the see duplicate results coming out of 1:1 joins
(which is a bug). I take this as "evidence" that we're better off
eliminating those duplicates from the start. Guozhang's proposal seems like
a lightweight solution to the problem, so FWIW, I'm in favor.

Thanks,
-John

On Fri, Mar 8, 2019 at 7:59 AM Adam Bellemare 
wrote:

> Hi Guozhang
>
> That would certainly work for eliminating those duplicate values. As it
> stands right now, this would be consistent with swallowing changes due to
> out-of-order processing with multiple threads, and seems like a very
> reasonable way forward. Thank you for the suggestion!
>
> I have been trying to think if there are any other scenarios where we can
> end up with duplicates, though I have not been able to identify any others
> at the moment. I will think on it a bit more, but if anyone else has any
> ideas, please chime in.
>
> Thanks,
> Adam
>
>
>
> On Thu, Mar 7, 2019 at 8:19 PM Guozhang Wang  wrote:
>
> > One more thought regarding *c-P2: Duplicates)*: first I want to separate
> > this issue with the more general issue that today (not only foreign-key,
> > but also co-partition primary-key) table-table joins is still not
> strictly
> > respecting the timestamp ordering since the two changelog streams may be
> > fetched and hence processed out-of-order and we do not allow a record to
> be
> > joined with the other table at any given time snapshot yet. So ideally
> when
> > there are two changelog records (k1, (f-k1, v1)), (k1, (f-k1, v2)) coming
> > at the left hand table and one record (f-k1, v3) at the right hand table,
> > depending on the processing ordering we may get:
> >
> > (k1, (f-k1, v2-v3))
> >
> > or
> >
> > (k1, (f-k1, v1-v3))
> > (k1, (f-k1, v2-v3))
> >
> > And this is not to be addressed by this KIP.
> >
> > What I would advocate is to fix the issue that is introduced in this KIP
> > alone, that is we may have
> >
> > (k1, (f-k1, v2-v3))   // this should actually be v1-v3
> > (k1, (f-k1, v2-v3))
> >
> > I admit that it does not have correctness issue from the semantics along,
> > comparing it with "discarding the first result", but it may be confusing
> > from user's observation who do not expect to see the seemingly
> duplicates.
> > On the other hand, I think there's a light solution to avoid it, which is
> > that we can still optimize away to not send the full payload of "v1" from
> > left hand side to right hand side, but instead of just trimming off the
> > whole bytes, we can send, e.g., an MD5 hash of the bytes (I'm using MD5
> > here just as an example, we can definitely replace it with other
> > functions), by doing which we can discard the join operation if the hash
> > value sent back from the right hand side does not match with the left
> hand
> > side any more, i.e. we will only send:
> >
> > (k1, (f-k1, v2-v3))
> >
> > to down streams once.
> >
> > WDYT?
> >
> >
> > Guozhang
> >
> >
> > On Wed, Mar 6, 2019 at 7:58 AM Adam Bellemare 
> > wrote:
> >
> > > Ah yes, I recall it all now. That answers that question as to why I had
> > > caching disabled. I can certainly re-enable it since I believe the main
> > > concern was simply about reconciling those two iterators. A lack of
> > > knowledge there on my part.
> > >
> > >
> > > Thank you John for weighing in - we certainly both do appreciate it. I
> > > think that John hits it on the head though with his comment of "If it
> > turns
> > > out we're wrong about this, then it should be possible to fix the
> > semantics
> > > in place, without messing with the API."
> > >
> > > If anyone else would like to weigh in, your thoughts would be greatly
> > > appreciated.
> > >
> > > Thanks
> > >
> > > On Tue, Mar 5, 2019 at 6:05 PM Matthias J. Sax 
> > > wrote:
> > >
> > > > >> I dont know how to range scan over a caching store, probably one
> had
> > > > >> to open 2 iterators and merge them.
> > > >
> > > > That happens automatically. If you query a cached KTable, it ranges
> > over
> > > > the cache and the underlying RocksDB and performs the merging under
> the
> > > > hood.
> > > >
> > > > >> Other than that, I still think even the regualr join is broken
> with
> > > > >> caching enabled right?
> > > >
> > > > Why? To me, if you use the word "broker", it implies conceptually
> > > > incorrect; I don't see this.
> > > >
> > > > > I once files a ticket, because with caching
> > > > >>> enabled it would return values that havent been published
> > downstream
> > > > yet.
> > > >
> > > > For the bug report, if found
> > > > https://issues.apache.org/jira/browse/KAFKA-6599. We still need to
> fix
> > > > this, but it is a regular bug as any other, and we should not change
> a
> > > > design because of a bug.
> > > >
> > > > That range() returns values that have not been published downstream
> if
> > > > caching is enabled is how caching works and is intended behavior. Not
> > > > sure why say it'

Re: Speeding up integration tests

2019-03-08 Thread Stanislav Kozlovski
> We internally have an improvement for a half a year now which reruns the
flaky test classes at the end of the test gradle task, lets you know that
they were rerun and probably flaky. It fails the build only if the second
run of the test class was also unsuccessful. I think it works pretty good,
we mostly have green builds. If there is interest, I can try to contribute
that.

That does sound very intriguing. Does it rerun the test classes that failed
or some known, marked classes? If it is the former, I can see a lot of
value in having that automated in our PR builds. I wonder what others think
of this

On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass 
wrote:

> Hey All,
>
> Thanks for the loads of ideas.
>
> @Stanislav, @Sonke
> I probably left it out from my email but I really imagined this as a
> case-by-case basis change. If we think that it wouldn't cause problems,
> then it might be applied. That way we'd limit the blast radius somewhat.
> The 1 hour gain is really just the most optimistic scenario, I'm almost
> sure that not every test could be transformed to use a common cluster.
> We internally have an improvement for a half a year now which reruns the
> flaky test classes at the end of the test gradle task, lets you know that
> they were rerun and probably flaky. It fails the build only if the second
> run of the test class was also unsuccessful. I think it works pretty good,
> we mostly have green builds. If there is interest, I can try to contribute
> that.
>
> >I am also extremely annoyed at times by the amount of coffee I have to
> drink before tests finish
> Just please don't get a heart attack :)
>
> @Ron, @Colin
> You bring up a very good point that it is easier and frees up more
> resources if we just run change specific tests and it's good to know that a
> similar solution (meaning using a shared resource for testing) have failed
> elsewhere. I second Ron on the test categorization though, although as a
> first attempt I think using a flaky retry + running only the necessary
> tests would help in both time saving and effectiveness. Also it would be
> easier to achieve.
>
> @Ismael
> Yea, it'd be interesting to profile the startup/shutdown, I've never done
> that. Perhaps I'll set some time apart for that :). It's definitely true
> though that if we see a significant delay there we wouldn't just improve
> the efficiency of the tests but also customer experience.
>
> Best,
> Viktor
>
>
>
> On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma  wrote:
>
> > It's an idea that has come up before and worth exploring eventually.
> > However, I'd first try to optimize the server startup/shutdown process.
> If
> > we measure where the time is going, maybe some opportunities will present
> > themselves.
> >
> > Ismael
> >
> > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com
> > >
> > wrote:
> >
> > > Hi Folks,
> > >
> > > I've been observing lately that unit tests usually take 2.5 hours to
> run
> > > and a very big portion of these are the core tests where a new cluster
> is
> > > spun up for every test. This takes most of the time. I ran a test
> > > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> > and
> > > it shows for instance that running the whole class itself took 10
> minutes
> > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > > seconds, so the useful time would be between 3-4 minutes. This is a
> > bigger
> > > test though, most of them won't take this much.
> > > There are 74 classes that implement KafkaServerTestHarness and just
> > running
> > > :core:integrationTest takes almost 2 hours.
> > >
> > > I think we could greatly speed up these integration tests by just
> > creating
> > > the cluster once per class and perform the tests on separate methods. I
> > > know that this a little bit contradicts to the principle that tests
> > should
> > > be independent but it seems like recreating clusters for each is a very
> > > expensive operation. Also if the tests are acting on different
> resources
> > > (different topics, etc.) then it might not hurt their independence.
> There
> > > might be cases of course where this is not possible but I think there
> > could
> > > be a lot where it is.
> > >
> > > In the optimal case we could cut the testing time back by approximately
> > an
> > > hour. This would save resources and give quicker feedback for PR
> builds.
> > >
> > > What are your thoughts?
> > > Has anyone thought about this or were there any attempts made?
> > >
> > > Best,
> > > Viktor
> > >
> >
>


-- 
Best,
Stanislav


[jira] [Reopened] (KAFKA-7980) Flaky Test SocketServerTest#testConnectionRateLimit

2019-03-08 Thread Bill Bejeck (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reopened KAFKA-7980:

  Assignee: (was: Rajini Sivaram)

Failed again in build 
[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/20134/]
{noformat}
Error Message
java.lang.AssertionError: Connections created too quickly: 4
Stacktrace
java.lang.AssertionError: Connections created too quickly: 4
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at 
kafka.network.SocketServerTest.testConnectionRateLimit(SocketServerTest.scala:1122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at

[jira] [Created] (KAFKA-8072) Transient failure in SslSelectorTest.testCloseOldestConnectionWithMultipleStagedReceives

2019-03-08 Thread Bill Bejeck (JIRA)
Bill Bejeck created KAFKA-8072:
--

 Summary: Transient failure in 
SslSelectorTest.testCloseOldestConnectionWithMultipleStagedReceives
 Key: KAFKA-8072
 URL: https://issues.apache.org/jira/browse/KAFKA-8072
 Project: Kafka
  Issue Type: Bug
  Components: unit tests
Affects Versions: 2.3.0
Reporter: Bill Bejeck
 Fix For: 2.3.0, 2.2.1


Failed in build [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/20134/]

Stacktrace
{noformat}
Error Message
java.lang.AssertionError: Channel has bytes buffered
Stacktrace
java.lang.AssertionError: Channel has bytes buffered
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertFalse(Assert.java:65)
at 
org.apache.kafka.common.network.SelectorTest.createConnectionWithStagedReceives(SelectorTest.java:499)
at 
org.apache.kafka.common.network.SelectorTest.verifyCloseOldestConnectionWithStagedReceives(SelectorTest.java:505)
at 
org.apache.kafka.common.network.SelectorTest.testCloseOldestConnectionWithMultipleStagedReceives(SelectorTest.java:474)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionD

[jira] [Created] (KAFKA-8073) Transient failure in kafka.api.UserQuotaTest.testThrottledProducerConsumer

2019-03-08 Thread Bill Bejeck (JIRA)
Bill Bejeck created KAFKA-8073:
--

 Summary: Transient failure in 
kafka.api.UserQuotaTest.testThrottledProducerConsumer
 Key: KAFKA-8073
 URL: https://issues.apache.org/jira/browse/KAFKA-8073
 Project: Kafka
  Issue Type: Bug
  Components: unit tests
Affects Versions: 2.3.0
Reporter: Bill Bejeck
 Fix For: 2.3.0, 2.2.1


Failed in build [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/20134/]

 

Stacktrace:
{noformat}
Error Message
java.lang.AssertionError: Client with id=QuotasTestProducer-1 should have been 
throttled
Stacktrace
java.lang.AssertionError: Client with id=QuotasTestProducer-1 should have been 
throttled
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at 
kafka.api.QuotaTestClients.verifyThrottleTimeMetric(BaseQuotaTest.scala:229)
at 
kafka.api.QuotaTestClients.verifyProduceThrottle(BaseQuotaTest.scala:215)
at 
kafka.api.BaseQuotaTest.testThrottledProducerConsumer(BaseQuotaTest.scala:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispat

[jira] [Resolved] (KAFKA-7980) Flaky Test SocketServerTest#testConnectionRateLimit

2019-03-08 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-7980.
---
Resolution: Fixed
  Reviewer: Ismael Juma

> Flaky Test SocketServerTest#testConnectionRateLimit
> ---
>
> Key: KAFKA-7980
> URL: https://issues.apache.org/jira/browse/KAFKA-7980
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/25/]
> {quote}java.lang.AssertionError: Connections created too quickly: 4 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.network.SocketServerTest.testConnectionRateLimit(SocketServerTest.scala:1122){quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2019-03-08 Thread Adam Bellemare
Hi Guozhang

That would certainly work for eliminating those duplicate values. As it
stands right now, this would be consistent with swallowing changes due to
out-of-order processing with multiple threads, and seems like a very
reasonable way forward. Thank you for the suggestion!

I have been trying to think if there are any other scenarios where we can
end up with duplicates, though I have not been able to identify any others
at the moment. I will think on it a bit more, but if anyone else has any
ideas, please chime in.

Thanks,
Adam



On Thu, Mar 7, 2019 at 8:19 PM Guozhang Wang  wrote:

> One more thought regarding *c-P2: Duplicates)*: first I want to separate
> this issue with the more general issue that today (not only foreign-key,
> but also co-partition primary-key) table-table joins is still not strictly
> respecting the timestamp ordering since the two changelog streams may be
> fetched and hence processed out-of-order and we do not allow a record to be
> joined with the other table at any given time snapshot yet. So ideally when
> there are two changelog records (k1, (f-k1, v1)), (k1, (f-k1, v2)) coming
> at the left hand table and one record (f-k1, v3) at the right hand table,
> depending on the processing ordering we may get:
>
> (k1, (f-k1, v2-v3))
>
> or
>
> (k1, (f-k1, v1-v3))
> (k1, (f-k1, v2-v3))
>
> And this is not to be addressed by this KIP.
>
> What I would advocate is to fix the issue that is introduced in this KIP
> alone, that is we may have
>
> (k1, (f-k1, v2-v3))   // this should actually be v1-v3
> (k1, (f-k1, v2-v3))
>
> I admit that it does not have correctness issue from the semantics along,
> comparing it with "discarding the first result", but it may be confusing
> from user's observation who do not expect to see the seemingly duplicates.
> On the other hand, I think there's a light solution to avoid it, which is
> that we can still optimize away to not send the full payload of "v1" from
> left hand side to right hand side, but instead of just trimming off the
> whole bytes, we can send, e.g., an MD5 hash of the bytes (I'm using MD5
> here just as an example, we can definitely replace it with other
> functions), by doing which we can discard the join operation if the hash
> value sent back from the right hand side does not match with the left hand
> side any more, i.e. we will only send:
>
> (k1, (f-k1, v2-v3))
>
> to down streams once.
>
> WDYT?
>
>
> Guozhang
>
>
> On Wed, Mar 6, 2019 at 7:58 AM Adam Bellemare 
> wrote:
>
> > Ah yes, I recall it all now. That answers that question as to why I had
> > caching disabled. I can certainly re-enable it since I believe the main
> > concern was simply about reconciling those two iterators. A lack of
> > knowledge there on my part.
> >
> >
> > Thank you John for weighing in - we certainly both do appreciate it. I
> > think that John hits it on the head though with his comment of "If it
> turns
> > out we're wrong about this, then it should be possible to fix the
> semantics
> > in place, without messing with the API."
> >
> > If anyone else would like to weigh in, your thoughts would be greatly
> > appreciated.
> >
> > Thanks
> >
> > On Tue, Mar 5, 2019 at 6:05 PM Matthias J. Sax 
> > wrote:
> >
> > > >> I dont know how to range scan over a caching store, probably one had
> > > >> to open 2 iterators and merge them.
> > >
> > > That happens automatically. If you query a cached KTable, it ranges
> over
> > > the cache and the underlying RocksDB and performs the merging under the
> > > hood.
> > >
> > > >> Other than that, I still think even the regualr join is broken with
> > > >> caching enabled right?
> > >
> > > Why? To me, if you use the word "broker", it implies conceptually
> > > incorrect; I don't see this.
> > >
> > > > I once files a ticket, because with caching
> > > >>> enabled it would return values that havent been published
> downstream
> > > yet.
> > >
> > > For the bug report, if found
> > > https://issues.apache.org/jira/browse/KAFKA-6599. We still need to fix
> > > this, but it is a regular bug as any other, and we should not change a
> > > design because of a bug.
> > >
> > > That range() returns values that have not been published downstream if
> > > caching is enabled is how caching works and is intended behavior. Not
> > > sure why say it's incorrect?
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 3/5/19 1:49 AM, Jan Filipiak wrote:
> > > >
> > > >
> > > > On 04.03.2019 19:14, Matthias J. Sax wrote:
> > > >> Thanks Adam,
> > > >>
> > > >> *> Q) Range scans work with caching enabled, too. Thus, there is no
> > > >> functional/correctness requirement to disable caching. I cannot
> > > >> remember why Jan's proposal added this? It might be an
> > > >> implementation detail though (maybe just remove it from the KIP?
> > > >> -- might be miss leading).
> > > >
> > > > I dont know how to range scan over a caching store, probably one had
> > > > to open 2 iterators and merge them.
> > > >
> > > > Other than that,

[jira] [Created] (KAFKA-8071) Specify default partitions and replication factor for regex based topics in kafka

2019-03-08 Thread Abhi (JIRA)
Abhi created KAFKA-8071:
---

 Summary: Specify default partitions and replication factor for 
regex based topics in kafka
 Key: KAFKA-8071
 URL: https://issues.apache.org/jira/browse/KAFKA-8071
 Project: Kafka
  Issue Type: New Feature
  Components: controller
Affects Versions: 2.1.1
Reporter: Abhi


Is it possible to specify different default partitions and replication factor 
for topics of type foo.*? If not what is the best way to achieve this from 
producer point of view?

I know of KafkaAdmin utils but the topic creation will happen on producer and I 
don't want to give admin permissions on metadata stored in zookeeper to the 
user running the producer for security reasons.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk8 #3447

2019-03-08 Thread Apache Jenkins Server
See 


Changes:

[manikumar] KAFKA-8060: The Kafka protocol generator should allow null defaults

--
[...truncated 4.67 MB...]

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutAllWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutAllWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutIfAbsentWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutIfAbsentWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardClose 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardClose 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardFlush 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardFlush 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardInit 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardInit 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnIsOpen 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnIsOpen 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnName 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnName 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWindowStartTimestampWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWindowStartTimestampWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
PASSED

> Task :streams:streams-scala:test

org.apache.kafka.streams.scala.StreamToTableJoinScalaIntegrationTestImplicitSerdes
 > testShouldCountClicksPerRegionWithNamedRepartitionTopic STARTED

org.apache.kafka.streams.scala.StreamToTableJoinScalaIntegrationTestImplicitSerdes
 > testShouldCountClicksPerRegionWithNamedRepartitionTopic PASSED

org.apache.kafka.streams.scala.StreamToTableJoinScalaIntegrationTestImplicitSerdes
 > testShouldCountClicksPerRegionJava STARTED

org.apache.kafka.streams.scala.StreamToTableJoinScalaIntegrationTestImplicitSerdes
 > testShouldCountClicksPerRegionJava PASSED

org.apache.kafka.streams.scala.StreamToTableJoinScalaIntegrationTestImplicitSerdes
 > testShouldCountClicksPerRegion STARTED

org.apache.kafka.streams.scala.StreamToTableJoinScalaIntegrationTestImplicitSerdes
 > testShouldCountClicksPerRegion PASSED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaJoin STARTED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaJoin PASSED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaSimple STARTED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaSimple PASSED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaAggregate STARTED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaAggregate PASSED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaTransform STARTED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaTransform PASSED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWordsMaterialized 
STARTED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWordsMaterialized 
PASSED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWordsJava STARTED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWordsJava PASSED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWords STARTED

org.apache.kafka

[jira] [Created] (KAFKA-8070) System test ConsumerGroupCommandTest fails intermittently with SSL

2019-03-08 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created KAFKA-8070:
-

 Summary: System test ConsumerGroupCommandTest fails intermittently 
with SSL
 Key: KAFKA-8070
 URL: https://issues.apache.org/jira/browse/KAFKA-8070
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
 Fix For: 2.3.0, 2.2.1


Same failure occurred in both 
ConsumerGroupCommandTest.test_describe_consumer_group and 
ConsumerGroupCommandTest.test_list_consumer_groups, both with SSL enabled. 
Looks like the consumer started up successfully in both cases, but took longer 
than the 10 second timeout.

{noformat}
Consumer was too slow to start
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
 line 132, in run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
 line 189, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py",
 line 428, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/tests/kafkatest/tests/core/consumer_group_command_test.py",
 line 106, in test_describe_consumer_group
self.setup_and_verify(security_protocol, group="test-consumer-group")
  File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/tests/kafkatest/tests/core/consumer_group_command_test.py",
 line 70, in setup_and_verify
timeout_sec=10, backoff_sec=.2, err_msg="Consumer was too slow to start")
  File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/utils/util.py",
 line 41, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg)
TimeoutError: Consumer was too slow to start

{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-436 Add a metric indicating start time

2019-03-08 Thread Dongjin Lee
+1 (non binding)

2 bindings, 3 non-bindings until now. (Colin, Manikumar / Satish, Mickael,
Dongjin)

On Fri, Mar 8, 2019 at 7:44 PM Mickael Maison 
wrote:

> +1 (non binding)
> Thanks
>
> On Fri, Mar 8, 2019 at 6:39 AM Satish Duggana 
> wrote:
> >
> > Thanks for the KIP,
> > +1 (non-binding)
> >
> > ~Satish.
> >
> > On Thu, Mar 7, 2019 at 11:58 PM Manikumar 
> wrote:
> >
> > > +1 (binding).
> > >
> > > Thanks for the KIP.
> > >
> > > Thanks,
> > > Manikumar
> > >
> > >
> > > On Thu, Mar 7, 2019 at 11:52 PM Colin McCabe 
> wrote:
> > >
> > > > +1 (binding).
> > > >
> > > > Thanks, Stanislav.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > On Tue, Mar 5, 2019, at 05:23, Stanislav Kozlovski wrote:
> > > > > Hey everybody,
> > > > >
> > > > > I'd like to start a vote thread about the lightweight KIP-436
> > > > > KIP: KIP-436
> > > > > <
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-436%3A+Add+a+metric+indicating+start+time
> > > > >
> > > > > JIRA: KAFKA-7992  >
> > > > > Pull Request: 6318 
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
speakerdeck: speakerdeck.com/dongjin
*


Re: [VOTE] KIP-436 Add a metric indicating start time

2019-03-08 Thread Mickael Maison
+1 (non binding)
Thanks

On Fri, Mar 8, 2019 at 6:39 AM Satish Duggana  wrote:
>
> Thanks for the KIP,
> +1 (non-binding)
>
> ~Satish.
>
> On Thu, Mar 7, 2019 at 11:58 PM Manikumar  wrote:
>
> > +1 (binding).
> >
> > Thanks for the KIP.
> >
> > Thanks,
> > Manikumar
> >
> >
> > On Thu, Mar 7, 2019 at 11:52 PM Colin McCabe  wrote:
> >
> > > +1 (binding).
> > >
> > > Thanks, Stanislav.
> > >
> > > best,
> > > Colin
> > >
> > > On Tue, Mar 5, 2019, at 05:23, Stanislav Kozlovski wrote:
> > > > Hey everybody,
> > > >
> > > > I'd like to start a vote thread about the lightweight KIP-436
> > > > KIP: KIP-436
> > > > <
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-436%3A+Add+a+metric+indicating+start+time
> > > >
> > > > JIRA: KAFKA-7992 
> > > > Pull Request: 6318 
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >


[jira] [Reopened] (KAFKA-7976) Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable

2019-03-08 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reopened KAFKA-7976:
---

Test has failed again, so the fix wasn't sufficient.

> Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable
> ---
>
> Key: KAFKA-7976
> URL: https://issues.apache.org/jira/browse/KAFKA-7976
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/28/]
> {quote}java.lang.AssertionError: Unclean leader not elected
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable(DynamicBrokerReconfigurationTest.scala:488){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk8 #3446

2019-03-08 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-7831; Do not modify subscription state from background thread

--
[...truncated 4.68 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopic[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopic[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldForwardRecordsFromSubtopologyToSubtopology[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldForwardRecordsFromSubtopologyToSubtopology[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCreateStateDirectoryForStatefulTopology[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCreateStateDirectoryForStatefulTopology[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfWallClockTimeAdvances[Eos enabled = true] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfWallClockTimeAdvances[Eos enabled = true] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldCloseProcessor[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldCloseProcessor[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFeedStoreFromGlobalKTable[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFeedStoreFromGlobalKTable[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCleanUpPersistentStateStoresOnClose[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCleanUpPersistentStateStoresOnClose[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowPatternNotValidForTopicN