[jira] [Updated] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is invalid

2019-12-03 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-6266:
---
Fix Version/s: 2.4.1
   2.5.0

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0, 1.0.1
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Priority: Major
> Fix For: 2.5.0, 2.4.1
>
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is inval

2019-12-03 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987550#comment-16987550
 ] 

Jun Rao commented on KAFKA-6266:


Looking at the code where the "Resetting first dirty offset " warning is 
emitted. It seems that when we only return the new cleaning offset based on log 
start offset but not writing the new value to the cleaner offset checkpoint 
file. So, it the case when there is nothing to clean, this warning will repeat 
every time the log cleaner checks for logs to clean. To fix this, we should 
overwrite the cleaner offset checkpoint file as well.

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0, 1.0.1
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Priority: Major
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9261) NPE when updating client metadata

2019-12-03 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-9261:
---
Fix Version/s: 2.3.2
   2.4.0

> NPE when updating client metadata
> -
>
> Key: KAFKA-9261
> URL: https://issues.apache.org/jira/browse/KAFKA-9261
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.4.0, 2.3.2
>
>
> We have seen the following exception recently:
> {code}
> java.lang.NullPointerException
>   at java.base/java.util.Objects.requireNonNull(Objects.java:221)
>   at org.apache.kafka.common.Cluster.(Cluster.java:134)
>   at org.apache.kafka.common.Cluster.(Cluster.java:89)
>   at 
> org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:82)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:58)
>   at 
> org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325)
>   at org.apache.kafka.clients.Metadata.update(Metadata.java:252)
>   at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
> {code}
> The client assumes that if a leader is included in the response, then node 
> information must also be available. There are at least a couple possible 
> reasons this assumption can fail:
> 1. The client is able to detect stale partition metadata using leader epoch 
> information available. If stale partition metadata is detected, the client 
> ignores it and uses the last known metadata. However, it cannot detect stale 
> broker information and will always accept the latest update. This means that 
> the latest metadata may be a mix of multiple metadata responses and therefore 
> the invariant will not generally hold.
> 2. There is no lock which protects both the fetching of partition metadata 
> and the live broker when handling a Metadata request. This means an 
> UpdateMetadata request can arrive concurrently and break the intended 
> invariant.
> It seems case 2 has been possible for a long time, but it should be extremely 
> rare. Case 1 was only made possible with KIP-320, which added the leader 
> epoch tracking. It should also be rare, but the window for inconsistent 
> metadata is probably a bit bigger than the window for a concurrent update.
> To fix this, we should make the client more defensive about metadata updates 
> and not assume that the leader is among the live endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9258) Connect ConnectorStatusMetricsGroup sometimes NPE

2019-12-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-9258:


Assignee: Cyrus Vafadari

> Connect ConnectorStatusMetricsGroup sometimes NPE
> -
>
> Key: KAFKA-9258
> URL: https://issues.apache.org/jira/browse/KAFKA-9258
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.5.0
>Reporter: Cyrus Vafadari
>Assignee: Cyrus Vafadari
>Priority: Blocker
> Fix For: 2.4.0, 2.5.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.runtime.Worker$ConnectorStatusMetricsGroup.recordTaskRemoved(Worker.java:901)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTask(Worker.java:720)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTasks(Worker.java:740)
>   at 
> org.apache.kafka.connect.runtime.Worker.stopAndAwaitTasks(Worker.java:758)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.processTaskConfigUpdatesWithIncrementalCooperative(DistributedHerder.java:575)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:396)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:282)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9258) Connect ConnectorStatusMetricsGroup sometimes NPE

2019-12-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-9258.
--
  Reviewer: Randall Hauch
Resolution: Fixed

Merged into `trunk` and backported to the `2.4` branch after this was approved 
as a blocker for AK 2.4.0

> Connect ConnectorStatusMetricsGroup sometimes NPE
> -
>
> Key: KAFKA-9258
> URL: https://issues.apache.org/jira/browse/KAFKA-9258
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.5.0
>Reporter: Cyrus Vafadari
>Assignee: Cyrus Vafadari
>Priority: Blocker
> Fix For: 2.4.0, 2.5.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.runtime.Worker$ConnectorStatusMetricsGroup.recordTaskRemoved(Worker.java:901)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTask(Worker.java:720)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTasks(Worker.java:740)
>   at 
> org.apache.kafka.connect.runtime.Worker.stopAndAwaitTasks(Worker.java:758)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.processTaskConfigUpdatesWithIncrementalCooperative(DistributedHerder.java:575)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:396)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:282)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9261) NPE when updating client metadata

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987487#comment-16987487
 ] 

ASF GitHub Bot commented on KAFKA-9261:
---

ijuma commented on pull request #7772: KAFKA-9261; Client should handle 
inconsistent leader metadata
URL: https://github.com/apache/kafka/pull/7772
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NPE when updating client metadata
> -
>
> Key: KAFKA-9261
> URL: https://issues.apache.org/jira/browse/KAFKA-9261
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> We have seen the following exception recently:
> {code}
> java.lang.NullPointerException
>   at java.base/java.util.Objects.requireNonNull(Objects.java:221)
>   at org.apache.kafka.common.Cluster.(Cluster.java:134)
>   at org.apache.kafka.common.Cluster.(Cluster.java:89)
>   at 
> org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:82)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:58)
>   at 
> org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325)
>   at org.apache.kafka.clients.Metadata.update(Metadata.java:252)
>   at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
> {code}
> The client assumes that if a leader is included in the response, then node 
> information must also be available. There are at least a couple possible 
> reasons this assumption can fail:
> 1. The client is able to detect stale partition metadata using leader epoch 
> information available. If stale partition metadata is detected, the client 
> ignores it and uses the last known metadata. However, it cannot detect stale 
> broker information and will always accept the latest update. This means that 
> the latest metadata may be a mix of multiple metadata responses and therefore 
> the invariant will not generally hold.
> 2. There is no lock which protects both the fetching of partition metadata 
> and the live broker when handling a Metadata request. This means an 
> UpdateMetadata request can arrive concurrently and break the intended 
> invariant.
> It seems case 2 has been possible for a long time, but it should be extremely 
> rare. Case 1 was only made possible with KIP-320, which added the leader 
> epoch tracking. It should also be rare, but the window for inconsistent 
> metadata is probably a bit bigger than the window for a concurrent update.
> To fix this, we should make the client more defensive about metadata updates 
> and not assume that the leader is among the live endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9088) Consolidate InternalMockProcessorContext and MockInternalProcessorContext

2019-12-03 Thread John Roesler (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987478#comment-16987478
 ] 

John Roesler commented on KAFKA-9088:
-

Hey all,

I can see I've missed some discussions here. I've read over the ticket and also 
looked over the highlights of the PR.

I'm afraid I really don't think that EasyMock is going to be a good approach 
here. It's a great library for when you need to mock things that are painful to 
implement, or when you need to play through different specific scenarios in 
different tests.

But at the same time, it can be a major pain to work with. Tracing the 
execution with the debugger is virtually impossible: as soon as you 
accidentally descend into the mock, you're deep into Java black magic. Figuring 
out when to set up expectations and when to replay them is deeply confusing. 
And paradoxically, just setting up a mock to consistently behave in a "normal" 
way takes an obnoxious amount of code.

My experience says that if what you're trying to mock has a relatively simple 
interface and state space, then just go ahead and implement a "manually mocked" 
implementation. Only reach for EasyMock when this basic approach is too 
difficult for some reason.

If I understand this ticket, the motivation was _only_ that there exist two 
similar classes in the Apache codebase, not that there is anything actually 
wrong with either of them. In that case, we shouldn't switch to EasyMock. We 
should just consolidate the implementations... Or not! I didn't see any actual 
justification for _why_ it's a problem to have two similar classes, and if 
consolidating them results in a lot of difficulty, or a lot of extra code, then 
consolidating is the wrong thing to do.

Either way, Just looking at the PR, it seems that this _consolidation_ strategy 
results in adding _double_ the lines of code that we're removing.

I know that this must have taken a ton of work, so I've been sitting here 
trying to thing of something else to say, but it really feels like we need to 
go back to the drawing board here and ask ourselves what we're trying to 
accomplish, and at what cost.

Thanks,
John

> Consolidate InternalMockProcessorContext and MockInternalProcessorContext
> -
>
> Key: KAFKA-9088
> URL: https://issues.apache.org/jira/browse/KAFKA-9088
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Bruno Cadonna
>Assignee: Pierangelo Di Pilato
>Priority: Minor
>  Labels: newbie
>
> Currently, we have two mocks for the {{InternalProcessorContext}}. The goal 
> of this ticket is to merge both into one mock or replace it with an 
> {{EasyMock}} mock. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9258) Connect ConnectorStatusMetricsGroup sometimes NPE

2019-12-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9258:
-
Fix Version/s: 2.5.0

> Connect ConnectorStatusMetricsGroup sometimes NPE
> -
>
> Key: KAFKA-9258
> URL: https://issues.apache.org/jira/browse/KAFKA-9258
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Cyrus Vafadari
>Priority: Blocker
> Fix For: 2.4.0, 2.5.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.runtime.Worker$ConnectorStatusMetricsGroup.recordTaskRemoved(Worker.java:901)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTask(Worker.java:720)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTasks(Worker.java:740)
>   at 
> org.apache.kafka.connect.runtime.Worker.stopAndAwaitTasks(Worker.java:758)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.processTaskConfigUpdatesWithIncrementalCooperative(DistributedHerder.java:575)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:396)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:282)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9258) Connect ConnectorStatusMetricsGroup sometimes NPE

2019-12-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9258:
-
Affects Version/s: 2.5.0

> Connect ConnectorStatusMetricsGroup sometimes NPE
> -
>
> Key: KAFKA-9258
> URL: https://issues.apache.org/jira/browse/KAFKA-9258
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.5.0
>Reporter: Cyrus Vafadari
>Priority: Blocker
> Fix For: 2.4.0, 2.5.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.runtime.Worker$ConnectorStatusMetricsGroup.recordTaskRemoved(Worker.java:901)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTask(Worker.java:720)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTasks(Worker.java:740)
>   at 
> org.apache.kafka.connect.runtime.Worker.stopAndAwaitTasks(Worker.java:758)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.processTaskConfigUpdatesWithIncrementalCooperative(DistributedHerder.java:575)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:396)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:282)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9258) Connect ConnectorStatusMetricsGroup sometimes NPE

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987477#comment-16987477
 ] 

ASF GitHub Bot commented on KAFKA-9258:
---

rhauch commented on pull request #7768: KAFKA-9258 Check Connect Metrics 
non-null in task stop
URL: https://github.com/apache/kafka/pull/7768
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Connect ConnectorStatusMetricsGroup sometimes NPE
> -
>
> Key: KAFKA-9258
> URL: https://issues.apache.org/jira/browse/KAFKA-9258
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Cyrus Vafadari
>Priority: Blocker
> Fix For: 2.4.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.runtime.Worker$ConnectorStatusMetricsGroup.recordTaskRemoved(Worker.java:901)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTask(Worker.java:720)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTasks(Worker.java:740)
>   at 
> org.apache.kafka.connect.runtime.Worker.stopAndAwaitTasks(Worker.java:758)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.processTaskConfigUpdatesWithIncrementalCooperative(DistributedHerder.java:575)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:396)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:282)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9073) Kafka Streams State stuck in rebalancing after one of the StreamThread encounters java.lang.IllegalStateException: No current assignment for partition

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987450#comment-16987450
 ] 

ASF GitHub Bot commented on KAFKA-9073:
---

guozhangwang commented on pull request #7630: KAFKA-9073: check assignment in 
requestFailed to avoid NPE
URL: https://github.com/apache/kafka/pull/7630
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Kafka Streams State stuck in rebalancing after one of the StreamThread 
> encounters java.lang.IllegalStateException: No current assignment for 
> partition
> --
>
> Key: KAFKA-9073
> URL: https://issues.apache.org/jira/browse/KAFKA-9073
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: amuthan Ganeshan
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: KAFKA-9073.log
>
>
> I have a Kafka stream application that stores the incoming messages into a 
> state store, and later during the punctuation period, we store them into a 
> big data persistent store after processing the messages.
> The application consumes from 120 partitions distributed across 40 instances. 
> The application has been running fine without any problem for months, but all 
> of a sudden some of the instances failed because of a stream thread exception 
> saying  
> ```java.lang.IllegalStateException: No current assignment for partition 
> --changelog-98```
>  
> And other instances are stuck in the REBALANCING state, and never comes out 
> of it. Here is the full stack trace, I just masked the application-specific 
> app name and store name in the stack trace due to NDA.
>  
> ```
> 2019-10-21 13:27:13,481 ERROR 
> [application.id-a2c06c51-bfc7-449a-a094-d8b770caee92-StreamThread-3] 
> [org.apache.kafka.streams.processor.internals.StreamThread] [] stream-thread 
> [application.id-a2c06c51-bfc7-449a-a094-d8b770caee92-StreamThread-3] 
> Encountered the following error during processing:
> java.lang.IllegalStateException: No current assignment for partition 
> application.id-store_name-changelog-98
>  at 
> org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:319)
>  at 
> org.apache.kafka.clients.consumer.internals.SubscriptionState.requestFailed(SubscriptionState.java:618)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher$2.onFailure(Fetcher.java:709)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFutureAdapter.onFailure(RequestFutureAdapter.java:30)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:574)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeUpdateStandbyTasks(StreamThread.java:1126)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:923)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
> ```
>  
> Now I checked the state sore disk usage; it is less than 40% of the total 
> disk space available. Restarting the application solves the problem for a 
> short amount of time, but the error popping up randomly on some other 
> instances quickly. 

[jira] [Commented] (KAFKA-9173) StreamsPartitionAssignor assigns partitions to only one worker

2019-12-03 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987420#comment-16987420
 ] 

Guozhang Wang commented on KAFKA-9173:
--

[~o.muravskiy] Note that the num.tasks are not equal to num.partitions, as many 
partitions of different topics can map to one task. Although I did not see your 
source code, from the logs I think 21 topics are mapped together into one 
sub-topology (if you do not understand the concept of sub-topology you can read 
it in the web docs here: 
https://docs.confluent.io/current/streams/architecture.html), and hence you 
still only have 10 tasks instead of 210 tasks.

> StreamsPartitionAssignor assigns partitions to only one worker
> --
>
> Key: KAFKA-9173
> URL: https://issues.apache.org/jira/browse/KAFKA-9173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Oleg Muravskiy
>Priority: Major
>  Labels: user-experience
> Attachments: StreamsPartitionAssignor.log
>
>
> I'm running a distributed KafkaStreams application on 10 worker nodes, 
> subscribed to 21 topics with 10 partitions in each. I'm only using a 
> Processor interface, and a persistent state store.
> However, only one worker gets assigned partitions, all other workers get 
> nothing. Restarting the application, or cleaning local state stores does not 
> help. StreamsPartitionAssignor migrates to other nodes, and eventually picks 
> up other node to assign partitions to, but still only one node.
> It's difficult to figure out where to look for the signs of problems, I'm 
> attaching the log messages from the StreamsPartitionAssignor. Let me know 
> what else I could provide to help resolve this.
> [^StreamsPartitionAssignor.log]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9173) StreamsPartitionAssignor assigns partitions to only one worker

2019-12-03 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987410#comment-16987410
 ] 

Sophie Blee-Goldman commented on KAFKA-9173:


What exactly do you mean by "subscribed to 21 topics"? For example are you 
using pattern subscription and have 21 input topics you expect to match, or are 
you using 21 different topics in the same topology? 

> StreamsPartitionAssignor assigns partitions to only one worker
> --
>
> Key: KAFKA-9173
> URL: https://issues.apache.org/jira/browse/KAFKA-9173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Oleg Muravskiy
>Priority: Major
>  Labels: user-experience
> Attachments: StreamsPartitionAssignor.log
>
>
> I'm running a distributed KafkaStreams application on 10 worker nodes, 
> subscribed to 21 topics with 10 partitions in each. I'm only using a 
> Processor interface, and a persistent state store.
> However, only one worker gets assigned partitions, all other workers get 
> nothing. Restarting the application, or cleaning local state stores does not 
> help. StreamsPartitionAssignor migrates to other nodes, and eventually picks 
> up other node to assign partitions to, but still only one node.
> It's difficult to figure out where to look for the signs of problems, I'm 
> attaching the log messages from the StreamsPartitionAssignor. Let me know 
> what else I could provide to help resolve this.
> [^StreamsPartitionAssignor.log]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9231) Streams Threads may die from recoverable errors with EOS enabled

2019-12-03 Thread John Roesler (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987395#comment-16987395
 ] 

John Roesler edited comment on KAFKA-9231 at 12/4/19 1:14 AM:
--

Fixed in 2.4 by 
https://github.com/apache/kafka/commit/d7fe494b2a983256092bcc50eac8eab8eb8a6163

And in trunk by 
https://github.com/apache/kafka/commit/18c13d38ed7090801f125088e3eaec7e3c85c09d


was (Author: vvcephei):
Fixed in 2.4 by 
https://github.com/apache/kafka/commit/d7fe494b2a983256092bcc50eac8eab8eb8a6163

> Streams Threads may die from recoverable errors with EOS enabled
> 
>
> Key: KAFKA-9231
> URL: https://issues.apache.org/jira/browse/KAFKA-9231
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.0
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Blocker
> Fix For: 2.4.0
>
>
> While testing Streams in EOS mode under frequent and heavy network 
> partitions, I've encountered the following error, leading to thread death:
> {noformat}
> [2019-11-26 04:54:02,650] ERROR 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> Encountered the following unexpected Kafka exception during processing, this 
> usually indicate Streams internal errors: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Failed 
> to rebalance.
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:739)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
> Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] failed 
> to suspend stream tasks
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:253)
>   at 
> org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsRevoked(StreamsRebalanceListener.java:116)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsRevoked(ConsumerCoordinator.java:291)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onLeavePrepare(ConsumerCoordinator.java:707)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.unsubscribe(KafkaConsumer.java:1073)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.enforceRebalance(StreamThread.java:716)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:710)
>   ... 1 more
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: task 
> [1_1] Failed to flush state store KSTREAM-AGGREGATE-STATE-STORE-07
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:279)
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:175)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:581)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:535)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:660)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:628)
>   at 
> org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendRunningTasks(AssignedStreamsTasks.java:145)
>   at 
> org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendOrCloseTasks(AssignedStreamsTasks.java:128)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:246)
>   ... 7 more
> Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer 
> attempted an operation with an old epoch. Either there is a newer producer 
> with the same transactionalId, or the producer's transaction has been expired 
> by the broker.
> [2019-11-26 04:54:02,650] INFO 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] State 
> transition from PARTITIONS_REVOKED to PENDING_SHUTDOWN 
> 

[jira] [Resolved] (KAFKA-9231) Streams Threads may die from recoverable errors with EOS enabled

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-9231.
-
Resolution: Fixed

Fixed in 2.4 by 
https://github.com/apache/kafka/commit/d7fe494b2a983256092bcc50eac8eab8eb8a6163

> Streams Threads may die from recoverable errors with EOS enabled
> 
>
> Key: KAFKA-9231
> URL: https://issues.apache.org/jira/browse/KAFKA-9231
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.0
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Blocker
> Fix For: 2.4.0
>
>
> While testing Streams in EOS mode under frequent and heavy network 
> partitions, I've encountered the following error, leading to thread death:
> {noformat}
> [2019-11-26 04:54:02,650] ERROR 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> Encountered the following unexpected Kafka exception during processing, this 
> usually indicate Streams internal errors: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Failed 
> to rebalance.
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:739)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
> Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] failed 
> to suspend stream tasks
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:253)
>   at 
> org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsRevoked(StreamsRebalanceListener.java:116)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsRevoked(ConsumerCoordinator.java:291)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onLeavePrepare(ConsumerCoordinator.java:707)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.unsubscribe(KafkaConsumer.java:1073)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.enforceRebalance(StreamThread.java:716)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:710)
>   ... 1 more
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: task 
> [1_1] Failed to flush state store KSTREAM-AGGREGATE-STATE-STORE-07
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:279)
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:175)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:581)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:535)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:660)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:628)
>   at 
> org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendRunningTasks(AssignedStreamsTasks.java:145)
>   at 
> org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendOrCloseTasks(AssignedStreamsTasks.java:128)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:246)
>   ... 7 more
> Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer 
> attempted an operation with an old epoch. Either there is a newer producer 
> with the same transactionalId, or the producer's transaction has been expired 
> by the broker.
> [2019-11-26 04:54:02,650] INFO 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] State 
> transition from PARTITIONS_REVOKED to PENDING_SHUTDOWN 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2019-11-26 04:54:02,650] INFO 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> Shutting down 

[jira] [Commented] (KAFKA-9231) Streams Threads may die from recoverable errors with EOS enabled

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987393#comment-16987393
 ] 

ASF GitHub Bot commented on KAFKA-9231:
---

vvcephei commented on pull request #7748: KAFKA-9231: Streams Threads may die 
from recoverable errors with EOS enabled
URL: https://github.com/apache/kafka/pull/7748
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Streams Threads may die from recoverable errors with EOS enabled
> 
>
> Key: KAFKA-9231
> URL: https://issues.apache.org/jira/browse/KAFKA-9231
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.0
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Blocker
> Fix For: 2.4.0
>
>
> While testing Streams in EOS mode under frequent and heavy network 
> partitions, I've encountered the following error, leading to thread death:
> {noformat}
> [2019-11-26 04:54:02,650] ERROR 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> Encountered the following unexpected Kafka exception during processing, this 
> usually indicate Streams internal errors: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Failed 
> to rebalance.
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:739)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
> Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] failed 
> to suspend stream tasks
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:253)
>   at 
> org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsRevoked(StreamsRebalanceListener.java:116)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsRevoked(ConsumerCoordinator.java:291)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onLeavePrepare(ConsumerCoordinator.java:707)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.unsubscribe(KafkaConsumer.java:1073)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.enforceRebalance(StreamThread.java:716)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:710)
>   ... 1 more
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: task 
> [1_1] Failed to flush state store KSTREAM-AGGREGATE-STATE-STORE-07
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:279)
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:175)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:581)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:535)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:660)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:628)
>   at 
> org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendRunningTasks(AssignedStreamsTasks.java:145)
>   at 
> org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendOrCloseTasks(AssignedStreamsTasks.java:128)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:246)
>   ... 7 more
> Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer 
> attempted an operation with an old epoch. Either there is a newer producer 
> with the same transactionalId, or the producer's transaction has been expired 
> by the broker.
> [2019-11-26 04:54:02,650] INFO 
> [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] 
> stream-thread 
> 

[jira] [Commented] (KAFKA-8948) Flaky Test StreamsEosTest#test_rebalance_complex is broken for multiple runs

2019-12-03 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987334#comment-16987334
 ] 

Sophie Blee-Goldman commented on KAFKA-8948:


Turns out this was most likely a real issue that just happened to disappear 
again for some reason – a bug in this system test was discovered to be causing 
failures after subsequent runs on the same machine since it did not properly 
clean up after the test. That bug (and most likely also this ticket) should be 
fixed by [https://github.com/apache/kafka/pull/7693]

> Flaky Test StreamsEosTest#test_rebalance_complex is broken for multiple runs
> 
>
> Key: KAFKA-8948
> URL: https://issues.apache.org/jira/browse/KAFKA-8948
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Blocker
>  Labels: system-tests
> Attachments: KafkaService-worker5-server-start-stdout-stderr.log.tgz, 
> KafkaService-worker6-debug.tgz, 
> KafkaService-worker6-server-start-stdout-stderr.log.tgz, 
> KafkaService-worker7-debug.tgz, 
> KafkaService-worker7-server-start-stdout-stderr.log.tgz, 
> StreamsComplexEosTestDriverService-streams.log.tgz, 
> StreamsComplexEosTestDriverService-streams.stderr.tgz, 
> StreamsComplexEosTestDriverService-streams.stdout.tgz, 
> StreamsComplexEosTestJobRunningService-worker10-streams.log.tgz, 
> StreamsComplexEosTestJobRunningService-worker10-streams.stderr.tgz, 
> StreamsComplexEosTestJobRunningService-worker10-streams.stdout.tgz, 
> StreamsComplexEosTestJobRunningService-worker11-streams.log.tgz, 
> StreamsComplexEosTestJobRunningService-worker11-streams.stderr.tgz, 
> StreamsComplexEosTestJobRunningService-worker11-streams.stdout.tgz, 
> StreamsComplexEosTestJobRunningService-worker9-streams.log.tgz, 
> StreamsComplexEosTestJobRunningService-worker9-streams.stderr.tgz, 
> StreamsComplexEosTestJobRunningService-worker9-streams.stdout.tgz, 
> StreamsComplexEosTestVerifyRunnerService-streams.log.tgz, 
> StreamsComplexEosTestVerifyRunnerService-streams.stderr.tgz, 
> StreamsComplexEosTestVerifyRunnerService-streams.stdout.tgz, report.txt, 
> test_log.debug, test_log.info
>
>
> When kicking off multiple runs of the system tests with `–repeat N` the 
> StreamsEosTest#test_rebalance_complex passes the first time but fails for all 
> subsequent runs. Logs from failing run attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9265) kafka.log.Log instances are leaking on log delete

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987300#comment-16987300
 ] 

ASF GitHub Bot commented on KAFKA-9265:
---

soondenana commented on pull request #7773: KAFKA-9265: Fix kafka.log.Log 
instance leak on log deletion
URL: https://github.com/apache/kafka/pull/7773
 
 
   KAFKA-8448 fixes problem with similar leak. The Log objects are being
   held in ScheduledExecutor PeriodicProducerExpirationCheck callback. The
   fix in KAFKA-8448 was to change the policy of ScheduledExecutor to
   remove the scheduled task when it gets canceled (by calling
   setRemoveOnCancelPolicy(true)).
   
   This works when a log is closed using close() method. But when a log is
   deleted either when the topic gets deleted or when the rebalancing
   operation moves the replica away from broker, the delete() operation is
   invoked. Log.delete() doesn't close the pending scheduled task and that
   leaks Log instance.
   
   Fix is to close the scheduled task in the Log.delete() method too.
   
   Tested with and without this fix and Log instances are no longer leaked
   and scheduled tasks are gone once log is deleted.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> kafka.log.Log instances are leaking on log delete
> -
>
> Key: KAFKA-9265
> URL: https://issues.apache.org/jira/browse/KAFKA-9265
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vikas Singh
>Assignee: Vikas Singh
>Priority: Major
>
> KAFKA-8448 fixes problem with similar leak. The {{Log}} objects are being 
> held in {{ScheduledExecutor}} {{PeriodicProducerExpirationCheck}} callback. 
> The fix in KAFKA-8448 was to change the policy of {{ScheduledExecutor}} to 
> remove the scheduled task when it gets canceled (by calling 
> {{setRemoveOnCancelPolicy(true)}}).
> This works when a log is closed using {{close()}} method. But when a log is 
> deleted either when the topic gets deleted or when the rebalancing operation 
> moves the replica away from broker, the {{delete()}} operation is invoked. 
> {{Log.delete()}} doesn't close the pending scheduled task and that leaks Log 
> instance.
> Fix is to close the scheduled task in the {{Log.delete()}} method too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9261) NPE when updating client metadata

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987293#comment-16987293
 ] 

ASF GitHub Bot commented on KAFKA-9261:
---

hachikuji commented on pull request #7772: KAFKA-9261; Client should handle 
inconsistent leader metadata
URL: https://github.com/apache/kafka/pull/7772
 
 
   This is a reduced scope fix. The purpose of this patch is to ensure that 
partition leader state is kept in sync with broker metadata in `MetadataCache` 
and consequently in `Cluster`. Due to the possibility of metadata event 
reordering, it was possible for this state to be inconsistent which could lead 
to an NPE in some cases. The test case here provides a specific scenario where 
this could happen.
   
   Also see #7770 for additional detail. 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NPE when updating client metadata
> -
>
> Key: KAFKA-9261
> URL: https://issues.apache.org/jira/browse/KAFKA-9261
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> We have seen the following exception recently:
> {code}
> java.lang.NullPointerException
>   at java.base/java.util.Objects.requireNonNull(Objects.java:221)
>   at org.apache.kafka.common.Cluster.(Cluster.java:134)
>   at org.apache.kafka.common.Cluster.(Cluster.java:89)
>   at 
> org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:82)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:58)
>   at 
> org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325)
>   at org.apache.kafka.clients.Metadata.update(Metadata.java:252)
>   at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
> {code}
> The client assumes that if a leader is included in the response, then node 
> information must also be available. There are at least a couple possible 
> reasons this assumption can fail:
> 1. The client is able to detect stale partition metadata using leader epoch 
> information available. If stale partition metadata is detected, the client 
> ignores it and uses the last known metadata. However, it cannot detect stale 
> broker information and will always accept the latest update. This means that 
> the latest metadata may be a mix of multiple metadata responses and therefore 
> the invariant will not generally hold.
> 2. There is no lock which protects both the fetching of partition metadata 
> and the live broker when handling a Metadata request. This means an 
> UpdateMetadata request can arrive concurrently and break the intended 
> invariant.
> It seems case 2 has been possible for a long time, but it should be extremely 
> rare. Case 1 was only made possible with KIP-320, which added the leader 
> epoch tracking. It should also be rare, but the window for inconsistent 
> metadata is probably a bit bigger than the window for a concurrent update.
> To fix this, we should make the client more defensive about metadata updates 
> and not assume that the leader is among the live endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7016) Reconsider the "avoid the expensive and useless stack trace for api exceptions" practice

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987289#comment-16987289
 ] 

ASF GitHub Bot commented on KAFKA-7016:
---

ableegoldman commented on pull request #7756: KAFKA-7016: Don't fill in stack 
traces
URL: https://github.com/apache/kafka/pull/7756
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reconsider the "avoid the expensive and useless stack trace for api 
> exceptions" practice
> 
>
> Key: KAFKA-7016
> URL: https://issues.apache.org/jira/browse/KAFKA-7016
> Project: Kafka
>  Issue Type: Bug
>Reporter: Martin Vysny
>Priority: Major
>
> I am trying to write a Kafka Consumer; upon running it only prints out:
> {\{ org.apache.kafka.common.errors.InvalidGroupIdException: The configured 
> groupId is invalid}}
> Note that the stack trace is missing, so that I have no information which 
> part of my code is bad and need fixing; I also have no information which 
> Kafka Client method has been called. Upon closer examination I found this in 
> ApiException:
>  
> {{/* avoid the expensive and useless stack trace for api exceptions */}}
>  {{@Override}}
>  {{public Throwable fillInStackTrace() {}}
>  \{{ return this;}}
>  {{}}}
>  
> I think it is a bad practice to hide all useful debugging info and trade it 
> for dubious performance gains. Exceptions are for exceptional code flow which 
> are allowed to be slow.
>  
> This applies to kafka-clients 1.1.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9265) kafka.log.Log instances are leaking on log delete

2019-12-03 Thread Vikas Singh (Jira)
Vikas Singh created KAFKA-9265:
--

 Summary: kafka.log.Log instances are leaking on log delete
 Key: KAFKA-9265
 URL: https://issues.apache.org/jira/browse/KAFKA-9265
 Project: Kafka
  Issue Type: Bug
Reporter: Vikas Singh


KAFKA-8448 fixes problem with similar leak. The {{Log}} objects are being held 
in {{ScheduledExecutor}} {{PeriodicProducerExpirationCheck}} callback. The fix 
in KAFKA-8448 was to change the policy of {{ScheduledExecutor}} to remove the 
scheduled task when it gets canceled (by calling 
{{setRemoveOnCancelPolicy(true)}}).

This works when a log is closed using {{close()}} method. But when a log is 
deleted either when the topic gets deleted or when the rebalancing operation 
moves the replica away from broker, the {{delete()}} operation is invoked. 
{{Log.delete()}} doesn't close the pending scheduled task and that leaks Log 
instance.

Fix is to close the scheduled task in the {{Log.delete()}} method too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9265) kafka.log.Log instances are leaking on log delete

2019-12-03 Thread Vikas Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Singh reassigned KAFKA-9265:
--

Assignee: Vikas Singh

> kafka.log.Log instances are leaking on log delete
> -
>
> Key: KAFKA-9265
> URL: https://issues.apache.org/jira/browse/KAFKA-9265
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vikas Singh
>Assignee: Vikas Singh
>Priority: Major
>
> KAFKA-8448 fixes problem with similar leak. The {{Log}} objects are being 
> held in {{ScheduledExecutor}} {{PeriodicProducerExpirationCheck}} callback. 
> The fix in KAFKA-8448 was to change the policy of {{ScheduledExecutor}} to 
> remove the scheduled task when it gets canceled (by calling 
> {{setRemoveOnCancelPolicy(true)}}).
> This works when a log is closed using {{close()}} method. But when a log is 
> deleted either when the topic gets deleted or when the rebalancing operation 
> moves the replica away from broker, the {{delete()}} operation is invoked. 
> {{Log.delete()}} doesn't close the pending scheduled task and that leaks Log 
> instance.
> Fix is to close the scheduled task in the {{Log.delete()}} method too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2019-12-03 Thread John Roesler (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987281#comment-16987281
 ] 

John Roesler commented on KAFKA-6718:
-

Hey [~lkokhreidze]. I know Matthias pinged me a while back... I've just been 
re-queuing the task to look at this ticket every day to the next day. Sorry 
about that.

I actually don't think that there's anything specific to worry about with 
respect to KIP-441. In my opinion, you can just design your feature against the 
current state of Streams, and whoever loses the race would just have to deal 
with adjusting the implementation to take the other feature into account.

For what it's worth, though, I don't think there's too much semantic overlap, 
just the practical overlap that both efforts affect the same protocol and 
module. That's why I don't think we need to wait for each other _or_ try to 
unify the efforts.

I don't know if you have any specific ideas, and I haven't had a chance to look 
into the details of the old work on this ticket. (This is what I haven't been 
able to find time to do). But sharing my immediate thoughts, I've used the 
"rack awareness" feature in Elasticsearch, and felt that it was a pretty 
straightforward and reasonable way to implement it. See 
https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html
 for more information.

Basically, they let you add some arbitrary "tags"/"attrs" to each instance (in 
the config), and then there's a "rack awareness" config that takes a list of 
"tags" to consider. It's a pretty simple, but also powerful, arrangement. You 
might also want to consider the "forced awareness" section, which is in 
response to a concern that also applies to us.

Thanks for picking it up!

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9184) Redundant task creation and periodic rebalances after zombie worker rejoins the group

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987279#comment-16987279
 ] 

ASF GitHub Bot commented on KAFKA-9184:
---

kkonstantine commented on pull request #7771: KAFKA-9184: Redundant task 
creation and periodic rebalances after zombie Connect worker rejoins the group
URL: https://github.com/apache/kafka/pull/7771
 
 
   Zombie workers, defined as workers that lose connectivity with the Kafka 
broker coordinator and get kicked out of the group but don't experience a jvm 
restart, have been keeping their tasks running. This side-effect is more 
disrupting with the new Incremental Cooperative rebalance protocol. When such 
workers return: 
   a) they join the group with existing assignment and this leads to redundant 
tasks running in the Connect cluster, and
   b) they interfere with the computation of lost tasks, which before this fix 
would lead to the scheduled rebalance delay not being reset correctly back to 
0. This results in periodic rebalances. 
   
   This fix focuses on resolving the above side-effects as follows: 
   * Each worker now tracks its connectivity with the broker coordinator in an 
unblocking manner. This allows the worker to detect that the broker coordinator 
is unreachable. The timeout is set to be equal to the heartbeat interval. If 
during this time the connection remains inactive, the worker will proactively 
stop all its connectors and tasks and will keep attempting to connect to the 
coordinator. 
   * The incremental cooperative assignor will keep the delay to a positive 
value as long as it can detect lost tasks. If the set of tasks that are 
computed as lost becomes empty, the delay will be set to zero and no additional 
rebalancing will be scheduled. 
   
   Besides the test included in this PR, the improvements are being tested with 
a framework that deploys a Connect cluster on docker images and introduces 
network partitions between all or selected workers and the Kafka brokers. 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Redundant task creation and periodic rebalances after zombie worker rejoins 
> the group
> -
>
> Key: KAFKA-9184
> URL: https://issues.apache.org/jira/browse/KAFKA-9184
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.3.2
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Blocker
> Fix For: 2.4.0, 2.3.2
>
>
> First reported here: 
> https://stackoverflow.com/questions/58631092/kafka-connect-assigns-same-task-to-multiple-workers
> There seems to be an issue with task reassignment when a worker rejoins after 
> an unsuccessful join request. The worker seems to be outside the group for a 
> generation but when it joins again the same task is running in more than one 
> worker



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9184) Redundant task creation and periodic rebalances after zombie worker rejoins the group

2019-12-03 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-9184:
--
Summary: Redundant task creation and periodic rebalances after zombie 
worker rejoins the group  (was: Redundant task creation after worker fails to 
join a specific group generation)

> Redundant task creation and periodic rebalances after zombie worker rejoins 
> the group
> -
>
> Key: KAFKA-9184
> URL: https://issues.apache.org/jira/browse/KAFKA-9184
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.2
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Blocker
> Fix For: 2.3.2
>
>
> First reported here: 
> https://stackoverflow.com/questions/58631092/kafka-connect-assigns-same-task-to-multiple-workers
> There seems to be an issue with task reassignment when a worker rejoins after 
> an unsuccessful join request. The worker seems to be outside the group for a 
> generation but when it joins again the same task is running in more than one 
> worker



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9184) Redundant task creation and periodic rebalances after zombie worker rejoins the group

2019-12-03 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-9184:
--
Fix Version/s: 2.4.0

> Redundant task creation and periodic rebalances after zombie worker rejoins 
> the group
> -
>
> Key: KAFKA-9184
> URL: https://issues.apache.org/jira/browse/KAFKA-9184
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.3.2
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Blocker
> Fix For: 2.4.0, 2.3.2
>
>
> First reported here: 
> https://stackoverflow.com/questions/58631092/kafka-connect-assigns-same-task-to-multiple-workers
> There seems to be an issue with task reassignment when a worker rejoins after 
> an unsuccessful join request. The worker seems to be outside the group for a 
> generation but when it joins again the same task is running in more than one 
> worker



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9184) Redundant task creation and periodic rebalances after zombie worker rejoins the group

2019-12-03 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-9184:
--
Affects Version/s: 2.4.0

> Redundant task creation and periodic rebalances after zombie worker rejoins 
> the group
> -
>
> Key: KAFKA-9184
> URL: https://issues.apache.org/jira/browse/KAFKA-9184
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.3.2
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Blocker
> Fix For: 2.3.2
>
>
> First reported here: 
> https://stackoverflow.com/questions/58631092/kafka-connect-assigns-same-task-to-multiple-workers
> There seems to be an issue with task reassignment when a worker rejoins after 
> an unsuccessful join request. The worker seems to be outside the group for a 
> generation but when it joins again the same task is running in more than one 
> worker



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9261) NPE when updating client metadata

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987217#comment-16987217
 ] 

ASF GitHub Bot commented on KAFKA-9261:
---

hachikuji commented on pull request #7770: KAFKA-9261; Client should handle 
unavailable leader metadata
URL: https://github.com/apache/kafka/pull/7770
 
 
   The client caches metadata fetched from Metadata requests. Previously, each 
metadata response overwrote all of the metadata from the previous one, so we 
could rely on the expectation that the broker only returned the leaderId for a 
partition if it had connection information available. This behavior changed 
with KIP-320 since having the leader epoch allows the client to filter out 
partition metadata which is known to be stale. However, because of this, we can 
no longer rely on the request-level guarantee of leader availability. There is 
no mechanism similar to the leader epoch to track the staleness of broker 
metadata, so we still overwrite all of the broker metadata from each response, 
which means that the partition metadata can get out of sync with the broker 
metadata in the client's cache. Hence it is no longer safe to validate inside 
the `Cluster` constructor that each leader has an associated `Node`
   
   Fixing this issue was unfortunately not straightforward because the cache 
was built to maintain references to broker metadata through the `Node` object 
at the partition level. In order to keep the state consistent, each `Node` 
reference would need to be updated based on the new broker metadata. Instead of 
doing that, this patch changes the cache so that it is structured more closely 
with the Metadata response schema. Broker node information is maintained at the 
top level in a single collection and cached partition metadata only references 
the id of the broker. To accommodate this, we have removed 
`PartitionInfoAndEpoch` and we have altered 
`MetadataResponse.PartitionMetadata` to eliminate its `Node` references.
   
   Note that one of the side benefits of the refactor here is that we virtually 
eliminate one of the hotspots in Metadata request handling in 
`MetadataCache.getEndpoints (which was renamed to `maybeFilterAliveReplicas`). 
The only reason this was expensive was because we had to build a new collection 
for the `Node` representations of each of the replica lists. This information 
was doomed to just get discarded on serialization, so the whole effort was 
wasteful. Now, we work with the lower level id lists and no copy of the 
replicas is needed (at least for all versions other than 0).
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NPE when updating client metadata
> -
>
> Key: KAFKA-9261
> URL: https://issues.apache.org/jira/browse/KAFKA-9261
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> We have seen the following exception recently:
> {code}
> java.lang.NullPointerException
>   at java.base/java.util.Objects.requireNonNull(Objects.java:221)
>   at org.apache.kafka.common.Cluster.(Cluster.java:134)
>   at org.apache.kafka.common.Cluster.(Cluster.java:89)
>   at 
> org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:82)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:58)
>   at 
> org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325)
>   at org.apache.kafka.clients.Metadata.update(Metadata.java:252)
>   at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>   at 
> 

[jira] [Commented] (KAFKA-9154) ProducerId generation should be managed by the Controller

2019-12-03 Thread Satish Duggana (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987156#comment-16987156
 ] 

Satish Duggana commented on KAFKA-9154:
---

[~cmccabe] Please let me know if I can take this up. 

> ProducerId generation should be managed by the Controller
> -
>
> Key: KAFKA-9154
> URL: https://issues.apache.org/jira/browse/KAFKA-9154
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: Viktor Somogyi-Vass
>Assignee: Colin McCabe
>Priority: Major
>
> Currently producerIds are maintained in Zookeeper but in the future we'd like 
> them to be managed by the controller quorum in an internal topic.
> The reason for storing this in Zookeeper was that this must be unique across 
> the cluster. In this task it should be refactored such that the 
> TransactionManager turns to the Controller for a ProducerId which connects to 
> Zookeeper to acquire this ID. Since ZK is the single source of truth and the 
> PID won't be cached anywhere it should be safe (just one extra hop added).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8937) Flaky Test SaslPlainSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2019-12-03 Thread Rajini Sivaram (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987067#comment-16987067
 ] 

Rajini Sivaram commented on KAFKA-8937:
---

This is a bug in the implementation and is being fixed under KAFKA-9190.

> Flaky Test 
> SaslPlainSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> --
>
> Key: KAFKA-8937
> URL: https://issues.apache.org/jira/browse/KAFKA-8937
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Sophie Blee-Goldman
>Priority: Major
>  Labels: flaky-test
>
> h3. Error Message
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
> h3. Stacktrace
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:530)
>  at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) 
> at org.scalatest.Assertions$class.fail(Assertions.scala:1091) at 
> org.scalatest.Assertions$.fail(Assertions.scala:1389) at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:830) at 
> kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:792) at 
> kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:1324) at 
> kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1333) at 
> kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:529)
>  at 
> kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:368)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:412) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>  at 
> 

[jira] [Commented] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987057#comment-16987057
 ] 

ASF GitHub Bot commented on KAFKA-9203:
---

ijuma commented on pull request #7769: KAFKA-9203: Revert "MINOR: Remove 
workarounds for lz4-java bug affecting byte buffers (#6679)"
URL: https://github.com/apache/kafka/pull/7769
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> kafka-client 2.3.1 fails to consume lz4 compressed topic
> 
>
> Key: KAFKA-9203
> URL: https://issues.apache.org/jira/browse/KAFKA-9203
> Project: Kafka
>  Issue Type: Bug
>  Components: compression, consumer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: David Watzke
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 2.4.0, 2.3.2
>
> Attachments: kafka-clients-2.3.2-SNAPSHOT.jar
>
>
> I run kafka cluster 2.1.1
> when I upgraded the consumer app to use kafka-client 2.3.0 (or 2.3.1) instead 
> of 2.2.0, I immediately started getting the following exceptions in a loop 
> when consuming a topic with LZ4-compressed messages:
> {noformat}
> 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
> com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
> while polling and processing messages: org.apache.kafka.common.KafkaExce
> ption: Received exception when fetching the next record from 
> FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
> consumption. 
> org.apache.kafka.common.KafkaException: Received exception when fetching the 
> next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
> record to continue consumption. 
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
>  
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
>     at 
> resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
>  
>     at 
> resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> 

[jira] [Updated] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9263:

Affects Version/s: 2.4.0

> Reocurrence: Transient failure in 
> kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs
> ---
>
> Key: KAFKA-9263
> URL: https://issues.apache.org/jira/browse/KAFKA-9263
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 2.4.0
>Reporter: John Roesler
>Priority: Major
>
> This test has failed for me on 
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
> Stacktrace
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
>   at org.scalatest.Assertions.fail(Assertions.scala:1091)
>   at org.scalatest.Assertions.fail$(Assertions.scala:1087)
>   at org.scalatest.Assertions$.fail(Assertions.scala:1389)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
>   at 
> kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Standard Output
> [2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,091] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host 

[jira] [Updated] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9263:

Fix Version/s: (was: 1.1.0)

> Reocurrence: Transient failure in 
> kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs
> ---
>
> Key: KAFKA-9263
> URL: https://issues.apache.org/jira/browse/KAFKA-9263
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: John Roesler
>Priority: Major
>
> This test has failed for me on 
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
> Stacktrace
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
>   at org.scalatest.Assertions.fail(Assertions.scala:1091)
>   at org.scalatest.Assertions.fail$(Assertions.scala:1087)
>   at org.scalatest.Assertions$.fail(Assertions.scala:1389)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
>   at 
> kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Standard Output
> [2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,091] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> 

[jira] [Updated] (KAFKA-9264) Reocurrence: Flaky Test DelegationTokenEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9264:

Reporter: John Roesler  (was: Matthias J. Sax)

> Reocurrence: Flaky Test 
> DelegationTokenEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> --
>
> Key: KAFKA-9264
> URL: https://issues.apache.org/jira/browse/KAFKA-9264
> Project: Kafka
>  Issue Type: Bug
>  Components: core, security, unit tests
>Reporter: John Roesler
>Priority: Major
>  Labels: flaky-test
>
> This test has failed again for me, so apparently it's not fixed:
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/DelegationTokenEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
> Stacktrace
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
>   at org.scalatest.Assertions.fail(Assertions.scala:1091)
>   at org.scalatest.Assertions.fail$(Assertions.scala:1087)
>   at org.scalatest.Assertions$.fail(Assertions.scala:1389)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
>   at kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:796)
>   at 
> kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:1335)
>   at kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1343)
>   at 
> kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:530)
>   at 
> kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:369)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> 

[jira] [Updated] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9263:

Description: 
This test has failed for me on 
https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/

{noformat}

{noformat}

  was:
Saw this error once on Jenkins: 
https://builds.apache.org/job/kafka-pr-jdk9-scala2.12/3025/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/

{code}
Stacktrace

java.lang.AssertionError: timed out waiting for message produce
at kafka.utils.TestUtils$.fail(TestUtils.scala:347)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:861)
at 
kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:357)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:844)
Standard Output

[2017-12-07 19:22:56,297] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:22:59,447] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:22:59,453] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:01,335] ERROR Error while creating ephemeral at /controller, 
node already exists and owner '99134641238966279' does not match current 
session '99134641238966277' (kafka.zk.KafkaZkClient$CheckedEphemeral:71)
[2017-12-07 19:23:04,695] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:04,760] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:06,764] ERROR Error while creating ephemeral at /controller, 
node already exists and owner '99134641586700293' does not match current 
session '99134641586700295' (kafka.zk.KafkaZkClient$CheckedEphemeral:71)
[2017-12-07 19:23:09,379] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:09,387] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:11,533] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:11,539] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:13,022] ERROR Error while creating ephemeral at /controller, 
node already exists and owner '99134642031034375' does not match current 
session '99134642031034373' (kafka.zk.KafkaZkClient$CheckedEphemeral:71)
[2017-12-07 19:23:14,667] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:14,673] ERROR 

[jira] [Assigned] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler reassigned KAFKA-9263:
---

Assignee: (was: Dong Lin)

> Reocurrence: Transient failure in 
> kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs
> ---
>
> Key: KAFKA-9263
> URL: https://issues.apache.org/jira/browse/KAFKA-9263
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Guozhang Wang
>Priority: Major
> Fix For: 1.1.0
>
>
> This test has failed for me on 
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
> Stacktrace
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
>   at org.scalatest.Assertions.fail(Assertions.scala:1091)
>   at org.scalatest.Assertions.fail$(Assertions.scala:1087)
>   at org.scalatest.Assertions$.fail(Assertions.scala:1389)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
>   at 
> kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Standard Output
> [2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,091] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> 

[jira] [Updated] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9263:

Reporter: John Roesler  (was: Guozhang Wang)

> Reocurrence: Transient failure in 
> kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs
> ---
>
> Key: KAFKA-9263
> URL: https://issues.apache.org/jira/browse/KAFKA-9263
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: John Roesler
>Priority: Major
> Fix For: 1.1.0
>
>
> This test has failed for me on 
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
> Stacktrace
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
>   at org.scalatest.Assertions.fail(Assertions.scala:1091)
>   at org.scalatest.Assertions.fail$(Assertions.scala:1087)
>   at org.scalatest.Assertions$.fail(Assertions.scala:1389)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
>   at 
> kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Standard Output
> [2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,091] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 

[jira] [Updated] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9263:

Component/s: (was: unit tests)
 clients

> Reocurrence: Transient failure in 
> kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs
> ---
>
> Key: KAFKA-9263
> URL: https://issues.apache.org/jira/browse/KAFKA-9263
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.4.0
>Reporter: John Roesler
>Priority: Major
>
> This test has failed for me on 
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
> Stacktrace
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
>   at org.scalatest.Assertions.fail(Assertions.scala:1091)
>   at org.scalatest.Assertions.fail$(Assertions.scala:1087)
>   at org.scalatest.Assertions$.fail(Assertions.scala:1389)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
>   at 
> kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Standard Output
> [2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,091] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 

[jira] [Updated] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9263:

Labels: flaky-test  (was: )

> Reocurrence: Transient failure in 
> kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs
> ---
>
> Key: KAFKA-9263
> URL: https://issues.apache.org/jira/browse/KAFKA-9263
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.4.0
>Reporter: John Roesler
>Priority: Major
>  Labels: flaky-test
>
> This test has failed for me on 
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
> Stacktrace
> org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
> within timeout after replica movement. Producer future 
> Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
> 1 ms.))
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
>   at org.scalatest.Assertions.fail(Assertions.scala:1091)
>   at org.scalatest.Assertions.fail$(Assertions.scala:1087)
>   at org.scalatest.Assertions$.fail(Assertions.scala:1389)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
>   at 
> kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Standard Output
> [2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-12-03 04:54:27,091] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition unclean-test-topic-1-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> 

[jira] [Created] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)
John Roesler created KAFKA-9263:
---

 Summary: Reocurrence: Transient failure in 
kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs
 Key: KAFKA-9263
 URL: https://issues.apache.org/jira/browse/KAFKA-9263
 Project: Kafka
  Issue Type: Bug
  Components: unit tests
Reporter: Guozhang Wang
Assignee: Dong Lin
 Fix For: 1.1.0


Saw this error once on Jenkins: 
https://builds.apache.org/job/kafka-pr-jdk9-scala2.12/3025/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/

{code}
Stacktrace

java.lang.AssertionError: timed out waiting for message produce
at kafka.utils.TestUtils$.fail(TestUtils.scala:347)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:861)
at 
kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:357)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:844)
Standard Output

[2017-12-07 19:22:56,297] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:22:59,447] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:22:59,453] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:01,335] ERROR Error while creating ephemeral at /controller, 
node already exists and owner '99134641238966279' does not match current 
session '99134641238966277' (kafka.zk.KafkaZkClient$CheckedEphemeral:71)
[2017-12-07 19:23:04,695] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:04,760] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:06,764] ERROR Error while creating ephemeral at /controller, 
node already exists and owner '99134641586700293' does not match current 
session '99134641586700295' (kafka.zk.KafkaZkClient$CheckedEphemeral:71)
[2017-12-07 19:23:09,379] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:09,387] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:11,533] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:11,539] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-12-07 19:23:13,022] ERROR Error while creating ephemeral at /controller, 
node already exists and owner '99134642031034375' does not match current 
session '99134642031034373' (kafka.zk.KafkaZkClient$CheckedEphemeral:71)
[2017-12-07 19:23:14,667] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action 

[jira] [Updated] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.AdminClientIntegrationTest.testLogStartOffsetCheckpointkafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9263:

Description: 
This test has failed for me on 
https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/

{noformat}
Error Message
org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
within timeout after replica movement. Producer future 
Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
1 ms.))
Stacktrace
org.scalatest.exceptions.TestFailedException: only 0 messages are produced 
within timeout after replica movement. Producer future 
Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for 
1 ms.))
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
at org.scalatest.Assertions.fail(Assertions.scala:1091)
at org.scalatest.Assertions.fail$(Assertions.scala:1087)
at org.scalatest.Assertions$.fail(Assertions.scala:1389)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
at 
kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:834)
Standard Output
[2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=0] Error for partition topic-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
fetcherId=0] Error for partition topic-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-12-03 04:54:27,091] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=0] Error for partition unclean-test-topic-1-1 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-12-03 04:54:35,263] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition elect-preferred-leaders-topic-1-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-12-03 04:54:35,264] ERROR [ReplicaFetcher replicaId=2, leaderId=0, 
fetcherId=0] Error for partition elect-preferred-leaders-topic-1-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-12-03 04:54:35,463] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition 

[jira] [Updated] (KAFKA-9264) Reocurrence: Flaky Test DelegationTokenEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9264:

Description: 
This test has failed again for me, so apparently it's not fixed:
https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/DelegationTokenEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/

{noformat}
Error Message
org.scalatest.exceptions.TestFailedException: Consumed 0 records before timeout 
instead of the expected 1 records
Stacktrace
org.scalatest.exceptions.TestFailedException: Consumed 0 records before timeout 
instead of the expected 1 records
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
at org.scalatest.Assertions.fail(Assertions.scala:1091)
at org.scalatest.Assertions.fail$(Assertions.scala:1087)
at org.scalatest.Assertions$.fail(Assertions.scala:1389)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842)
at kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:796)
at 
kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:1335)
at kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1343)
at 
kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:530)
at 
kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:369)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
at 

[jira] [Created] (KAFKA-9264) Reocurrence: Flaky Test DelegationTokenEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2019-12-03 Thread John Roesler (Jira)
John Roesler created KAFKA-9264:
---

 Summary: Reocurrence: Flaky Test 
DelegationTokenEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
 Key: KAFKA-9264
 URL: https://issues.apache.org/jira/browse/KAFKA-9264
 Project: Kafka
  Issue Type: Bug
  Components: core, security, unit tests
Reporter: Matthias J. Sax


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/2645/testReport/junit/kafka.api/DelegationTokenEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/]
{quote}org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
timeout instead of the expected 1 records at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) at 
org.scalatest.Assertions.fail(Assertions.scala:1091) at 
org.scalatest.Assertions.fail$(Assertions.scala:1087) at 
org.scalatest.Assertions$.fail(Assertions.scala:1389) at 
kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:841) at 
kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1342) at 
kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:529)
 at 
kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:368){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-8937) Flaky Test SaslPlainSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2019-12-03 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler reopened KAFKA-8937:
-

This test has just failed again for me. Apparently, it is not fixed.

Unfortunately, Jenkins didn't capture the logs or exception, so the only thing 
I have is the console output:
{noformat}

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED
org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
timeout instead of the expected 1 records
{noformat}

I doubt this is enough to go on, though...

https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/26926/consoleText

> Flaky Test 
> SaslPlainSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> --
>
> Key: KAFKA-8937
> URL: https://issues.apache.org/jira/browse/KAFKA-8937
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Sophie Blee-Goldman
>Priority: Major
>  Labels: flaky-test
>
> h3. Error Message
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
> h3. Stacktrace
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:530)
>  at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) 
> at org.scalatest.Assertions$class.fail(Assertions.scala:1091) at 
> org.scalatest.Assertions$.fail(Assertions.scala:1389) at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:830) at 
> kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:792) at 
> kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:1324) at 
> kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1333) at 
> kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:529)
>  at 
> kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:368)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:412) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at 
> 

[jira] [Commented] (KAFKA-9173) StreamsPartitionAssignor assigns partitions to only one worker

2019-12-03 Thread Oleg Muravskiy (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987017#comment-16987017
 ] 

Oleg Muravskiy commented on KAFKA-9173:
---

Hi [~ableegoldman],

I think I'm missing something there. I have 210 partitions to consume from, and 
I have configured Streams to run with 200 threads.

Why do I have only 10 tasks?

 

> StreamsPartitionAssignor assigns partitions to only one worker
> --
>
> Key: KAFKA-9173
> URL: https://issues.apache.org/jira/browse/KAFKA-9173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Oleg Muravskiy
>Priority: Major
>  Labels: user-experience
> Attachments: StreamsPartitionAssignor.log
>
>
> I'm running a distributed KafkaStreams application on 10 worker nodes, 
> subscribed to 21 topics with 10 partitions in each. I'm only using a 
> Processor interface, and a persistent state store.
> However, only one worker gets assigned partitions, all other workers get 
> nothing. Restarting the application, or cleaning local state stores does not 
> help. StreamsPartitionAssignor migrates to other nodes, and eventually picks 
> up other node to assign partitions to, but still only one node.
> It's difficult to figure out where to look for the signs of problems, I'm 
> attaching the log messages from the StreamsPartitionAssignor. Let me know 
> what else I could provide to help resolve this.
> [^StreamsPartitionAssignor.log]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9258) Connect ConnectorStatusMetricsGroup sometimes NPE

2019-12-03 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar updated KAFKA-9258:
-
Priority: Blocker  (was: Major)

> Connect ConnectorStatusMetricsGroup sometimes NPE
> -
>
> Key: KAFKA-9258
> URL: https://issues.apache.org/jira/browse/KAFKA-9258
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Cyrus Vafadari
>Priority: Blocker
> Fix For: 2.4.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.runtime.Worker$ConnectorStatusMetricsGroup.recordTaskRemoved(Worker.java:901)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTask(Worker.java:720)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTasks(Worker.java:740)
>   at 
> org.apache.kafka.connect.runtime.Worker.stopAndAwaitTasks(Worker.java:758)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.processTaskConfigUpdatesWithIncrementalCooperative(DistributedHerder.java:575)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:396)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:282)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9258) Connect ConnectorStatusMetricsGroup sometimes NPE

2019-12-03 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar updated KAFKA-9258:
-
Fix Version/s: 2.4.0

> Connect ConnectorStatusMetricsGroup sometimes NPE
> -
>
> Key: KAFKA-9258
> URL: https://issues.apache.org/jira/browse/KAFKA-9258
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Cyrus Vafadari
>Priority: Major
> Fix For: 2.4.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.runtime.Worker$ConnectorStatusMetricsGroup.recordTaskRemoved(Worker.java:901)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTask(Worker.java:720)
>   at 
> org.apache.kafka.connect.runtime.Worker.awaitStopTasks(Worker.java:740)
>   at 
> org.apache.kafka.connect.runtime.Worker.stopAndAwaitTasks(Worker.java:758)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.processTaskConfigUpdatesWithIncrementalCooperative(DistributedHerder.java:575)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:396)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:282)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1

2019-12-03 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-9203:
--

Assignee: Ismael Juma

> kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
> ---
>
> Key: KAFKA-9203
> URL: https://issues.apache.org/jira/browse/KAFKA-9203
> Project: Kafka
>  Issue Type: Bug
>  Components: compression, consumer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: David Watzke
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 2.4.0, 2.3.2
>
> Attachments: kafka-clients-2.3.2-SNAPSHOT.jar
>
>
> I run kafka cluster 2.1.1
> when I upgraded the consumer app to use kafka-client 2.3.0 (or 2.3.1) instead 
> of 2.2.0, I immediately started getting the following exceptions in a loop 
> when consuming a topic with LZ4-compressed messages:
> {noformat}
> 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
> com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
> while polling and processing messages: org.apache.kafka.common.KafkaExce
> ption: Received exception when fetching the next record from 
> FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
> consumption. 
> org.apache.kafka.common.KafkaException: Received exception when fetching the 
> next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
> record to continue consumption. 
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
>  
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
>     at 
> resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
>  
>     at 
> resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> 

[jira] [Updated] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1

2019-12-03 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-9203:
---
Priority: Blocker  (was: Critical)

> kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
> ---
>
> Key: KAFKA-9203
> URL: https://issues.apache.org/jira/browse/KAFKA-9203
> Project: Kafka
>  Issue Type: Bug
>  Components: compression, consumer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: David Watzke
>Priority: Blocker
> Fix For: 2.4.0, 2.3.2
>
> Attachments: kafka-clients-2.3.2-SNAPSHOT.jar
>
>
> I run kafka cluster 2.1.1
> when I upgraded the consumer app to use kafka-client 2.3.0 (or 2.3.1) instead 
> of 2.2.0, I immediately started getting the following exceptions in a loop 
> when consuming a topic with LZ4-compressed messages:
> {noformat}
> 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
> com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
> while polling and processing messages: org.apache.kafka.common.KafkaExce
> ption: Received exception when fetching the next record from 
> FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
> consumption. 
> org.apache.kafka.common.KafkaException: Received exception when fetching the 
> next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
> record to continue consumption. 
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
>  
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
>     at 
> resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
>  
>     at 
> resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> 

[jira] [Updated] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1

2019-12-03 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-9203:
---
Fix Version/s: 2.3.2
   2.4.0

> kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
> ---
>
> Key: KAFKA-9203
> URL: https://issues.apache.org/jira/browse/KAFKA-9203
> Project: Kafka
>  Issue Type: Bug
>  Components: compression, consumer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: David Watzke
>Priority: Critical
> Fix For: 2.4.0, 2.3.2
>
> Attachments: kafka-clients-2.3.2-SNAPSHOT.jar
>
>
> I run kafka cluster 2.1.1
> when I upgraded the consumer app to use kafka-client 2.3.0 (or 2.3.1) instead 
> of 2.2.0, I immediately started getting the following exceptions in a loop 
> when consuming a topic with LZ4-compressed messages:
> {noformat}
> 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
> com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
> while polling and processing messages: org.apache.kafka.common.KafkaExce
> ption: Received exception when fetching the next record from 
> FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
> consumption. 
> org.apache.kafka.common.KafkaException: Received exception when fetching the 
> next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
> record to continue consumption. 
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
>  
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
>     at 
> resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
>  
>     at 
> resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> 

[jira] [Updated] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic

2019-12-03 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-9203:
---
Summary: kafka-client 2.3.1 fails to consume lz4 compressed topic  (was: 
kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1)

> kafka-client 2.3.1 fails to consume lz4 compressed topic
> 
>
> Key: KAFKA-9203
> URL: https://issues.apache.org/jira/browse/KAFKA-9203
> Project: Kafka
>  Issue Type: Bug
>  Components: compression, consumer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: David Watzke
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 2.4.0, 2.3.2
>
> Attachments: kafka-clients-2.3.2-SNAPSHOT.jar
>
>
> I run kafka cluster 2.1.1
> when I upgraded the consumer app to use kafka-client 2.3.0 (or 2.3.1) instead 
> of 2.2.0, I immediately started getting the following exceptions in a loop 
> when consuming a topic with LZ4-compressed messages:
> {noformat}
> 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
> com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
> while polling and processing messages: org.apache.kafka.common.KafkaExce
> ption: Received exception when fetching the next record from 
> FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
> consumption. 
> org.apache.kafka.common.KafkaException: Received exception when fetching the 
> next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
> record to continue consumption. 
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
>  
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
>     at 
> resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
>  
>     at 
> resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> 

[jira] [Commented] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1

2019-12-03 Thread Ismael Juma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986929#comment-16986929
 ] 

Ismael Juma commented on KAFKA-9203:


[~dwatzke] Thanks for testing! I submitted a PR to revert the change as an 
immediate step and will try to resubmit the original change after I understand 
the underlying issue.

> kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
> ---
>
> Key: KAFKA-9203
> URL: https://issues.apache.org/jira/browse/KAFKA-9203
> Project: Kafka
>  Issue Type: Bug
>  Components: compression, consumer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: David Watzke
>Priority: Critical
> Attachments: kafka-clients-2.3.2-SNAPSHOT.jar
>
>
> I run kafka cluster 2.1.1
> when I upgraded the consumer app to use kafka-client 2.3.0 (or 2.3.1) instead 
> of 2.2.0, I immediately started getting the following exceptions in a loop 
> when consuming a topic with LZ4-compressed messages:
> {noformat}
> 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
> com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
> while polling and processing messages: org.apache.kafka.common.KafkaExce
> ption: Received exception when fetching the next record from 
> FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
> consumption. 
> org.apache.kafka.common.KafkaException: Received exception when fetching the 
> next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
> record to continue consumption. 
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
>  
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
>     at 
> resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
>  
>     at 
> resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>    

[jira] [Commented] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986927#comment-16986927
 ] 

ASF GitHub Bot commented on KAFKA-9203:
---

ijuma commented on pull request #7769: KAFKA-9203: Revert "MINOR: Remove 
workarounds for lz4-java bug affecting byte buffers (#6679)"
URL: https://github.com/apache/kafka/pull/7769
 
 
   This reverts commit 90043d5f as it caused a regression in some cases:
   
   > Caused by: java.io.IOException: Stream frame descriptor corrupted
   > at 
org.apache.kafka.common.record.KafkaLZ4BlockInputStream.readHeader(KafkaLZ4BlockInputStream.java:132)
   > at 
org.apache.kafka.common.record.KafkaLZ4BlockInputStream.(KafkaLZ4BlockInputStream.java:78)
   > at 
org.apache.kafka.common.record.CompressionType$4.wrapForInput(CompressionType.java:110)
   
   I will investigate why after, but I want to get the safe fix into 2.4.0.
   The reporter of KAFKA-9203 has verified that reverting this change
   makes the problem go away.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
> ---
>
> Key: KAFKA-9203
> URL: https://issues.apache.org/jira/browse/KAFKA-9203
> Project: Kafka
>  Issue Type: Bug
>  Components: compression, consumer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: David Watzke
>Priority: Critical
> Attachments: kafka-clients-2.3.2-SNAPSHOT.jar
>
>
> I run kafka cluster 2.1.1
> when I upgraded the consumer app to use kafka-client 2.3.0 (or 2.3.1) instead 
> of 2.2.0, I immediately started getting the following exceptions in a loop 
> when consuming a topic with LZ4-compressed messages:
> {noformat}
> 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
> com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
> while polling and processing messages: org.apache.kafka.common.KafkaExce
> ption: Received exception when fetching the next record from 
> FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
> consumption. 
> org.apache.kafka.common.KafkaException: Received exception when fetching the 
> next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
> record to continue consumption. 
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
>  
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
>     at 
> resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
>  
>     at 
> 

[jira] [Commented] (KAFKA-9071) transactional.id.expiration.ms config value should be implemented as a Long

2019-12-03 Thread Mario Georgiev (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986881#comment-16986881
 ] 

Mario Georgiev commented on KAFKA-9071:
---

Hello [~mjsax] whom is appropriate to tag for a review? Or just choose the lead 
of the appropriate component?

> transactional.id.expiration.ms config value should be implemented as a Long
> ---
>
> Key: KAFKA-9071
> URL: https://issues.apache.org/jira/browse/KAFKA-9071
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Affects Versions: 2.3.0
>Reporter: Joost van de Wijgerd
>Assignee: Mario Georgiev
>Priority: Major
>
> Currently the value of this config parameter is limited to MAX_INT 
> effectively limiting the transactional id expiration to  ~ 25 days. This is 
> causing some issues for us on our Acceptance environment (which is not used 
> that often / heavily) where our transactional services will start failing 
> because if this issue.
> I believe best practice for millisecond values should be to implement them as 
> a Long and not as an Integer
> this is currently the max value: transactional.id.expiration.ms=2147483647
> while I would like to set it to: transactional.id.expiration.ms=3154000 
> (i.e. 1 year)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9262) kafka broker fails to response fetching request from consumer when quota limit is violated

2019-12-03 Thread Junqi Zhang (Jira)
Junqi Zhang created KAFKA-9262:
--

 Summary: kafka broker fails to response fetching request from 
consumer when quota limit is violated
 Key: KAFKA-9262
 URL: https://issues.apache.org/jira/browse/KAFKA-9262
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Junqi Zhang


Kafka broker fails to response fetching request from consumer when quota limit 
is violated(consumer_byte_rate is exceeded)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9234) Consider using @Nullable and @Nonnull annotations

2019-12-03 Thread oulabas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986825#comment-16986825
 ] 

oulabas commented on KAFKA-9234:


Hello [~mjsax], 

I am a new contributor and I would like to start with this ticket. Can I go 
ahead and assign this to myself?

 

> Consider using @Nullable and @Nonnull annotations
> -
>
> Key: KAFKA-9234
> URL: https://issues.apache.org/jira/browse/KAFKA-9234
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, clients, consumer, KafkaConnect, producer , 
> streams, streams-test-utils
>Reporter: Matthias J. Sax
>Priority: Minor
>  Labels: beginner, newbie
>
> Java7 was dropped some time ago, and we might want to consider usein Java8 
> `@Nullable` and `@Nonnull` annotations for all public facing APIs instead of 
> documenting it in JavaDocs only.
> This tickets should be broken down in a series of smaller PRs to keep the 
> scope of each PR contained, allowing for more effective reviews.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1

2019-12-03 Thread David Watzke (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Watzke updated KAFKA-9203:

Description: 
I run kafka cluster 2.1.1

when I upgraded the consumer app to use kafka-client 2.3.0 (or 2.3.1) instead 
of 2.2.0, I immediately started getting the following exceptions in a loop when 
consuming a topic with LZ4-compressed messages:
{noformat}
2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
while polling and processing messages: org.apache.kafka.common.KafkaExce
ption: Received exception when fetching the next record from 
FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
consumption. 
org.apache.kafka.common.KafkaException: Received exception when fetching the 
next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
record to continue consumption. 
    at 
org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
 
    at 
org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
 
    at 
org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
 
    at 
org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
 
    at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
 
    at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
    at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
    at 
com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
 
    at 
com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
    at 
com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
    at 
com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
 
    at 
resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
 
    at 
scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
    at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
    at scala.util.control.Exception$Catch.either(Exception.scala:252) 
    at 
resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
    at 
resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
    at 
resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
    at 
resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
    at 
resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
 
    at 
resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
 
    at 
resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
 
    at 
resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) 
    at 
resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) 
    at 
resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
    at 
com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) 
    at 
com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18)
 
    at 
resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
 
    at 
scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
    at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
    at scala.util.control.Exception$Catch.either(Exception.scala:252) 
    at 
resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
    at 
resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
    at 
resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
    at 
resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
    at 
resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
 
    at 
resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
 
    at 
resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
 
    at 
resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) 
    at 
resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) 
    at 
resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
    at 
com.avast.filerep.saver.RequestSaver$.$anonfun$main$4(RequestSaver.scala:18) 
    at 

[jira] [Commented] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1

2019-12-03 Thread David Watzke (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986765#comment-16986765
 ] 

David Watzke commented on KAFKA-9203:
-

[~ijuma] Thank you. With the attached clients jar it works fine! So it really 
is caused by this commit.

> kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
> ---
>
> Key: KAFKA-9203
> URL: https://issues.apache.org/jira/browse/KAFKA-9203
> Project: Kafka
>  Issue Type: Bug
>  Components: compression, consumer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: David Watzke
>Priority: Critical
> Attachments: kafka-clients-2.3.2-SNAPSHOT.jar
>
>
> I run kafka cluster 2.1.1
> when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of 
> 2.2.0, I immediately started getting the following exceptions in a loop when 
> consuming a topic with LZ4-compressed messages:
> {noformat}
> 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] 
> com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred 
> while polling and processing messages: org.apache.kafka.common.KafkaExce
> ption: Received exception when fetching the next record from 
> FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue 
> consumption. 
> org.apache.kafka.common.KafkaException: Received exception when fetching the 
> next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the 
> record to continue consumption. 
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
>  
>     at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
>  
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) 
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180)
>  
>     at 
> com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) 
>     at 
> resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) 
>     at 
> resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25)
>  
>     at 
> resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
>  
>     at 
> resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53)
>  
>     at 
> resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) 
>     at 
> com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18)
>  
>     at 
> resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88)
>  
>     at 
> scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) 
>     at scala.util.control.Exception$Catch.apply(Exception.scala:228) 
>     at scala.util.control.Exception$Catch.either(Exception.scala:252) 
>     at 
> resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) 
>     at 
> resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) 
>     at 
> 

[jira] [Commented] (KAFKA-9071) transactional.id.expiration.ms config value should be implemented as a Long

2019-12-03 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986735#comment-16986735
 ] 

Matthias J. Sax commented on KAFKA-9071:


[~despondency] We are usually somewhat lazy to update the tickets and you can 
do it yourself though – however, there is no "review" stage define in this Jira 
– you can hit "submit patch" though to mark it as "PR available".

Best way to get attention is to tag potential reviews (ie, usually comitters) 
on github in a comment and ask for a review.

> transactional.id.expiration.ms config value should be implemented as a Long
> ---
>
> Key: KAFKA-9071
> URL: https://issues.apache.org/jira/browse/KAFKA-9071
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Affects Versions: 2.3.0
>Reporter: Joost van de Wijgerd
>Assignee: Mario Georgiev
>Priority: Major
>
> Currently the value of this config parameter is limited to MAX_INT 
> effectively limiting the transactional id expiration to  ~ 25 days. This is 
> causing some issues for us on our Acceptance environment (which is not used 
> that often / heavily) where our transactional services will start failing 
> because if this issue.
> I believe best practice for millisecond values should be to implement them as 
> a Long and not as an Integer
> this is currently the max value: transactional.id.expiration.ms=2147483647
> while I would like to set it to: transactional.id.expiration.ms=3154000 
> (i.e. 1 year)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9252) Kafka Connect fails to create connector if single-broker Kafka cluster is configured for offsets.topic.replication.factor=3

2019-12-03 Thread Robin Moffatt (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Moffatt updated KAFKA-9252:
-
Summary: Kafka Connect fails to create connector if single-broker Kafka 
cluster is configured for offsets.topic.replication.factor=3  (was: Kafka 
Connect )

> Kafka Connect fails to create connector if single-broker Kafka cluster is 
> configured for offsets.topic.replication.factor=3
> ---
>
> Key: KAFKA-9252
> URL: https://issues.apache.org/jira/browse/KAFKA-9252
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 2.3.1
>Reporter: Robin Moffatt
>Priority: Minor
>
> If I mis-configure my *single* Kafka broker with 
> `offsets.topic.replication.factor=3` (the default), Kafka Connect will start 
> up absolutely fine (Kafka Connect started in the log file, `/connectors` 
> endpoint returns HTTP 200). But if I try to create a connector, it 
> (eventually) returns
> {code:java}
> {"error_code":500,"message":"Request timed out"}{code}
> There's no error in the Kafka Connect worker log at INFO level. More details: 
> [https://rmoff.net/2019/11/29/kafka-connect-request-timed-out/]
> This could be improved. Either at startup ensure that the Kafka consumer 
> offsets topic is available and not startup if it's not, or at least log why 
> the connector failed to be created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2019-12-03 Thread Levani Kokhreidze (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986715#comment-16986715
 ] 

Levani Kokhreidze commented on KAFKA-6718:
--

Hi [~vvcephei] 

Would appreciate your thoughts on this ticket if there're anything specific you 
think we must take into account while implementing this new feature.

KIP-441 has a lot of work with standby tasks, wondering if we should unify this 
in the KIP-441 (pretty sure we shouldn't, thinking out loud), should we wait 
for KIP-441, or they can be done in parallel?

 

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2019-12-03 Thread Levani Kokhreidze (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986715#comment-16986715
 ] 

Levani Kokhreidze edited comment on KAFKA-6718 at 12/3/19 8:36 AM:
---

Hi [~vvcephei] 

Would appreciate your thoughts on this ticket if there is anything specific you 
think we must take into account while implementing this new feature.

KIP-441 has a lot of work with standby tasks, wondering if we should unify this 
in the KIP-441 (pretty sure we shouldn't, thinking out loud), should we wait 
for KIP-441, or they can be done in parallel?

 


was (Author: lkokhreidze):
Hi [~vvcephei] 

Would appreciate your thoughts on this ticket if there're anything specific you 
think we must take into account while implementing this new feature.

KIP-441 has a lot of work with standby tasks, wondering if we should unify this 
in the KIP-441 (pretty sure we shouldn't, thinking out loud), should we wait 
for KIP-441, or they can be done in parallel?

 

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)