[jira] [Commented] (KAFKA-8041) Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll

2019-08-12 Thread Sophie Blee-Goldman (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905426#comment-16905426
 ] 

Sophie Blee-Goldman commented on KAFKA-8041:


h3. java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.

[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/950/testReport/junit/kafka.server/LogDirFailureTest/testIOExceptionDuringLogRoll/]

> Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll
> -
>
> Key: KAFKA-8041
> URL: https://issues.apache.org/jira/browse/KAFKA-8041
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.0.1, 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.4.0
>
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/236/tests]
> {quote}java.lang.AssertionError: Expected some messages
> at kafka.utils.TestUtils$.fail(TestUtils.scala:357)
> at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:787)
> at 
> kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:189)
> at 
> kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63){quote}
> STDOUT
> {quote}[2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, 
> leaderId=0, fetcherId=0] Error for partition topic-6 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-10 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-4 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-8 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-2 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:45:00,248] ERROR Error while rolling log segment for topic-0 
> in dir 
> /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216
>  (kafka.server.LogDirFailureChannel:76)
> java.io.FileNotFoundException: 
> /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216/topic-0/.index
>  (Not a directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:121)
> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> at kafka.log.AbstractIndex.resize(AbstractIndex.scala:115)
> at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:184)
> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:184)
> at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)
> at kafka.log.Log.$anonfun$roll$8(Log.scala:1520)
> at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1520)
> at scala.Option.foreach(Option.scala:257)
> at kafka.log.Log.$anonfun$roll$2(Log.scala:1520)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1881)
> at kafka.log.Log.roll(Log.scala:1484)
> at 
> kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:154)
> at 
> 

[jira] [Resolved] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta resolved KAFKA-8789.

Resolution: Invalid

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. Sorry for the noise.

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905343#comment-16905343
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 4:16 PM:
-

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url or something? Sorry for the noise.


was (Author: rocketraman):
UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. Sorry for the noise.

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905343#comment-16905343
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 4:16 PM:
-

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url property or something? Sorry for the 
noise.


was (Author: rocketraman):
UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url or something? Sorry for the noise.

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8412) Still a nullpointer exception thrown on shutdown while flushing before closing producers

2019-08-12 Thread Chris Pettitt (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905459#comment-16905459
 ] 

Chris Pettitt commented on KAFKA-8412:
--

I'm able to repro this and [~mjsax]'s solution should fix this.

One other observation while I was in this code is that we essentially have 
state spread out across classes. As a dev new to the code, I would expect to 
push most of the state for the task down into StreamTask and only if we need 
fast lookup by state keep an index of these in AssignedTask, but treat this as 
an index and not as the authoritative state. In other words, I would prefer to 
see close on StreamTask do the right thing for the state it is in instead of 
having AssignedTasks be responsible for different types of close. This should 
also make it easier to test without involving mocks, as we are in AssignedTasks.

So I see a few options:
 # Use [~mjsax]'s solution and keep state as is.
 # Push state down into StreamTask, AssignedTasks just calls close as it does 
today.
 # Do #1 in first patch and follow up with #2 in a second patch focused on 
refactoring state.

[~mjsax] [~guozhang] thoughts?

> Still a nullpointer exception thrown on shutdown while flushing before 
> closing producers
> 
>
> Key: KAFKA-8412
> URL: https://issues.apache.org/jira/browse/KAFKA-8412
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Sebastiaan
>Assignee: Chris Pettitt
>Priority: Minor
>
> I found a closed issue and replied there but decided to open one myself 
> because although they're related they're slightly different. The original 
> issue is at https://issues.apache.org/jira/browse/KAFKA-7678
> The fix there has been to implement a null check around closing a producer 
> because in some cases the producer is already null there (has been closed 
> already)
> In version 2.1.1 we are getting a very similar exception, but in the 'flush' 
> method that is called pre-close. This is in the log:
> {code:java}
> message: stream-thread 
> [webhook-poster-7034dbb0-7423-476b-98f3-d18db675d6d6-StreamThread-1] Failed 
> while closing StreamTask 1_26 due to the following error:
> logger_name: org.apache.kafka.streams.processor.internals.AssignedStreamsTasks
> java.lang.NullPointerException: null
>     at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:245)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:493)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:443)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:568)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:691)
>     at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.close(AssignedTasks.java:397)
>     at 
> org.apache.kafka.streams.processor.internals.TaskManager.shutdown(TaskManager.java:260)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1181)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:758){code}
> Followed by:
>  
> {code:java}
> message: task [1_26] Could not close task due to the following error:
> logger_name: org.apache.kafka.streams.processor.internals.StreamTask
> java.lang.NullPointerException: null
>     at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:245)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:493)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:443)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:568)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:691)
>     at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.close(AssignedTasks.java:397)
>     at 
> org.apache.kafka.streams.processor.internals.TaskManager.shutdown(TaskManager.java:260)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1181)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:758){code}
> If I look at the source code at this point, I see a nice null check in the 
> close method, but not in the flush method that is called just before that:
> {code:java}
> public void flush() {
>     this.log.debug("Flushing producer");
>     this.producer.flush();
>     this.checkForException();
> }
> public void close() {
>     this.log.debug("Closing producer");
>     if (this.producer != null) {
>  

[jira] [Commented] (KAFKA-8790) [kafka-connect] KafkaBaseLog.WorkThread not recoverable

2019-08-12 Thread Paul Whalen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905460#comment-16905460
 ] 

Paul Whalen commented on KAFKA-8790:


Looks like KAFKA-7941.  My team is running a fork of 2.0.0 with the changes 
from this PR successfully: https://github.com/apache/kafka/pull/6283.  Might 
help to comment on that PR to get it merged in.

> [kafka-connect] KafkaBaseLog.WorkThread not recoverable
> ---
>
> Key: KAFKA-8790
> URL: https://issues.apache.org/jira/browse/KAFKA-8790
> Project: Kafka
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Priority: Major
>
> We have a kafka (source) connector that's copying data from some kafka 
> cluster to the target cluster. The connector is deployed to a bunch of 
> workers running on mesos, thus the lifecycle of the workers are managed by 
> mesos. Workers should be recovered by mesos in case of failure, and then 
> source tasks will rely on kafka connect's KafkaOffsetBackingStore to recover 
> the offsets to proceed.
> Recently we witness some unrecoverable situation, though: worker is not doing 
> anything after some network reset on the host where the worker is running. 
> More specifically, it seems that the kafka connect tasks' on that worker stop 
> to poll source kafka cluster, because the consumers are stuck in a rebalance 
> state.
> After some digging, we found that the thread to handle the source task offset 
> recovery is dead, which makes the all rebalancing tasks stuck in the state of 
> reading back the offset. The log we saw in our connect task:
> {code:java}
> 2019-08-12 14:29:28,089 ERROR Unexpected exception in Thread[KafkaBasedLog 
> Work Thread - kc_replicator_offsets,5,main] 
> (org.apache.kafka.connect.util.KafkaBasedLog)
> org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by 
> times in 30001ms{code}
> As far as I can see 
> ([https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L339]),
>  the thread will be dead in case of error, while the worker is still alive, 
> which means a worker without the thread to recover offset thus all tasks on 
> that worker are not recoverable and will stuck in case of failure.
>  
> Solution to fix this issue will ideally either of the following:
>  * Make the KafkaBasedLog Work Thread recoverable from error
>  * Or KafkaBasedLog Work Thread death should make the worker exit (a finally 
> clause to call System.exit), then the worker lifecycle management (in our 
> case, it's mesos) will restart the worker elsewhere
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta reopened KAFKA-8789:


> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8790) [kafka-connect] KafkaBaseLog.WorkThread not recoverable

2019-08-12 Thread Qinghui Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905506#comment-16905506
 ] 

Qinghui Xu commented on KAFKA-8790:
---

[~pgwhalen]
Thanks for the hint, I'll have a look at your PR.

> [kafka-connect] KafkaBaseLog.WorkThread not recoverable
> ---
>
> Key: KAFKA-8790
> URL: https://issues.apache.org/jira/browse/KAFKA-8790
> Project: Kafka
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Priority: Major
>
> We have a kafka (source) connector that's copying data from some kafka 
> cluster to the target cluster. The connector is deployed to a bunch of 
> workers running on mesos, thus the lifecycle of the workers are managed by 
> mesos. Workers should be recovered by mesos in case of failure, and then 
> source tasks will rely on kafka connect's KafkaOffsetBackingStore to recover 
> the offsets to proceed.
> Recently we witness some unrecoverable situation, though: worker is not doing 
> anything after some network reset on the host where the worker is running. 
> More specifically, it seems that the kafka connect tasks' on that worker stop 
> to poll source kafka cluster, because the consumers are stuck in a rebalance 
> state.
> After some digging, we found that the thread to handle the source task offset 
> recovery is dead, which makes the all rebalancing tasks stuck in the state of 
> reading back the offset. The log we saw in our connect task:
> {code:java}
> 2019-08-12 14:29:28,089 ERROR Unexpected exception in Thread[KafkaBasedLog 
> Work Thread - kc_replicator_offsets,5,main] 
> (org.apache.kafka.connect.util.KafkaBasedLog)
> org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by 
> times in 30001ms{code}
> As far as I can see 
> ([https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L339]),
>  the thread will be dead in case of error, while the worker is still alive, 
> which means a worker without the thread to recover offset thus all tasks on 
> that worker are not recoverable and will stuck in case of failure.
>  
> Solution to fix this issue will ideally either of the following:
>  * Make the KafkaBasedLog Work Thread recoverable from error
>  * Or KafkaBasedLog Work Thread death should make the worker exit (a finally 
> clause to call System.exit), then the worker lifecycle management (in our 
> case, it's mesos) will restart the worker elsewhere
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Description: 
I have a topic with about 20,000 events in it. When I run the following tools 
command using Kafka 2.

bin/kafka-avro-console-consumer \ 
  --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
  --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
  --from-beginning --max-messages 100 \
  --isolation-level read_committed --skip-message-on-error \
  --timeout-ms 15000

I get 100 messages as expected.

However, when running the exact same command using Kafka 2.3.0 I get 
org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.

The version of Kafka on the server is 2.3.0.

NOTE: I am using the Confluent distribution of Kafka for the client side tools, 
specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try to 
replicate with a vanilla Kafka if necessary.

  was:
I have a topic with about 20,000 events in it. When I run the following tools 
command using

 

bin/kafka-avro-console-consumer \ 
  --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
  --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
  --from-beginning --max-messages 100 \
 --isolation-level read_committed --skip-message-on-error \
 --timeout-ms 15000

I get 100 messages 


> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-08-12 Thread Richard Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905377#comment-16905377
 ] 

Richard Yu commented on KAFKA-8522:
---

[~junrao] A thought just occurred to me. Wouldn't we need to provide an upgrade 
path if we wish to deprecate the old checkpoint file system? If so, then how 
would we go about implementing it?

I'm not completely sure if we need one at the moment, but I'd think we do. 

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-7149) Reduce assignment data size to improve kafka streams scalability

2019-08-12 Thread Vinoth Chandar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905430#comment-16905430
 ] 

Vinoth Chandar commented on KAFKA-7149:
---

While looking at changes to detect topology changes, realized this was still 
more complicated than it may be has to. Turns out just dictionary encoding the 
Topic names on the wire, can provide similar gains as both approaches above. PR 
is updated based on that..  This is a lot simpler . 

 
{code:java}
oldAssignmentInfoBytes : 77684 , newAssignmentInfoBytes: 33226{code}
 

P.S: My benchmark code 

 
{code:java}
public static void main(String[] args) {

// Assumption : Streams topology with 10 input topics, 4 sub topologies (2 
topics per sub topology) = ~20 topics
// High number of hosts = 500; High number of partitions = 128
final int numStandbyPerTask = 2;
final String topicPrefix = "streams_topic_name";
final int numTopicGroups = 4;
final int numHosts = 500;
final int numTopics = 20;
final int partitionPerTopic = 128;
final int numTasks = partitionPerTopic * numTopics;

List activeTasks = new ArrayList<>();
Map> standbyTasks = new HashMap<>();
Map> partitionsByHost = new HashMap<>();

// add tasks across each topicGroups
for (int tg =0 ; tg < numTopicGroups; tg++) {
  for (int i=0; i < (numTasks/numTopicGroups)/numHosts; i++) {
TaskId taskId = new TaskId(tg, i);
activeTasks.add(taskId);
for (int j=0; j < numStandbyPerTask; j++) {
  standbyTasks.computeIfAbsent(taskId, k -> new HashSet<>())
 .add(new TopicPartition(topicPrefix+ tg + i + j, j));
}
  }
}

// Generate actual global assignment map
Random random = new Random(12345);
for (int h=0; h < numHosts; h++) {
  Set topicPartitions = new HashSet<>();
  for (int j=0; j < numTasks/numHosts; j++) {
int topicGroupId = random.nextInt(numTopicGroups);
int topicIndex = random.nextInt(numTopics/numTopicGroups);
String topicName = topicPrefix + topicGroupId + topicIndex;
int partition = random.nextInt(128);
topicPartitions.add(new TopicPartition(topicName, partition));
  }
  HostInfo hostInfo = new HostInfo("streams_host" + h, 123456);
  partitionsByHost.put(hostInfo, topicPartitions);
}

final AssignmentInfo oldAssignmentInfo = new AssignmentInfo(4, activeTasks, 
standbyTasks, partitionsByHost, 0);
final AssignmentInfo newAssignmentInfo = new AssignmentInfo(5, activeTasks, 
standbyTasks, partitionsByHost, 0);
System.out.format("oldAssignmentInfoBytes : %d , newAssignmentInfoBytes: %d 
\n", oldAssignmentInfo.encode().array().length, 
newAssignmentInfo.encode().array().length);
}{code}

> Reduce assignment data size to improve kafka streams scalability
> 
>
> Key: KAFKA-7149
> URL: https://issues.apache.org/jira/browse/KAFKA-7149
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.0.0
>Reporter: Ashish Surana
>Assignee: Vinoth Chandar
>Priority: Major
>
> We observed that when we have high number of partitions, instances or 
> stream-threads, assignment-data size grows too fast and we start getting 
> below RecordTooLargeException at kafka-broker.
> Workaround of this issue is commented at: 
> https://issues.apache.org/jira/browse/KAFKA-6976
> Still it limits the scalability of kafka streams as moving around 100MBs of 
> assignment data for each rebalancing affects performance & reliability 
> (timeout exceptions starts appearing) as well. Also this limits kafka streams 
> scale even with high max.message.bytes setting as data size increases pretty 
> quickly with number of partitions, instances or stream-threads.
>  
> Solution:
> To address this issue in our cluster, we are sending the compressed 
> assignment-data. We saw assignment-data size reduced by 8X-10X. This improved 
> the kafka streams scalability drastically for us and we could now run it with 
> more than 8,000 partitions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-console-consumer performance regression  (was: 
kafka-avro-console-consumer works with 2.0.x, but not 2.3.x)

> kafka-console-consumer performance regression
> -
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8558) KIP-479 - Add Materialized Overload to KStream#Join

2019-08-12 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905380#comment-16905380
 ] 

Bill Bejeck commented on KAFKA-8558:


[~apurva]

> Bill, where are we with fixing this?

I've started a PR and plan to push it 8/13 - 8/14 timeframe

>Also, if I understand correctly, as trunk stands today, if you upgrade from 
>versions < 2.3 to trunk, and if you >name your join node, you will not reuse 
>the join state store since the name will have changed?

That's partially correct.  In version 2.3 - 2.1, users can name the repartition 
topic via `Joined.name`.  Right now in trunk if users have named the 
repartition topic, the same base name is used for the join operator and state 
store (hence changelog topic). With this Jira we'll only re-use the base name 
for the repartition topic and join operator and naming of the statestore (and 
changelog topic) is done via the Materialized object. 

 
 

> KIP-479 - Add Materialized Overload to KStream#Join 
> 
>
> Key: KAFKA-8558
> URL: https://issues.apache.org/jira/browse/KAFKA-8558
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 2.4.0
>
>
> To prevent a topology incompatibility with the release of 2.4 and the naming 
> of Join operations we'll add an overloaded KStream#join method accepting a 
> Materialized parameter. This will allow users to explicitly name state stores 
> created by Kafka Streams in the join operation.
>  
> The overloads will apply to all flavors of KStream#join (inner, left, and 
> right). 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905343#comment-16905343
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 7:03 PM:
-

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. Sorry for the noise.

For future Googlers, I created this issue instead: 
[https://github.com/confluentinc/schema-registry/issues/1185]


was (Author: rocketraman):
UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url property or something? Sorry for the 
noise.

For future Googlers, I created this issue instead: 
[https://github.com/confluentinc/schema-registry/issues/1185]

> kafka-console-consumer performance regression
> -
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8789) kafka-avro-console-consumer works with confluent 5.0.3, but not 5.3.0

2019-08-12 Thread Raman Gupta (JIRA)
Raman Gupta created KAFKA-8789:
--

 Summary: kafka-avro-console-consumer works with confluent 5.0.3, 
but not 5.3.0
 Key: KAFKA-8789
 URL: https://issues.apache.org/jira/browse/KAFKA-8789
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 2.3.0
Reporter: Raman Gupta


I have a topic with about 20,000 events in it. When I run the following tools 
command using

 

bin/kafka-avro-console-consumer \ 
  --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
  --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
  --from-beginning --max-messages 100 \
 --isolation-level read_committed --skip-message-on-error \
 --timeout-ms 15000

I get 100 messages 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-avro-console-consumer works with 2.0.x, but not 2.3.x  (was: 
kafka-avro-console-consumer works with confluent 5.0.3, but not 5.3.0)

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using
>  
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>  --isolation-level read_committed --skip-message-on-error \
>  --timeout-ms 15000
> I get 100 messages 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-7149) Reduce assignment data size to improve kafka streams scalability

2019-08-12 Thread Vinoth Chandar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905430#comment-16905430
 ] 

Vinoth Chandar edited comment on KAFKA-7149 at 8/12/19 5:47 PM:


While looking at changes to detect topology changes, realized this was still 
more complicated than it may be has to. Turns out just dictionary encoding the 
Topic names on the wire, can provide similar gains as both approaches above. PR 
is updated based on that..  This is a lot simpler .  
{code:java}
oldAssignmentInfoBytes : 77684 , newAssignmentInfoBytes: 33226{code}
P.S: My benchmark code 
{code:java}
public static void main(String[] args) {

// Assumption : Streams topology with 10 input topics, 4 sub topologies (2 
topics per sub topology) = ~20 topics
// High number of hosts = 500; High number of partitions = 128
final int numStandbyPerTask = 2;
final String topicPrefix = "streams_topic_name";
final int numTopicGroups = 4;
final int numHosts = 500;
final int numTopics = 20;
final int partitionPerTopic = 128;
final int numTasks = partitionPerTopic * numTopics;

List activeTasks = new ArrayList<>();
Map> standbyTasks = new HashMap<>();
Map> partitionsByHost = new HashMap<>();

// add tasks across each topicGroups
for (int tg =0 ; tg < numTopicGroups; tg++) {
  for (int i=0; i < (numTasks/numTopicGroups)/numHosts; i++) {
TaskId taskId = new TaskId(tg, i);
activeTasks.add(taskId);
for (int j=0; j < numStandbyPerTask; j++) {
  standbyTasks.computeIfAbsent(taskId, k -> new HashSet<>())
 .add(new TopicPartition(topicPrefix+ tg + i + j, j));
}
  }
}

// Generate actual global assignment map
Random random = new Random(12345);
for (int h=0; h < numHosts; h++) {
  Set topicPartitions = new HashSet<>();
  for (int j=0; j < numTasks/numHosts; j++) {
int topicGroupId = random.nextInt(numTopicGroups);
int topicIndex = random.nextInt(numTopics/numTopicGroups);
String topicName = topicPrefix + topicGroupId + topicIndex;
int partition = random.nextInt(128);
topicPartitions.add(new TopicPartition(topicName, partition));
  }
  HostInfo hostInfo = new HostInfo("streams_host" + h, 123456);
  partitionsByHost.put(hostInfo, topicPartitions);
}

final AssignmentInfo oldAssignmentInfo = new AssignmentInfo(4, activeTasks, 
standbyTasks, partitionsByHost, 0);
final AssignmentInfo newAssignmentInfo = new AssignmentInfo(5, activeTasks, 
standbyTasks, partitionsByHost, 0);
System.out.format("oldAssignmentInfoBytes : %d , newAssignmentInfoBytes: %d 
\n", oldAssignmentInfo.encode().array().length, 
newAssignmentInfo.encode().array().length);
}{code}


was (Author: vc):
While looking at changes to detect topology changes, realized this was still 
more complicated than it may be has to. Turns out just dictionary encoding the 
Topic names on the wire, can provide similar gains as both approaches above. PR 
is updated based on that..  This is a lot simpler . 

 
{code:java}
oldAssignmentInfoBytes : 77684 , newAssignmentInfoBytes: 33226{code}
 

P.S: My benchmark code 

 
{code:java}
public static void main(String[] args) {

// Assumption : Streams topology with 10 input topics, 4 sub topologies (2 
topics per sub topology) = ~20 topics
// High number of hosts = 500; High number of partitions = 128
final int numStandbyPerTask = 2;
final String topicPrefix = "streams_topic_name";
final int numTopicGroups = 4;
final int numHosts = 500;
final int numTopics = 20;
final int partitionPerTopic = 128;
final int numTasks = partitionPerTopic * numTopics;

List activeTasks = new ArrayList<>();
Map> standbyTasks = new HashMap<>();
Map> partitionsByHost = new HashMap<>();

// add tasks across each topicGroups
for (int tg =0 ; tg < numTopicGroups; tg++) {
  for (int i=0; i < (numTasks/numTopicGroups)/numHosts; i++) {
TaskId taskId = new TaskId(tg, i);
activeTasks.add(taskId);
for (int j=0; j < numStandbyPerTask; j++) {
  standbyTasks.computeIfAbsent(taskId, k -> new HashSet<>())
 .add(new TopicPartition(topicPrefix+ tg + i + j, j));
}
  }
}

// Generate actual global assignment map
Random random = new Random(12345);
for (int h=0; h < numHosts; h++) {
  Set topicPartitions = new HashSet<>();
  for (int j=0; j < numTasks/numHosts; j++) {
int topicGroupId = random.nextInt(numTopicGroups);
int topicIndex = random.nextInt(numTopics/numTopicGroups);
String topicName = topicPrefix + topicGroupId + topicIndex;
int partition = random.nextInt(128);
topicPartitions.add(new TopicPartition(topicName, partition));
  }
  HostInfo hostInfo = new HostInfo("streams_host" + h, 123456);
  partitionsByHost.put(hostInfo, topicPartitions);
}

final AssignmentInfo oldAssignmentInfo = new AssignmentInfo(4, activeTasks, 
standbyTasks, partitionsByHost, 0);
final AssignmentInfo newAssignmentInfo = new AssignmentInfo(5, activeTasks, 
standbyTasks, partitionsByHost, 0);

[jira] [Created] (KAFKA-8790) [kafka-connect] KafkaBaseLog.WorkThread not recoverable

2019-08-12 Thread Qinghui Xu (JIRA)
Qinghui Xu created KAFKA-8790:
-

 Summary: [kafka-connect] KafkaBaseLog.WorkThread not recoverable
 Key: KAFKA-8790
 URL: https://issues.apache.org/jira/browse/KAFKA-8790
 Project: Kafka
  Issue Type: Bug
Reporter: Qinghui Xu


We have a kafka (source) connector that's copying data from some kafka cluster 
to the target cluster. The connector is deployed to a bunch of workers running 
on mesos, thus the lifecycle of the workers are managed by mesos. Workers 
should be recovered by mesos in case of failure, and then source tasks will 
rely on kafka connect's KafkaOffsetBackingStore to recover the offsets to 
proceed.

Recently we witness some unrecoverable situation, though: worker is not doing 
anything after some network reset on the host where the worker is running. More 
specifically, it seems that the kafka connect tasks' on that worker stop to 
poll source kafka cluster, because the consumers are stuck in a rebalance state.

After some digging, we found that the thread to handle the source task offset 
recovery is dead, which makes the all rebalancing tasks stuck in the state of 
reading back the offset. The log we saw in our connect task:
{code:java}
2019-08-12 14:29:28,089 ERROR Unexpected exception in Thread[KafkaBasedLog Work 
Thread - kc_replicator_offsets,5,main] 
(org.apache.kafka.connect.util.KafkaBasedLog)
org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times 
in 30001ms{code}
As far as I can see 
([https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L339]),
 the thread will be dead in case of error, while the worker is still alive, 
which means a worker without the thread to recover offset thus all tasks on 
that worker are not recoverable and will stuck in case of failure.

 

Solution to fix this issue will ideally either of the following:
 * Make the KafkaBasedLog Work Thread recoverable from error
 * Or KafkaBasedLog Work Thread death should make the worker exit (a finally 
clause to call System.exit), then the worker lifecycle management (in our case, 
it's mesos) will restart the worker elsewhere

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905343#comment-16905343
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 5:06 PM:
-

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url property or something? Sorry for the 
noise.

For future Googlers, I created this issue instead: 
[https://github.com/confluentinc/schema-registry/issues/1185]


was (Author: rocketraman):
UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url property or something? Sorry for the 
noise.

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 7:23 PM:
-

I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout.

 


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Ok more information here... the problem appears to be that the 5.3.0 client 
needs a much bigger timeout to work consistently i.e. 60s instead of 15s. At 
15s, the 5.0.3 client works consistently, but the 5.3.0 client times out every 
time. The SR is simply never called because the consumer never receives any 
messages within the timeout. So this may be a performance regression with Kafka 
and is not related to the SR, or a change in the behavior of the `--timeout-ms` 
command.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs 

[jira] [Updated] (KAFKA-8789) kafka-console-consumer needs bigger timeout-ms setting in order to work

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-console-consumer needs bigger timeout-ms setting in order to 
work  (was: kafka-console-consumer performance regression)

> kafka-console-consumer needs bigger timeout-ms setting in order to work
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer needs bigger timeout-ms setting in order to work

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 7:49 PM:
-

I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 45000
Processed a total of 1 messages
3.12user 0.27system 0:32.55elapsed 10%CPU (0avgtext+0avgdata 178252maxresident)k
0inputs+0outputs (0major+41263minor)pagefaults 0swaps
{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout.

 


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, 

[jira] [Commented] (KAFKA-8601) Producer Improvement: Sticky Partitioner

2019-08-12 Thread Justine Olshan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905585#comment-16905585
 ] 

Justine Olshan commented on KAFKA-8601:
---

Just have one more PR for this (implementing the additional partitioner). Then 
we can close it.

> Producer Improvement: Sticky Partitioner
> 
>
> Key: KAFKA-8601
> URL: https://issues.apache.org/jira/browse/KAFKA-8601
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently the default partitioner uses a round-robin strategy to partition 
> non-keyed values. The idea is to implement a "sticky partitioner" that 
> chooses a partition for a topic and sends all records to that partition until 
> the batch is sent. Then a new partition is chosen. This new partitioner will 
> increase batching and decrease latency. 
> KIP link: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (KAFKA-8791) RocksDBTimestampeStore should open in regular mode for new store

2019-08-12 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-8791:
--

Assignee: Matthias J. Sax

> RocksDBTimestampeStore should open in regular mode for new store
> 
>
> Key: KAFKA-8791
> URL: https://issues.apache.org/jira/browse/KAFKA-8791
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
> Fix For: 2.4.0, 2.3.1
>
>
> KIP-258 introduced RocksDBTimestampedStore that supports an upgrade mode to 
> migrate data in existing persistent stores to the new storage format. 
> However, RocksDBTimestampsStore incorrectly also uses upgrade mode for new 
> stores, instead of opening in regular mode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8774) Connect REST API exposes plaintext secrets in tasks endpoint if config value contains additional characters

2019-08-12 Thread Arjun Satish (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arjun Satish updated KAFKA-8774:

Summary: Connect REST API exposes plaintext secrets in tasks endpoint if 
config value contains additional characters  (was: Connect REST API exposes 
plaintext secrets in tasks endpoint)

> Connect REST API exposes plaintext secrets in tasks endpoint if config value 
> contains additional characters
> ---
>
> Key: KAFKA-8774
> URL: https://issues.apache.org/jira/browse/KAFKA-8774
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0
>Reporter: Oleksandr Diachenko
>Assignee: Oleksandr Diachenko
>Priority: Critical
>
> I have configured a Connector to use externalized secrets, and the following 
> endpoint returns secrets in the externalized form: 
> {code:java}
> curl localhost:8083/connectors/foobar|jq
> {code}
> {code:java}
> {
> "name": "foobar",
> "config": {
> "connector.class": "io.confluent.connect.s3.S3SinkConnector",
> ...
> "consumer.override.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
> password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
> "admin.override.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
> password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
> "consumer.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
> password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
> "producer.override.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
> password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
> "producer.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
> password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
> ...
> },
> "tasks": [
> { "connector": "foobar", "task": 0 }
> ],
> "type": "sink"
> }{code}
> But another endpoint returns secrets in plain text:
> {code:java}
> curl localhost:8083/connectors/foobar/tasks|jq
> {code}
> {code:java}
> [
>   {
> "id": {
>   "connector": "lcc-kgkpm",
>   "task": 0
> },
> "config": {
>   "connector.class": "io.confluent.connect.s3.S3SinkConnector",
>   ...
>   "errors.log.include.messages": "true",
>   "flush.size": "1000",
>   "consumer.override.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"OOPS\" password=\"SURPRISE\";",
>   "admin.override.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"OOPS\" password=\"SURPRISE\";",
>   "consumer.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"OOPS\" password=\"SURPRISE\";",
>   "producer.override.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"OOPS\" password=\"SURPRISE\";",
>   "producer.sasl.jaas.config": 
> "org.apache.kafka.common.security.plain.PlainLoginModule required 
> username=\"OOPS\" password=\"SURPRISE\";",
>   ...
> }
>   }
> ]
> {code}
>  
> EDIT: This bug only shows up if the secrets are a substring in the config 
> value. If they form the entirety of the config value, then the secrets are 
> hidden at the /tasks endpoints.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8791) RocksDBTimestampeStore should open in regular mode for new store

2019-08-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905713#comment-16905713
 ] 

ASF GitHub Bot commented on KAFKA-8791:
---

mjsax commented on pull request #7201: KAFKA-8791: RocksDBTimestampedStore 
should open in regular mode by default
URL: https://github.com/apache/kafka/pull/7201
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RocksDBTimestampeStore should open in regular mode for new store
> 
>
> Key: KAFKA-8791
> URL: https://issues.apache.org/jira/browse/KAFKA-8791
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
> Fix For: 2.4.0, 2.3.1
>
>
> KIP-258 introduced RocksDBTimestampedStore that supports an upgrade mode to 
> migrate data in existing persistent stores to the new storage format. 
> However, RocksDBTimestampsStore incorrectly also uses upgrade mode for new 
> stores, instead of opening in regular mode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-08-12 Thread Richard Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905747#comment-16905747
 ] 

Richard Yu commented on KAFKA-8522:
---

[~jagsancio] any thoughts?

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Improvement
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-08-12 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905792#comment-16905792
 ] 

Jun Rao commented on KAFKA-8522:


[~Yohan123]: For upgrade, we can roughly do the following. If the new code only 
finds the old style checkpoint file, it uses that for the cleaning offsets. 
Once the new checkpoint files have been written, the old checkpoint file can be 
deleted. After that, the new code will be using the cleaning offsets in the new 
checkpoint file.

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Improvement
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8792) Default ZK configuration to disable AdminServer

2019-08-12 Thread Gwen Shapira (JIRA)
Gwen Shapira created KAFKA-8792:
---

 Summary: Default ZK configuration to disable AdminServer
 Key: KAFKA-8792
 URL: https://issues.apache.org/jira/browse/KAFKA-8792
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira


Kafka ships with default ZK configuration. With the upgrade to ZK 3.5, our 
defaults include running ZK's AdminServer on port 8080. This is an unfortunate 
default as it tends to cause conflicts. 

I suggest we default to disable ZK's AdminServer in the default ZK configs that 
we ship. Users who want to use AdminServer can enable it and set the port to 
something that works for them. Realistically, in most production environments, 
a different ZK server will be used anyway. So this is mostly to save new users 
who are trying Kafka on their own machine from running into accidental and 
frustrating port conflicts.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8791) RocksDBTimestampedStore should open in regular mode for new stores

2019-08-12 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8791:
---
Summary: RocksDBTimestampedStore should open in regular mode for new stores 
 (was: RocksDBTimestampeStore should open in regular mode for new store)

> RocksDBTimestampedStore should open in regular mode for new stores
> --
>
> Key: KAFKA-8791
> URL: https://issues.apache.org/jira/browse/KAFKA-8791
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
> Fix For: 2.4.0, 2.3.1
>
>
> KIP-258 introduced RocksDBTimestampedStore that supports an upgrade mode to 
> migrate data in existing persistent stores to the new storage format. 
> However, RocksDBTimestampsStore incorrectly also uses upgrade mode for new 
> stores, instead of opening in regular mode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8792) Default ZK configuration to disable AdminServer

2019-08-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905827#comment-16905827
 ] 

Sönke Liebau commented on KAFKA-8792:
-

I agree, port 8080 tends to be one of the first ports software binds to for 
some reason, so there is a reasonable chance that people wo test locally will 
have something running on that port.

> Default ZK configuration to disable AdminServer
> ---
>
> Key: KAFKA-8792
> URL: https://issues.apache.org/jira/browse/KAFKA-8792
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Priority: Major
>
> Kafka ships with default ZK configuration. With the upgrade to ZK 3.5, our 
> defaults include running ZK's AdminServer on port 8080. This is an 
> unfortunate default as it tends to cause conflicts. 
> I suggest we default to disable ZK's AdminServer in the default ZK configs 
> that we ship. Users who want to use AdminServer can enable it and set the 
> port to something that works for them. Realistically, in most production 
> environments, a different ZK server will be used anyway. So this is mostly to 
> save new users who are trying Kafka on their own machine from running into 
> accidental and frustrating port conflicts.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8774) Connect REST API exposes plaintext secrets in tasks endpoint

2019-08-12 Thread Arjun Satish (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arjun Satish updated KAFKA-8774:

Description: 
I have configured a Connector to use externalized secrets, and the following 
endpoint returns secrets in the externalized form: 
{code:java}
curl localhost:8083/connectors/foobar|jq
{code}
{code:java}
{
"name": "foobar",
"config": {

"connector.class": "io.confluent.connect.s3.S3SinkConnector",
...
"consumer.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
"admin.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
"consumer.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
"producer.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
"producer.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
...
},
"tasks": [

{ "connector": "foobar", "task": 0 }

],
"type": "sink"
}{code}
But another endpoint returns secrets in plain text:
{code:java}
curl localhost:8083/connectors/foobar/tasks|jq
{code}
{code:java}
[
  {
"id": {
  "connector": "lcc-kgkpm",
  "task": 0
},
"config": {
  "connector.class": "io.confluent.connect.s3.S3SinkConnector",
  ...
  "errors.log.include.messages": "true",
  "flush.size": "1000",
  "consumer.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"OOPS\" password=\"SURPRISE\";",
  "admin.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"OOPS\" password=\"SURPRISE\";",
  "consumer.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"OOPS\" password=\"SURPRISE\";",
  "producer.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"OOPS\" password=\"SURPRISE\";",
  "producer.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"OOPS\" password=\"SURPRISE\";",
  ...
}
  }
]
{code}
 
EDIT: This bug only shows up if the secrets are a substring in the config 
value. If they form the entirety of the config value, then the secrets are 
hidden at the /tasks endpoints.

  was:
I have configured a Connector to use externalized secrets, and the following 
endpoint returns secrets in the externalized form: 
{code:java}
curl localhost:8083/connectors/foobar|jq
{code}
{code:java}
{
"name": "foobar",
"config": {

"connector.class": "io.confluent.connect.s3.S3SinkConnector",
...
"consumer.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
"admin.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
"consumer.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
"producer.override.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
"producer.sasl.jaas.config": 
"org.apache.kafka.common.security.plain.PlainLoginModule required 
username=\"${file:/some/secret/path/secrets.properties:kafka.api.key}\" 
password=\"${file:/some/secret/path/secrets.properties:kafka.api.secret}\";",
...
},
"tasks": [

{ "connector": "foobar", "task": 0 }

],
"type": "sink"
}{code}
But another endpoint returns secrets in plain text:
{code:java}
curl localhost:8083/connectors/foobar/tasks|jq
{code}
{code:java}
[
  {

[jira] [Updated] (KAFKA-8791) RocksDBTimestampedStore should open in regular mode for new stores

2019-08-12 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8791:
---
Description: 
[KIP-258|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB]]
 introduced `RocksDBTimestampedStore` that supports an "upgrade mode" to 
migrate data in existing persistent stores to the new storage format. However, 
`RocksDBTimestampedStore` incorrectly also uses "upgrade mode" for new stores 
instead of opening them in "regular mode".  (was: KIP-258 introduced 
RocksDBTimestampedStore that supports an upgrade mode to migrate data in 
existing persistent stores to the new storage format. However, 
RocksDBTimestampsStore incorrectly also uses upgrade mode for new stores, 
instead of opening in regular mode.)

> RocksDBTimestampedStore should open in regular mode for new stores
> --
>
> Key: KAFKA-8791
> URL: https://issues.apache.org/jira/browse/KAFKA-8791
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
> Fix For: 2.4.0, 2.3.1
>
>
> [KIP-258|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB]]
>  introduced `RocksDBTimestampedStore` that supports an "upgrade mode" to 
> migrate data in existing persistent stores to the new storage format. 
> However, `RocksDBTimestampedStore` incorrectly also uses "upgrade mode" for 
> new stores instead of opening them in "regular mode".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8791) RocksDBTimestampedStore should open in regular mode for new stores

2019-08-12 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8791:
---
Description: 
[KIP-258|https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB]
 introduced `RocksDBTimestampedStore` that supports an "upgrade mode" to 
migrate data in existing persistent stores to the new storage format. However, 
`RocksDBTimestampedStore` incorrectly also uses "upgrade mode" for new stores 
instead of opening them in "regular mode".  (was: 
[KIP-258|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB]]
 introduced `RocksDBTimestampedStore` that supports an "upgrade mode" to 
migrate data in existing persistent stores to the new storage format. However, 
`RocksDBTimestampedStore` incorrectly also uses "upgrade mode" for new stores 
instead of opening them in "regular mode".)

> RocksDBTimestampedStore should open in regular mode for new stores
> --
>
> Key: KAFKA-8791
> URL: https://issues.apache.org/jira/browse/KAFKA-8791
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
> Fix For: 2.4.0, 2.3.1
>
>
> [KIP-258|https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB]
>  introduced `RocksDBTimestampedStore` that supports an "upgrade mode" to 
> migrate data in existing persistent stores to the new storage format. 
> However, `RocksDBTimestampedStore` incorrectly also uses "upgrade mode" for 
> new stores instead of opening them in "regular mode".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8792) Default ZK configuration to disable AdminServer

2019-08-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905828#comment-16905828
 ] 

ASF GitHub Bot commented on KAFKA-8792:
---

gwenshap commented on pull request #7203: KAFKA-8792; Default ZK configuration 
to disable AdminServer
URL: https://github.com/apache/kafka/pull/7203
 
 
   Kafka ships with default ZK configuration. With the upgrade to ZK 3.5, our 
defaults include running ZK's AdminServer on port 8080. This is an unfortunate 
default as it tends to cause conflicts.
   
   I suggest we default to disable ZK's AdminServer in the default ZK configs 
that we ship. Users who want to use AdminServer can enable it and set the port 
to something that works for them. Realistically, in most production 
environments, a different ZK server will be used anyway. So this is mostly to 
save new users who are trying Kafka on their own machine from running into 
accidental and frustrating port conflicts.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default ZK configuration to disable AdminServer
> ---
>
> Key: KAFKA-8792
> URL: https://issues.apache.org/jira/browse/KAFKA-8792
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Priority: Major
>
> Kafka ships with default ZK configuration. With the upgrade to ZK 3.5, our 
> defaults include running ZK's AdminServer on port 8080. This is an 
> unfortunate default as it tends to cause conflicts. 
> I suggest we default to disable ZK's AdminServer in the default ZK configs 
> that we ship. Users who want to use AdminServer can enable it and set the 
> port to something that works for them. Realistically, in most production 
> environments, a different ZK server will be used anyway. So this is mostly to 
> save new users who are trying Kafka on their own machine from running into 
> accidental and frustrating port conflicts.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (KAFKA-7499) Extend ProductionExceptionHandler to cover serialization exceptions

2019-08-12 Thread Kamal Chandraprakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash reassigned KAFKA-7499:
---

Assignee: (was: Kamal Chandraprakash)

> Extend ProductionExceptionHandler to cover serialization exceptions
> ---
>
> Key: KAFKA-7499
> URL: https://issues.apache.org/jira/browse/KAFKA-7499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: beginner, newbie
>
> In 
> [KIP-210|https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce],
>  an exception handler for the write path was introduced. This exception 
> handler covers exception that are raised in the producer callback.
> However, serialization happens before the data is handed to the producer with 
> Kafka Streams itself and the producer uses `byte[]/byte[]` key-value-pair 
> types.
> Thus, we might want to extend the ProductionExceptionHandler to cover 
> serialization exception, too, to skip over corrupted output messages. An 
> example could be a "String" message that contains invalid JSON and should be 
> serialized as JSON.
> This ticket might required a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-7499) Extend ProductionExceptionHandler to cover serialization exceptions

2019-08-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905830#comment-16905830
 ] 

ASF GitHub Bot commented on KAFKA-7499:
---

kamalcph commented on pull request #5969: KAFKA-7499: Extend 
ProductionExceptionHandler to cover serialization exceptions
URL: https://github.com/apache/kafka/pull/5969
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Extend ProductionExceptionHandler to cover serialization exceptions
> ---
>
> Key: KAFKA-7499
> URL: https://issues.apache.org/jira/browse/KAFKA-7499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: beginner, newbie
>
> In 
> [KIP-210|https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce],
>  an exception handler for the write path was introduced. This exception 
> handler covers exception that are raised in the producer callback.
> However, serialization happens before the data is handed to the producer with 
> Kafka Streams itself and the producer uses `byte[]/byte[]` key-value-pair 
> types.
> Thus, we might want to extend the ProductionExceptionHandler to cover 
> serialization exception, too, to skip over corrupted output messages. An 
> example could be a "String" message that contains invalid JSON and should be 
> serialized as JSON.
> This ticket might required a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (KAFKA-8792) Default ZK configuration to disable AdminServer

2019-08-12 Thread Gwen Shapira (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned KAFKA-8792:
---

Assignee: Gwen Shapira

> Default ZK configuration to disable AdminServer
> ---
>
> Key: KAFKA-8792
> URL: https://issues.apache.org/jira/browse/KAFKA-8792
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>Priority: Major
>
> Kafka ships with default ZK configuration. With the upgrade to ZK 3.5, our 
> defaults include running ZK's AdminServer on port 8080. This is an 
> unfortunate default as it tends to cause conflicts. 
> I suggest we default to disable ZK's AdminServer in the default ZK configs 
> that we ship. Users who want to use AdminServer can enable it and set the 
> port to something that works for them. Realistically, in most production 
> environments, a different ZK server will be used anyway. So this is mostly to 
> save new users who are trying Kafka on their own machine from running into 
> accidental and frustrating port conflicts.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 7:22 PM:
-

I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Ok more information here... the problem appears to be that the 5.3.0 client 
needs a much bigger timeout to work consistently i.e. 60s instead of 15s. At 
15s, the 5.0.3 client works consistently, but the 5.3.0 client times out every 
time. The SR is simply never called because the consumer never receives any 
messages within the timeout. So this may be a performance regression with Kafka 
and is not related to the SR, or a change in the behavior of the `--timeout-ms` 
command.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps{code}

so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout.

 


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Ok more information here... the problem appears to be that the 5.3.0 client 
needs a much bigger timeout to work consistently i.e. 60s instead of 15s. At 
15s, the 5.0.3 client works consistently, but the 5.3.0 client times out every 
time. The SR is simply never called because the consumer never receives any 
messages within the timeout. So this may be a performance regression with Kafka 
and is not related to the SR, or a change in the behavior of the `--timeout-ms` 
command.

Also interesting is that the total time for the command to run is pretty much 
the same:

```
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000

[jira] [Commented] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta commented on KAFKA-8789:


I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Ok more information here... the problem appears to be that the 5.3.0 client 
needs a much bigger timeout to work consistently i.e. 60s instead of 15s. At 
15s, the 5.0.3 client works consistently, but the 5.3.0 client times out every 
time. The SR is simply never called because the consumer never receives any 
messages within the timeout. So this may be a performance regression with Kafka 
and is not related to the SR, or a change in the behavior of the `--timeout-ms` 
command.

Also interesting is that the total time for the command to run is pretty much 
the same:

```
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps
```

so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout.

 

> kafka-console-consumer performance regression
> -
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8522) Tombstones can survive forever

2019-08-12 Thread Gwen Shapira (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-8522:

Issue Type: Improvement  (was: Bug)

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Improvement
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-4545) tombstone needs to be removed after delete.retention.ms has passed after it has been cleaned

2019-08-12 Thread Gwen Shapira (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-4545:

Issue Type: Improvement  (was: Bug)

> tombstone needs to be removed after delete.retention.ms has passed after it 
> has been cleaned
> 
>
> Key: KAFKA-4545
> URL: https://issues.apache.org/jira/browse/KAFKA-4545
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Jun Rao
>Assignee: Richard Yu
>Priority: Major
>  Labels: needs-kip
>
> The algorithm for removing the tombstone in a compacted is supposed to be the 
> following.
> 1. Tombstone is never removed when it's still in the dirty portion of the log.
> 2. After the tombstone is in the cleaned portion of the log, we further delay 
> the removal of the tombstone by delete.retention.ms since the time the 
> tombstone is in the cleaned portion.
> Once the tombstone is in the cleaned portion, we know there can't be any 
> message with the same key before the tombstone. Therefore, for any consumer, 
> if it reads a non-tombstone message before the tombstone, but can read to the 
> end of the log within delete.retention.ms, it's guaranteed to see the 
> tombstone.
> However, the current implementation doesn't seem correct. We delay the 
> removal of the tombstone by delete.retention.ms since the last modified time 
> of the last cleaned segment. However, the last modified time is inherited 
> from the original segment, which could be arbitrarily old. So, the tombstone 
> may not be preserved as long as it needs to be.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 11:11 PM:
--

The problem seems to be related to the `--timeout-ms` parameter.

In every case, the total time for the command to run is pretty much the same:
{code:java}
# Confluent 5.0.3 with no timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

# Confluent 5.3.0 with no timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

# Confluent 5.0.3 with 15s timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

# Confluent 5.3.0 with 15s timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

# Confluent 5.3.0 with 45s timeout works

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 45000
Processed a total of 1 messages
3.12user 0.27system 0:32.55elapsed 10%CPU (0avgtext+0avgdata 178252maxresident)k
0inputs+0outputs (0major+41263minor)pagefaults 0swaps
{code}
but newer client versions appear to need a longer timeout to work correctly Has 
the behavior of the `--timeout-ms` parameter changed in some way?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout, so having the smallest timeout possible that will likely give me 
all the messages is desirable.


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
# Confluent 5.0.3 with no timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

# Confluent 5.3.0 with no timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

# Confluent 5.0.3 with 15s timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

# Confluent 5.3.0 with 15s timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

# Confluent 5.3.0 with 45s timeout works

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 45000
Processed a total of 1 messages
3.12user 0.27system 0:32.55elapsed 10%CPU (0avgtext+0avgdata 178252maxresident)k
0inputs+0outputs (0major+41263minor)pagefaults 0swaps
{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only 

[jira] [Issue Comment Deleted] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Comment: was deleted

(was: And the same behavior for the regular console consumer:
{code:java}
confluent-5.3.0 $ time bin/kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 16:57:04,777] ERROR Error processing message, terminating consumer 
process:  (kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
1.97user 0.23system 0:31.48elapsed 7%CPU (0avgtext+0avgdata 150260maxresident)k
0inputs+0outputs (0major+34637minor)pagefaults 0swaps{code})

> kafka-console-consumer timeout-ms setting behaves incorrectly with older 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 
> broker. When I run the following tools command using the older Kafka client 
> included in Confluent 5.0.3.
> bin/kafka-console-consumer \ 
>   --bootstrap-server $KAFKA \ 
>   --topic x \ 
>   --from-beginning --max-messages 1 \
>  --timeout-ms 15000
> I get 1 message as expected.
> However, when running the exact same command using the console consumer 
> included with Confluent 5.3.0, I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 11:09 PM:
--

I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
# Confluent 5.0.3 with no timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

# Confluent 5.3.0 with no timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

# Confluent 5.0.3 with 15s timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

# Confluent 5.3.0 with 15s timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

# Confluent 5.3.0 with 45s timeout works

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 45000
Processed a total of 1 messages
3.12user 0.27system 0:32.55elapsed 10%CPU (0avgtext+0avgdata 178252maxresident)k
0inputs+0outputs (0major+41263minor)pagefaults 0swaps
{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout, so having the smallest timeout possible that will likely give me 
all the messages is desirable.


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 

[jira] [Commented] (KAFKA-8412) Still a nullpointer exception thrown on shutdown while flushing before closing producers

2019-08-12 Thread Chris Pettitt (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905598#comment-16905598
 ] 

Chris Pettitt commented on KAFKA-8412:
--

Having played with this a bit more I think its best to go for #1 for now, which 
is a point fix with a pretty small scope, i.e. low risk.

I do see value in rethinking state a bit particularly around ownership and 
transitions, perhaps as a separate ticket. As someone new coming to the code it 
is difficult to work out which states and transitions are valid at a glance, 
especially because it is distributed across at least AssignedTasks, 
AbstractTask, and StreamTask.

> Still a nullpointer exception thrown on shutdown while flushing before 
> closing producers
> 
>
> Key: KAFKA-8412
> URL: https://issues.apache.org/jira/browse/KAFKA-8412
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Sebastiaan
>Assignee: Chris Pettitt
>Priority: Minor
>
> I found a closed issue and replied there but decided to open one myself 
> because although they're related they're slightly different. The original 
> issue is at https://issues.apache.org/jira/browse/KAFKA-7678
> The fix there has been to implement a null check around closing a producer 
> because in some cases the producer is already null there (has been closed 
> already)
> In version 2.1.1 we are getting a very similar exception, but in the 'flush' 
> method that is called pre-close. This is in the log:
> {code:java}
> message: stream-thread 
> [webhook-poster-7034dbb0-7423-476b-98f3-d18db675d6d6-StreamThread-1] Failed 
> while closing StreamTask 1_26 due to the following error:
> logger_name: org.apache.kafka.streams.processor.internals.AssignedStreamsTasks
> java.lang.NullPointerException: null
>     at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:245)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:493)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:443)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:568)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:691)
>     at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.close(AssignedTasks.java:397)
>     at 
> org.apache.kafka.streams.processor.internals.TaskManager.shutdown(TaskManager.java:260)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1181)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:758){code}
> Followed by:
>  
> {code:java}
> message: task [1_26] Could not close task due to the following error:
> logger_name: org.apache.kafka.streams.processor.internals.StreamTask
> java.lang.NullPointerException: null
>     at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:245)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:493)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:443)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:568)
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:691)
>     at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.close(AssignedTasks.java:397)
>     at 
> org.apache.kafka.streams.processor.internals.TaskManager.shutdown(TaskManager.java:260)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1181)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:758){code}
> If I look at the source code at this point, I see a nice null check in the 
> close method, but not in the flush method that is called just before that:
> {code:java}
> public void flush() {
>     this.log.debug("Flushing producer");
>     this.producer.flush();
>     this.checkForException();
> }
> public void close() {
>     this.log.debug("Closing producer");
>     if (this.producer != null) {
>     this.producer.close();
>     this.producer = null;
>     }
>     this.checkForException();
> }{code}
> Seems to my (ignorant) eye that the flush method should also be wrapped in a 
> null check in the same way as has been done for close.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8601) Producer Improvement: Sticky Partitioner

2019-08-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905571#comment-16905571
 ] 

ASF GitHub Bot commented on KAFKA-8601:
---

jolshan commented on pull request #7199: KAFKA-8601: StickyRoundRobinPartitioner
URL: https://github.com/apache/kafka/pull/7199
 
 
   Creates a partitioner that utilizes the sticky partitioning strategy on both 
records with null keys and non-null keys. Builds off of KIP-480. This 
partitioner will not be used as the default; usage is configured by the user. 
   
   Includes tests of both keyed and non-keyed scenarios as well as tests for 
when a partition is unavailable. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Producer Improvement: Sticky Partitioner
> 
>
> Key: KAFKA-8601
> URL: https://issues.apache.org/jira/browse/KAFKA-8601
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently the default partitioner uses a round-robin strategy to partition 
> non-keyed values. The idea is to implement a "sticky partitioner" that 
> chooses a partition for a topic and sends all records to that partition until 
> the batch is sent. Then a new partition is chosen. This new partitioner will 
> increase batching and decrease latency. 
> KIP link: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-4545) tombstone needs to be removed after delete.retention.ms has passed after it has been cleaned

2019-08-12 Thread Gwen Shapira (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-4545:

Priority: Minor  (was: Major)

> tombstone needs to be removed after delete.retention.ms has passed after it 
> has been cleaned
> 
>
> Key: KAFKA-4545
> URL: https://issues.apache.org/jira/browse/KAFKA-4545
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Jun Rao
>Assignee: Richard Yu
>Priority: Minor
>  Labels: needs-kip
>
> The algorithm for removing the tombstone in a compacted is supposed to be the 
> following.
> 1. Tombstone is never removed when it's still in the dirty portion of the log.
> 2. After the tombstone is in the cleaned portion of the log, we further delay 
> the removal of the tombstone by delete.retention.ms since the time the 
> tombstone is in the cleaned portion.
> Once the tombstone is in the cleaned portion, we know there can't be any 
> message with the same key before the tombstone. Therefore, for any consumer, 
> if it reads a non-tombstone message before the tombstone, but can read to the 
> end of the log within delete.retention.ms, it's guaranteed to see the 
> tombstone.
> However, the current implementation doesn't seem correct. We delay the 
> removal of the tombstone by delete.retention.ms since the last modified time 
> of the last cleaned segment. However, the last modified time is inherited 
> from the original segment, which could be arbitrarily old. So, the tombstone 
> may not be preserved as long as it needs to be.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8789) kafka-console-consumer needs bigger timeout-ms setting in order to work

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905652#comment-16905652
 ] 

Raman Gupta commented on KAFKA-8789:


And the same behavior for the regular console consumer:
{code:java}
confluent-5.3.0 $ time bin/kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 16:57:04,777] ERROR Error processing message, terminating consumer 
process:  (kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
1.97user 0.23system 0:31.48elapsed 7%CPU (0avgtext+0avgdata 150260maxresident)k
0inputs+0outputs (0major+34637minor)pagefaults 0swaps{code}

> kafka-console-consumer needs bigger timeout-ms setting in order to work
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Description: 
I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 broker. 
When I run the following tools command using the older Kafka client included in 
Confluent 5.0.3.

bin/kafka-console-consumer \ 
  --bootstrap-server $KAFKA \ 
  --topic x \ 
  --from-beginning --max-messages 1 \
 --timeout-ms 15000

I get 1 message as expected.

However, when running the exact same command using the console consumer 
included with Confluent 5.3.0, I get 
org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.

NOTE: I am using the Confluent distribution of Kafka for the client side tools, 
specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try to 
replicate with a vanilla Kafka if necessary.

  was:
I have a topic with about 20,000 events in it. When I run the following tools 
command using Kafka 2.

bin/kafka-avro-console-consumer \ 
  --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
  --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
  --from-beginning --max-messages 100 \
  --isolation-level read_committed --skip-message-on-error \
  --timeout-ms 15000

I get 100 messages as expected.

However, when running the exact same command using Kafka 2.3.0 I get 
org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.

The version of Kafka on the server is 2.3.0.

NOTE: I am using the Confluent distribution of Kafka for the client side tools, 
specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try to 
replicate with a vanilla Kafka if necessary.


> kafka-console-consumer timeout-ms setting behaves incorrectly with older 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 
> broker. When I run the following tools command using the older Kafka client 
> included in Confluent 5.0.3.
> bin/kafka-console-consumer \ 
>   --bootstrap-server $KAFKA \ 
>   --topic x \ 
>   --from-beginning --max-messages 1 \
>  --timeout-ms 15000
> I get 1 message as expected.
> However, when running the exact same command using the console consumer 
> included with Confluent 5.3.0, I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Issue Comment Deleted] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Comment: was deleted

(was: UPDATE: The error message may be coming from the schema registry, which 
would put it outside the purview of the Kafka project. Sorry for the noise.

For future Googlers, I created this issue instead: 
[https://github.com/confluentinc/schema-registry/issues/1185])

> kafka-console-consumer timeout-ms setting behaves incorrectly with older 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 
> broker. When I run the following tools command using the older Kafka client 
> included in Confluent 5.0.3.
> bin/kafka-console-consumer \ 
>   --bootstrap-server $KAFKA \ 
>   --topic x \ 
>   --from-beginning --max-messages 1 \
>  --timeout-ms 15000
> I get 1 message as expected.
> However, when running the exact same command using the console consumer 
> included with Confluent 5.3.0, I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-console-consumer timeout-ms setting behaves incorrectly with 
older client  (was: kafka-console-consumer needs bigger timeout-ms setting in 
order to work)

> kafka-console-consumer timeout-ms setting behaves incorrectly with older 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)