[jira] [Assigned] (KAFKA-9737) Describing log dir reassignment times out if broker is offline

2020-03-20 Thread Tom Bentley (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Bentley reassigned KAFKA-9737:
--

Assignee: Tom Bentley

> Describing log dir reassignment times out if broker is offline
> --
>
> Key: KAFKA-9737
> URL: https://issues.apache.org/jira/browse/KAFKA-9737
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Tom Bentley
>Priority: Major
>
> If there is any broker offline when trying to describe a log dir 
> reassignment, then we get the something like the following error:
> {code}
> Status of partition reassignment: 
>   
>Partitions reassignment failed due to 
> org.apache.kafka.common.errors.TimeoutException: 
> Call(callName=describeReplicaLogDirs, deadlineMs=1584663960068, tries=1, 
> nextAllowedTryMs=158466
> 3960173) timed out at 1584663960073 after 1 attempt(s)
>   
>
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: 
> Call(callName=describeReplicaLogDirs, deadlineMs=1584663960068, tries=1, 
> nextAllowedTryMs=158
> 4663960173) timed out at 1584663960073 after 1 attempt(s) 
>   
>   
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>   
>
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>   
>  
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>   
>  
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>   
>   
> at 
> kafka.admin.ReassignPartitionsCommand$.checkIfReplicaReassignmentSucceeded(ReassignPartitionsCommand.scala:381)
>
> at 
> kafka.admin.ReassignPartitionsCommand$.verifyAssignment(ReassignPartitionsCommand.scala:98)
>  
> at 
> kafka.admin.ReassignPartitionsCommand$.verifyAssignment(ReassignPartitionsCommand.scala:90)
> at 
> kafka.admin.ReassignPartitionsCommand$.main(ReassignPartitionsCommand.scala:61)
> at 
> kafka.admin.ReassignPartitionsCommand.main(ReassignPartitionsCommand.scala)
> Caused by: org.apache.kafka.common.errors.TimeoutException: 
> Call(callName=describeReplicaLogDirs, deadlineMs=1584663960068, tries=1, 
> nextAllowedTryMs=1584663960173) timed out at 15846
> 63960073 after 1 attempt(s)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
> for a node assignment.
> {code}
> It would be nice if the tool was smart enough to notice brokers that are 
> offline and report them as such while reporting the status of reassignments 
> for online brokers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9735) Kafka Connect REST endpoint doesn't consider keystore/truststore

2020-03-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/KAFKA-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063196#comment-17063196
 ] 

Sönke Liebau commented on KAFKA-9735:
-

Hi [~SledgeHammer] a common cause for this error is if the private key is 
missing from the keystone. Are you using those same keystores for Kafka 
communication and are they working there, or are these dedicated to Connect 
Rest?

> Kafka Connect REST endpoint doesn't consider keystore/truststore
> 
>
> Key: KAFKA-9735
> URL: https://issues.apache.org/jira/browse/KAFKA-9735
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
> Environment: Windows 10 Pro x64
>Reporter: SledgeHammer
>Priority: Major
>
> Trying to secure the Kafka Connect REST endpoint with SSL fails (no cipher 
> suites in common):
> listeners.https.ssl.client.auth=required
> listeners.https.ssl.truststore.location=/progra~1/kafka_2.12-2.4.0/config/kafka.truststore.jks
> listeners.https.ssl.truststore.password=xxx
> listeners.https.ssl.keystore.location=/progra~1/kafka_2.12-2.4.0/config/kafka01.keystore.jks
> listeners.https.ssl.keystore.password=xxx
> listeners.https.ssl.key.password=xxx
> listeners.https.ssl.enabled.protocols=TLSv1.2
>  
> See: 
> [https://stackoverflow.com/questions/55220602/unable-to-configure-ssl-for-kafka-connect-rest-api]
>  
> Developer debugged and saw the configs are not being returned. I am still 
> seeing this issue in 2.4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9692) Flaky test - kafka.admin.ReassignPartitionsClusterTest#znodeReassignmentShouldOverrideApiTriggeredReassignment

2020-03-20 Thread Tom Bentley (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Bentley resolved KAFKA-9692.

Resolution: Done

This test got deleted as part of KAFKA-8820, when much of the testing changed 
to unit tests.

> Flaky test - 
> kafka.admin.ReassignPartitionsClusterTest#znodeReassignmentShouldOverrideApiTriggeredReassignment
> --
>
> Key: KAFKA-9692
> URL: https://issues.apache.org/jira/browse/KAFKA-9692
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 2.5.0
>Reporter: Tom Bentley
>Priority: Major
>  Labels: flaky-test
>
> {noformat}
> java.lang.AssertionError: expected: but was: 101)>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:120)
>   at org.junit.Assert.assertEquals(Assert.java:146)
>   at 
> kafka.admin.ReassignPartitionsClusterTest.assertReplicas(ReassignPartitionsClusterTest.scala:1220)
>   at 
> kafka.admin.ReassignPartitionsClusterTest.assertIsReassigning(ReassignPartitionsClusterTest.scala:1191)
>   at 
> kafka.admin.ReassignPartitionsClusterTest.znodeReassignmentShouldOverrideApiTriggeredReassignment(ReassignPartitionsClusterTest.scala:897)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at jdk.internal.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.jav

[jira] [Assigned] (KAFKA-9737) Describing log dir reassignment times out if broker is offline

2020-03-20 Thread Tom Bentley (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Bentley reassigned KAFKA-9737:
--

Assignee: (was: Tom Bentley)

> Describing log dir reassignment times out if broker is offline
> --
>
> Key: KAFKA-9737
> URL: https://issues.apache.org/jira/browse/KAFKA-9737
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
>
> If there is any broker offline when trying to describe a log dir 
> reassignment, then we get the something like the following error:
> {code}
> Status of partition reassignment: 
>   
>Partitions reassignment failed due to 
> org.apache.kafka.common.errors.TimeoutException: 
> Call(callName=describeReplicaLogDirs, deadlineMs=1584663960068, tries=1, 
> nextAllowedTryMs=158466
> 3960173) timed out at 1584663960073 after 1 attempt(s)
>   
>
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: 
> Call(callName=describeReplicaLogDirs, deadlineMs=1584663960068, tries=1, 
> nextAllowedTryMs=158
> 4663960173) timed out at 1584663960073 after 1 attempt(s) 
>   
>   
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>   
>
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>   
>  
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>   
>  
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>   
>   
> at 
> kafka.admin.ReassignPartitionsCommand$.checkIfReplicaReassignmentSucceeded(ReassignPartitionsCommand.scala:381)
>
> at 
> kafka.admin.ReassignPartitionsCommand$.verifyAssignment(ReassignPartitionsCommand.scala:98)
>  
> at 
> kafka.admin.ReassignPartitionsCommand$.verifyAssignment(ReassignPartitionsCommand.scala:90)
> at 
> kafka.admin.ReassignPartitionsCommand$.main(ReassignPartitionsCommand.scala:61)
> at 
> kafka.admin.ReassignPartitionsCommand.main(ReassignPartitionsCommand.scala)
> Caused by: org.apache.kafka.common.errors.TimeoutException: 
> Call(callName=describeReplicaLogDirs, deadlineMs=1584663960068, tries=1, 
> nextAllowedTryMs=1584663960173) timed out at 15846
> 63960073 after 1 attempt(s)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
> for a node assignment.
> {code}
> It would be nice if the tool was smart enough to notice brokers that are 
> offline and report them as such while reporting the status of reassignments 
> for online brokers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-1368) Upgrade log4j

2020-03-20 Thread Tom Bentley (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063231#comment-17063231
 ] 

Tom Bentley commented on KAFKA-1368:


[~mimaison], [~ecomar] are you working on this? If not do you mind if I try to 
take it forward?

> Upgrade log4j
> -
>
> Key: KAFKA-1368
> URL: https://issues.apache.org/jira/browse/KAFKA-1368
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Vladislav Pernin
>Assignee: Mickael Maison
>Priority: Major
>
> Upgrade log4j to at least 1.2.16 ou 1.2.17.
> Usage of EnhancedPatternLayout will be possible.
> It allows to set delimiters around the full log, stacktrace included, making 
> log messages collection easier with tools like Logstash.
> Example : <[%d{}]...[%t] %m%throwable>%n
> <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch 
> request for partition [X,6] offset 700 from consumer with correlation id 
> 0 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only 
> have log segments in the range 16021 to 16021.
> at kafka.log.Log.read(Log.scala:429)
> ...
> at java.lang.Thread.run(Thread.java:744)>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6851) Kafka CLI commands should support specifying the servers in environment variables

2020-03-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/KAFKA-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063259#comment-17063259
 ] 

Sönke Liebau commented on KAFKA-6851:
-

Hi [~peter.gergely.horv...@gmail.com] are you doing any work on this feature 
behind the scenes? Otherwise, I'd be willing to adopt this and create a KIP for 
it.

> Kafka CLI commands should support specifying the servers in environment 
> variables
> -
>
> Key: KAFKA-6851
> URL: https://issues.apache.org/jira/browse/KAFKA-6851
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 1.0.0, 1.0.1, 1.1.0
> Environment: ALL
>Reporter: Peter Horvath
>Priority: Major
>
> Currently, different Kafka CLI commands require specifying different servers 
> as argument (--broker-list, --bootstrap-server and --zookeeper).
>  
> This is kind of tedious as different CLI commands require specifying 
> different servers, which is especially painful if the host names are *long*, 
> and *only slightly different* (e.g. naming scheme for AWS: 
> ec2-12-34-56-2.region-x.compute.amazonaws.com). 
>  
> I know I could simply export shell variables for each type of endpoint and 
> refer that in the command, but that still only eases the pain:
> export 
> KAFKA_ZK=[ec2-12-34-56-2.region-x.compute.amazonaws.com|http://ec2-12-34-56-2.region-x.compute.amazonaws.com/]
> bin/kafka-topics.sh --list --zookeeper ${KAFKA_ZK}
>  
> According to [some conversation on the Kafka mailing 
> list|http://mail-archives.apache.org/mod_mbox/kafka-users/201804.mbox/%3cca+usspag21mwufr146_hlvdyftuzjm4zfn1c5tnfjugjxag...@mail.gmail.com%3E]
>  (started by me), people usually resort to some kind of non-standardised 
> work-arounds (see replies to thread _Using Kafka CLI without specifying the 
> URLs every single time?_)
>  
>  
> When dealing with Kafka client CLI, it would be huge relief / productivity 
> boost if the CLI could pick up the servers from the user's environment 
> variables.
> For example, instead of using --broker-list, --bootstrap-server or 
> --zookeeper, the Kafka client commands could check if there is an environment 
> variable and use that in case the argument is missing. (allowing the 
> environment-defined server to still be overridden with an *explicit* 
> argument).
>  
> I propose introducing some kind of specific environment variables that would 
> allow the user to omit the explicit specification of the --broker-list, 
> --bootstrap-server and --zookeeper argument every single time a CLI command 
> is called. This would make using Kafka CLI much easier. 
>  
> ||CLI argument||Proposed environment variable name||
> |--broker-list|KAFKA_CLIENT_BROKER_LIST|
> |--bootstrap-server|KAFKA_CLIENT_BOOTSTRAP_SERVER|
> |--zookeeper|KAFKA_CLIENT_ZOOKEEPER|
> It should be possible to implement this solely at the wrapper shell script 
> level, with some checks being done that test if the variable is defined or 
> not.
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-1368) Upgrade log4j

2020-03-20 Thread Mickael Maison (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063326#comment-17063326
 ] 

Mickael Maison commented on KAFKA-1368:
---

[~tombentley] I've reassigned it to you

> Upgrade log4j
> -
>
> Key: KAFKA-1368
> URL: https://issues.apache.org/jira/browse/KAFKA-1368
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Vladislav Pernin
>Assignee: Tom Bentley
>Priority: Major
>
> Upgrade log4j to at least 1.2.16 ou 1.2.17.
> Usage of EnhancedPatternLayout will be possible.
> It allows to set delimiters around the full log, stacktrace included, making 
> log messages collection easier with tools like Logstash.
> Example : <[%d{}]...[%t] %m%throwable>%n
> <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch 
> request for partition [X,6] offset 700 from consumer with correlation id 
> 0 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only 
> have log segments in the range 16021 to 16021.
> at kafka.log.Log.read(Log.scala:429)
> ...
> at java.lang.Thread.run(Thread.java:744)>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-1368) Upgrade log4j

2020-03-20 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison reassigned KAFKA-1368:
-

Assignee: Tom Bentley  (was: Mickael Maison)

> Upgrade log4j
> -
>
> Key: KAFKA-1368
> URL: https://issues.apache.org/jira/browse/KAFKA-1368
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Vladislav Pernin
>Assignee: Tom Bentley
>Priority: Major
>
> Upgrade log4j to at least 1.2.16 ou 1.2.17.
> Usage of EnhancedPatternLayout will be possible.
> It allows to set delimiters around the full log, stacktrace included, making 
> log messages collection easier with tools like Logstash.
> Example : <[%d{}]...[%t] %m%throwable>%n
> <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch 
> request for partition [X,6] offset 700 from consumer with correlation id 
> 0 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only 
> have log segments in the range 16021 to 16021.
> at kafka.log.Log.read(Log.scala:429)
> ...
> at java.lang.Thread.run(Thread.java:744)>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9724) Consumer wrongly ignores fetched records "since it no longer has valid position"

2020-03-20 Thread Oleg Muravskiy (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1706#comment-1706
 ] 

Oleg Muravskiy commented on KAFKA-9724:
---

Sure [~hachikuji]: 

{noformat}
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
check.crcs = true
client.dns.lookup = default
client.id = ris-updates-to-hbase-ristest
client.rack = 
connections.max.idle.ms = 54
default.api.timeout.ms = 6
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 500
fetch.max.wait.ms = 8000
fetch.min.bytes = 1
group.id = ris-updates-to-hbase-ristest
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
max.partition.fetch.bytes = 400
max.poll.interval.ms = 30
max.poll.records = 50
metadata.max.age.ms = 30
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 3
partition.assignment.strategy = 
[org.apache.kafka.clients.consumer.StickyAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 3
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 6
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 65536
session.timeout.ms = 30
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
{noformat}


> Consumer wrongly ignores fetched records "since it no longer has valid 
> position"
> 
>
> Key: KAFKA-9724
> URL: https://issues.apache.org/jira/browse/KAFKA-9724
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 2.4.0
>Reporter: Oleg Muravskiy
>Priority: Major
>
> After upgrading kafka-client to 2.4.0 (while brokers are still at 2.2.0) 
> consumers in a consumer group intermittently stop progressing on assigned 
> partitions, even when there are messages to consume. This is not a permanent 
> condition, as they progress from time to time, but become slower with time, 
> and catch up after restart.
> Here is a sample of 3 consecutive ignored fetches:
> {noformat}
> 2020-03-15 12:08:58,440 DEBUG [Thread-6] o.a.k.c.c.i.ConsumerCoordinator - 
> Committed offset 538065584 for partition mrt-rrc10-6
> 2020-03-15 12:08:58,541 DEBUG [Thread-6] o.a.k.c.consumer.internals.Fetcher - 
> Skipping validation of fetch offsets for partitions [mrt-rrc10-1, 
> mrt-rrc10-6, mrt-rrc22-7] since the broker does not support the required 
> protocol version (introduced in Kafka 2.3)
> 2020-03-15 12:08:58,549 DEBUG [Thread-6] org.apache.kafka.clients.Metadata - 
> Updating last seen epoch from null to 62 for partition mrt-rrc10-6
> 2020-03-15 12:08:58,557 DEBUG [Thread-6] o.a.k.c.c.i.ConsumerCoordinator - 
> Committed offset 538065584 for partition mrt-rrc10-6
> 2020-03-15 12:08:58,652 DEBUG [Thread-6] o.a.k.c.consumer.internals.Fetcher - 
> Skipping validation of fetch offsets for partitions [mrt-rrc10-1, 
> mrt-rrc10-6, mrt-rrc22-7] since the broker does not support the required 
> protocol version (introduced in Kafka 2.3)
> 2020-03-15 12:08:58,659 DEBUG [Thread-6] org.apache.kafka.clients.Metadata - 
> Updating last seen epoch from null to 62 for partition mrt-rrc10-6
> 2020-03-15 12:08:58,659 DEBUG [Thread-6] o.a

[jira] [Created] (KAFKA-9738) Add Generics Type Parameters to forwarded() in MockProcessorContext

2020-03-20 Thread Bruno Cadonna (Jira)
Bruno Cadonna created KAFKA-9738:


 Summary: Add Generics Type Parameters to forwarded() in 
MockProcessorContext 
 Key: KAFKA-9738
 URL: https://issues.apache.org/jira/browse/KAFKA-9738
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 2.5.0
Reporter: Bruno Cadonna


The method {{forwarded()}} to capture the forwarded records in 
{{MockProcessorContext}} does not have any type parameters although the 
corresponding {{forward()}} does have them. To enable type checking at compile 
time in tests, generics parameters shall be added to the {{forwarded()}} method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9739) StreamsBuilder.build fails with StreamsException "Found a null keyChangingChild node for OptimizableRepartitionNode"

2020-03-20 Thread Artur (Jira)
Artur created KAFKA-9739:


 Summary: StreamsBuilder.build fails with StreamsException "Found a 
null keyChangingChild node for OptimizableRepartitionNode"
 Key: KAFKA-9739
 URL: https://issues.apache.org/jira/browse/KAFKA-9739
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.4.1
Reporter: Artur
 Attachments: streams-exception-log.txt, topology-description.txt

We created a topology using Streams DSL (topology description is available in 
the attached topology-description.txt, no optimization).

Topology works fine with {{topology.optimization=none}}, however it fails to 
build with StreamsException "Found a null keyChangingChild node for 
OptimizableRepartitionNode" if we set {{topology.optimization=all}} (exception 
stack trace is attached streams-exception-log.txt).

We used [https://zz85.github.io/kafka-streams-viz/] to visualize topology and 
try to guess what might be upsetting optimizer, yet did not manage to figure it 
out ourselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8470) State change logs should not be in TRACE level

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063428#comment-17063428
 ] 

ASF GitHub Bot commented on KAFKA-8470:
---

stanislavkozlovski commented on pull request #8320: 
KAFKA-8470-state-change-logger
URL: https://github.com/apache/kafka/pull/8320
 
 
   The StateChange logger in Kafka should not be logging its state changes in 
TRACE level.
   We consider these changes very useful in debugging. Given that we configure 
that logger to log in TRACE levels by default, it seems better to change the 
default level to INFO and move the logs we consider useful to that level.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> State change logs should not be in TRACE level
> --
>
> Key: KAFKA-8470
> URL: https://issues.apache.org/jira/browse/KAFKA-8470
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Minor
>
> The StateChange logger in Kafka should not be logging its state changes in 
> TRACE level.
> We consider these changes very useful in debugging and we additionally 
> configure that logger to log in TRACE levels by default.
> Since we consider it important enough to configure its own logger to log in a 
> separate log level, why don't we change those logs to INFO and have the 
> logger use the defaults?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8470) State change logs should not be in TRACE level

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063437#comment-17063437
 ] 

ASF GitHub Bot commented on KAFKA-8470:
---

stanislavkozlovski commented on pull request #6878: KAFKA-8470: State change 
logs should not be in TRACE level
URL: https://github.com/apache/kafka/pull/6878
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> State change logs should not be in TRACE level
> --
>
> Key: KAFKA-8470
> URL: https://issues.apache.org/jira/browse/KAFKA-8470
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Minor
>
> The StateChange logger in Kafka should not be logging its state changes in 
> TRACE level.
> We consider these changes very useful in debugging and we additionally 
> configure that logger to log in TRACE levels by default.
> Since we consider it important enough to configure its own logger to log in a 
> separate log level, why don't we change those logs to INFO and have the 
> logger use the defaults?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9735) Kafka Connect REST endpoint doesn't consider keystore/truststore

2020-03-20 Thread SledgeHammer (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063446#comment-17063446
 ] 

SledgeHammer commented on KAFKA-9735:
-

Hi, yes. I have end-to-end SSL/SASL/Jaas working on Zookeeper <-> Kafka <-> 
Kafka Connect (standalone) <-> my application. I just started looking at try to 
secure the Connect REST endpoint.

> Kafka Connect REST endpoint doesn't consider keystore/truststore
> 
>
> Key: KAFKA-9735
> URL: https://issues.apache.org/jira/browse/KAFKA-9735
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
> Environment: Windows 10 Pro x64
>Reporter: SledgeHammer
>Priority: Major
>
> Trying to secure the Kafka Connect REST endpoint with SSL fails (no cipher 
> suites in common):
> listeners.https.ssl.client.auth=required
> listeners.https.ssl.truststore.location=/progra~1/kafka_2.12-2.4.0/config/kafka.truststore.jks
> listeners.https.ssl.truststore.password=xxx
> listeners.https.ssl.keystore.location=/progra~1/kafka_2.12-2.4.0/config/kafka01.keystore.jks
> listeners.https.ssl.keystore.password=xxx
> listeners.https.ssl.key.password=xxx
> listeners.https.ssl.enabled.protocols=TLSv1.2
>  
> See: 
> [https://stackoverflow.com/questions/55220602/unable-to-configure-ssl-for-kafka-connect-rest-api]
>  
> Developer debugged and saw the configs are not being returned. I am still 
> seeing this issue in 2.4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9734) Streams IllegalStateException in trunk during rebalance

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063449#comment-17063449
 ] 

ASF GitHub Bot commented on KAFKA-9734:
---

vvcephei commented on pull request #8319: KAFKA-9734: Fix IllegalState in 
Streams transit to standby
URL: https://github.com/apache/kafka/pull/8319
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Streams IllegalStateException in trunk during rebalance
> ---
>
> Key: KAFKA-9734
> URL: https://issues.apache.org/jira/browse/KAFKA-9734
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
>
> I have observed the following exception to kill a thread in the current trunk 
> of Streams:
> {noformat}
> [2020-03-19 07:04:35,206] ERROR 
> [stream-soak-test-e60443b4-aa2d-4107-abf7-ce90407cb70e-StreamThread-1] 
> stream-thread [stream-soak-test-e60443b4-aa
> 2d-4107-abf7-ce90407cb70e-StreamThread-1] Encountered the following exception 
> during processing and the thread is going to shut down:  
> (org.apache.kafka.streams.processor.internals.StreamThread)
> java.lang.IllegalStateException: The changelog reader is not restoring active 
> tasks while trying to transit to update standby tasks: {}
> at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.transitToUpdateStandby(StoreChangelogReader.java:303)
>   
>
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:582)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:498)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:472){noformat}
> Judging from the fact that the standby tasks are reported as an empty set, I 
> think this is just a missed case in the task manager. PR to follow shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9734) Streams IllegalStateException in trunk during rebalance

2020-03-20 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-9734.
-
Resolution: Fixed

> Streams IllegalStateException in trunk during rebalance
> ---
>
> Key: KAFKA-9734
> URL: https://issues.apache.org/jira/browse/KAFKA-9734
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
>
> I have observed the following exception to kill a thread in the current trunk 
> of Streams:
> {noformat}
> [2020-03-19 07:04:35,206] ERROR 
> [stream-soak-test-e60443b4-aa2d-4107-abf7-ce90407cb70e-StreamThread-1] 
> stream-thread [stream-soak-test-e60443b4-aa
> 2d-4107-abf7-ce90407cb70e-StreamThread-1] Encountered the following exception 
> during processing and the thread is going to shut down:  
> (org.apache.kafka.streams.processor.internals.StreamThread)
> java.lang.IllegalStateException: The changelog reader is not restoring active 
> tasks while trying to transit to update standby tasks: {}
> at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.transitToUpdateStandby(StoreChangelogReader.java:303)
>   
>
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:582)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:498)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:472){noformat}
> Judging from the fact that the standby tasks are reported as an empty set, I 
> think this is just a missed case in the task manager. PR to follow shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8381) SSL factory for inter-broker listener is broken

2020-03-20 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-8381:
--
Affects Version/s: (was: 2.3.0)

Removed AffectedVersion=2.3.0 since this was fixed before the release and the 
issue wasn't in any released version.

> SSL factory for inter-broker listener is broken
> ---
>
> Key: KAFKA-8381
> URL: https://issues.apache.org/jira/browse/KAFKA-8381
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 2.3.0
>
>
> From a system test failure:
> {code}
> [2019-05-17 15:48:12,453] ERROR [KafkaServer id=1] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.kafka.common.KafkaException: 
> org.apache.kafka.common.config.ConfigException: Invalid value 
> javax.net.ssl.SSLHandshakeException: General SSLEngine problem for 
> configuration A client SSLEngine created with the provided settings can't 
> connect to a server SSLEngine created with those settings.
>     at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:162)
>     at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>     at 
> org.apache.kafka.common.network.ChannelBuilders.serverChannelBuilder(ChannelBuilders.java:85)
>     at kafka.network.Processor.(SocketServer.scala:747)
>     at kafka.network.SocketServer.newProcessor(SocketServer.scala:388)
>     at 
> kafka.network.SocketServer.$anonfun$addDataPlaneProcessors$1(SocketServer.scala:282)
>     at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>     at 
> kafka.network.SocketServer.addDataPlaneProcessors(SocketServer.scala:281)
>     at 
> kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1(SocketServer.scala:244)
>     at 
> kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1$adapted(SocketServer.scala:241)
>     at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>     at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>     at 
> kafka.network.SocketServer.createDataPlaneAcceptorsAndProcessors(SocketServer.scala:241)
>     at kafka.network.SocketServer.startup(SocketServer.scala:120)
>     at kafka.server.KafkaServer.startup(KafkaServer.scala:293)
> {code}
> Looks like the changes under 
> https://github.com/apache/kafka/commit/0494cd329f3aaed94b3b46de0abe495f80faaedd
>  added validation for inter-broker SSL factory with hostname verification 
> enabled and `localhost` as the hostname. As a result, integration tests pass, 
> but system tests fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9739) StreamsBuilder.build fails with StreamsException "Found a null keyChangingChild node for OptimizableRepartitionNode"

2020-03-20 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reassigned KAFKA-9739:
--

Assignee: Bill Bejeck

> StreamsBuilder.build fails with StreamsException "Found a null 
> keyChangingChild node for OptimizableRepartitionNode"
> 
>
> Key: KAFKA-9739
> URL: https://issues.apache.org/jira/browse/KAFKA-9739
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.1
>Reporter: Artur
>Assignee: Bill Bejeck
>Priority: Major
> Attachments: streams-exception-log.txt, topology-description.txt
>
>
> We created a topology using Streams DSL (topology description is available in 
> the attached topology-description.txt, no optimization).
> Topology works fine with {{topology.optimization=none}}, however it fails to 
> build with StreamsException "Found a null keyChangingChild node for 
> OptimizableRepartitionNode" if we set {{topology.optimization=all}} 
> (exception stack trace is attached streams-exception-log.txt).
> We used [https://zz85.github.io/kafka-streams-viz/] to visualize topology and 
> try to guess what might be upsetting optimizer, yet did not manage to figure 
> it out ourselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9739) StreamsBuilder.build fails with StreamsException "Found a null keyChangingChild node for OptimizableRepartitionNode"

2020-03-20 Thread Bill Bejeck (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063466#comment-17063466
 ] 

Bill Bejeck commented on KAFKA-9739:


Hi [~apoli]. 

Thanks for reporting this.  Can you also include the topology code on this 
ticket?

Thanks

> StreamsBuilder.build fails with StreamsException "Found a null 
> keyChangingChild node for OptimizableRepartitionNode"
> 
>
> Key: KAFKA-9739
> URL: https://issues.apache.org/jira/browse/KAFKA-9739
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.1
>Reporter: Artur
>Assignee: Bill Bejeck
>Priority: Major
> Attachments: streams-exception-log.txt, topology-description.txt
>
>
> We created a topology using Streams DSL (topology description is available in 
> the attached topology-description.txt, no optimization).
> Topology works fine with {{topology.optimization=none}}, however it fails to 
> build with StreamsException "Found a null keyChangingChild node for 
> OptimizableRepartitionNode" if we set {{topology.optimization=all}} 
> (exception stack trace is attached streams-exception-log.txt).
> We used [https://zz85.github.io/kafka-streams-viz/] to visualize topology and 
> try to guess what might be upsetting optimizer, yet did not manage to figure 
> it out ourselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9739) StreamsBuilder.build fails with StreamsException "Found a null keyChangingChild node for OptimizableRepartitionNode"

2020-03-20 Thread Artur (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur updated KAFKA-9739:
-
Attachment: topology-definition-fragment.java

> StreamsBuilder.build fails with StreamsException "Found a null 
> keyChangingChild node for OptimizableRepartitionNode"
> 
>
> Key: KAFKA-9739
> URL: https://issues.apache.org/jira/browse/KAFKA-9739
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.1
>Reporter: Artur
>Assignee: Bill Bejeck
>Priority: Major
> Attachments: streams-exception-log.txt, 
> topology-definition-fragment.java, topology-description.txt
>
>
> We created a topology using Streams DSL (topology description is available in 
> the attached topology-description.txt, no optimization).
> Topology works fine with {{topology.optimization=none}}, however it fails to 
> build with StreamsException "Found a null keyChangingChild node for 
> OptimizableRepartitionNode" if we set {{topology.optimization=all}} 
> (exception stack trace is attached streams-exception-log.txt).
> We used [https://zz85.github.io/kafka-streams-viz/] to visualize topology and 
> try to guess what might be upsetting optimizer, yet did not manage to figure 
> it out ourselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9739) StreamsBuilder.build fails with StreamsException "Found a null keyChangingChild node for OptimizableRepartitionNode"

2020-03-20 Thread Artur (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063478#comment-17063478
 ] 

Artur commented on KAFKA-9739:
--

Sure. [^topology-definition-fragment.java]

I tried to left out majority of irrelevant pieces, but decent amount of 
business specifics slipped in anyway

> StreamsBuilder.build fails with StreamsException "Found a null 
> keyChangingChild node for OptimizableRepartitionNode"
> 
>
> Key: KAFKA-9739
> URL: https://issues.apache.org/jira/browse/KAFKA-9739
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.1
>Reporter: Artur
>Assignee: Bill Bejeck
>Priority: Major
> Attachments: streams-exception-log.txt, 
> topology-definition-fragment.java, topology-description.txt
>
>
> We created a topology using Streams DSL (topology description is available in 
> the attached topology-description.txt, no optimization).
> Topology works fine with {{topology.optimization=none}}, however it fails to 
> build with StreamsException "Found a null keyChangingChild node for 
> OptimizableRepartitionNode" if we set {{topology.optimization=all}} 
> (exception stack trace is attached streams-exception-log.txt).
> We used [https://zz85.github.io/kafka-streams-viz/] to visualize topology and 
> try to guess what might be upsetting optimizer, yet did not manage to figure 
> it out ourselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7958) Transactions are broken with kubernetes hosted brokers

2020-03-20 Thread Alexandre Dupriez (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063526#comment-17063526
 ] 

Alexandre Dupriez commented on KAFKA-7958:
--

_"Would be good to get confirmation though.":_ confirmed the fix for 
[#KAFKA-7755] fixes the problem in the transaction manager. Tested on 2.2.1.

> Transactions are broken with kubernetes hosted brokers
> --
>
> Key: KAFKA-7958
> URL: https://issues.apache.org/jira/browse/KAFKA-7958
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.1.0
> Environment: cp-kakfka 2.1.1-1, kafka-streams 2.1.1
>Reporter: Thomas Dickinson
>Priority: Major
>
> After a rolling re-start in a kubernetes-like environment, brokers may change 
> IP address.  From our logs it seems that the transaction manager in the 
> brokers never re-resolves the DNS name of other brokers, keeping stale pod 
> IPs.  Thus transactions stop working.  
> ??[2019-02-20 02:20:20,085] WARN [TransactionCoordinator id=1001] Connection 
> to node 0 
> (khaki-joey-kafka-0.khaki-joey-kafka-headless.hyperspace-dev/[10.233.124.181:9092|http://10.233.124.181:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> ??[2019-02-20 02:20:57,205] WARN [TransactionCoordinator id=1001] Connection 
> to node 1 
> (khaki-joey-kafka-1.khaki-joey-kafka-headless.hyperspace-dev/[10.233.122.67:9092|http://10.233.122.67:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> This is from the log from broker 1001 which was restarted first, followed by 
> 1 and then 0.  The log entries are from the day after the rolling restart.
> I note a similar issue was fixed for clients 2.1.1  
> https://issues.apache.org/jira/browse/KAFKA-7755.  We are using streams lib 
> 2.1.1
> *Update* We are now testing with Kafka 2.1.1 (docker cp-kafka 5.1.2-1) and it 
> looks like this may be resolved.  Would be good to get confirmation though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-7958) Transactions are broken with kubernetes hosted brokers

2020-03-20 Thread Alexandre Dupriez (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063526#comment-17063526
 ] 

Alexandre Dupriez edited comment on KAFKA-7958 at 3/20/20, 5:05 PM:


_"Would be good to get confirmation though.":_ confirmed the fix for 
[[KAFKA-7755||#KAFKA-7755]https://issues.apache.org/jira/browse/KAFKA-7755 
[]|#KAFKA-7755] fixes the problem in the transaction manager. Tested on 2.2.1.


was (Author: hangleton):
_"Would be good to get confirmation though.":_ confirmed the fix for 
[#KAFKA-7755] fixes the problem in the transaction manager. Tested on 2.2.1.

> Transactions are broken with kubernetes hosted brokers
> --
>
> Key: KAFKA-7958
> URL: https://issues.apache.org/jira/browse/KAFKA-7958
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.1.0
> Environment: cp-kakfka 2.1.1-1, kafka-streams 2.1.1
>Reporter: Thomas Dickinson
>Priority: Major
>
> After a rolling re-start in a kubernetes-like environment, brokers may change 
> IP address.  From our logs it seems that the transaction manager in the 
> brokers never re-resolves the DNS name of other brokers, keeping stale pod 
> IPs.  Thus transactions stop working.  
> ??[2019-02-20 02:20:20,085] WARN [TransactionCoordinator id=1001] Connection 
> to node 0 
> (khaki-joey-kafka-0.khaki-joey-kafka-headless.hyperspace-dev/[10.233.124.181:9092|http://10.233.124.181:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> ??[2019-02-20 02:20:57,205] WARN [TransactionCoordinator id=1001] Connection 
> to node 1 
> (khaki-joey-kafka-1.khaki-joey-kafka-headless.hyperspace-dev/[10.233.122.67:9092|http://10.233.122.67:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> This is from the log from broker 1001 which was restarted first, followed by 
> 1 and then 0.  The log entries are from the day after the rolling restart.
> I note a similar issue was fixed for clients 2.1.1  
> https://issues.apache.org/jira/browse/KAFKA-7755.  We are using streams lib 
> 2.1.1
> *Update* We are now testing with Kafka 2.1.1 (docker cp-kafka 5.1.2-1) and it 
> looks like this may be resolved.  Would be good to get confirmation though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-7958) Transactions are broken with kubernetes hosted brokers

2020-03-20 Thread Alexandre Dupriez (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063526#comment-17063526
 ] 

Alexandre Dupriez edited comment on KAFKA-7958 at 3/20/20, 5:05 PM:


_"Would be good to get confirmation though.":_ confirmed the fix for 
https://issues.apache.org/jira/browse/KAFKA-7755 fixes the problem in the 
transaction manager. Tested on 2.2.1.


was (Author: hangleton):
_"Would be good to get confirmation though.":_ confirmed the fix for 
[[KAFKA-7755||#KAFKA-7755]https://issues.apache.org/jira/browse/KAFKA-7755 
[]|#KAFKA-7755] fixes the problem in the transaction manager. Tested on 2.2.1.

> Transactions are broken with kubernetes hosted brokers
> --
>
> Key: KAFKA-7958
> URL: https://issues.apache.org/jira/browse/KAFKA-7958
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.1.0
> Environment: cp-kakfka 2.1.1-1, kafka-streams 2.1.1
>Reporter: Thomas Dickinson
>Priority: Major
>
> After a rolling re-start in a kubernetes-like environment, brokers may change 
> IP address.  From our logs it seems that the transaction manager in the 
> brokers never re-resolves the DNS name of other brokers, keeping stale pod 
> IPs.  Thus transactions stop working.  
> ??[2019-02-20 02:20:20,085] WARN [TransactionCoordinator id=1001] Connection 
> to node 0 
> (khaki-joey-kafka-0.khaki-joey-kafka-headless.hyperspace-dev/[10.233.124.181:9092|http://10.233.124.181:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> ??[2019-02-20 02:20:57,205] WARN [TransactionCoordinator id=1001] Connection 
> to node 1 
> (khaki-joey-kafka-1.khaki-joey-kafka-headless.hyperspace-dev/[10.233.122.67:9092|http://10.233.122.67:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> This is from the log from broker 1001 which was restarted first, followed by 
> 1 and then 0.  The log entries are from the day after the rolling restart.
> I note a similar issue was fixed for clients 2.1.1  
> https://issues.apache.org/jira/browse/KAFKA-7755.  We are using streams lib 
> 2.1.1
> *Update* We are now testing with Kafka 2.1.1 (docker cp-kafka 5.1.2-1) and it 
> looks like this may be resolved.  Would be good to get confirmation though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-7958) Transactions are broken with kubernetes hosted brokers

2020-03-20 Thread Alexandre Dupriez (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063526#comment-17063526
 ] 

Alexandre Dupriez edited comment on KAFKA-7958 at 3/20/20, 5:06 PM:


_"Would be good to get confirmation though.":_ confirm the fix for 
https://issues.apache.org/jira/browse/KAFKA-7755 fixes the problem in the 
transaction manager. Tested on 2.2.1.


was (Author: hangleton):
_"Would be good to get confirmation though.":_ confirmed the fix for 
https://issues.apache.org/jira/browse/KAFKA-7755 fixes the problem in the 
transaction manager. Tested on 2.2.1.

> Transactions are broken with kubernetes hosted brokers
> --
>
> Key: KAFKA-7958
> URL: https://issues.apache.org/jira/browse/KAFKA-7958
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.1.0
> Environment: cp-kakfka 2.1.1-1, kafka-streams 2.1.1
>Reporter: Thomas Dickinson
>Priority: Major
>
> After a rolling re-start in a kubernetes-like environment, brokers may change 
> IP address.  From our logs it seems that the transaction manager in the 
> brokers never re-resolves the DNS name of other brokers, keeping stale pod 
> IPs.  Thus transactions stop working.  
> ??[2019-02-20 02:20:20,085] WARN [TransactionCoordinator id=1001] Connection 
> to node 0 
> (khaki-joey-kafka-0.khaki-joey-kafka-headless.hyperspace-dev/[10.233.124.181:9092|http://10.233.124.181:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> ??[2019-02-20 02:20:57,205] WARN [TransactionCoordinator id=1001] Connection 
> to node 1 
> (khaki-joey-kafka-1.khaki-joey-kafka-headless.hyperspace-dev/[10.233.122.67:9092|http://10.233.122.67:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> This is from the log from broker 1001 which was restarted first, followed by 
> 1 and then 0.  The log entries are from the day after the rolling restart.
> I note a similar issue was fixed for clients 2.1.1  
> https://issues.apache.org/jira/browse/KAFKA-7755.  We are using streams lib 
> 2.1.1
> *Update* We are now testing with Kafka 2.1.1 (docker cp-kafka 5.1.2-1) and it 
> looks like this may be resolved.  Would be good to get confirmation though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7958) Transactions are broken with kubernetes hosted brokers

2020-03-20 Thread Alexandre Dupriez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Dupriez resolved KAFKA-7958.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Transactions are broken with kubernetes hosted brokers
> --
>
> Key: KAFKA-7958
> URL: https://issues.apache.org/jira/browse/KAFKA-7958
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.1.0
> Environment: cp-kakfka 2.1.1-1, kafka-streams 2.1.1
>Reporter: Thomas Dickinson
>Priority: Major
> Fix For: 2.1.1
>
>
> After a rolling re-start in a kubernetes-like environment, brokers may change 
> IP address.  From our logs it seems that the transaction manager in the 
> brokers never re-resolves the DNS name of other brokers, keeping stale pod 
> IPs.  Thus transactions stop working.  
> ??[2019-02-20 02:20:20,085] WARN [TransactionCoordinator id=1001] Connection 
> to node 0 
> (khaki-joey-kafka-0.khaki-joey-kafka-headless.hyperspace-dev/[10.233.124.181:9092|http://10.233.124.181:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> ??[2019-02-20 02:20:57,205] WARN [TransactionCoordinator id=1001] Connection 
> to node 1 
> (khaki-joey-kafka-1.khaki-joey-kafka-headless.hyperspace-dev/[10.233.122.67:9092|http://10.233.122.67:9092/])
>  could not be established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)??
> This is from the log from broker 1001 which was restarted first, followed by 
> 1 and then 0.  The log entries are from the day after the rolling restart.
> I note a similar issue was fixed for clients 2.1.1  
> https://issues.apache.org/jira/browse/KAFKA-7755.  We are using streams lib 
> 2.1.1
> *Update* We are now testing with Kafka 2.1.1 (docker cp-kafka 5.1.2-1) and it 
> looks like this may be resolved.  Would be good to get confirmation though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8266) Improve `testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup`

2020-03-20 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063562#comment-17063562
 ] 

Sophie Blee-Goldman commented on KAFKA-8266:


Can we close this as a duplicate?

> Improve 
> `testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup`
> 
>
> Key: KAFKA-8266
> URL: https://issues.apache.org/jira/browse/KAFKA-8266
> Project: Kafka
>  Issue Type: Test
>Reporter: Jason Gustafson
>Priority: Major
>
> Some additional validation could be done after the member gets kicked out. 
> The main thing is showing that the group can continue to consume data and 
> commit offsets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2020-03-20 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063563#comment-17063563
 ] 

Sophie Blee-Goldman commented on KAFKA-7965:


Failed on both builds of this PR:

[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/5315/testReport/junit/kafka.api/ConsumerBounceTest/testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup/]

[https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1295/testReport/junit/kafka.api/ConsumerBounceTest/testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup/]

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 1.1.1, 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6145) Warm up new KS instances before migrating tasks - potentially a two phase rebalance

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063576#comment-17063576
 ] 

ASF GitHub Bot commented on KAFKA-6145:
---

vvcephei commented on pull request #8262: KAFKA-6145: Add constrained balanced 
assignment algorithm
URL: https://github.com/apache/kafka/pull/8262
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Warm up new KS instances before migrating tasks - potentially a two phase 
> rebalance
> ---
>
> Key: KAFKA-6145
> URL: https://issues.apache.org/jira/browse/KAFKA-6145
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Sophie Blee-Goldman
>Priority: Major
>  Labels: needs-kip
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround would be two execute the rebalance in two phases:
> 1) start running state store building on the new node
> 2) once the state store is fully populated on the new node, only then 
> rebalance the tasks - there will still be a rebalance pause, but would be 
> greatly reduced
> Relates to: KAFKA-6144 - Allow state stores to serve stale reads during 
> rebalance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-9701:

Attachment: cluster.log

> Consumer could catch InconsistentGroupProtocolException during rebalance
> 
>
> Key: KAFKA-9701
> URL: https://issues.apache.org/jira/browse/KAFKA-9701
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Priority: Major
> Attachments: cluster.log
>
>
> INFO log shows that we accidentally hit an unexpected inconsistent group 
> protocol exception:
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,382*] INFO 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> stream-client [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949] State 
> transition from REBALANCING to RUNNING (org.apache.kafka.streams.KafkaStreams)
>  
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,384*] WARN [kafka-producer-network-thread | 
> stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] task 
> [0_1] Error sending record to topic node-name-repartition due to Producer 
> attempted an operation with an old epoch. Either there is a newer producer 
> with the same transactionalId, or the producer's transaction has been expired 
> by the broker.; No more records will be sent and no more offsets will be 
> recorded for this task.
>  
>  
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,521*] INFO 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer-d1c3c796-0bfb-4c1c-9fb4-5a807d8b53a2
>  sending LeaveGroup request to coordinator 
> ip-172-31-20-215.us-west-2.compute.internal:9092 (id: 2147482646 rack: null) 
> due to the consumer unsubscribed from all topics 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
>  
> [2020-03-10T17:16:54-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,798*] ERROR 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> stream-thread 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> Encountered the following unexpected Kafka exception during processing, this 
> usually indicate Streams internal errors: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-03-10T17:16:54-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> org.apache.kafka.common.errors.InconsistentGroupProtocolException: The group 
> member's supported protocols are incompatible with those of existing members 
> or first group member tried to join with empty protocol type or empty 
> protocol list.
>  
> Potentially needs further log to understand this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread John Roesler (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063613#comment-17063613
 ] 

John Roesler commented on KAFKA-9701:
-

Hey [~bchen225242] , I think I've reproduced this issue in a soak test. 
Attaching the client and broker logs ...[^cluster.log]

> Consumer could catch InconsistentGroupProtocolException during rebalance
> 
>
> Key: KAFKA-9701
> URL: https://issues.apache.org/jira/browse/KAFKA-9701
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Priority: Major
> Attachments: cluster.log
>
>
> INFO log shows that we accidentally hit an unexpected inconsistent group 
> protocol exception:
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,382*] INFO 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> stream-client [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949] State 
> transition from REBALANCING to RUNNING (org.apache.kafka.streams.KafkaStreams)
>  
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,384*] WARN [kafka-producer-network-thread | 
> stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] task 
> [0_1] Error sending record to topic node-name-repartition due to Producer 
> attempted an operation with an old epoch. Either there is a newer producer 
> with the same transactionalId, or the producer's transaction has been expired 
> by the broker.; No more records will be sent and no more offsets will be 
> recorded for this task.
>  
>  
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,521*] INFO 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer-d1c3c796-0bfb-4c1c-9fb4-5a807d8b53a2
>  sending LeaveGroup request to coordinator 
> ip-172-31-20-215.us-west-2.compute.internal:9092 (id: 2147482646 rack: null) 
> due to the consumer unsubscribed from all topics 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
>  
> [2020-03-10T17:16:54-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,798*] ERROR 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> stream-thread 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> Encountered the following unexpected Kafka exception during processing, this 
> usually indicate Streams internal errors: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-03-10T17:16:54-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> org.apache.kafka.common.errors.InconsistentGroupProtocolException: The group 
> member's supported protocols are incompatible with those of existing members 
> or first group member tried to join with empty protocol type or empty 
> protocol list.
>  
> Potentially needs further log to understand this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9654) ReplicaAlterLogDirsThread can't be created again if the previous ReplicaAlterLogDirsThreadmeet encounters leader epoch error

2020-03-20 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-9654.

Fix Version/s: 2.5.1
   2.4.2
   Resolution: Fixed

> ReplicaAlterLogDirsThread can't be created again if the previous 
> ReplicaAlterLogDirsThreadmeet encounters leader epoch error
> 
>
> Key: KAFKA-9654
> URL: https://issues.apache.org/jira/browse/KAFKA-9654
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Critical
> Fix For: 2.4.2, 2.5.1
>
>
> ReplicaManager does create ReplicaAlterLogDirsThread only if an new future 
> log is created. If the previous ReplicaAlterLogDirsThread encounters error 
> when moving data, the target partition is moved to "failedPartitions" and 
> ReplicaAlterLogDirsThread get idle due to empty partitions. The future log is 
> still existent so we CAN'T either create another ReplicaAlterLogDirsThread to 
> handle the parition or update the paritions of the idler 
> ReplicaAlterLogDirsThread.
> ReplicaManager should call ReplicaAlterLogDirsManager#addFetcherForPartitions 
> even if there is already a future log since we can create an new 
> ReplicaAlterLogDirsThread to handle the new partitions or update the 
> partitions of existent ReplicaAlterLogDirsThread to make it busy again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9740) Add a "continue" option for Kafka Connect error handling

2020-03-20 Thread Zihan Li (Jira)
Zihan Li created KAFKA-9740:
---

 Summary: Add a "continue" option for Kafka Connect error handling
 Key: KAFKA-9740
 URL: https://issues.apache.org/jira/browse/KAFKA-9740
 Project: Kafka
  Issue Type: Improvement
Reporter: Zihan Li


Currently there are two error handling options in Kafka Connect, "none" and 
"all". Option "none" will config the connector to fail fast, and option "all" 
will ignore broken records.

If users want to store their broken records, they have to config a broken 
record queue, which is too much work for them in some cases. 

Some sink connectors have the ability to deal with broken records, for example, 
a JDBC sink connector can store the broken raw bytes into a separate table, a 
S3 connector can store that in a zipped file.

Therefore, it would be idea if Kafka Connect provides an additional option that 
sends the broken raw bytes to SinkTask directly. 

SinkTask is then responsible for handling the unparsed bytes input.

The benefits of having this additional option are:
 * Being user friendly. Connectors can handle broken record and hide that from 
clients.
 * Providing more flexibility to SinkTask in terms of broken record handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9740) Add a "continue" option for Kafka Connect error handling

2020-03-20 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li updated KAFKA-9740:

Component/s: KafkaConnect

> Add a "continue" option for Kafka Connect error handling
> 
>
> Key: KAFKA-9740
> URL: https://issues.apache.org/jira/browse/KAFKA-9740
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Zihan Li
>Priority: Major
>
> Currently there are two error handling options in Kafka Connect, "none" and 
> "all". Option "none" will config the connector to fail fast, and option "all" 
> will ignore broken records.
> If users want to store their broken records, they have to config a broken 
> record queue, which is too much work for them in some cases. 
> Some sink connectors have the ability to deal with broken records, for 
> example, a JDBC sink connector can store the broken raw bytes into a separate 
> table, a S3 connector can store that in a zipped file.
> Therefore, it would be idea if Kafka Connect provides an additional option 
> that sends the broken raw bytes to SinkTask directly. 
> SinkTask is then responsible for handling the unparsed bytes input.
> The benefits of having this additional option are:
>  * Being user friendly. Connectors can handle broken record and hide that 
> from clients.
>  * Providing more flexibility to SinkTask in terms of broken record handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9727) Flaky system test StreamsEOSTest.test_failure_and_recovery

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063646#comment-17063646
 ] 

ASF GitHub Bot commented on KAFKA-9727:
---

guozhangwang commented on pull request #8307: KAFKA-9727: cleanup the state 
store for standby task dirty close and check null for changelogs
URL: https://github.com/apache/kafka/pull/8307
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky system test StreamsEOSTest.test_failure_and_recovery
> --
>
> Key: KAFKA-9727
> URL: https://issues.apache.org/jira/browse/KAFKA-9727
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, system tests
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Hits no lock available exceptions sometime after task revive:
>  
> [2020-03-13 05:40:50,224] ERROR stream-thread 
> [EosTest-46de8ee5-82a1-4bd4-af23-f4acd8515f0f-StreamThread-1] Encountered the 
> following exception during processing and the thread is going to shut down:  
> (org.apache.kafka.streams.processor.internals.StreamThread)
> org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
> KSTREAM-AGGREGATE-STATE-STORE-03 at location 
> /mnt/streams/EosTest/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03
>         at 
> org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:87)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:191)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:230)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:44)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$0(MeteredKeyValueStore.java:101)
>         at 
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:806)
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:101)
>         at 
> org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:81)
>         at 
> org.apache.kafka.streams.processor.internals.StandbyTask.initializeIfNeeded(StandbyTask.java:86)
>         at 
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:275)
>  
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:583)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:498)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:472)
> Caused by: org.rocksdb.RocksDBException: lock : 
> /mnt/streams/EosTest/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/LOCK:
>  No locks available
>         at org.rocksdb.RocksDB.open(Native Method)
>         at org.rocksdb.RocksDB.open(RocksDB.java:286)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:75)
>         ... 14 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9705) Config changes should be propagated from Controller in bridge release

2020-03-20 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9705:
---
Summary: Config changes should be propagated from Controller in bridge 
release  (was: (Incremental)AlterConfig should be propagated from Controller in 
bridge release)

> Config changes should be propagated from Controller in bridge release
> -
>
> Key: KAFKA-9705
> URL: https://issues.apache.org/jira/browse/KAFKA-9705
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> In the bridge release, we need to restrict the direct access of ZK to 
> controller only. This means the existing AlterConfig path should be migrated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9727) Flaky system test StreamsEOSTest.test_failure_and_recovery

2020-03-20 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-9727.

Resolution: Fixed

> Flaky system test StreamsEOSTest.test_failure_and_recovery
> --
>
> Key: KAFKA-9727
> URL: https://issues.apache.org/jira/browse/KAFKA-9727
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, system tests
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Hits no lock available exceptions sometime after task revive:
>  
> [2020-03-13 05:40:50,224] ERROR stream-thread 
> [EosTest-46de8ee5-82a1-4bd4-af23-f4acd8515f0f-StreamThread-1] Encountered the 
> following exception during processing and the thread is going to shut down:  
> (org.apache.kafka.streams.processor.internals.StreamThread)
> org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
> KSTREAM-AGGREGATE-STATE-STORE-03 at location 
> /mnt/streams/EosTest/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03
>         at 
> org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:87)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:191)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:230)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:44)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$0(MeteredKeyValueStore.java:101)
>         at 
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:806)
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:101)
>         at 
> org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:81)
>         at 
> org.apache.kafka.streams.processor.internals.StandbyTask.initializeIfNeeded(StandbyTask.java:86)
>         at 
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:275)
>  
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:583)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:498)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:472)
> Caused by: org.rocksdb.RocksDBException: lock : 
> /mnt/streams/EosTest/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/LOCK:
>  No locks available
>         at org.rocksdb.RocksDB.open(Native Method)
>         at org.rocksdb.RocksDB.open(RocksDB.java:286)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:75)
>         ... 14 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7509) Kafka Connect logs unnecessary warnings about unused configurations

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063667#comment-17063667
 ] 

ASF GitHub Bot commented on KAFKA-7509:
---

rhauch commented on pull request #5876: KAFKA-7509: Avoid passing most 
non-applicable properties to producer, consumer, and admin client
URL: https://github.com/apache/kafka/pull/5876
 
 
   The producer, consumer, and admin client log properties that are supplied 
but unused by the producer. Previously, Connect would pass many of its worker 
configuration properties into the producer, consumer, and admin client used for 
internal topics, resulting in lots of log warnings about unused config 
properties.
   
   With this change, Connect attempts to filter out the worker’s configuration 
properties that are not applicable to the producer, consumer, or admin client 
used for _internal_ topics. (Connect is already including only producer and 
consumer properties when creating those clients for connectors, since those 
properties are prefixed in the worker config.)
   
   For the most part, this is relatively straightforward, since there are some 
top-level worker-specific properties that can be removed, and most 
extension-specific properties have Connect-specific properties. Unfortunately, 
the REST extension is the only type of connect extension that uses unprefixed 
properties from the worker config, so any it is not possible to remove those 
from the properties passed to the producer, consumer, and admin clients. 
Hopefully, REST extensions are prevalant yet, and this will mean most users may 
not see any warnings about unused properties in the producer, consumer, and 
admin client.
   
   Removing the Connect worker configs is one step. The other is to remove the 
any properties for the producer that are specific to the consumer and admin 
client. Likewise, for the consumer we have to remove any properties that are 
specific to the producer and admin client, and for the admin client remove any 
properties that are specific to the producer and consumer. Note that any 
property that is unknown (e.g., properties for REST extension, interceptors, 
metric reporters, serdes, partitioners, etc.) must be passed to the producer, 
consumer, and admin client. All of these — except for the REST extension 
properties — should be used by both the producer and consumer. But, since the 
admin client only supports metric reporters, any properties for interceptors, 
serdes, partitioners and REST extension will also be logged as unused. Connect 
configures the serdes for the producers, so we're left with any custom 
properties for interceptors and partitioners still getting passed to the 
AdminClient, but this is about the best we can do at this point.
   
   All of this filtering logic was added to the `ConnectUtils` class, allowing 
the logic to be easily unit tested and minimize changes to other existing code. 
All changes are limited to Kafka Connect, and will work with all client and 
Connect extensions (passing them to the clients if they are unknown). Also, 
note that when configuring the producers and consumers for connectors, Connect 
only uses the properties that begin with the `producer.` and `consumer.` 
prefixes, respectively.
   
   This supersedes #5867 and #5802.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Kafka Connect logs unnecessary warnings about unused configurations
> ---
>
> Key: KAFKA-7509
> URL: https://issues.apache.org/jira/browse/KAFKA-7509
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Major
>
> When running Connect, the logs contain quite a few warnings about "The 
> configuration '{}' was supplied but isn't a known config." This occurs when 
> Connect creates producers, consumers, and admin clients, because the 
> AbstractConfig is logging unused configuration properties upon construction. 
> It's complicated by the fact that the Producer, Consumer, and AdminClient all 
> create their own AbstractConfig instances within the constructor, so we can't 
> even call its {{ignore(String key)}} method.
> See also KAFKA-6793 for a similar issue with Streams.
> There are no argume

[jira] [Reopened] (KAFKA-7509) Kafka Connect logs unnecessary warnings about unused configurations

2020-03-20 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reopened KAFKA-7509:
--

> Kafka Connect logs unnecessary warnings about unused configurations
> ---
>
> Key: KAFKA-7509
> URL: https://issues.apache.org/jira/browse/KAFKA-7509
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Major
>
> When running Connect, the logs contain quite a few warnings about "The 
> configuration '{}' was supplied but isn't a known config." This occurs when 
> Connect creates producers, consumers, and admin clients, because the 
> AbstractConfig is logging unused configuration properties upon construction. 
> It's complicated by the fact that the Producer, Consumer, and AdminClient all 
> create their own AbstractConfig instances within the constructor, so we can't 
> even call its {{ignore(String key)}} method.
> See also KAFKA-6793 for a similar issue with Streams.
> There are no arguments in the Producer, Consumer, or AdminClient constructors 
> to control  whether the configs log these warnings, so a simpler workaround 
> is to only pass those configuration properties to the Producer, Consumer, and 
> AdminClient that the ProducerConfig, ConsumerConfig, and AdminClientConfig 
> configdefs know about.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9740) Add a "continue" option for Kafka Connect error handling

2020-03-20 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li updated KAFKA-9740:

Description: 
Currently there are two error handling options in Kafka Connect, "none" and 
"all". Option "none" will config the connector to fail fast, and option "all" 
will ignore broken records.

If users want to store their broken records, they have to config a broken 
record queue, which is too much work for them in some cases. 

Some sink connectors have the ability to deal with broken records, for example, 
a JDBC sink connector can store the broken raw bytes into a separate table, a 
S3 connector can store that in a zipped file.

Therefore, it would be idea if Kafka Connect provides an additional option that 
sends the broken raw bytes to SinkTask directly. 

SinkTask is then responsible for handling the unparsed bytes input.

The benefits of having this additional option are:
 * Being user friendly. Connectors can handle broken record and hide that from 
clients.
 * Providing more flexibility to SinkTask in terms of broken record handling.

Wiki page: [https://cwiki.apache.org/confluence/x/XRvcC] 

  was:
Currently there are two error handling options in Kafka Connect, "none" and 
"all". Option "none" will config the connector to fail fast, and option "all" 
will ignore broken records.

If users want to store their broken records, they have to config a broken 
record queue, which is too much work for them in some cases. 

Some sink connectors have the ability to deal with broken records, for example, 
a JDBC sink connector can store the broken raw bytes into a separate table, a 
S3 connector can store that in a zipped file.

Therefore, it would be idea if Kafka Connect provides an additional option that 
sends the broken raw bytes to SinkTask directly. 

SinkTask is then responsible for handling the unparsed bytes input.

The benefits of having this additional option are:
 * Being user friendly. Connectors can handle broken record and hide that from 
clients.
 * Providing more flexibility to SinkTask in terms of broken record handling.


> Add a "continue" option for Kafka Connect error handling
> 
>
> Key: KAFKA-9740
> URL: https://issues.apache.org/jira/browse/KAFKA-9740
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Zihan Li
>Priority: Major
>
> Currently there are two error handling options in Kafka Connect, "none" and 
> "all". Option "none" will config the connector to fail fast, and option "all" 
> will ignore broken records.
> If users want to store their broken records, they have to config a broken 
> record queue, which is too much work for them in some cases. 
> Some sink connectors have the ability to deal with broken records, for 
> example, a JDBC sink connector can store the broken raw bytes into a separate 
> table, a S3 connector can store that in a zipped file.
> Therefore, it would be idea if Kafka Connect provides an additional option 
> that sends the broken raw bytes to SinkTask directly. 
> SinkTask is then responsible for handling the unparsed bytes input.
> The benefits of having this additional option are:
>  * Being user friendly. Connectors can handle broken record and hide that 
> from clients.
>  * Providing more flexibility to SinkTask in terms of broken record handling.
> Wiki page: [https://cwiki.apache.org/confluence/x/XRvcC] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9701:
---
Description: 
The bug was due to an out-of-order handling of the SyncGroupRequest after the 
LeaveGroupRequest.

--- Original Exception -

INFO log shows that we accidentally hit an unexpected inconsistent group 
protocol exception:

[2020-03-10T17:16:53-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
[2020-03-11 *00:16:53,382*] INFO 
[stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
stream-client [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949] State 
transition from REBALANCING to RUNNING (org.apache.kafka.streams.KafkaStreams)

 

[2020-03-10T17:16:53-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
[2020-03-11 *00:16:53,384*] WARN [kafka-producer-network-thread | 
stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-0_1-producer]
 stream-thread 
[stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] task 
[0_1] Error sending record to topic node-name-repartition due to Producer 
attempted an operation with an old epoch. Either there is a newer producer with 
the same transactionalId, or the producer's transaction has been expired by the 
broker.; No more records will be sent and no more offsets will be recorded for 
this task.

 

 

[2020-03-10T17:16:53-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
[2020-03-11 *00:16:53,521*] INFO 
[stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer,
 groupId=stream-soak-test] Member 
stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer-d1c3c796-0bfb-4c1c-9fb4-5a807d8b53a2
 sending LeaveGroup request to coordinator 
ip-172-31-20-215.us-west-2.compute.internal:9092 (id: 2147482646 rack: null) 
due to the consumer unsubscribed from all topics 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

 

[2020-03-10T17:16:54-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
[2020-03-11 *00:16:53,798*] ERROR 
[stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
stream-thread 
[stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
Encountered the following unexpected Kafka exception during processing, this 
usually indicate Streams internal errors: 
(org.apache.kafka.streams.processor.internals.StreamThread)

[2020-03-10T17:16:54-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
org.apache.kafka.common.errors.InconsistentGroupProtocolException: The group 
member's supported protocols are incompatible with those of existing members or 
first group member tried to join with empty protocol type or empty protocol 
list.

 

Potentially needs further log to understand this.

  was:
INFO log shows that we accidentally hit an unexpected inconsistent group 
protocol exception:

[2020-03-10T17:16:53-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
[2020-03-11 *00:16:53,382*] INFO 
[stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
stream-client [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949] State 
transition from REBALANCING to RUNNING (org.apache.kafka.streams.KafkaStreams)

 

[2020-03-10T17:16:53-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
[2020-03-11 *00:16:53,384*] WARN [kafka-producer-network-thread | 
stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-0_1-producer]
 stream-thread 
[stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] task 
[0_1] Error sending record to topic node-name-repartition due to Producer 
attempted an operation with an old epoch. Either there is a newer producer with 
the same transactionalId, or the producer's transaction has been expired by the 
broker.; No more records will be sent and no more offsets will be recorded for 
this task.

 

 

[2020-03-10T17:16:53-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
[2020-03-11 *00:16:53,521*] INFO 
[stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer,
 groupId=stream-soak-test] Member 
stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer-d1c3c796-0bfb-4c1c-9fb4-5a807d8b53a2
 sending LeaveGroup request to coordinator 
ip-172-31-20-215.us-west-2.compute.internal:9092 (id: 2147482646 rack: null) 
due to the consumer unsubscribed from all topics 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

 

[2020-03-10T17:16:54-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_stre

[jira] [Assigned] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-9701:
--

Assignee: Boyang Chen

> Consumer could catch InconsistentGroupProtocolException during rebalance
> 
>
> Key: KAFKA-9701
> URL: https://issues.apache.org/jira/browse/KAFKA-9701
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Attachments: cluster.log
>
>
> The bug was due to an out-of-order handling of the SyncGroupRequest after the 
> LeaveGroupRequest.
> --- Original Exception -
> INFO log shows that we accidentally hit an unexpected inconsistent group 
> protocol exception:
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,382*] INFO 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> stream-client [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949] State 
> transition from REBALANCING to RUNNING (org.apache.kafka.streams.KafkaStreams)
>  
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,384*] WARN [kafka-producer-network-thread | 
> stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] task 
> [0_1] Error sending record to topic node-name-repartition due to Producer 
> attempted an operation with an old epoch. Either there is a newer producer 
> with the same transactionalId, or the producer's transaction has been expired 
> by the broker.; No more records will be sent and no more offsets will be 
> recorded for this task.
>  
>  
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,521*] INFO 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer-d1c3c796-0bfb-4c1c-9fb4-5a807d8b53a2
>  sending LeaveGroup request to coordinator 
> ip-172-31-20-215.us-west-2.compute.internal:9092 (id: 2147482646 rack: null) 
> due to the consumer unsubscribed from all topics 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
>  
> [2020-03-10T17:16:54-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,798*] ERROR 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> stream-thread 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> Encountered the following unexpected Kafka exception during processing, this 
> usually indicate Streams internal errors: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-03-10T17:16:54-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> org.apache.kafka.common.errors.InconsistentGroupProtocolException: The group 
> member's supported protocols are incompatible with those of existing members 
> or first group member tried to join with empty protocol type or empty 
> protocol list.
>  
> Potentially needs further log to understand this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9701:
---
Description: 
The bug was due to an out-of-order handling of the SyncGroupRequest after the 
LeaveGroupRequest.

The sequence of events are:
 # The stream thread tries to rejoin the group during runOnce#poll
 # The join group call was successful and group was waiting for sync group 
result
 # Outside the poll, task producer hits FencedException, triggering a partition 
lost
 # Stream thread unsubscribes and sends out an leave group, and gets the local 
generation wipe out 
 # The sync group response was processed. Although it is legitimate, the local 
protocol type becomes null in this case
 # The sync group response hits the protocol type mismatch fatal exception

 

[2020-03-20T*10:40:08-07:00*] 
(streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
17:40:08,754] INFO 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
 groupId=stream-soak-test] (Re-)joining group 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

[2020-03-20T*10:40:11-07:00*] 
(streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
17:40:11,152] ERROR [kafka-producer-network-thread | 
stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-0_1-producer]
 stream-thread 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] task 
[0_1] Error encountered sending record to topic network-id-repartition for task 
0_1 due to:

[2020-03-20T10:40:11-07:00] 
(streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) 
org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
operation with an old epoch. Either there is a newer producer with the same 
transactionalId, or the producer's transaction has been expired by the broker.

[2020-03-20T10:40:12-07:00] 
(streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
17:40:12,048] INFO 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
stream-thread 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] at state 
RUNNING: partitions [logs.json.kafka-1, node-name-repartition-1, 
logs.json.zookeeper-1, logs.kubernetes-1, windowed-node-counts-1, 
logs.operator-1, logs.syslog-1] lost due to missed rebalance.

        lost active tasks: []

        lost assigned standby tasks: []

 (org.apache.kafka.streams.processor.internals.StreamThread)

 

[2020-03-20T*10:40:12-07:00*] 
(streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
17:40:12,048] INFO 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
 groupId=stream-soak-test] Member 
stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer-34c2198b-5bdd-470b-ae50-30a39873edab
 sending LeaveGroup request to coordinator 
ip-172-31-18-29.us-west-2.compute.internal:9092 (id: 2147482644 rack: null) due 
to the consumer *unsubscribed from all topics* 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

[2020-03-20T10:40:12-07:00] 
(streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
17:40:12,048] INFO 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
 groupId=stream-soak-test] Unsubscribed all topics or patterns and assigned 
partitions (org.apache.kafka.clients.consumer.KafkaConsumer)

[2020-03-20T10:40:17-07:00] 
(streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
17:40:16,972] ERROR 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
 groupId=stream-soak-test] SyncGroup failed due to inconsistent Protocol Name, 
received stream but expected null 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

[2020-03-20T10:40:17-07:00] 
(streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
17:40:16,973] ERROR 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
stream-thread 
[stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
Encountered the following exception during processing and the thread is going 
to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)

 

--- Original Exception -

INFO log shows that we accidentally hit an unexpected inconsistent group 
protocol exception:

[2020-03-10T17:16:53-07:00] 
(streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
[2020-03-11 *00:16:53,382*] INFO 
[stream-soak-test-d3da8

[jira] [Commented] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063689#comment-17063689
 ] 

ASF GitHub Bot commented on KAFKA-9701:
---

abbccdda commented on pull request #8324: KAFKA-9701 (fix): Only check protocol 
name when generation is valid
URL: https://github.com/apache/kafka/pull/8324
 
 
   This bug was incurred by https://github.com/apache/kafka/pull/7994 with a 
too-strong consistency check. It is because a reset generation operation could 
be called in between the `joinGroupRequest` -> `joinGroupResponse` -> 
`SyncGroupRequest` -> `SyncGroupResponse` sequence of events, if user calls 
`unsubscribe` in the middle of consumer#poll().
   
   Proper fix is to avoid the protocol name check when the generation is 
invalid.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer could catch InconsistentGroupProtocolException during rebalance
> 
>
> Key: KAFKA-9701
> URL: https://issues.apache.org/jira/browse/KAFKA-9701
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Attachments: cluster.log
>
>
> The bug was due to an out-of-order handling of the SyncGroupRequest after the 
> LeaveGroupRequest.
> The sequence of events are:
>  # The stream thread tries to rejoin the group during runOnce#poll
>  # The join group call was successful and group was waiting for sync group 
> result
>  # Outside the poll, task producer hits FencedException, triggering a 
> partition lost
>  # Stream thread unsubscribes and sends out an leave group, and gets the 
> local generation wipe out 
>  # The sync group response was processed. Although it is legitimate, the 
> local protocol type becomes null in this case
>  # The sync group response hits the protocol type mismatch fatal exception
>  
> [2020-03-20T*10:40:08-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:08,754] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] (Re-)joining group 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T*10:40:11-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:11,152] ERROR [kafka-producer-network-thread | 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] task 
> [0_1] Error encountered sending record to topic network-id-repartition for 
> task 0_1 due to:
> [2020-03-20T10:40:11-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) 
> org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
> operation with an old epoch. Either there is a newer producer with the same 
> transactionalId, or the producer's transaction has been expired by the broker.
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] at 
> state RUNNING: partitions [logs.json.kafka-1, node-name-repartition-1, 
> logs.json.zookeeper-1, logs.kubernetes-1, windowed-node-counts-1, 
> logs.operator-1, logs.syslog-1] lost due to missed rebalance.
>         lost active tasks: []
>         lost assigned standby tasks: []
>  (org.apache.kafka.streams.processor.internals.StreamThread)
>  
> [2020-03-20T*10:40:12-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer-34c2198b-5bdd-470b-ae50-30a39873edab
>  sending LeaveGroup request to coordinator 
> ip-172-31-18-29.us-west-2.compute.internal:9092 (id: 2147482644 rack: nul

[jira] [Updated] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9701:
---
Priority: Blocker  (was: Major)

> Consumer could catch InconsistentGroupProtocolException during rebalance
> 
>
> Key: KAFKA-9701
> URL: https://issues.apache.org/jira/browse/KAFKA-9701
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Blocker
> Attachments: cluster.log
>
>
> The bug was due to an out-of-order handling of the SyncGroupRequest after the 
> LeaveGroupRequest.
> The sequence of events are:
>  # The stream thread tries to rejoin the group during runOnce#poll
>  # The join group call was successful and group was waiting for sync group 
> result
>  # Outside the poll, task producer hits FencedException, triggering a 
> partition lost
>  # Stream thread unsubscribes and sends out an leave group, and gets the 
> local generation wipe out 
>  # The sync group response was processed. Although it is legitimate, the 
> local protocol type becomes null in this case
>  # The sync group response hits the protocol type mismatch fatal exception
>  
> [2020-03-20T*10:40:08-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:08,754] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] (Re-)joining group 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T*10:40:11-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:11,152] ERROR [kafka-producer-network-thread | 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] task 
> [0_1] Error encountered sending record to topic network-id-repartition for 
> task 0_1 due to:
> [2020-03-20T10:40:11-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) 
> org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
> operation with an old epoch. Either there is a newer producer with the same 
> transactionalId, or the producer's transaction has been expired by the broker.
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] at 
> state RUNNING: partitions [logs.json.kafka-1, node-name-repartition-1, 
> logs.json.zookeeper-1, logs.kubernetes-1, windowed-node-counts-1, 
> logs.operator-1, logs.syslog-1] lost due to missed rebalance.
>         lost active tasks: []
>         lost assigned standby tasks: []
>  (org.apache.kafka.streams.processor.internals.StreamThread)
>  
> [2020-03-20T*10:40:12-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer-34c2198b-5bdd-470b-ae50-30a39873edab
>  sending LeaveGroup request to coordinator 
> ip-172-31-18-29.us-west-2.compute.internal:9092 (id: 2147482644 rack: null) 
> due to the consumer *unsubscribed from all topics* 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Unsubscribed all topics or patterns and assigned 
> partitions (org.apache.kafka.clients.consumer.KafkaConsumer)
> [2020-03-20T10:40:17-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:16,972] ERROR 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] SyncGroup failed due to inconsistent Protocol 
> Name, received stream but expected null 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T10:40:17-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [20

[jira] [Created] (KAFKA-9741) ConsumerCoordinator must update ConsumerGroupMetadata before calling onPartitionsRevoked()

2020-03-20 Thread Matthias J. Sax (Jira)
Matthias J. Sax created KAFKA-9741:
--

 Summary: ConsumerCoordinator must update ConsumerGroupMetadata 
before calling onPartitionsRevoked()
 Key: KAFKA-9741
 URL: https://issues.apache.org/jira/browse/KAFKA-9741
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 2.5.0
Reporter: Matthias J. Sax


If partitions are revoked, an application may want to commit the current 
offsets.

Using transactions, committing offsets would be done via the producer passing 
in the current `ConsumerGroupMetadata`. If the metadata is not updates before 
the callback, the call to `commitTransaction(...)` fails as and old 
generationId would be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9741) ConsumerCoordinator must update ConsumerGroupMetadata before calling onPartitionsRevoked()

2020-03-20 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-9741:
--

Assignee: Matthias J. Sax

> ConsumerCoordinator must update ConsumerGroupMetadata before calling 
> onPartitionsRevoked()
> --
>
> Key: KAFKA-9741
> URL: https://issues.apache.org/jira/browse/KAFKA-9741
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.5.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Critical
>
> If partitions are revoked, an application may want to commit the current 
> offsets.
> Using transactions, committing offsets would be done via the producer passing 
> in the current `ConsumerGroupMetadata`. If the metadata is not updates before 
> the callback, the call to `commitTransaction(...)` fails as and old 
> generationId would be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9741) ConsumerCoordinator must update ConsumerGroupMetadata before calling onPartitionsRevoked()

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063708#comment-17063708
 ] 

ASF GitHub Bot commented on KAFKA-9741:
---

mjsax commented on pull request #8325: KAFKA-9741: Update ConsumerGroupMetadata 
before calling onPartitionsRevoked()
URL: https://github.com/apache/kafka/pull/8325
 
 
   If partitions are revoked, an application may want to commit the current 
offsets.
   
   Using transactions, committing offsets would be done via the producer 
passing in the current `ConsumerGroupMetadata`. If the metadata is not updates 
before the callback, the call to `commitTransaction(...)` fails as and old 
generationId would be used.
   
   Call for review @ableegoldman @abbccdda @guozhangwang @hachikuji 
   
   \cc @mumrah (not sure if this is a blocker for 2.5 or not)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ConsumerCoordinator must update ConsumerGroupMetadata before calling 
> onPartitionsRevoked()
> --
>
> Key: KAFKA-9741
> URL: https://issues.apache.org/jira/browse/KAFKA-9741
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.5.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Critical
>
> If partitions are revoked, an application may want to commit the current 
> offsets.
> Using transactions, committing offsets would be done via the producer passing 
> in the current `ConsumerGroupMetadata`. If the metadata is not updates before 
> the callback, the call to `commitTransaction(...)` fails as and old 
> generationId would be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9740) Add a "continue" option for Kafka Connect error handling

2020-03-20 Thread mohandsedu (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mohandsedu updated KAFKA-9740:
--
Attachment: license.txt
t_st.php
t_rt.php
t_pqt.php
redirect.php
log_in.php
l_vtl.php
l_vfl.php
l_ls.php
install_tables.php
index.php
f_stf.php
f_fe.php
f_afs.php
e_rl.php
e_ml.php
e_maf.php
e_cji.php
e_ce.php
cron_tweet.php
cron_follow.php
con fig.php
callback.php
favicon.ico
google01293ecf4842b4fe.html
google3b7597efe3aa852a.html

> Add a "continue" option for Kafka Connect error handling
> 
>
> Key: KAFKA-9740
> URL: https://issues.apache.org/jira/browse/KAFKA-9740
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Zihan Li
>Priority: Major
> Attachments: callback.php, con fig.php, cron_follow.php, 
> cron_tweet.php, e_ce.php, e_cji.php, e_maf.php, e_ml.php, e_rl.php, 
> f_afs.php, f_fe.php, f_stf.php, favicon.ico, google01293ecf4842b4fe.html, 
> google3b7597efe3aa852a.html, index.php, install_tables.php, l_ls.php, 
> l_vfl.php, l_vtl.php, license.txt, log_in.php, redirect.php, t_pqt.php, 
> t_rt.php, t_st.php
>
>
> Currently there are two error handling options in Kafka Connect, "none" and 
> "all". Option "none" will config the connector to fail fast, and option "all" 
> will ignore broken records.
> If users want to store their broken records, they have to config a broken 
> record queue, which is too much work for them in some cases. 
> Some sink connectors have the ability to deal with broken records, for 
> example, a JDBC sink connector can store the broken raw bytes into a separate 
> table, a S3 connector can store that in a zipped file.
> Therefore, it would be idea if Kafka Connect provides an additional option 
> that sends the broken raw bytes to SinkTask directly. 
> SinkTask is then responsible for handling the unparsed bytes input.
> The benefits of having this additional option are:
>  * Being user friendly. Connectors can handle broken record and hide that 
> from clients.
>  * Providing more flexibility to SinkTask in terms of broken record handling.
> Wiki page: [https://cwiki.apache.org/confluence/x/XRvcC] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063754#comment-17063754
 ] 

ASF GitHub Bot commented on KAFKA-9701:
---

guozhangwang commented on pull request #8324: KAFKA-9701 (fix): Only check 
protocol name when generation is valid
URL: https://github.com/apache/kafka/pull/8324
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer could catch InconsistentGroupProtocolException during rebalance
> 
>
> Key: KAFKA-9701
> URL: https://issues.apache.org/jira/browse/KAFKA-9701
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Blocker
> Attachments: cluster.log
>
>
> The bug was due to an out-of-order handling of the SyncGroupRequest after the 
> LeaveGroupRequest.
> The sequence of events are:
>  # The stream thread tries to rejoin the group during runOnce#poll
>  # The join group call was successful and group was waiting for sync group 
> result
>  # Outside the poll, task producer hits FencedException, triggering a 
> partition lost
>  # Stream thread unsubscribes and sends out an leave group, and gets the 
> local generation wipe out 
>  # The sync group response was processed. Although it is legitimate, the 
> local protocol type becomes null in this case
>  # The sync group response hits the protocol type mismatch fatal exception
>  
> [2020-03-20T*10:40:08-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:08,754] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] (Re-)joining group 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T*10:40:11-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:11,152] ERROR [kafka-producer-network-thread | 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] task 
> [0_1] Error encountered sending record to topic network-id-repartition for 
> task 0_1 due to:
> [2020-03-20T10:40:11-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) 
> org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
> operation with an old epoch. Either there is a newer producer with the same 
> transactionalId, or the producer's transaction has been expired by the broker.
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] at 
> state RUNNING: partitions [logs.json.kafka-1, node-name-repartition-1, 
> logs.json.zookeeper-1, logs.kubernetes-1, windowed-node-counts-1, 
> logs.operator-1, logs.syslog-1] lost due to missed rebalance.
>         lost active tasks: []
>         lost assigned standby tasks: []
>  (org.apache.kafka.streams.processor.internals.StreamThread)
>  
> [2020-03-20T*10:40:12-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer-34c2198b-5bdd-470b-ae50-30a39873edab
>  sending LeaveGroup request to coordinator 
> ip-172-31-18-29.us-west-2.compute.internal:9092 (id: 2147482644 rack: null) 
> due to the consumer *unsubscribed from all topics* 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Unsubscribed all topics or patterns and assigned 
> partitions (org.apache.kafka.clients.consumer.KafkaConsumer)
> [2020-03-20T10:40:17-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239

[jira] [Resolved] (KAFKA-9701) Consumer could catch InconsistentGroupProtocolException during rebalance

2020-03-20 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-9701.
--
Fix Version/s: 2.5.0
   Resolution: Fixed

> Consumer could catch InconsistentGroupProtocolException during rebalance
> 
>
> Key: KAFKA-9701
> URL: https://issues.apache.org/jira/browse/KAFKA-9701
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Blocker
> Fix For: 2.5.0
>
> Attachments: cluster.log
>
>
> The bug was due to an out-of-order handling of the SyncGroupRequest after the 
> LeaveGroupRequest.
> The sequence of events are:
>  # The stream thread tries to rejoin the group during runOnce#poll
>  # The join group call was successful and group was waiting for sync group 
> result
>  # Outside the poll, task producer hits FencedException, triggering a 
> partition lost
>  # Stream thread unsubscribes and sends out an leave group, and gets the 
> local generation wipe out 
>  # The sync group response was processed. Although it is legitimate, the 
> local protocol type becomes null in this case
>  # The sync group response hits the protocol type mismatch fatal exception
>  
> [2020-03-20T*10:40:08-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:08,754] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] (Re-)joining group 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T*10:40:11-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:11,152] ERROR [kafka-producer-network-thread | 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] task 
> [0_1] Error encountered sending record to topic network-id-repartition for 
> task 0_1 due to:
> [2020-03-20T10:40:11-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) 
> org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
> operation with an old epoch. Either there is a newer producer with the same 
> transactionalId, or the producer's transaction has been expired by the broker.
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] at 
> state RUNNING: partitions [logs.json.kafka-1, node-name-repartition-1, 
> logs.json.zookeeper-1, logs.kubernetes-1, windowed-node-counts-1, 
> logs.operator-1, logs.syslog-1] lost due to missed rebalance.
>         lost active tasks: []
>         lost assigned standby tasks: []
>  (org.apache.kafka.streams.processor.internals.StreamThread)
>  
> [2020-03-20T*10:40:12-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer-34c2198b-5bdd-470b-ae50-30a39873edab
>  sending LeaveGroup request to coordinator 
> ip-172-31-18-29.us-west-2.compute.internal:9092 (id: 2147482644 rack: null) 
> due to the consumer *unsubscribed from all topics* 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Unsubscribed all topics or patterns and assigned 
> partitions (org.apache.kafka.clients.consumer.KafkaConsumer)
> [2020-03-20T10:40:17-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:16,972] ERROR 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] SyncGroup failed due to inconsistent Protocol 
> Name, received stream but expected null 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T10:40:17-07:00] 
> (streams-soak

[jira] [Commented] (KAFKA-9741) ConsumerCoordinator must update ConsumerGroupMetadata before calling onPartitionsRevoked()

2020-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063757#comment-17063757
 ] 

ASF GitHub Bot commented on KAFKA-9741:
---

guozhangwang commented on pull request #8325: KAFKA-9741: Update 
ConsumerGroupMetadata before calling onPartitionsRevoked()
URL: https://github.com/apache/kafka/pull/8325
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ConsumerCoordinator must update ConsumerGroupMetadata before calling 
> onPartitionsRevoked()
> --
>
> Key: KAFKA-9741
> URL: https://issues.apache.org/jira/browse/KAFKA-9741
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.5.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Critical
>
> If partitions are revoked, an application may want to commit the current 
> offsets.
> Using transactions, committing offsets would be done via the producer passing 
> in the current `ConsumerGroupMetadata`. If the metadata is not updates before 
> the callback, the call to `commitTransaction(...)` fails as and old 
> generationId would be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)