[jira] [Commented] (KAFKA-17101) Mirror maker internal topics cleanup policy changes to 'delete' from 'compact'

2024-07-17 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866646#comment-17866646
 ] 

kaushik srinivas commented on KAFKA-17101:
--

It is MM2. 

We do not have any other services editing the configuration and this is not 
easily reproducible as well.

> Mirror maker internal topics cleanup policy changes to 'delete' from 
> 'compact' 
> ---
>
> Key: KAFKA-17101
> URL: https://issues.apache.org/jira/browse/KAFKA-17101
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.4.1, 3.5.1, 3.6.1
>Reporter: kaushik srinivas
>Priority: Major
>
> Scenario/Setup details
> Kafka cluster 1: 3 replicas
> Kafka cluster 2: 3 replicas
> MM1 moving data from cluster 1 to cluster 2
> MM2 moving data from cluster 2 to cluster 1
> Sometimes with a reboot of the kafka cluster 1 and MM1 instance, we observe 
> MM failing to come up with below exception,
> {code:java}
> {"message":"DistributedHerder-connect-1-1 - 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder - [Worker 
> clientId=connect-1, groupId=site1-mm2] Uncaught exception in herder work 
> thread, exiting: "}}
> org.apache.kafka.common.config.ConfigException: Topic 
> 'mm2-offsets.site1.internal' supplied via the 'offset.storage.topic' property 
> is required to have 'cleanup.policy=compact' to guarantee consistency and 
> durability of source connector offsets, but found the topic currently has 
> 'cleanup.policy=delete'. Continuing would likely result in eventually losing 
> source connector offsets and problems restarting this Connect cluster in the 
> future. Change the 'offset.storage.topic' property in the Connect worker 
> configurations to use a topic with 'cleanup.policy=compact'. {code}
> Once the topic is altered with cleanup policy of compact. MM works just fine.
> This is happening on our setups sporadically and across varieties of 
> scenarios. Not been successful in identifying the exact reproduction steps as 
> of now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3

2024-07-11 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-17079:
-
Priority: Major  (was: Blocker)

> scoverage plugin not found in maven repo, version 1.9.3
> ---
>
> Key: KAFKA-17079
> URL: https://issues.apache.org/jira/browse/KAFKA-17079
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Major
>
> Team, 
> Creating coverage reports for kafka core module. Below issue is seen. Branch 
> used is 3.6
> * Exception is:
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
> ':core:compileScoverageScala'.
>         at 
> org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52)
>         at 
> org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303)
>         at 
> org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463)
>         at 
> org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380)
>         at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>         at 
> org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
> Caused by: 
> org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException:
>  Could not resolve all files for configuration ':core:scoverage'.
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496)
>         at 
> org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30)
>         at 
> org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74)
>         at 
> org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574)
>         at 
> org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
>         at 
> org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133)
>         at 
> 

[jira] [Commented] (KAFKA-17101) Mirror maker internal topics cleanup policy changes to 'delete' from 'compact'

2024-07-11 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864935#comment-17864935
 ] 

kaushik srinivas commented on KAFKA-17101:
--

[~gharris1727] Below is the configuration. 

site1.ssl.keystore.filename=/etc/mirror-maker/secrets/ssl/site1/server.jks
site1.ssl.truststore.filename=/etc/mirror-maker/secrets/ssl/site1/trustchain.jks
site2.security.protocol=SSL
site1->site2.heartbeats.topic.replication.factor=3
site1.ssl.enabled.protocols=TLSv1.2,TLSv1.3
syslog=false
site2.ssl.keystore.filename=/etc/mirror-maker/secrets/ssl/site2/server.jks
site2.consumer.ssl.cipher.suites=
site1->site2.sync.topic.configs.interval.seconds=300
site1->site2.topics=.*_ALARM$,.*_INTERNAL_Intent_Changes$,product_INTERNAL_HAM_UPDATE$
site1->site2.replication.factor=3
site1->site2.emit.checkpoints.enabled=true
site2.ssl.key.password=productkeystore
tasks.max=1
site1.ssl.keystore.password=productkeystore
site1.ssl.truststore.location=/etc/kafka/shared/site1_truststore
site1.status.storage.replication.factor=3
site1.ssl.truststore.password=productkeystore
site1->site2.sync.topic.acls.enabled=false
site1->site2.refresh.topics.interval.seconds=300
site2.ssl.truststore.location=/etc/kafka/shared/site2_truststore
site2.ssl.truststore.filename=/etc/mirror-maker/secrets/ssl/site2/trustchain.jks
site1.ssl.protocol=TLSv1.2
site1->site2.replication.policy.class=RetainTopicNameReplicationPolicy
site1->site2.groups=.*
site1.security.protocol=SSL
site1->site2.groups.blacklist=console-consumer-.*, connect-.*, __.*
site1.config.storage.replication.factor=3
site1->site2.emit.hearbeats.enabled=true
site2.ssl.truststore.password=productkeystore
site1->site2.offset-syncs.topic.replication.factor=3
site1.ssl.keystore.location=/etc/kafka/shared/site1_keystore
site1.offset.storage.replication.factor=3
clusters=site1,site2
site1.bootstrap.servers=product-kafka-headless:9092
site2.ssl.protocol=TLSv1.2
site1->site2.refresh.groups.interval.seconds=300
site2.ssl.enabled=true
site2.ssl.enabled.protocols=TLSv1.2,TLSv1.3
site2.ssl.endpoint.identification.algorithm=
site1.ssl.cipher.suites=
site1->site2.checkpoints.topic.replication.factor=3
site1.producer.ssl.cipher.suites=
site2.ssl.keystore.location=/etc/kafka/shared/site2_keystore
site2.config.storage.replication.factor=3
site2.status.storage.replication.factor=3
site1.ssl.enabled=true
site1->site2.enabled=true
site1.ssl.endpoint.identification.algorithm=
site1.admin.ssl.cipher.suites=
site2.producer.ssl.cipher.suites=
site2.ssl.keystore.password=productkeystore
site2.ssl.cipher.suites=
site2.offset.storage.replication.factor=3
site1.ssl.key.password=productkeystore
site1.consumer.ssl.cipher.suites=
site2.bootstrap.servers=product-kafka-headless:9097
site1->site2.sync.topic.configs.enabled=true
site2.admin.ssl.cipher.suites=

> Mirror maker internal topics cleanup policy changes to 'delete' from 
> 'compact' 
> ---
>
> Key: KAFKA-17101
> URL: https://issues.apache.org/jira/browse/KAFKA-17101
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.4.1, 3.5.1, 3.6.1
>Reporter: kaushik srinivas
>Priority: Major
>
> Scenario/Setup details
> Kafka cluster 1: 3 replicas
> Kafka cluster 2: 3 replicas
> MM1 moving data from cluster 1 to cluster 2
> MM2 moving data from cluster 2 to cluster 1
> Sometimes with a reboot of the kafka cluster 1 and MM1 instance, we observe 
> MM failing to come up with below exception,
> {code:java}
> {"message":"DistributedHerder-connect-1-1 - 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder - [Worker 
> clientId=connect-1, groupId=site1-mm2] Uncaught exception in herder work 
> thread, exiting: "}}
> org.apache.kafka.common.config.ConfigException: Topic 
> 'mm2-offsets.site1.internal' supplied via the 'offset.storage.topic' property 
> is required to have 'cleanup.policy=compact' to guarantee consistency and 
> durability of source connector offsets, but found the topic currently has 
> 'cleanup.policy=delete'. Continuing would likely result in eventually losing 
> source connector offsets and problems restarting this Connect cluster in the 
> future. Change the 'offset.storage.topic' property in the Connect worker 
> configurations to use a topic with 'cleanup.policy=compact'. {code}
> Once the topic is altered with cleanup policy of compact. MM works just fine.
> This is happening on our setups sporadically and across varieties of 
> scenarios. Not been successful in identifying the exact reproduction steps as 
> of now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17101) Mirror maker internal topics cleanup policy changes to 'delete' from 'compact'

2024-07-08 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-17101:


 Summary: Mirror maker internal topics cleanup policy changes to 
'delete' from 'compact' 
 Key: KAFKA-17101
 URL: https://issues.apache.org/jira/browse/KAFKA-17101
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.6.1, 3.5.1, 3.4.1
Reporter: kaushik srinivas


Scenario/Setup details

Kafka cluster 1: 3 replicas

Kafka cluster 2: 3 replicas

MM1 moving data from cluster 1 to cluster 2

MM2 moving data from cluster 2 to cluster 1

Sometimes with a reboot of the kafka cluster 1 and MM1 instance, we observe MM 
failing to come up with below exception,
{code:java}
{"message":"DistributedHerder-connect-1-1 - 
org.apache.kafka.connect.runtime.distributed.DistributedHerder - [Worker 
clientId=connect-1, groupId=site1-mm2] Uncaught exception in herder work 
thread, exiting: "}}
org.apache.kafka.common.config.ConfigException: Topic 
'mm2-offsets.site1.internal' supplied via the 'offset.storage.topic' property 
is required to have 'cleanup.policy=compact' to guarantee consistency and 
durability of source connector offsets, but found the topic currently has 
'cleanup.policy=delete'. Continuing would likely result in eventually losing 
source connector offsets and problems restarting this Connect cluster in the 
future. Change the 'offset.storage.topic' property in the Connect worker 
configurations to use a topic with 'cleanup.policy=compact'. {code}
Once the topic is altered with cleanup policy of compact. MM works just fine.

This is happening on our setups sporadically and across varieties of scenarios. 
Not been successful in identifying the exact reproduction steps as of now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3

2024-07-04 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-17079:
-
Issue Type: Bug  (was: Improvement)

> scoverage plugin not found in maven repo, version 1.9.3
> ---
>
> Key: KAFKA-17079
> URL: https://issues.apache.org/jira/browse/KAFKA-17079
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Major
>
> Team, 
> Creating coverage reports for kafka core module. Below issue is seen. Branch 
> used is 3.6
> * Exception is:
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
> ':core:compileScoverageScala'.
>         at 
> org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52)
>         at 
> org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303)
>         at 
> org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463)
>         at 
> org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380)
>         at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>         at 
> org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
> Caused by: 
> org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException:
>  Could not resolve all files for configuration ':core:scoverage'.
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496)
>         at 
> org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30)
>         at 
> org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74)
>         at 
> org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574)
>         at 
> org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
>         at 
> org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133)
>         at 
> 

[jira] [Updated] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3

2024-07-04 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-17079:
-
Priority: Blocker  (was: Major)

> scoverage plugin not found in maven repo, version 1.9.3
> ---
>
> Key: KAFKA-17079
> URL: https://issues.apache.org/jira/browse/KAFKA-17079
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Blocker
>
> Team, 
> Creating coverage reports for kafka core module. Below issue is seen. Branch 
> used is 3.6
> * Exception is:
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
> ':core:compileScoverageScala'.
>         at 
> org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53)
>         at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73)
>         at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52)
>         at 
> org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317)
>         at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303)
>         at 
> org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463)
>         at 
> org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380)
>         at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>         at 
> org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
> Caused by: 
> org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException:
>  Could not resolve all files for configuration ':core:scoverage'.
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496)
>         at 
> org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30)
>         at 
> org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74)
>         at 
> org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
>         at 
> org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574)
>         at 
> org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
>         at 
> org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133)
>         at 
> 

[jira] [Created] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3

2024-07-04 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-17079:


 Summary: scoverage plugin not found in maven repo, version 1.9.3
 Key: KAFKA-17079
 URL: https://issues.apache.org/jira/browse/KAFKA-17079
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


Team, 

Creating coverage reports for kafka core module. Below issue is seen. Branch 
used is 3.6

* Exception is:
org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
':core:compileScoverageScala'.
        at 
org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38)
        at 
org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77)
        at 
org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55)
        at 
org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52)
        at 
org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204)
        at 
org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199)
        at 
org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66)
        at 
org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59)
        at 
org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157)
        at 
org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59)
        at 
org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53)
        at 
org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73)
        at 
org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52)
        at 
org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42)
        at 
org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337)
        at 
org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324)
        at 
org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317)
        at 
org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303)
        at 
org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463)
        at 
org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380)
        at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at 
org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
Caused by: 
org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException:
 Could not resolve all files for configuration ':core:scoverage'.
        at 
org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769)
        at 
org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176)
        at 
org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496)
        at 
org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30)
        at 
org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74)
        at 
org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
        at 
org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574)
        at 
org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
        at 
org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133)
        at 
org.gradle.api.internal.file.UnionFileCollection.visitChildren(UnionFileCollection.java:81)
        at 
org.gradle.api.internal.file.CompositeFileCollection.visitContents(CompositeFileCollection.java:133)
        at 
org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366)
        at 

[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-16370:
-
Issue Type: Bug  (was: Improvement)

> offline rollback procedure from kraft mode to zookeeper mode.
> -
>
> Key: KAFKA-16370
> URL: https://issues.apache.org/jira/browse/KAFKA-16370
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Major
>
> From the KIP, 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]
>  
> h2. Finalizing the Migration
> Once the cluster has been fully upgraded to KRaft mode, the controller will 
> still be running in migration mode and making dual writes to KRaft and ZK. 
> Since the data in ZK is still consistent with that of the KRaft metadata log, 
> it is still possible to revert back to ZK.
> *_The time that the cluster is running all KRaft brokers/controllers, but 
> still running in migration mode, is effectively unbounded._*
> Once the operator has decided to commit to KRaft mode, the final step is to 
> restart the controller quorum and take it out of migration mode by setting 
> _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The 
> active controller will only finalize the migration once it detects that all 
> members of the quorum have signaled that they are finalizing the migration 
> (again, using the tagged field in ApiVersionsResponse). Once the controller 
> leaves migration mode, it will write a ZkMigrationStateRecord to the log and 
> no longer perform writes to ZK. It will also disable its special handling of 
> ZK RPCs.
> *At this point, the cluster is fully migrated and is running in KRaft mode. A 
> rollback to ZK is still possible after finalizing the migration, but it must 
> be done offline and it will cause metadata loss (which can also cause 
> partition data loss).*
>  
> Trying out the same in a kafka cluster which is migrated from zookeeper into 
> kraft mode. We observe the rollback is possible by deleting the "/controller" 
> node in the zookeeper before the rollback from kraft mode to zookeeper is 
> done.
> The above snippet indicates that the rollback from kraft to zk after 
> migration is finalized is still possible in offline method. Is there any 
> already known steps to be done as part of this offline method of rollback ?
> From our experience, we currently know of the step "deletion of /controller 
> node in zookeeper to force zookeper based brokers to be elected as new 
> controller after the rollback is done". Are there any additional 
> steps/actions apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-16370:
-
Issue Type: Wish  (was: Improvement)

> offline rollback procedure from kraft mode to zookeeper mode.
> -
>
> Key: KAFKA-16370
> URL: https://issues.apache.org/jira/browse/KAFKA-16370
> Project: Kafka
>  Issue Type: Wish
>Reporter: kaushik srinivas
>Priority: Major
>
> From the KIP, 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]
>  
> h2. Finalizing the Migration
> Once the cluster has been fully upgraded to KRaft mode, the controller will 
> still be running in migration mode and making dual writes to KRaft and ZK. 
> Since the data in ZK is still consistent with that of the KRaft metadata log, 
> it is still possible to revert back to ZK.
> *_The time that the cluster is running all KRaft brokers/controllers, but 
> still running in migration mode, is effectively unbounded._*
> Once the operator has decided to commit to KRaft mode, the final step is to 
> restart the controller quorum and take it out of migration mode by setting 
> _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The 
> active controller will only finalize the migration once it detects that all 
> members of the quorum have signaled that they are finalizing the migration 
> (again, using the tagged field in ApiVersionsResponse). Once the controller 
> leaves migration mode, it will write a ZkMigrationStateRecord to the log and 
> no longer perform writes to ZK. It will also disable its special handling of 
> ZK RPCs.
> *At this point, the cluster is fully migrated and is running in KRaft mode. A 
> rollback to ZK is still possible after finalizing the migration, but it must 
> be done offline and it will cause metadata loss (which can also cause 
> partition data loss).*
>  
> Trying out the same in a kafka cluster which is migrated from zookeeper into 
> kraft mode. We observe the rollback is possible by deleting the "/controller" 
> node in the zookeeper before the rollback from kraft mode to zookeeper is 
> done.
> The above snippet indicates that the rollback from kraft to zk after 
> migration is finalized is still possible in offline method. Is there any 
> already known steps to be done as part of this offline method of rollback ?
> From our experience, we currently know of the step "deletion of /controller 
> node in zookeeper to force zookeper based brokers to be elected as new 
> controller after the rollback is done". Are there any additional 
> steps/actions apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-16370:
-
Issue Type: Improvement  (was: Wish)

> offline rollback procedure from kraft mode to zookeeper mode.
> -
>
> Key: KAFKA-16370
> URL: https://issues.apache.org/jira/browse/KAFKA-16370
> Project: Kafka
>  Issue Type: Improvement
>Reporter: kaushik srinivas
>Priority: Major
>
> From the KIP, 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]
>  
> h2. Finalizing the Migration
> Once the cluster has been fully upgraded to KRaft mode, the controller will 
> still be running in migration mode and making dual writes to KRaft and ZK. 
> Since the data in ZK is still consistent with that of the KRaft metadata log, 
> it is still possible to revert back to ZK.
> *_The time that the cluster is running all KRaft brokers/controllers, but 
> still running in migration mode, is effectively unbounded._*
> Once the operator has decided to commit to KRaft mode, the final step is to 
> restart the controller quorum and take it out of migration mode by setting 
> _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The 
> active controller will only finalize the migration once it detects that all 
> members of the quorum have signaled that they are finalizing the migration 
> (again, using the tagged field in ApiVersionsResponse). Once the controller 
> leaves migration mode, it will write a ZkMigrationStateRecord to the log and 
> no longer perform writes to ZK. It will also disable its special handling of 
> ZK RPCs.
> *At this point, the cluster is fully migrated and is running in KRaft mode. A 
> rollback to ZK is still possible after finalizing the migration, but it must 
> be done offline and it will cause metadata loss (which can also cause 
> partition data loss).*
>  
> Trying out the same in a kafka cluster which is migrated from zookeeper into 
> kraft mode. We observe the rollback is possible by deleting the "/controller" 
> node in the zookeeper before the rollback from kraft mode to zookeeper is 
> done.
> The above snippet indicates that the rollback from kraft to zk after 
> migration is finalized is still possible in offline method. Is there any 
> already known steps to be done as part of this offline method of rollback ?
> From our experience, we currently know of the step "deletion of /controller 
> node in zookeeper to force zookeper based brokers to be elected as new 
> controller after the rollback is done". Are there any additional 
> steps/actions apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-16370:


 Summary: offline rollback procedure from kraft mode to zookeeper 
mode.
 Key: KAFKA-16370
 URL: https://issues.apache.org/jira/browse/KAFKA-16370
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


>From the KIP, 
>[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]

 
h2. Finalizing the Migration

Once the cluster has been fully upgraded to KRaft mode, the controller will 
still be running in migration mode and making dual writes to KRaft and ZK. 
Since the data in ZK is still consistent with that of the KRaft metadata log, 
it is still possible to revert back to ZK.

*_The time that the cluster is running all KRaft brokers/controllers, but still 
running in migration mode, is effectively unbounded._*

Once the operator has decided to commit to KRaft mode, the final step is to 
restart the controller quorum and take it out of migration mode by setting 
_zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active 
controller will only finalize the migration once it detects that all members of 
the quorum have signaled that they are finalizing the migration (again, using 
the tagged field in ApiVersionsResponse). Once the controller leaves migration 
mode, it will write a ZkMigrationStateRecord to the log and no longer perform 
writes to ZK. It will also disable its special handling of ZK RPCs.

*At this point, the cluster is fully migrated and is running in KRaft mode. A 
rollback to ZK is still possible after finalizing the migration, but it must be 
done offline and it will cause metadata loss (which can also cause partition 
data loss).*

 

Trying out the same in a kafka cluster which is migrated from zookeeper into 
kraft mode. We observe the rollback is possible by deleting the "/controller" 
node in the zookeeper before the rollback from kraft mode to zookeeper is done.

The above snippet indicates that the rollback from kraft to zk after migration 
is finalized is still possible in offline method. Is there any already known 
steps to be done as part of this offline method of rollback ?

>From our experience, we currently know of the step "deletion of /controller 
>node in zookeeper to force zookeper based brokers to be elected as new 
>controller after the rollback is done". Are there any additional steps/actions 
>apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16360) Release plan of 3.x kafka releases.

2024-03-11 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-16360:


 Summary: Release plan of 3.x kafka releases.
 Key: KAFKA-16360
 URL: https://issues.apache.org/jira/browse/KAFKA-16360
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


KIP 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-ReleaseTimeline]
 mentions ,
h2. Kafka 3.7
 * January 2024
 * Final release with ZK mode

But we see in Jira, some tickets are marked for 3.8 release. Does apache 
continue to make 3.x releases having zookeeper and kraft supported independent 
of pure kraft 4.x releases ?

If yes, how many more releases can be expected on 3.x release line ?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-10-17 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-15223:
-
Priority: Major  (was: Critical)

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1
>Reporter: kaushik srinivas
>Priority: Major
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior releases of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-10-17 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-15223:
-
Issue Type: Improvement  (was: Bug)

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior releases of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-5892) Connector property override does not work unless setting ALL converter properties

2023-09-16 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765979#comment-17765979
 ] 

kaushik srinivas commented on KAFKA-5892:
-

[~rhauch] [~jsahu] , Is this still open to be modified ? 

 

> Connector property override does not work unless setting ALL converter 
> properties
> -
>
> Key: KAFKA-5892
> URL: https://issues.apache.org/jira/browse/KAFKA-5892
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Yeva Byzek
>Assignee: Jitendra Sahu
>Priority: Minor
>  Labels: newbie
>
> A single connector setting override {{value.converter.schemas.enable=false}} 
> only takes effect if ALL of the converter properties are overridden in the 
> connector.
> At minimum, we should give user warning or error that this is will be ignored.
> We should also consider changing the behavior to allow the single property 
> override even if all the converter properties are not specified, but this 
> requires discussion to evaluate the impact of this change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-08-16 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755207#comment-17755207
 ] 

kaushik srinivas commented on KAFKA-15223:
--

Any updates on this ?

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior releases of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-08-03 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-15223:
-
Issue Type: Bug  (was: Improvement)

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior releases of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-08-03 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750590#comment-17750590
 ] 

kaushik srinivas commented on KAFKA-15223:
--

[~ijuma] 

Can you please support us answering this query ?

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior releases of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-07-25 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747266#comment-17747266
 ] 

kaushik srinivas commented on KAFKA-15223:
--

Any updates on this ?

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior releases of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-07-20 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745419#comment-17745419
 ] 

kaushik srinivas commented on KAFKA-15223:
--

[~ijuma] , Can you please help us with this ?

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior releases of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-07-20 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-15223:
-
Affects Version/s: 3.4.1
   3.5.0
   3.3.2
   3.3.1
   3.4.0

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior releases of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-07-20 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-15223:
-
Summary: Need more clarity in documentation for upgrade/downgrade 
procedures and limitations across releases.  (was: Need clarity in 
documentation for upgrade/downgrade across releases.)

> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Improvement
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> "Restart the brokers one by one for the new protocol version to take effect. 
> {*}Once the brokers begin using the latest protocol version, it will no 
> longer be possible to downgrade the cluster to an older version.{*}"
>  
> These two statements are repeated across a lot of prior release of kafka and 
> is confusing.
> Below are the questions:
>  # Is downgrade not at all possible to *"any"* older version of kafka once 
> the inter.broker.protocol.version is updated to latest version *OR* 
> downgrades are not possible only to versions *"<2.1"* ?
>  # Suppose one takes an approach similar to upgrade even for the downgrade 
> path. i.e. downgrade the inter.broker.protocol.version first to the previous 
> version, next downgrade the software/code of kafka to previous release 
> revision. Does downgrade work with this approach ?
> Can these two questions be documented if the results are already known ?
> Maybe a downgrade guide can be created too similar to the existing upgrade 
> guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.

2023-07-20 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-15223:
-
Description: 
Referring to the upgrade documentation for apache kafka.

[https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]

There is confusion with respect to below statements from the above sectioned 
link of apache docs.

"If you are upgrading from a version prior to 2.1.x, please see the note below 
about the change to the schema used to store consumer offsets. *Once you have 
changed the inter.broker.protocol.version to the latest version, it will not be 
possible to downgrade to a version prior to 2.1."*

The above statement mentions that the downgrade would not be possible to 
version prior to "2.1" in case of "upgrading the inter.broker.protocol.version 
to the latest version".

But, there is another statement made in the documentation in *point 4* as below

"Restart the brokers one by one for the new protocol version to take effect. 
{*}Once the brokers begin using the latest protocol version, it will no longer 
be possible to downgrade the cluster to an older version.{*}"

 

These two statements are repeated across a lot of prior releases of kafka and 
is confusing.

Below are the questions:
 # Is downgrade not at all possible to *"any"* older version of kafka once the 
inter.broker.protocol.version is updated to latest version *OR* downgrades are 
not possible only to versions *"<2.1"* ?
 # Suppose one takes an approach similar to upgrade even for the downgrade 
path. i.e. downgrade the inter.broker.protocol.version first to the previous 
version, next downgrade the software/code of kafka to previous release 
revision. Does downgrade work with this approach ?

Can these two questions be documented if the results are already known ?

Maybe a downgrade guide can be created too similar to the existing upgrade 
guide ?

  was:
Referring to the upgrade documentation for apache kafka.

[https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]

There is confusion with respect to below statements from the above sectioned 
link of apache docs.

"If you are upgrading from a version prior to 2.1.x, please see the note below 
about the change to the schema used to store consumer offsets. *Once you have 
changed the inter.broker.protocol.version to the latest version, it will not be 
possible to downgrade to a version prior to 2.1."*

The above statement mentions that the downgrade would not be possible to 
version prior to "2.1" in case of "upgrading the inter.broker.protocol.version 
to the latest version".

But, there is another statement made in the documentation in *point 4* as below

"Restart the brokers one by one for the new protocol version to take effect. 
{*}Once the brokers begin using the latest protocol version, it will no longer 
be possible to downgrade the cluster to an older version.{*}"

 

These two statements are repeated across a lot of prior release of kafka and is 
confusing.

Below are the questions:
 # Is downgrade not at all possible to *"any"* older version of kafka once the 
inter.broker.protocol.version is updated to latest version *OR* downgrades are 
not possible only to versions *"<2.1"* ?
 # Suppose one takes an approach similar to upgrade even for the downgrade 
path. i.e. downgrade the inter.broker.protocol.version first to the previous 
version, next downgrade the software/code of kafka to previous release 
revision. Does downgrade work with this approach ?

Can these two questions be documented if the results are already known ?

Maybe a downgrade guide can be created too similar to the existing upgrade 
guide ?


> Need more clarity in documentation for upgrade/downgrade procedures and 
> limitations across releases.
> 
>
> Key: KAFKA-15223
> URL: https://issues.apache.org/jira/browse/KAFKA-15223
> Project: Kafka
>  Issue Type: Improvement
>Reporter: kaushik srinivas
>Priority: Critical
>
> Referring to the upgrade documentation for apache kafka.
> [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]
> There is confusion with respect to below statements from the above sectioned 
> link of apache docs.
> "If you are upgrading from a version prior to 2.1.x, please see the note 
> below about the change to the schema used to store consumer offsets. *Once 
> you have changed the inter.broker.protocol.version to the latest version, it 
> will not be possible to downgrade to a version prior to 2.1."*
> The above statement mentions that the downgrade would not be possible to 
> version prior to "2.1" in case of "upgrading the 
> inter.broker.protocol.version to the latest version".
> But, there is another statement made in the documentation in *point 4* as 
> below
> 

[jira] [Created] (KAFKA-15223) Need clarity in documentation for upgrade/downgrade across releases.

2023-07-20 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-15223:


 Summary: Need clarity in documentation for upgrade/downgrade 
across releases.
 Key: KAFKA-15223
 URL: https://issues.apache.org/jira/browse/KAFKA-15223
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


Referring to the upgrade documentation for apache kafka.

[https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]

There is confusion with respect to below statements from the above sectioned 
link of apache docs.

"If you are upgrading from a version prior to 2.1.x, please see the note below 
about the change to the schema used to store consumer offsets. *Once you have 
changed the inter.broker.protocol.version to the latest version, it will not be 
possible to downgrade to a version prior to 2.1."*

The above statement mentions that the downgrade would not be possible to 
version prior to "2.1" in case of "upgrading the inter.broker.protocol.version 
to the latest version".

But, there is another statement made in the documentation in *point 4* as below

"Restart the brokers one by one for the new protocol version to take effect. 
{*}Once the brokers begin using the latest protocol version, it will no longer 
be possible to downgrade the cluster to an older version.{*}"

 

These two statements are repeated across a lot of prior release of kafka and is 
confusing.

Below are the questions:
 # Is downgrade not at all possible to *"any"* older version of kafka once the 
inter.broker.protocol.version is updated to latest version *OR* downgrades are 
not possible only to versions *"<2.1"* ?
 # Suppose one takes an approach similar to upgrade even for the downgrade 
path. i.e. downgrade the inter.broker.protocol.version first to the previous 
version, next downgrade the software/code of kafka to previous release 
revision. Does downgrade work with this approach ?

Can these two questions be documented if the results are already known ?

Maybe a downgrade guide can be created too similar to the existing upgrade 
guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-9542) ZSTD Compression Not Working

2022-02-01 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485114#comment-17485114
 ] 

kaushik srinivas commented on KAFKA-9542:
-

[~lucasbradstreet] 

We see this issue even with latest kafka and also fails with 
readOnlyRootFileSystem /tmp path. Do you see any possible issues in relocating 
this tmpdir to anything apart from system /tmp path ? 

> ZSTD Compression Not Working
> 
>
> Key: KAFKA-9542
> URL: https://issues.apache.org/jira/browse/KAFKA-9542
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 2.3.0
> Environment: Linux, CentOS
>Reporter: Prashant
>Priority: Critical
>
> I enabled zstd compression at producer by adding  "compression.type=zstd" in 
> producer config. When try to run it, producer fails with 
> "org.apache.kafka.common.errors.UnknownServerException: The server 
> experienced an unexpected error when processing the request"
> In Broker Logs, I could find following exception:
>  
> [2020-02-12 11:48:04,623] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition load_logPlPts-6 (kafka.server.ReplicaManager)
> org.apache.kafka.common.KafkaException: java.lang.NoClassDefFoundError: Could 
> not initialize class 
> org.apache.kafka.common.record.CompressionType$ZstdConstructors
>        at 
> org.apache.kafka.common.record.CompressionType$5.wrapForInput(CompressionType.java:133)
>        at 
> org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:257)
>        at 
> org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:324)
>        at 
> scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:54)
>        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>        at 
> kafka.log.LogValidator$$anonfun$validateMessagesAndAssignOffsetsCompressed$1.apply(LogValidator.scala:269)
>        at 
> kafka.log.LogValidator$$anonfun$validateMessagesAndAssignOffsetsCompressed$1.apply(LogValidator.scala:261)
>        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>        at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261)
>        at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:72)
>        at kafka.log.Log$$anonfun$append$2.liftedTree1$1(Log.scala:869)
>        at kafka.log.Log$$anonfun$append$2.apply(Log.scala:868)
>        at kafka.log.Log$$anonfun$append$2.apply(Log.scala:850)
>        at kafka.log.Log.maybeHandleIOException(Log.scala:2065)
>        at kafka.log.Log.append(Log.scala:850)
>        at kafka.log.Log.appendAsLeader(Log.scala:819)
>        at kafka.cluster.Partition$$anonfun$14.apply(Partition.scala:771)
>        at kafka.cluster.Partition$$anonfun$14.apply(Partition.scala:759)
>        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) 
>  
> This is fresh broker installed on "CentOS Linux" v7. This doesn't seem to be 
> a classpath issue as same package is working on MacOS. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (KAFKA-13578) Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500

2022-01-06 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470385#comment-17470385
 ] 

kaushik srinivas edited comment on KAFKA-13578 at 1/7/22, 7:25 AM:
---

[~ijuma] [~cmccabe] 

Can you provide some insights on this as we are considering this upgrade in 
kafka.


was (Author: kaushik srinivas):
[~ijuma] 

Can you provide some insights on this as we are considering this upgrade in 
kafka.

> Need clarity on the bridge release version of kafka without zookeeper in the 
> eco system - KIP500
> 
>
> Key: KAFKA-13578
> URL: https://issues.apache.org/jira/browse/KAFKA-13578
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Critical
>
> We see from the KIP-500
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum]
>  
> Below statement needs some clarity,
> *We will be able to upgrade from any version of Kafka to this bridge release, 
> and from the bridge release to a post-ZK release.  When upgrading from an 
> earlier release to a post-ZK release, the upgrade must be done in two steps: 
> first, you must upgrade to the bridge release, and then you must upgrade to 
> the post-ZK release.*
>  
> What apache kafka version is referred to as bridge release in this context ? 
> can apache kafka 3.0.0 be considered bridge release even though it is not 
> production ready to run without zookeeper ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13578) Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500

2022-01-06 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470385#comment-17470385
 ] 

kaushik srinivas commented on KAFKA-13578:
--

[~ijuma] 

Can you provide some insights on this as we are considering this upgrade in 
kafka.

> Need clarity on the bridge release version of kafka without zookeeper in the 
> eco system - KIP500
> 
>
> Key: KAFKA-13578
> URL: https://issues.apache.org/jira/browse/KAFKA-13578
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Critical
>
> We see from the KIP-500
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum]
>  
> Below statement needs some clarity,
> *We will be able to upgrade from any version of Kafka to this bridge release, 
> and from the bridge release to a post-ZK release.  When upgrading from an 
> earlier release to a post-ZK release, the upgrade must be done in two steps: 
> first, you must upgrade to the bridge release, and then you must upgrade to 
> the post-ZK release.*
>  
> What apache kafka version is referred to as bridge release in this context ? 
> can apache kafka 3.0.0 be considered bridge release even though it is not 
> production ready to run without zookeeper ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13578) Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500

2022-01-06 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-13578:


 Summary: Need clarity on the bridge release version of kafka 
without zookeeper in the eco system - KIP500
 Key: KAFKA-13578
 URL: https://issues.apache.org/jira/browse/KAFKA-13578
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


We see from the KIP-500

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum]

 

Below statement needs some clarity,

*We will be able to upgrade from any version of Kafka to this bridge release, 
and from the bridge release to a post-ZK release.  When upgrading from an 
earlier release to a post-ZK release, the upgrade must be done in two steps: 
first, you must upgrade to the bridge release, and then you must upgrade to the 
post-ZK release.*

 

What apache kafka version is referred to as bridge release in this context ? 
can apache kafka 3.0.0 be considered bridge release even though it is not 
production ready to run without zookeeper ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers

2021-08-09 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas resolved KAFKA-13177.
--
Resolution: Not A Bug

> partition failures and fewer shrink but a lot of isr expansions with 
> increased num.replica.fetchers in kafka brokers
> 
>
> Key: KAFKA-13177
> URL: https://issues.apache.org/jira/browse/KAFKA-13177
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Assignee: kaushik srinivas
>Priority: Major
>
> Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s)
> topics : 15, partitions each : 15 replication factor 3, min.insync.replicas  
> : 2
> producers running with acks : all
> Initially the num.replica.fetchers was set to 1 (default) and we observed 
> very frequent ISR shrinks and expansions. So the setups were tuned with a 
> higher value of 4. 
> Once after this change was done, we see below behavior and warning msgs in 
> broker logs
>  # Over a period of 2 days, there are around 10 shrinks corresponding to 10 
> partitions, but around 700 ISR expansions corresponding to almost all 
> partitions in the cluster(approx 50 to 60 partitions).
>  # we see frequent warn msg of partitions being marked as failure in the same 
> time span. Below is the trace --> {"type":"log", "host":"ww", 
> "level":"WARN", "neid":"kafka-ww", "system":"kafka", 
> "time":"2021-08-03T20:09:15.340", "timezone":"UTC", 
> "log":{"message":"ReplicaFetcherThread-2-1003 - 
> kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, 
> leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}*
>  
> We see the above behavior continuously after increasing the 
> num.replica.fetchers to 4 from 1. We did increase this to improve the 
> replication performance and hence reduce the ISR shrinks.
> But we see this strange behavior after the change. What would the above trace 
> indicate and is marking partitions as failed just a WARN msgs and handled by 
> kafka or is it something to worry about ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers

2021-08-09 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395873#comment-17395873
 ] 

kaushik srinivas commented on KAFKA-13177:
--

This is a expected behavior to occur. This happens when the epoch of the leader 
changes either due to bouncing of brokers or controllers. replica fetchers 
would be removed and added when ever these epoch changes. 

These logs vanished once after the brokers are up and running. And is expected 
to happen when the brokers restart or reassignment happens.

> partition failures and fewer shrink but a lot of isr expansions with 
> increased num.replica.fetchers in kafka brokers
> 
>
> Key: KAFKA-13177
> URL: https://issues.apache.org/jira/browse/KAFKA-13177
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Assignee: kaushik srinivas
>Priority: Major
>
> Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s)
> topics : 15, partitions each : 15 replication factor 3, min.insync.replicas  
> : 2
> producers running with acks : all
> Initially the num.replica.fetchers was set to 1 (default) and we observed 
> very frequent ISR shrinks and expansions. So the setups were tuned with a 
> higher value of 4. 
> Once after this change was done, we see below behavior and warning msgs in 
> broker logs
>  # Over a period of 2 days, there are around 10 shrinks corresponding to 10 
> partitions, but around 700 ISR expansions corresponding to almost all 
> partitions in the cluster(approx 50 to 60 partitions).
>  # we see frequent warn msg of partitions being marked as failure in the same 
> time span. Below is the trace --> {"type":"log", "host":"ww", 
> "level":"WARN", "neid":"kafka-ww", "system":"kafka", 
> "time":"2021-08-03T20:09:15.340", "timezone":"UTC", 
> "log":{"message":"ReplicaFetcherThread-2-1003 - 
> kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, 
> leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}*
>  
> We see the above behavior continuously after increasing the 
> num.replica.fetchers to 4 from 1. We did increase this to improve the 
> replication performance and hence reduce the ISR shrinks.
> But we see this strange behavior after the change. What would the above trace 
> indicate and is marking partitions as failed just a WARN msgs and handled by 
> kafka or is it something to worry about ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers

2021-08-09 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas reassigned KAFKA-13177:


Assignee: kaushik srinivas

> partition failures and fewer shrink but a lot of isr expansions with 
> increased num.replica.fetchers in kafka brokers
> 
>
> Key: KAFKA-13177
> URL: https://issues.apache.org/jira/browse/KAFKA-13177
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Assignee: kaushik srinivas
>Priority: Major
>
> Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s)
> topics : 15, partitions each : 15 replication factor 3, min.insync.replicas  
> : 2
> producers running with acks : all
> Initially the num.replica.fetchers was set to 1 (default) and we observed 
> very frequent ISR shrinks and expansions. So the setups were tuned with a 
> higher value of 4. 
> Once after this change was done, we see below behavior and warning msgs in 
> broker logs
>  # Over a period of 2 days, there are around 10 shrinks corresponding to 10 
> partitions, but around 700 ISR expansions corresponding to almost all 
> partitions in the cluster(approx 50 to 60 partitions).
>  # we see frequent warn msg of partitions being marked as failure in the same 
> time span. Below is the trace --> {"type":"log", "host":"ww", 
> "level":"WARN", "neid":"kafka-ww", "system":"kafka", 
> "time":"2021-08-03T20:09:15.340", "timezone":"UTC", 
> "log":{"message":"ReplicaFetcherThread-2-1003 - 
> kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, 
> leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}*
>  
> We see the above behavior continuously after increasing the 
> num.replica.fetchers to 4 from 1. We did increase this to improve the 
> replication performance and hence reduce the ISR shrinks.
> But we see this strange behavior after the change. What would the above trace 
> indicate and is marking partitions as failed just a WARN msgs and handled by 
> kafka or is it something to worry about ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13178) frequent network_exception trace at kafka producer.

2021-08-08 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-13178:


 Summary: frequent network_exception trace at kafka producer.
 Key: KAFKA-13178
 URL: https://issues.apache.org/jira/browse/KAFKA-13178
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


Running 3 node kafka cluster (4 cores and 4 cpu in k8s).

topics : 15, partitions each : 15 , replication factor : 3. min.insync.replicas 
: 2

producer is running with acks : "all"

We see frequent failures in the kafka producer with below trace

{"host":"ww","level":"WARN","log":{"classname":"org.apache.kafka.clients.producer.internals.Sender:595","message":"[Producer
 clientId=producer-1] Got error produce response with correlation id 2646 on 
topic-partition *-0, retrying (2 attempts left). Error: 
NETWORK_EXCEPTION","stacktrace":"","threadname":"kafka-producer-network-thread 
| 
producer-1"},"time":"2021-08-*04T02:22:20.529Z","timezone":"UTC","type":"log","system":"w","systemid":"3"}

 

What could be possible reasons for the above trace ? 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers

2021-08-08 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-13177:


 Summary: partition failures and fewer shrink but a lot of isr 
expansions with increased num.replica.fetchers in kafka brokers
 Key: KAFKA-13177
 URL: https://issues.apache.org/jira/browse/KAFKA-13177
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s)

topics : 15, partitions each : 15 replication factor 3, min.insync.replicas  : 2

producers running with acks : all

Initially the num.replica.fetchers was set to 1 (default) and we observed very 
frequent ISR shrinks and expansions. So the setups were tuned with a higher 
value of 4. 

Once after this change was done, we see below behavior and warning msgs in 
broker logs
 # Over a period of 2 days, there are around 10 shrinks corresponding to 10 
partitions, but around 700 ISR expansions corresponding to almost all 
partitions in the cluster(approx 50 to 60 partitions).
 # we see frequent warn msg of partitions being marked as failure in the same 
time span. Below is the trace --> {"type":"log", "host":"ww", 
"level":"WARN", "neid":"kafka-ww", "system":"kafka", 
"time":"2021-08-03T20:09:15.340", "timezone":"UTC", 
"log":{"message":"ReplicaFetcherThread-2-1003 - 
kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, 
leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}*

 

We see the above behavior continuously after increasing the 
num.replica.fetchers to 4 from 1. We did increase this to improve the 
replication performance and hence reduce the ISR shrinks.

But we see this strange behavior after the change. What would the above trace 
indicate and is marking partitions as failed just a WARN msgs and handled by 
kafka or is it something to worry about ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13176) frequent ISR shrinks and expansion with default num.replica.fetchers (1) under very low throughput conditions.

2021-08-08 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-13176:


 Summary: frequent ISR shrinks and expansion with default 
num.replica.fetchers (1) under very low throughput conditions.
 Key: KAFKA-13176
 URL: https://issues.apache.org/jira/browse/KAFKA-13176
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


Running a 3 node kafka cluster (2.3.x kafka) with 4 cores of cpu and 4Gi of 
memory on  a k8s environment.

num.replica.fetchers is configured to 1 (default value).

There are around 15 topics in the cluster and all of them receive a very low 
rate of records/sec (less than 100 per second most of the cases).

All the topics have more than 10 partitions and 3 replication each. 
min.insync.replicas is set to 2. And producers are run with acks level set to 
'all'.

we constantly observer ISR shrinks and expansions for almost each topic 
partition continuously. shrinks and expansions are mostly seperated by around 6 
to 8 seconds mostly usually.

During these shrinks and expands we see a lot of request time outs at the kafka 
producer side for these topics.

any known configuration items we can use to overcome this ? 

Confused about the fact of continuous ISR shrinks and expands with very low 
throughput topics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12858) dynamically update the ssl certificates of kafka connect worker without restarting connect process.

2021-05-28 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352986#comment-17352986
 ] 

kaushik srinivas commented on KAFKA-12858:
--

[~ijuma]

Can you provide us your inputs on this requirement.

> dynamically update the ssl certificates of kafka connect worker without 
> restarting connect process.
> ---
>
> Key: KAFKA-12858
> URL: https://issues.apache.org/jira/browse/KAFKA-12858
> Project: Kafka
>  Issue Type: Improvement
>Reporter: kaushik srinivas
>Priority: Major
>
> Hi,
>  
> We are trying to update the ssl certificates of kafka connect worker which is 
> due for expiry. Is there any way to dynamically update the ssl certificate of 
> connet worker as it is possible in kafka using kafka-configs.sh script ?
> If not, what is the recommended way to update the ssl certificates of kafka 
> connect worker without disrupting the existing traffic ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12858) dynamically update the ssl certificates of kafka connect worker without restarting connect process.

2021-05-28 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12858:


 Summary: dynamically update the ssl certificates of kafka connect 
worker without restarting connect process.
 Key: KAFKA-12858
 URL: https://issues.apache.org/jira/browse/KAFKA-12858
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


Hi,

 

We are trying to update the ssl certificates of kafka connect worker which is 
due for expiry. Is there any way to dynamically update the ssl certificate of 
connet worker as it is possible in kafka using kafka-configs.sh script ?

If not, what is the recommended way to update the ssl certificates of kafka 
connect worker without disrupting the existing traffic ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12855) Update ssl certificates of kafka connect worker runtime without restarting the worker process.

2021-05-27 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12855:


 Summary: Update ssl certificates of kafka connect worker runtime 
without restarting the worker process.
 Key: KAFKA-12855
 URL: https://issues.apache.org/jira/browse/KAFKA-12855
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: kaushik srinivas


Is there a possibility to update the ssl certificates of kafka connect worker 
dynamically something similar to kafka-configs script for kafka ? Or the only 
way to update the certificates is to restart the worker processes and update 
the certificates ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-05-24 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350264#comment-17350264
 ] 

kaushik srinivas commented on KAFKA-12534:
--

Hi [~cricket007]

To update more on this, the test works fine when the certificate 
duration/expiry is extended and the same sequence of steps were performed to 
regenerate the keystore of the kafka broker and updated using kafka-configs.sh 
script.

The script fails only when the keystore password is being changed.

-Kaushik.

> kafka-configs does not work with ssl enabled kafka broker.
> --
>
> Key: KAFKA-12534
> URL: https://issues.apache.org/jira/browse/KAFKA-12534
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> We are trying to change the trust store password on the fly using the 
> kafka-configs script for a ssl enabled kafka broker.
> Below is the command used:
> kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx'
> But we see below error in the broker logs when the command is run.
> {"type":"log", "host":"kf-2-0", "level":"INFO", 
> "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", 
> "time":"2021-03-23T12:14:40.055", "timezone":"UTC", 
> "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2
>  - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] 
> Failed authentication with /127.0.0.1 (SSL handshake failed)"}}
>  How can anyone configure ssl certs for the kafka-configs script and succeed 
> with the ssl handshake in this case ? 
> Note : 
> We are trying with a single listener i.e SSL: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-05-19 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347483#comment-17347483
 ] 

kaushik srinivas commented on KAFKA-12534:
--

Hi [~cricket007]

Did you get a chance to reproduce this issue ?

Need your support.

Thanks

Kaushik.

> kafka-configs does not work with ssl enabled kafka broker.
> --
>
> Key: KAFKA-12534
> URL: https://issues.apache.org/jira/browse/KAFKA-12534
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> We are trying to change the trust store password on the fly using the 
> kafka-configs script for a ssl enabled kafka broker.
> Below is the command used:
> kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx'
> But we see below error in the broker logs when the command is run.
> {"type":"log", "host":"kf-2-0", "level":"INFO", 
> "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", 
> "time":"2021-03-23T12:14:40.055", "timezone":"UTC", 
> "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2
>  - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] 
> Failed authentication with /127.0.0.1 (SSL handshake failed)"}}
>  How can anyone configure ssl certs for the kafka-configs script and succeed 
> with the ssl handshake in this case ? 
> Note : 
> We are trying with a single listener i.e SSL: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-05-16 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345886#comment-17345886
 ] 

kaushik srinivas edited comment on KAFKA-12534 at 5/17/21, 5:53 AM:


Hi [~cricket007] ,

We have tried the exact steps. Captured the commands and logs in detail. The 
scenario to change the keystore password does not work still. 

sequence of steps to reproduce
 # install kafka broker by generating a CA, truststore and keystore. (password 
for stores: 123456)
 # re generate the keystore with a new password (1234567). Use the same old 
generated CA and trust store from step1.
 # issue the dynamic reconfig command after replacing the keystore file in the 
specified location.
 # dynamic config command issued: 
{code:java}
./kafka-configs --bootstrap-server kafkabroker:9092 --command-config 
ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config 
'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore'
{code}
Note: listener name is ssl and is in the format specified in 
[https://docs.confluent.io/platform/current/kafka/dynamic-config.html#updating-ssl-keystore-of-an-existing-listener]

 # command fails with following trace 
{code:java}
Error while executing config command with args '--bootstrap-server 
kafkabroker:9092 --command-config ssl.properties --entity-type brokers 
--entity-name 1001 --alter --add-config 
listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore'
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidRequestException: Invalid config value 
for resource ConfigResource(type=BROKER, name='1001'): Invalid value 
org.apache.kafka.common.config.ConfigException: Validation of dynamic config 
update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to 
load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338)
at 
kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308)
at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85)
at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid 
config value for resource ConfigResource(type=BROKER, name='1001'): Invalid 
value org.apache.kafka.common.config.ConfigException: Validation of dynamic 
config update of SSLFactory failed: org.apache.kafka.common.KafkaException: 
Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration

{code}
Kafka broker logs the below output 

 
{code:java}
{ "timezone":"UTC", "log":{"message":"data-plane-kafka-request-handler-5 - 
kafka.server.AdminManager - [Admin Manager on Broker 1001]: Invalid config 
value for resource ConfigResource(type=BROKER, name='1001'): Invalid value 
org.apache.kafka.common.config.ConfigException: Validation of dynamic config 
update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to 
load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration"}} {code}
 

As per docs, the CA is not supposed to be changed and we have maintained that 
and the CA and trust stores are not changed. Also another observation is that, 
when for example the country name in the cert generation is changed and the 
certificate is regenerated, the dynamic config command works fine and we could 
see the ssl certs being reloaded in the kafka broker logs.

But when the keystore password is changed, things have never worked for us even 
after so many attempts of retries. Can you please help in reproducing this 
issue and provide some detailed steps if possible for the case where the 
keystore's password is being changed ? It has clearly never worked for us, even 
after many attempts.


was (Author: kaushik srinivas):
Hi,

We have tried the exact steps. Captured the commands and logs in detail. The 
scenario to change the keystore password does not work still. 

sequence of steps to reproduce
 # install kafka broker by generating a CA, truststore and keystore. (password 
for stores: 123456)
 # re generate the keystore with a new password (1234567). Use the same old 
generated CA and trust store from step1.
 # issue the dynamic reconfig command 

[jira] [Comment Edited] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-05-16 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345886#comment-17345886
 ] 

kaushik srinivas edited comment on KAFKA-12534 at 5/17/21, 5:52 AM:


Hi,

We have tried the exact steps. Captured the commands and logs in detail. The 
scenario to change the keystore password does not work still. 

sequence of steps to reproduce
 # install kafka broker by generating a CA, truststore and keystore. (password 
for stores: 123456)
 # re generate the keystore with a new password (1234567). Use the same old 
generated CA and trust store from step1.
 # issue the dynamic reconfig command after replacing the keystore file in the 
specified location.
 # dynamic config command issued: 
{code:java}
./kafka-configs --bootstrap-server kafkabroker:9092 --command-config 
ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config 
'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore'
{code}
Note: listener name is ssl and is in the format specified in 
[https://docs.confluent.io/platform/current/kafka/dynamic-config.html#updating-ssl-keystore-of-an-existing-listener]

 # command fails with following trace 
{code:java}
Error while executing config command with args '--bootstrap-server 
kafkabroker:9092 --command-config ssl.properties --entity-type brokers 
--entity-name 1001 --alter --add-config 
listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore'
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidRequestException: Invalid config value 
for resource ConfigResource(type=BROKER, name='1001'): Invalid value 
org.apache.kafka.common.config.ConfigException: Validation of dynamic config 
update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to 
load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338)
at 
kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308)
at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85)
at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid 
config value for resource ConfigResource(type=BROKER, name='1001'): Invalid 
value org.apache.kafka.common.config.ConfigException: Validation of dynamic 
config update of SSLFactory failed: org.apache.kafka.common.KafkaException: 
Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration

{code}
Kafka broker logs the below output 

 
{code:java}
{ "timezone":"UTC", "log":{"message":"data-plane-kafka-request-handler-5 - 
kafka.server.AdminManager - [Admin Manager on Broker 1001]: Invalid config 
value for resource ConfigResource(type=BROKER, name='1001'): Invalid value 
org.apache.kafka.common.config.ConfigException: Validation of dynamic config 
update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to 
load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration"}} {code}

 

As per docs, the CA is not supposed to be changed and we have maintained that 
and the CA and trust stores are not changed. Also another observation is that, 
when for example the country name in the cert generation is changed and the 
certificate is regenerated, the dynamic config command works fine and we could 
see the ssl certs being reloaded in the kafka broker logs.

But when the keystore password is changed, things have never worked for us even 
after so many attempts of retries. Can you please help in reproducing this 
issue and provide some detailed steps if possible for the case where the 
keystore's password is being changed ? It has clearly never worked for us, even 
after many attempts.


was (Author: kaushik srinivas):
Hi,

We have tried the exact steps. Captured the commands and logs in detail. The 
scenario to change the keystore password does not work still. 

sequence of steps to reproduce
 # install kafka broker by generating a CA, truststore and keystore. (password 
for stores: 123456)
 # re generate the keystore with a new password (1234567). Use the same old 
generated CA and trust store from step1.
 # issue the dynamic reconfig command after replacing 

[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-05-16 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345886#comment-17345886
 ] 

kaushik srinivas commented on KAFKA-12534:
--

Hi,

We have tried the exact steps. Captured the commands and logs in detail. The 
scenario to change the keystore password does not work still. 

sequence of steps to reproduce
 # install kafka broker by generating a CA, truststore and keystore. (password 
for stores: 123456)
 # re generate the keystore with a new password (1234567). Use the same old 
generated CA and trust store from step1.
 # issue the dynamic reconfig command after replacing the keystore file in the 
specified location.
 # dynamic config command issued: 
{code:java}
./kafka-configs --bootstrap-server kafkabroker:9092 --command-config 
ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config 
'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore'
{code}
Note: listener name is ssl and is in the format specified in 
[https://docs.confluent.io/platform/current/kafka/dynamic-config.html#updating-ssl-keystore-of-an-existing-listener]
 # command fails with following trace 
{code:java}
Error while executing config command with args '--bootstrap-server 
kafkabroker:9092 --command-config ssl.properties --entity-type brokers 
--entity-name 1001 --alter --add-config 
listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore'
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidRequestException: Invalid config value 
for resource ConfigResource(type=BROKER, name='1001'): Invalid value 
org.apache.kafka.common.config.ConfigException: Validation of dynamic config 
update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to 
load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338)
at 
kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308)
at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85)
at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid 
config value for resource ConfigResource(type=BROKER, name='1001'): Invalid 
value org.apache.kafka.common.config.ConfigException: Validation of dynamic 
config update of SSLFactory failed: org.apache.kafka.common.KafkaException: 
Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration

{code}
Kafka broker logs the below output 
{code:java}
{ "timezone":"UTC", "log":{"message":"data-plane-kafka-request-handler-5 - 
kafka.server.AdminManager - [Admin Manager on Broker 1001]: Invalid config 
value for resource ConfigResource(type=BROKER, name='1001'): Invalid value 
org.apache.kafka.common.config.ConfigException: Validation of dynamic config 
update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to 
load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration"}}
{code}
{code:java}
 {code}

> kafka-configs does not work with ssl enabled kafka broker.
> --
>
> Key: KAFKA-12534
> URL: https://issues.apache.org/jira/browse/KAFKA-12534
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> We are trying to change the trust store password on the fly using the 
> kafka-configs script for a ssl enabled kafka broker.
> Below is the command used:
> kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx'
> But we see below error in the broker logs when the command is run.
> {"type":"log", "host":"kf-2-0", "level":"INFO", 
> "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", 
> "time":"2021-03-23T12:14:40.055", "timezone":"UTC", 
> "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2
>  - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] 
> Failed authentication with /127.0.0.1 (SSL handshake failed)"}}
>  How can anyone configure ssl certs for the kafka-configs script and 

[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-05-12 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343135#comment-17343135
 ] 

kaushik srinivas commented on KAFKA-12534:
--

Hi [~cricket007]

We have tried this out. File permissions are ok and the double slash also is 
corrected. Still we face the same.

what do you suggest.

> kafka-configs does not work with ssl enabled kafka broker.
> --
>
> Key: KAFKA-12534
> URL: https://issues.apache.org/jira/browse/KAFKA-12534
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> We are trying to change the trust store password on the fly using the 
> kafka-configs script for a ssl enabled kafka broker.
> Below is the command used:
> kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx'
> But we see below error in the broker logs when the command is run.
> {"type":"log", "host":"kf-2-0", "level":"INFO", 
> "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", 
> "time":"2021-03-23T12:14:40.055", "timezone":"UTC", 
> "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2
>  - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] 
> Failed authentication with /127.0.0.1 (SSL handshake failed)"}}
>  How can anyone configure ssl certs for the kafka-configs script and succeed 
> with the ssl handshake in this case ? 
> Note : 
> We are trying with a single listener i.e SSL: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-04-14 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320966#comment-17320966
 ] 

kaushik srinivas edited comment on KAFKA-12534 at 4/14/21, 12:42 PM:
-

Hi [~cricket007] ,

We tried to change the keystore password and key pass for one of the kafka 
broker. 

below is the command used,

./kafka-configs --bootstrap-server xx:9092 --command-config ssl.xt 
--entity-type brokers --entity-name 1007 --alter --add-config 
'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567'

 

contents of command config file ssl.xt

[root@vm-10-75-112-163 bin]# cat ssl.xt
 ssl.key.password=123456
 ssl.keystore.location=/securityFiles/ssl/kafka.client.keystore.jks
 ssl.keystore.password=123456
 ssl.truststore.location=/securityFiles/ssl/kafka.client.truststore.jks
 ssl.truststore.password=123456
 security.protocol=SSL

 

note: We have keystores created one for kafka broker and one for admin client. 
the password for admin client keystore file is 123456. And this is what is 
configured in the command config file.

But we see below output when we run this command
{code:java}
hreads appropriately using -XX:ParallelGCThreads=N
[2021-04-14 12:24:39,705] WARN The configuration 'ssl.truststore.location' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
[2021-04-14 12:24:39,708] WARN The configuration 'ssl.keystore.password' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
[2021-04-14 12:24:39,710] WARN The configuration 'ssl.key.password' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
[2021-04-14 12:24:39,710] WARN The configuration 'ssl.keystore.location' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
[2021-04-14 12:24:39,710] WARN The configuration 'ssl.truststore.password' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
Error while executing config command with args '--bootstrap-server x:9092 
--command-config ssl.xt --entity-type brokers --entity-name 1007 --alter 
--add-config 
listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567'
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidRequestException: Invalid config value 
for resource ConfigResource(type=BROKER, name='1007'): Invalid value 
org.apache.kafka.common.config.ConfigException: Validation of dynamic config 
update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to 
load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338)
at 
kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308)
at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85)
at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid 
config value for resource ConfigResource(type=BROKER, name='1007'): Invalid 
value org.apache.kafka.common.config.ConfigException: Validation of dynamic 
config update of SSLFactory failed: org.apache.kafka.common.KafkaException: 
Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration
{code}
It says WARN The configuration 'ssl.truststore.location' was supplied but isn't 
a known config. (org.apache.kafka.clients.admin.AdminClientConfig)

Also the new keystore is encrypted with the new password and still we observe 
that the validation has failed.

Note : the server.properties file is not updated with the latest password. It 
is still referring to the old keystore and key passwords.

Can you help us in this.

-kaushik


was (Author: kaushik srinivas):
Hi [~cricket007] ,

We tried to change the keystore password and key pass for one of the kafka 
broker. 

below is the command used,

./kafka-configs --bootstrap-server xx:9092 --command-config ssl.xt 
--entity-type brokers --entity-name 1007 --alter --add-config 
'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567'

 

contents of command config file ssl.xt

[root@vm-10-75-112-163 bin]# cat ssl.xt
ssl.key.password=123456
ssl.keystore.location=/securityFiles/ssl/kafka.client.keystore.jks

[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-04-14 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320966#comment-17320966
 ] 

kaushik srinivas commented on KAFKA-12534:
--

Hi [~cricket007] ,

We tried to change the keystore password and key pass for one of the kafka 
broker. 

below is the command used,

./kafka-configs --bootstrap-server xx:9092 --command-config ssl.xt 
--entity-type brokers --entity-name 1007 --alter --add-config 
'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567'

 

contents of command config file ssl.xt

[root@vm-10-75-112-163 bin]# cat ssl.xt
ssl.key.password=123456
ssl.keystore.location=/securityFiles/ssl/kafka.client.keystore.jks
ssl.keystore.password=123456
ssl.truststore.location=/securityFiles/ssl/kafka.client.truststore.jks
ssl.truststore.password=123456
security.protocol=SSL

 

note: We have keystores created one for kafka broker and one for admin client. 
the password for admin client keystore file is 123456. And this is what is 
configured in the command config file.

But we see below output when we run this command
{code:java}
hreads appropriately using -XX:ParallelGCThreads=N
[2021-04-14 12:24:39,705] WARN The configuration 'ssl.truststore.location' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
[2021-04-14 12:24:39,708] WARN The configuration 'ssl.keystore.password' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
[2021-04-14 12:24:39,710] WARN The configuration 'ssl.key.password' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
[2021-04-14 12:24:39,710] WARN The configuration 'ssl.keystore.location' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
[2021-04-14 12:24:39,710] WARN The configuration 'ssl.truststore.password' was 
supplied but isn't a known config. 
(org.apache.kafka.clients.admin.AdminClientConfig)
Error while executing config command with args '--bootstrap-server x:9092 
--command-config ssl.xt --entity-type brokers --entity-name 1007 --alter 
--add-config 
listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567'
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidRequestException: Invalid config value 
for resource ConfigResource(type=BROKER, name='1007'): Invalid value 
org.apache.kafka.common.config.ConfigException: Validation of dynamic config 
update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to 
load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338)
at 
kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308)
at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85)
at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid 
config value for resource ConfigResource(type=BROKER, name='1007'): Invalid 
value org.apache.kafka.common.config.ConfigException: Validation of dynamic 
config update of SSLFactory failed: org.apache.kafka.common.KafkaException: 
Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for 
configuration Invalid dynamic configuration
{code}
It says WARN The configuration 'ssl.truststore.location' was supplied but isn't 
a known config. (org.apache.kafka.clients.admin.AdminClientConfig)

Also the new keystore is encrypted with the new password and still we observe 
that the validation has failed.

Can you help us in this.

-kaushik

> kafka-configs does not work with ssl enabled kafka broker.
> --
>
> Key: KAFKA-12534
> URL: https://issues.apache.org/jira/browse/KAFKA-12534
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> We are trying to change the trust store password on the fly using the 
> kafka-configs script for a ssl enabled kafka broker.
> Below is the command used:
> kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx'
> But we see below error in the broker logs when the command is run.
> {"type":"log", 

[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-03-28 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310435#comment-17310435
 ] 

kaushik srinivas commented on KAFKA-12534:
--

Hi [~ijuma]

Need your help here.

> kafka-configs does not work with ssl enabled kafka broker.
> --
>
> Key: KAFKA-12534
> URL: https://issues.apache.org/jira/browse/KAFKA-12534
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> We are trying to change the trust store password on the fly using the 
> kafka-configs script for a ssl enabled kafka broker.
> Below is the command used:
> kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx'
> But we see below error in the broker logs when the command is run.
> {"type":"log", "host":"kf-2-0", "level":"INFO", 
> "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", 
> "time":"2021-03-23T12:14:40.055", "timezone":"UTC", 
> "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2
>  - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] 
> Failed authentication with /127.0.0.1 (SSL handshake failed)"}}
>  How can anyone configure ssl certs for the kafka-configs script and succeed 
> with the ssl handshake in this case ? 
> Note : 
> We are trying with a single listener i.e SSL: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-28 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310434#comment-17310434
 ] 

kaushik srinivas commented on KAFKA-12530:
--

Any inputs for this please ?

> kafka-configs.sh does not work while changing the sasl jaas configurations.
> ---
>
> Key: KAFKA-12530
> URL: https://issues.apache.org/jira/browse/KAFKA-12530
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Major
>
> We are trying to use kafka-configs script to modify the sasl jaas 
> configurations, but unable to do so.
> Command used:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
> org.apache.kafka.common.security.plain.PlainLoginModule required \n 
> username=\"test\" \n password=\"test\"; \n };'
> error:
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val".
> command 2:
> kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config 
> 'sasl.jaas.config=[username=test,password=test]'
> output:
> command does not return , but kafka broker logs below error:
> DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
> "time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
> "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
>  - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - 
> Set SASL server state to FAILED during authentication"}}
>  {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
> "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
> "time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
> "log":{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
>  - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
> Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
> METADATA during SASL handshake.)"}}
> We have below issues:
>  1. If one installs kafka broker with SASL mechanism and wants to change the 
> SASL jaas config via kafka-configs scripts, how is it supposed to be done ? 
> Is one supposed to provide kafka-configs script credentials to get 
> authenticated with kafka broker ?
>  does kafka-configs needs client credentials to do the same ? 
>  2. Can anyone point us to example commands of kafka-configs to alter the 
> sasl.jaas.config property of kafka broker. We do not see any documentation or 
> examples for the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-03-23 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12534:


 Summary: kafka-configs does not work with ssl enabled kafka broker.
 Key: KAFKA-12534
 URL: https://issues.apache.org/jira/browse/KAFKA-12534
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.6.1
Reporter: kaushik srinivas


We are trying to change the trust store password on the fly using the 
kafka-configs script for a ssl enabled kafka broker.

Below is the command used:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx'

But we see below error in the broker logs when the command is run.

{"type":"log", "host":"kf-2-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", 
"time":"2021-03-23T12:14:40.055", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 
- org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] 
Failed authentication with /127.0.0.1 (SSL handshake failed)"}}

 How can anyone configure ssl certs for the kafka-configs script and succeed 
with the ssl handshake in this case ? 

Note : 

We are trying with a single listener i.e SSL: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-23 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-12530:
-
Description: 
We are trying to use kafka-configs script to modify the sasl jaas 
configurations, but unable to do so.

Command used:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
org.apache.kafka.common.security.plain.PlainLoginModule required \n 
username=\"test\" \n password=\"test\"; \n };'

error:

requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val".

command 2:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 
'sasl.jaas.config=[username=test,password=test]'

output:

command does not return , but kafka broker logs below error:

DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set 
SASL server state to FAILED during authentication"}}
 {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
METADATA during SASL handshake.)"}}

We have below issues:
 1. If one installs kafka broker with SASL mechanism and wants to change the 
SASL jaas config via kafka-configs scripts, how is it supposed to be done ? Is 
one supposed to provide kafka-configs script credentials to get authenticated 
with kafka broker ?
 does kafka-configs needs client credentials to do the same ? 
 2. Can anyone point us to example commands of kafka-configs to alter the 
sasl.jaas.config property of kafka broker. We do not see any documentation or 
examples for the same.

  was:
We are trying to use kafka-configs script to modify the sasl jaas 
configurations, but unable to do so.

Command used:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
org.apache.kafka.common.security.plain.PlainLoginModule required \n 
username=\"test\" \n password=\"test\"; \n };'

error:

requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val".

command 2:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 
'sasl.jaas.config=[username=test,password=test]'

output:

command does not return , but kafka broker logs below error:

DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set 
SASL server state to FAILED during authentication"}}
{"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
METADATA during SASL handshake.)"}}

We have below issues:
1. If one installs kafka broker with SASL mechanism and wants to change the 
SASL jaas config via kafka-configs scripts, how is it supposed to be done ?
 does kafka-configs needs client credentials to do the same ? 
2. Can anyone point us to example commands of kafka-configs to alter the 
sasl.jaas.config property of kafka broker. We do not see any documentation or 
examples for the same.


> kafka-configs.sh does not work while changing the sasl jaas configurations.
> ---
>
> Key: KAFKA-12530
> URL: https://issues.apache.org/jira/browse/KAFKA-12530
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Major
>
> We are trying to use kafka-configs script to modify the sasl jaas 
> configurations, but unable to do so.
> Command used:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
> 

[jira] [Commented] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-23 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306892#comment-17306892
 ] 

kaushik srinivas commented on KAFKA-12530:
--

Hi [~enether]

Can you provide us some inputs on this.

> kafka-configs.sh does not work while changing the sasl jaas configurations.
> ---
>
> Key: KAFKA-12530
> URL: https://issues.apache.org/jira/browse/KAFKA-12530
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Major
>
> We are trying to use kafka-configs script to modify the sasl jaas 
> configurations, but unable to do so.
> Command used:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
> org.apache.kafka.common.security.plain.PlainLoginModule required \n 
> username=\"test\" \n password=\"test\"; \n };'
> error:
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val".
> command 2:
> kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config 
> 'sasl.jaas.config=[username=test,password=test]'
> output:
> command does not return , but kafka broker logs below error:
> DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
> "time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
> "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
>  - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - 
> Set SASL server state to FAILED during authentication"}}
> {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
> "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
> "time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
> "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
>  - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
> Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
> METADATA during SASL handshake.)"}}
> We have below issues:
> 1. If one installs kafka broker with SASL mechanism and wants to change the 
> SASL jaas config via kafka-configs scripts, how is it supposed to be done ?
>  does kafka-configs needs client credentials to do the same ? 
> 2. Can anyone point us to example commands of kafka-configs to alter the 
> sasl.jaas.config property of kafka broker. We do not see any documentation or 
> examples for the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-23 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12530:


 Summary: kafka-configs.sh does not work while changing the sasl 
jaas configurations.
 Key: KAFKA-12530
 URL: https://issues.apache.org/jira/browse/KAFKA-12530
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


We are trying to use kafka-configs script to modify the sasl jaas 
configurations, but unable to do so.

Command used:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
org.apache.kafka.common.security.plain.PlainLoginModule required \n 
username=\"test\" \n password=\"test\"; \n };'

error:

requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val".

command 2:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 
'sasl.jaas.config=[username=test,password=test]'

output:

command does not return , but kafka broker logs below error:

DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set 
SASL server state to FAILED during authentication"}}
{"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
METADATA during SASL handshake.)"}}

We have below issues:
1. If one installs kafka broker with SASL mechanism and wants to change the 
SASL jaas config via kafka-configs scripts, how is it supposed to be done ?
 does kafka-configs needs client credentials to do the same ? 
2. Can anyone point us to example commands of kafka-configs to alter the 
sasl.jaas.config property of kafka broker. We do not see any documentation or 
examples for the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12529) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-23 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12529:


 Summary: kafka-configs.sh does not work while changing the sasl 
jaas configurations.
 Key: KAFKA-12529
 URL: https://issues.apache.org/jira/browse/KAFKA-12529
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


We are trying to use kafka-configs script to modify the sasl jaas 
configurations, but unable to do so.

Command used:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
org.apache.kafka.common.security.plain.PlainLoginModule required \n 
username=\"test\" \n password=\"test\"; \n };'

error:

requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val".

command 2:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 
'sasl.jaas.config=[username=test,password=test]'

output:

command does not return , but kafka broker logs below error:

DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set 
SASL server state to FAILED during authentication"}}
{"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
METADATA during SASL handshake.)"}}

We have below issues:
1. If one installs kafka broker with SASL mechanism and wants to change the 
SASL jaas config via kafka-configs scripts, how is it supposed to be done ?
 does kafka-configs needs client credentials to do the same ? 
2. Can anyone point us to example commands of kafka-configs to alter the 
sasl.jaas.config property of kafka broker. We do not see any documentation or 
examples for the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12528) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-23 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12528:


 Summary: kafka-configs.sh does not work while changing the sasl 
jaas configurations.
 Key: KAFKA-12528
 URL: https://issues.apache.org/jira/browse/KAFKA-12528
 Project: Kafka
  Issue Type: Bug
  Components: admin, core
Reporter: kaushik srinivas


We are trying to modify the sasl jaas configurations for the kafka broker 
runtime using the dynamic config update functionality using the 
kafka-configs.sh script. But we are unable to get it working.

Below is our command:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
org.apache.kafka.common.security.plain.PlainLoginModule required \n 
username=\"test\" \n password=\"test\"; \n };'

 

command is exiting with error:

requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val".

 

we also tried below format as well:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 
'sasl.jaas.config=[username=test,password=test]'

command does not return but the kafka broker logs prints the below error 
messages.

org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set 
SASL server state to FAILED during authentication"}}
{"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
METADATA during SASL handshake.)"}}

 

1. If one has SASL enabled and with a single listener, how are we supposed to 
change the sasl credentials using this command ?

2. can anyone point us out to some example commands for modifying the sasl jaas 
configurations ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2021-03-23 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306800#comment-17306800
 ] 

kaushik srinivas edited comment on KAFKA-8010 at 3/23/21, 6:52 AM:
---

Hi [~enether]

Can you please share an example command with square brackets for sasl jaas 
config usage ?

Also can you point out to documentation updates. we still have this issue.

 

Also if the kafka broker is already sasl enabled with let us assume PLAIN 
mechanism ,how can we update the sasl jaas config without passing any 
credentials with kafka-configs script as client to kafka broker ?

 


was (Author: kaushik srinivas):
Hi [~enether]

Can you please share an example command with square brackets for sasl jaas 
config usage ?

Also can you point out to documentation updates. we still have this issue.

 

> kafka-configs.sh does not allow setting config with an equal in the value
> -
>
> Key: KAFKA-8010
> URL: https://issues.apache.org/jira/browse/KAFKA-8010
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Mickael Maison
>Priority: Major
> Attachments: image-2019-03-05-19-45-47-168.png
>
>
> The sasl.jaas.config typically includes equals in its value. Unfortunately 
> the kafka-configs tool does not parse such values correctly and hits an error:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
> org.apache.kafka.common.security.plain.PlainLoginModule required\n  
> username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
> org.apache.zookeeper.server.auth.DigestLoginModule required\n  
> username=\"myuser2\"\n  password=\"mypassword2\;\n};"
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2021-03-23 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306800#comment-17306800
 ] 

kaushik srinivas commented on KAFKA-8010:
-

Hi [~enether]

Can you please share an example command with square brackets for sasl jaas 
config usage ?

Also can you point out to documentation updates. we still have this issue.

 

> kafka-configs.sh does not allow setting config with an equal in the value
> -
>
> Key: KAFKA-8010
> URL: https://issues.apache.org/jira/browse/KAFKA-8010
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Mickael Maison
>Priority: Major
> Attachments: image-2019-03-05-19-45-47-168.png
>
>
> The sasl.jaas.config typically includes equals in its value. Unfortunately 
> the kafka-configs tool does not parse such values correctly and hits an error:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
> org.apache.kafka.common.security.plain.PlainLoginModule required\n  
> username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
> org.apache.zookeeper.server.auth.DigestLoginModule required\n  
> username=\"myuser2\"\n  password=\"mypassword2\;\n};"
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-03-02 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293740#comment-17293740
 ] 

kaushik srinivas commented on KAFKA-12164:
--

 Hi [~kkonstantine]

Can you provide some insights into this issue ?

Thanks,

Kaushik.

> ssue when kafka connect worker pod restart, during creation of nested 
> partition directories in hdfs file system.
> 
>
> Key: KAFKA-12164
> URL: https://issues.apache.org/jira/browse/KAFKA-12164
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: kaushik srinivas
>Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the 
> same.
>  # hdfs connector is added to the connect worker.
>  # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested 
> directories derived from multiple fields in the record using a custom 
> partitioner.
>  # Now kafka connect hdfs connector uses below function calls to create the 
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref: 
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation 
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation 
> of test2 in hdfs happens if there is a restart to the connect worker pod. 
> Then the hdfs file system will remain with partial folders created for 
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive 
> integration is enabled, hive metastore exceptions will occur since there is a 
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a 
> restart is triggered to kafka connect worker pod. This will result in some 
> partially completed states in the system and may cause issues for the 
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling 
> the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-01-20 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269087#comment-17269087
 ] 

kaushik srinivas commented on KAFKA-12164:
--

Adding few major concerns with regard to the feedback of re creating the 
corrupt directories upon restart,

function syncWithHive() (DataWriter.java) is called at every restart/boot up of 
the connector. And this is the function which does an initial audit of all the 
partition directories and tries to sync the hdfs folders with the hive 
partitions before proceeding further to consume records from kafka.

Below is the snippet for the same.

 
{code:java}
List partitions = hiveMetaStore.listPartitions(hiveDatabase, 
topicTableMap.get(topic), (short) -1);
FileStatus[] statuses = FileUtils.getDirectories(storage, new Path(topicDir));
for (FileStatus status : statuses) {
  String location = status.getPath().toString();
  if (!partitions.contains(location)) {
String partitionValue = getPartitionValue(location);
hiveMetaStore.addPartition(hiveDatabase, topicTableMap.get(topic), 
partitionValue);
  }
{code}
Now going one step inside into function getDirectories > getDirectoriesImpl 
(from FileUtils).
here, those paths are returned as partition path if
a. the path is a directory
b. path does not contain nested directories (by way of checking no of non 
directory files is equal to no of (directory + non directory) files in the path.

If above conditions are met, then the path is added as partition path.

So in the erroneous case where the actual path is supposed to look like
/test1=0/test2=0/xxx.parquet
But instead due to a crash looks like below,
/test1=0/

In this case /test1=0 , satisfies the above a conditions and hence is 
returned as a new partition path to be updated to hive.
Doing this update to hive fails because the actual partition for hive is 
expected to be /test1=0/test2=0 and not /test1=0/

 

So this would mean, once there is a corrupt partition directory in hdfs, at 
every restart of the connector syncWithHive() call will keep throwing hive 
exceptions till the directory is corrected in the hdfs. This means that the 
stage of consuming the old (failed to commit) records again (even assuming its 
present in kafka after restart) would never be reached and connector remains in 
crashed state forever and requires a manual intervention of clean up activity.

-kaushik

 

> ssue when kafka connect worker pod restart, during creation of nested 
> partition directories in hdfs file system.
> 
>
> Key: KAFKA-12164
> URL: https://issues.apache.org/jira/browse/KAFKA-12164
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: kaushik srinivas
>Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the 
> same.
>  # hdfs connector is added to the connect worker.
>  # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested 
> directories derived from multiple fields in the record using a custom 
> partitioner.
>  # Now kafka connect hdfs connector uses below function calls to create the 
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref: 
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation 
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation 
> of test2 in hdfs happens if there is a restart to the connect worker pod. 
> Then the hdfs file system will remain with partial folders created for 
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive 
> integration is enabled, hive metastore exceptions will occur since there is a 
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a 
> restart is triggered to kafka connect worker pod. This will result in some 
> partially completed states in the system and may cause issues for the 
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling 
> the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-01-20 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269064#comment-17269064
 ] 

kaushik srinivas edited comment on KAFKA-12164 at 1/21/21, 6:31 AM:


For the below comments

"Now, by reading quickly the javadoc for FileSystem#mkdirs in HDFS I understand 
that a nested directory can be constructed from a given path, even if some of 
the parents exist. Which makes me think that a restart would allow the creation 
of the full path as long as the partitioner is deterministic with respect to 
the record (i.e. for a given record it always creates the same path). Upon a 
restart the export of a record will be retried, so the partitioner could 
possibly reattempt the full creation of the path. But that's an early 
assumption from my side. "

 

Lets suppose partitioner derives the directory /test1/test2 for a given record. 
And the connector crashes during the directory creation and hdfs ends up with 
only /test1/ folder.

But there is no guarantee that the kafka record which was not committed fully 
by the connector would exist in kafka even after restart/recovery of the 
connector. So there could be scenarios where these partial directories remain 
partial forever and requires an additional manual intervention to clean it up 
or complete its creation in hdfs, without which we see issues in hive table 
partitions if enabled.


was (Author: kaushik srinivas):
For the below comments

"Now, by reading quickly the javadoc for FileSystem#mkdirs in HDFS I understand 
that a nested directory can be constructed from a given path, even if some of 
the parents exist. Which makes me think that a restart would allow the creation 
of the full path as long as the partitioner is deterministic with respect to 
the record (i.e. for a given record it always creates the same path). Upon a 
restart the export of a record will be retried, so the partitioner could 
possibly reattempt the full creation of the path. But that's an early 
assumption from my side. "

 

Lets suppose partitioner derives the directory /test1/test2 for a given record. 
And the connector crashed during the directory creation and hdfs ends up with 
only /test1/ folder.

But there is no guarantee that the kafka record which was not committed fully 
by the connector would exist in kafka even after restart/recovery of the 
connector. So there could be scenarios where these partial directories remain 
partial forever and requires an additional manual intervention to clean it up 
or complete its creation in hdfs, without which we see issues in hive table 
partitions if enabled.

> ssue when kafka connect worker pod restart, during creation of nested 
> partition directories in hdfs file system.
> 
>
> Key: KAFKA-12164
> URL: https://issues.apache.org/jira/browse/KAFKA-12164
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: kaushik srinivas
>Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the 
> same.
>  # hdfs connector is added to the connect worker.
>  # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested 
> directories derived from multiple fields in the record using a custom 
> partitioner.
>  # Now kafka connect hdfs connector uses below function calls to create the 
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref: 
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation 
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation 
> of test2 in hdfs happens if there is a restart to the connect worker pod. 
> Then the hdfs file system will remain with partial folders created for 
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive 
> integration is enabled, hive metastore exceptions will occur since there is a 
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a 
> restart is triggered to kafka connect worker pod. This will result in some 
> partially completed states in the system and may cause issues for the 
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling 
> 

[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-01-20 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269064#comment-17269064
 ] 

kaushik srinivas commented on KAFKA-12164:
--

For the below comments

"Now, by reading quickly the javadoc for FileSystem#mkdirs in HDFS I understand 
that a nested directory can be constructed from a given path, even if some of 
the parents exist. Which makes me think that a restart would allow the creation 
of the full path as long as the partitioner is deterministic with respect to 
the record (i.e. for a given record it always creates the same path). Upon a 
restart the export of a record will be retried, so the partitioner could 
possibly reattempt the full creation of the path. But that's an early 
assumption from my side. "

 

Lets suppose partitioner derives the directory /test1/test2 for a given record. 
And the connector crashed during the directory creation and hdfs ends up with 
only /test1/ folder.

But there is no guarantee that the kafka record which was not committed fully 
by the connector would exist in kafka even after restart/recovery of the 
connector. So there could be scenarios where these partial directories remain 
partial forever and requires an additional manual intervention to clean it up 
or complete its creation in hdfs, without which we see issues in hive table 
partitions if enabled.

> ssue when kafka connect worker pod restart, during creation of nested 
> partition directories in hdfs file system.
> 
>
> Key: KAFKA-12164
> URL: https://issues.apache.org/jira/browse/KAFKA-12164
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: kaushik srinivas
>Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the 
> same.
>  # hdfs connector is added to the connect worker.
>  # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested 
> directories derived from multiple fields in the record using a custom 
> partitioner.
>  # Now kafka connect hdfs connector uses below function calls to create the 
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref: 
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation 
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation 
> of test2 in hdfs happens if there is a restart to the connect worker pod. 
> Then the hdfs file system will remain with partial folders created for 
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive 
> integration is enabled, hive metastore exceptions will occur since there is a 
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a 
> restart is triggered to kafka connect worker pod. This will result in some 
> partially completed states in the system and may cause issues for the 
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling 
> the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-01-20 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269061#comment-17269061
 ] 

kaushik srinivas commented on KAFKA-12164:
--

Hi [~kkonstantine]

We have created the ticket even on the hdfs sink connector side as well.

Below is the ticket 

[https://github.com/confluentinc/kafka-connect-hdfs/issues/538]

We have also captured more detailed analysis over there. But there are no 
responses in that forum as well.

-Kaushik

> ssue when kafka connect worker pod restart, during creation of nested 
> partition directories in hdfs file system.
> 
>
> Key: KAFKA-12164
> URL: https://issues.apache.org/jira/browse/KAFKA-12164
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: kaushik srinivas
>Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the 
> same.
>  # hdfs connector is added to the connect worker.
>  # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested 
> directories derived from multiple fields in the record using a custom 
> partitioner.
>  # Now kafka connect hdfs connector uses below function calls to create the 
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref: 
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation 
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation 
> of test2 in hdfs happens if there is a restart to the connect worker pod. 
> Then the hdfs file system will remain with partial folders created for 
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive 
> integration is enabled, hive metastore exceptions will occur since there is a 
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a 
> restart is triggered to kafka connect worker pod. This will result in some 
> partially completed states in the system and may cause issues for the 
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling 
> the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-01-20 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268532#comment-17268532
 ] 

kaushik srinivas commented on KAFKA-12164:
--

[~ijuma]

This is an issue with lot of impacts. Can you provide some suggestions.

> ssue when kafka connect worker pod restart, during creation of nested 
> partition directories in hdfs file system.
> 
>
> Key: KAFKA-12164
> URL: https://issues.apache.org/jira/browse/KAFKA-12164
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: kaushik srinivas
>Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the 
> same.
>  # hdfs connector is added to the connect worker.
>  # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested 
> directories derived from multiple fields in the record using a custom 
> partitioner.
>  # Now kafka connect hdfs connector uses below function calls to create the 
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref: 
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation 
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation 
> of test2 in hdfs happens if there is a restart to the connect worker pod. 
> Then the hdfs file system will remain with partial folders created for 
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive 
> integration is enabled, hive metastore exceptions will occur since there is a 
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a 
> restart is triggered to kafka connect worker pod. This will result in some 
> partially completed states in the system and may cause issues for the 
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling 
> the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-01-11 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262536#comment-17262536
 ] 

kaushik srinivas commented on KAFKA-12164:
--

[~ijuma]

Need your inputs.

> ssue when kafka connect worker pod restart, during creation of nested 
> partition directories in hdfs file system.
> 
>
> Key: KAFKA-12164
> URL: https://issues.apache.org/jira/browse/KAFKA-12164
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: kaushik srinivas
>Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the 
> same.
>  # hdfs connector is added to the connect worker.
>  # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested 
> directories derived from multiple fields in the record using a custom 
> partitioner.
>  # Now kafka connect hdfs connector uses below function calls to create the 
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref: 
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation 
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation 
> of test2 in hdfs happens if there is a restart to the connect worker pod. 
> Then the hdfs file system will remain with partial folders created for 
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive 
> integration is enabled, hive metastore exceptions will occur since there is a 
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a 
> restart is triggered to kafka connect worker pod. This will result in some 
> partially completed states in the system and may cause issues for the 
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling 
> the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-01-07 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12164:


 Summary: ssue when kafka connect worker pod restart, during 
creation of nested partition directories in hdfs file system.
 Key: KAFKA-12164
 URL: https://issues.apache.org/jira/browse/KAFKA-12164
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: kaushik srinivas


In our production labs, an issue is observed. Below is the sequence of the same.
 # hdfs connector is added to the connect worker.
 # hdfs connector is creating folders in hdfs /test1=1/test2=2/
Based on the custom partitioner. Here test1 and test2 are two separate nested 
directories derived from multiple fields in the record using a custom 
partitioner.
 # Now kafka connect hdfs connector uses below function calls to create the 
directories in the hdfs file system.
fs.mkdirs(new Path(filename));
ref: 
[https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]

Now the important thing to note is that if mkdirs() is a non atomic operation 
(i.e can result in partial execution if interrupted)
then suppose the first directory ie test1 is created and just before creation 
of test2 in hdfs happens if there is a restart to the connect worker pod. Then 
the hdfs file system will remain with partial folders created for partitions 
during the restart time frames.

So we might have conditions in hdfs as below
/test1=0/test2=0/
/test1=1/
/test1=2/test2=2
/test1=3/test2=3

So the second partition has a missing directory in it. And if hive integration 
is enabled, hive metastore exceptions will occur since there is a partition 
expected from hive table is missing for few partitions in hdfs.

*This can occur to any connector with some ongoing non atomic operation and a 
restart is triggered to kafka connect worker pod. This will result in some 
partially completed states in the system and may cause issues for the connector 
to continue its operation*.

*This is a very critical issue and needs some attention on ways for handling 
the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.

2020-07-20 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160930#comment-17160930
 ] 

kaushik srinivas commented on KAFKA-10278:
--

Hi [~bbyrne] ,

Does it mean in 2.5, even though the dynamic properties are pushed via static 
file, even those get displayed when we run kafka-configs with --alll option ?

Thanks,

Kaushik

> kafka-configs does not show the current properties of running kafka broker 
> upon describe.
> -
>
> Key: KAFKA-10278
> URL: https://issues.apache.org/jira/browse/KAFKA-10278
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>  Labels: kafka-configs.sh
>
> kafka-configs.sh does not list the properties 
> (read-only/per-broker/cluster-wide) with which the kafka broker is currently 
> running.
> The command returns nothing.
> Only those properties added or updated via kafka-configs.sh is listed by the 
> describe command.
> bash-4.2$ env -i  bin/kafka-configs.sh --bootstrap-server 
> kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers 
> --entity-default --describe Default config for brokers in the cluster are:
>   log.cleaner.threads=2 sensitive=false 
> synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.

2020-07-16 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159129#comment-17159129
 ] 

kaushik srinivas commented on KAFKA-10278:
--

Hi [~rajinisiva...@gmail.com] 

[~rsivaram], [~ijuma]

Is this a known issue already ?

 

> kafka-configs does not show the current properties of running kafka broker 
> upon describe.
> -
>
> Key: KAFKA-10278
> URL: https://issues.apache.org/jira/browse/KAFKA-10278
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>  Labels: kafka-configs.sh
>
> kafka-configs.sh does not list the properties 
> (read-only/per-broker/cluster-wide) with which the kafka broker is currently 
> running.
> The command returns nothing.
> Only those properties added or updated via kafka-configs.sh is listed by the 
> describe command.
> bash-4.2$ env -i  bin/kafka-configs.sh --bootstrap-server 
> kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers 
> --entity-default --describe Default config for brokers in the cluster are:
>   log.cleaner.threads=2 sensitive=false 
> synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.

2020-07-16 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-10278:


 Summary: kafka-configs does not show the current properties of 
running kafka broker upon describe.
 Key: KAFKA-10278
 URL: https://issues.apache.org/jira/browse/KAFKA-10278
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: kaushik srinivas


kafka-configs.sh does not list the properties 
(read-only/per-broker/cluster-wide) with which the kafka broker is currently 
running.

The command returns nothing.

Only those properties added or updated via kafka-configs.sh is listed by the 
describe command.

bash-4.2$ env -i  bin/kafka-configs.sh --bootstrap-server 
kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers 
--entity-default --describe Default config for brokers in the cluster are:
  log.cleaner.threads=2 sensitive=false 
synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.

2020-05-04 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098850#comment-17098850
 ] 

kaushik srinivas commented on KAFKA-9933:
-

Hi

[~ijuma]

We see when SASL (GSSAPI) and SSL both are turned ON, the kerberos principal 
name only is being validated for the ACLs. Even when the ssl certificate based 
principal name was not given ACLs, but with proper kerberos based name ACLs 
created, clients are working fine. 

Is this behavior expected, as in SASL is given precendence for ACLs check with 
both SSL AND SASL is enabled.

thanks a lot in advance

kaushik

> Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.
> 
>
> Key: KAFKA-9933
> URL: https://issues.apache.org/jira/browse/KAFKA-9933
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Hello,
> Document on the usage of the authorizer does not speak about the principal 
> being used when the protocol for the listener is chosen as SASL + SSL 
> (SASL_SSL).
> Suppose kerberos and ssl is enabled together, will the authorization be based 
> on the kerberos principal names or on the ssl certificate DN names ?
> There is no document covering this part of the use case.
> This needs information and documentation update.
> Thanks,
> Kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.

2020-05-04 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098824#comment-17098824
 ] 

kaushik srinivas commented on KAFKA-9933:
-

Hi 

[~ijuma]

Need your inputs. We enable both SASL and SSL with 

flavours of GSSAPI and PLAIN mechanisms both in varied deployments.

Thanks,

kaushik

 

> Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.
> 
>
> Key: KAFKA-9933
> URL: https://issues.apache.org/jira/browse/KAFKA-9933
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Hello,
> Document on the usage of the authorizer does not speak about the principal 
> being used when the protocol for the listener is chosen as SASL + SSL 
> (SASL_SSL).
> Suppose kerberos and ssl is enabled together, will the authorization be based 
> on the kerberos principal names or on the ssl certificate DN names ?
> There is no document covering this part of the use case.
> This needs information and documentation update.
> Thanks,
> Kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.

2020-05-04 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-9933:

Issue Type: Bug  (was: Improvement)

> Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.
> 
>
> Key: KAFKA-9933
> URL: https://issues.apache.org/jira/browse/KAFKA-9933
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Hello,
> Document on the usage of the authorizer does not speak about the principal 
> being used when the protocol for the listener is chosen as SASL + SSL 
> (SASL_SSL).
> Suppose kerberos and ssl is enabled together, will the authorization be based 
> on the kerberos principal names or on the ssl certificate DN names ?
> There is no document covering this part of the use case.
> This needs information and documentation update.
> Thanks,
> Kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9934) Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT

2020-05-04 Thread kaushik srinivas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098822#comment-17098822
 ] 

kaushik srinivas commented on KAFKA-9934:
-

Hi

[~ijuma]

Need your inputs on this case.

Thanks,

kaushik

> Information and doc update needed for support of AclAuthorizer when protocol 
> is PLAINTEXT
> -
>
> Key: KAFKA-9934
> URL: https://issues.apache.org/jira/browse/KAFKA-9934
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Need information on the case where the protocol is PLAINTEXT for listeners in 
> kafka.
> Does Authorization applies when the protocol is PLAINTEXT ?
> if so, what would be used as the principal name for the authorization acl 
> validations?
> There is no doc which describes this case.
> Need info and doc update for the same.
> Thanks,
> kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9934) Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT

2020-05-04 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-9934:

Issue Type: Bug  (was: Improvement)

> Information and doc update needed for support of AclAuthorizer when protocol 
> is PLAINTEXT
> -
>
> Key: KAFKA-9934
> URL: https://issues.apache.org/jira/browse/KAFKA-9934
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>
> Need information on the case where the protocol is PLAINTEXT for listeners in 
> kafka.
> Does Authorization applies when the protocol is PLAINTEXT ?
> if so, what would be used as the principal name for the authorization acl 
> validations?
> There is no doc which describes this case.
> Need info and doc update for the same.
> Thanks,
> kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9934) Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT

2020-04-28 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-9934:
---

 Summary: Information and doc update needed for support of 
AclAuthorizer when protocol is PLAINTEXT
 Key: KAFKA-9934
 URL: https://issues.apache.org/jira/browse/KAFKA-9934
 Project: Kafka
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: kaushik srinivas


Need information on the case where the protocol is PLAINTEXT for listeners in 
kafka.

Does Authorization applies when the protocol is PLAINTEXT ?

if so, what would be used as the principal name for the authorization acl 
validations?

There is no doc which describes this case.

Need info and doc update for the same.

Thanks,

kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.

2020-04-28 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-9933:
---

 Summary: Need doc update on the AclAuthorizer when SASL_SSL is the 
protocol used.
 Key: KAFKA-9933
 URL: https://issues.apache.org/jira/browse/KAFKA-9933
 Project: Kafka
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: kaushik srinivas


Hello,

Document on the usage of the authorizer does not speak about the principal 
being used when the protocol for the listener is chosen as SASL + SSL 
(SASL_SSL).

Suppose kerberos and ssl is enabled together, will the authorization be based 
on the kerberos principal names or on the ssl certificate DN names ?

There is no document covering this part of the use case.

This needs information and documentation update.

Thanks,

Kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8622) Snappy Compression Not Working

2020-04-11 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas resolved KAFKA-8622.
-
Resolution: Resolved

> Snappy Compression Not Working
> --
>
> Key: KAFKA-8622
> URL: https://issues.apache.org/jira/browse/KAFKA-8622
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Kunal Verma
>Assignee: kaushik srinivas
>Priority: Major
>
> I am trying to produce a message on the broker with compression enabled as 
> snappy.
> Environment :
> Brokers[Kafka-cluster] are hosted on Centos 7
> I have download the latest version (2.3.0 & 2.2.1) tar, extract it and moved 
> to /opt/kafka-
> I have executed the broker with standard configuration.
> In my producer service(written in java), I have enabled snappy compression.
> props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
>  
> so while sending record on broker, I am getting following errors:
> org.apache.kafka.common.errors.UnknownServerException: The server experienced 
> an unexpected error when processing the request
>  
> While investing further at broker end I got following error in log
>  
> logs/kafkaServer.out:java.lang.UnsatisfiedLinkError: 
> /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: 
> /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: 
> failed to map segment from shared object: Operation not permitted
> --
>  
> [2019-07-02 15:29:43,399] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition test-bulk-1 (kafka.server.ReplicaManager)
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
> at 
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:435)
> at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:466)
> at java.io.DataInputStream.readByte(DataInputStream.java:265)
> at org.apache.kafka.common.utils.ByteUtils.readVarint(ByteUtils.java:168)
> at 
> org.apache.kafka.common.record.DefaultRecord.readFrom(DefaultRecord.java:293)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(DefaultRecordBatch.java:264)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:569)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:538)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:327)
> at 
> scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:55)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at 
> kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:269)
> at 
> kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1$adapted(LogValidator.scala:261)
> at scala.collection.Iterator.foreach(Iterator.scala:941)
> at scala.collection.Iterator.foreach$(Iterator.scala:941)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261)
> at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:73)
> at kafka.log.Log.liftedTree1$1(Log.scala:881)
> at kafka.log.Log.$anonfun$append$2(Log.scala:868)
> at kafka.log.Log.maybeHandleIOException(Log.scala:2065)
> at kafka.log.Log.append(Log.scala:850)
> at kafka.log.Log.appendAsLeader(Log.scala:819)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
> at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> at scala.collection.TraversableLike.map(TraversableLike.scala:237)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
> at 

[jira] [Commented] (KAFKA-8622) Snappy Compression Not Working

2019-07-02 Thread kaushik srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876874#comment-16876874
 ] 

kaushik srinivas commented on KAFKA-8622:
-

This can be due to permission issue for the snappy library.

try to create a folder with write permissions and add this below flag in the 
kafka-run-class.sh script to point to the newly created directory.

-Dorg.xerial.snappy.tempdir=/home/cloud-user/GENERIC_FRAMEWORK/kaushik/confluent-5.1.2/tmp

 

It should work fine. Let know the test results.

> Snappy Compression Not Working
> --
>
> Key: KAFKA-8622
> URL: https://issues.apache.org/jira/browse/KAFKA-8622
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Kunal Verma
>Assignee: kaushik srinivas
>Priority: Major
>
> I am trying to produce a message on the broker with compression enabled as 
> snappy.
> Environment :
> Brokers[Kafka-cluster] are hosted on Centos 7
> I have download the latest version (2.3.0 & 2.2.1) tar, extract it and moved 
> to /opt/kafka-
> I have executed the broker with standard configuration.
> In my producer service(written in java), I have enabled snappy compression.
> props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
>  
> so while sending record on broker, I am getting following errors:
> org.apache.kafka.common.errors.UnknownServerException: The server experienced 
> an unexpected error when processing the request
>  
> While investing further at broker end I got following error in log
>  
> logs/kafkaServer.out:java.lang.UnsatisfiedLinkError: 
> /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: 
> /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: 
> failed to map segment from shared object: Operation not permitted
> --
>  
> [2019-07-02 15:29:43,399] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition test-bulk-1 (kafka.server.ReplicaManager)
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
> at 
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:435)
> at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:466)
> at java.io.DataInputStream.readByte(DataInputStream.java:265)
> at org.apache.kafka.common.utils.ByteUtils.readVarint(ByteUtils.java:168)
> at 
> org.apache.kafka.common.record.DefaultRecord.readFrom(DefaultRecord.java:293)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(DefaultRecordBatch.java:264)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:569)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:538)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:327)
> at 
> scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:55)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at 
> kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:269)
> at 
> kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1$adapted(LogValidator.scala:261)
> at scala.collection.Iterator.foreach(Iterator.scala:941)
> at scala.collection.Iterator.foreach$(Iterator.scala:941)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261)
> at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:73)
> at kafka.log.Log.liftedTree1$1(Log.scala:881)
> at kafka.log.Log.$anonfun$append$2(Log.scala:868)
> at kafka.log.Log.maybeHandleIOException(Log.scala:2065)
> at kafka.log.Log.append(Log.scala:850)
> at kafka.log.Log.appendAsLeader(Log.scala:819)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
> at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> at 

[jira] [Assigned] (KAFKA-8622) Snappy Compression Not Working

2019-07-02 Thread kaushik srinivas (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas reassigned KAFKA-8622:
---

Assignee: kaushik srinivas

> Snappy Compression Not Working
> --
>
> Key: KAFKA-8622
> URL: https://issues.apache.org/jira/browse/KAFKA-8622
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Kunal Verma
>Assignee: kaushik srinivas
>Priority: Major
>
> I am trying to produce a message on the broker with compression enabled as 
> snappy.
> Environment :
> Brokers[Kafka-cluster] are hosted on Centos 7
> I have download the latest version (2.3.0 & 2.2.1) tar, extract it and moved 
> to /opt/kafka-
> I have executed the broker with standard configuration.
> In my producer service(written in java), I have enabled snappy compression.
> props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
>  
> so while sending record on broker, I am getting following errors:
> org.apache.kafka.common.errors.UnknownServerException: The server experienced 
> an unexpected error when processing the request
>  
> While investing further at broker end I got following error in log
>  
> logs/kafkaServer.out:java.lang.UnsatisfiedLinkError: 
> /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: 
> /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: 
> failed to map segment from shared object: Operation not permitted
> --
>  
> [2019-07-02 15:29:43,399] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition test-bulk-1 (kafka.server.ReplicaManager)
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
> at 
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:435)
> at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:466)
> at java.io.DataInputStream.readByte(DataInputStream.java:265)
> at org.apache.kafka.common.utils.ByteUtils.readVarint(ByteUtils.java:168)
> at 
> org.apache.kafka.common.record.DefaultRecord.readFrom(DefaultRecord.java:293)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(DefaultRecordBatch.java:264)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:569)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:538)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:327)
> at 
> scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:55)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at 
> kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:269)
> at 
> kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1$adapted(LogValidator.scala:261)
> at scala.collection.Iterator.foreach(Iterator.scala:941)
> at scala.collection.Iterator.foreach$(Iterator.scala:941)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261)
> at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:73)
> at kafka.log.Log.liftedTree1$1(Log.scala:881)
> at kafka.log.Log.$anonfun$append$2(Log.scala:868)
> at kafka.log.Log.maybeHandleIOException(Log.scala:2065)
> at kafka.log.Log.append(Log.scala:850)
> at kafka.log.Log.appendAsLeader(Log.scala:819)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
> at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> at scala.collection.TraversableLike.map(TraversableLike.scala:237)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
> at 

[jira] [Comment Edited] (KAFKA-8485) Kafka connect worker does not respond/function when kafka broker goes down.

2019-06-07 Thread kaushik srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858366#comment-16858366
 ] 

kaushik srinivas edited comment on KAFKA-8485 at 6/7/19 7:47 AM:
-

Hi [~kkonstantine]

I see weird behavior across different runs.

One of the suggestion from the kafka dev group pointed out to this bug,

[https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-7941]

So we have done a patch to our connect with the pull request being opened for 
this issue. But no help from this patch, we still see stability issues with 
connect when kafka broker goes down.

 

Now we have seen below stack trace couple of times, where in after restart of 
kafka brokers, GET requests would work fine but when tried to add a connector 
or delete a connector, sometimes we have seen this trace
{code:java}
org.apache.kafka.connect.errors.ConnectException: Error writing connector 
configuration to Kafka
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:334)
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.putConnectorConfig(KafkaConfigBackingStore.java:303)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:555)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:535)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:271)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:220)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Timed out waiting for future
at 
org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:73)
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:331)
... 10 more
{code}
 By the time POST connectors or DELETE connectors are run, kafka brokers are 
completely up and running back. verified also the leader assignment of the 
kafka connect internal topics as well.

Not much activity is happening in the connect logs even in debug mode except 
from seeing the above trace.

Issue is very consistent, even without data activity once kafka brokers are 
restarted (2 out of 3 brokers), we see this behavior.


was (Author: kaushik srinivas):
Hi [~kkonstantine]

I see weird behavior across different runs.

One of the suggestion from the kafka dev group pointed out to this bug,

[https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-7941]

So we have done a patch to our connect with the pull request being opened for 
this issue. But no help from this patch, we still see stability issues with 
connect when kafka broker goes down.

 

Now we have seen below stack trace couple of times, where in after restart of 
kafka brokers, GET requests would work fine but when tried to add a connector 
or delete a connector, sometimes we have seen this trace
{code:java}
org.apache.kafka.connect.errors.ConnectException: Error writing connector 
configuration to Kafka
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:334)
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.putConnectorConfig(KafkaConfigBackingStore.java:303)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:555)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:535)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:271)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:220)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Timed out waiting for future
at 
org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:73)
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:331)
... 10 more
{code}
 

Not much activity is happening in the connect logs even in debug mode except 
from seeing the above trace.

Issue is very consistent, even without data activity once kafka brokers are 

[jira] [Updated] (KAFKA-8485) Kafka connect worker does not respond/function when kafka broker goes down.

2019-06-07 Thread kaushik srinivas (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-8485:

Summary: Kafka connect worker does not respond/function when kafka broker 
goes down.  (was: Kafka connect worker does not respond when kafka broker goes 
down with data streaming in progress)

> Kafka connect worker does not respond/function when kafka broker goes down.
> ---
>
> Key: KAFKA-8485
> URL: https://issues.apache.org/jira/browse/KAFKA-8485
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.2.1
>Reporter: kaushik srinivas
>Priority: Blocker
>  Labels: performance
>
> Below is the scenario
> 3 kafka brokers are up and running.
> Kafka connect worker is installed and a hdfs sink connector is added.
> Data streaming started, data being flushed out of kafka into hdfs.
> Topic is created with 3 partitons, one leader on all the three brokers.
> Now, 2 kafka brokers are restarted. Partition re balance happens.
> Now we observe, kafka connect does not respond. REST API keeps timing out. 
> Nothing useful is being logged at the connect logs as well.
> Only way to get out of this situation currently is to restart the kafka 
> connect worker and things gets normal.
>  
> The same scenario when tried without data being in progress, works fine. 
> Meaning REST API does not get into timing out state. 
> making this issue a blocker, because of the impact due to kafka broker 
> restart.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8485) Kafka connect worker does not respond when kafka broker goes down with data streaming in progress

2019-06-07 Thread kaushik srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858367#comment-16858367
 ] 

kaushik srinivas commented on KAFKA-8485:
-

I have a query in this scenario, KafkaBasedLog.java uses producers to update 
connect config topic within kafka, and this producer object is created at the 
runtime bootup.

If the leader broker of the config topic goes down after everything is working 
fine, would it mean the metadata which is present with this producer is no more 
valid and we hit issues with this producer ?

> Kafka connect worker does not respond when kafka broker goes down with data 
> streaming in progress
> -
>
> Key: KAFKA-8485
> URL: https://issues.apache.org/jira/browse/KAFKA-8485
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.2.1
>Reporter: kaushik srinivas
>Priority: Blocker
>  Labels: performance
>
> Below is the scenario
> 3 kafka brokers are up and running.
> Kafka connect worker is installed and a hdfs sink connector is added.
> Data streaming started, data being flushed out of kafka into hdfs.
> Topic is created with 3 partitons, one leader on all the three brokers.
> Now, 2 kafka brokers are restarted. Partition re balance happens.
> Now we observe, kafka connect does not respond. REST API keeps timing out. 
> Nothing useful is being logged at the connect logs as well.
> Only way to get out of this situation currently is to restart the kafka 
> connect worker and things gets normal.
>  
> The same scenario when tried without data being in progress, works fine. 
> Meaning REST API does not get into timing out state. 
> making this issue a blocker, because of the impact due to kafka broker 
> restart.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8485) Kafka connect worker does not respond when kafka broker goes down with data streaming in progress

2019-06-07 Thread kaushik srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858366#comment-16858366
 ] 

kaushik srinivas commented on KAFKA-8485:
-

Hi [~kkonstantine]

I see weird behavior across different runs.

One of the suggestion from the kafka dev group pointed out to this bug,

[https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-7941]

So we have done a patch to our connect with the pull request being opened for 
this issue. But no help from this patch, we still see stability issues with 
connect when kafka broker goes down.

 

Now we have seen below stack trace couple of times, where in after restart of 
kafka brokers, GET requests would work fine but when tried to add a connector 
or delete a connector, sometimes we have seen this trace
{code:java}
org.apache.kafka.connect.errors.ConnectException: Error writing connector 
configuration to Kafka
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:334)
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.putConnectorConfig(KafkaConfigBackingStore.java:303)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:555)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:535)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:271)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:220)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Timed out waiting for future
at 
org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:73)
at 
org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:331)
... 10 more
{code}
 

Not much activity is happening in the connect logs even in debug mode except 
from seeing the above trace.

Issue is very consistent, even without data activity once kafka brokers are 
restarted (2 out of 3 brokers), we see this behavior.

> Kafka connect worker does not respond when kafka broker goes down with data 
> streaming in progress
> -
>
> Key: KAFKA-8485
> URL: https://issues.apache.org/jira/browse/KAFKA-8485
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.2.1
>Reporter: kaushik srinivas
>Priority: Blocker
>  Labels: performance
>
> Below is the scenario
> 3 kafka brokers are up and running.
> Kafka connect worker is installed and a hdfs sink connector is added.
> Data streaming started, data being flushed out of kafka into hdfs.
> Topic is created with 3 partitons, one leader on all the three brokers.
> Now, 2 kafka brokers are restarted. Partition re balance happens.
> Now we observe, kafka connect does not respond. REST API keeps timing out. 
> Nothing useful is being logged at the connect logs as well.
> Only way to get out of this situation currently is to restart the kafka 
> connect worker and things gets normal.
>  
> The same scenario when tried without data being in progress, works fine. 
> Meaning REST API does not get into timing out state. 
> making this issue a blocker, because of the impact due to kafka broker 
> restart.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8485) Kafka connect worker does not respond when kafka broker goes down with data streaming in progress

2019-06-05 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-8485:
---

 Summary: Kafka connect worker does not respond when kafka broker 
goes down with data streaming in progress
 Key: KAFKA-8485
 URL: https://issues.apache.org/jira/browse/KAFKA-8485
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.2.1
Reporter: kaushik srinivas


Below is the scenario

3 kafka brokers are up and running.

Kafka connect worker is installed and a hdfs sink connector is added.

Data streaming started, data being flushed out of kafka into hdfs.

Topic is created with 3 partitons, one leader on all the three brokers.

Now, 2 kafka brokers are restarted. Partition re balance happens.

Now we observe, kafka connect does not respond. REST API keeps timing out. 

Nothing useful is being logged at the connect logs as well.

Only way to get out of this situation currently is to restart the kafka connect 
worker and things gets normal.

 

The same scenario when tried without data being in progress, works fine. 
Meaning REST API does not get into timing out state. 

making this issue a blocker, because of the impact due to kafka broker restart.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8314) Managing the doc field in case of schema projection - kafka connect

2019-05-02 Thread kaushik srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831549#comment-16831549
 ] 

kaushik srinivas commented on KAFKA-8314:
-

We have added a check in 
connect/api/src/main/java/org/apache/kafka/connect/data/SchemaProjector.java

as below to exclude non mandatory items of schema in our local build of kafka.
{code:java}
private static void checkMaybeCompatible(Schema source, Schema target) {
if (source.type() != target.type() && !isPromotable(source.type(), 
target.type())) {
throw new SchemaProjectorException("Schema type mismatch. source type: " + 
source.type() + " and target type: " + target.type());
} else if (!Objects.equals(source.name(), target.name())) {
throw new SchemaProjectorException("Schema name mismatch. source name: " + 
source.name() + " and target name: " + target.name());
} else if (!Objects.equals(source.parameters(), target.parameters())) {

// Create a copy of the original source and schema properties maps.
Map sourceParameters = new HashMap<>(source.parameters());
Map targetParameters = new HashMap<>(target.parameters());

// exclude doc,aliases and namespace fields for schema compatability check
sourceParameters.remove("connect.record.doc");
sourceParameters.remove("connect.record.aliases");
sourceParameters.remove("connect.record.namespace");
targetParameters.remove("connect.record.doc");
targetParameters.remove("connect.record.aliases");
targetParameters.remove("connect.record.namespace");

// throw SchemaProjectorException if comparison fails for remaining attributes.
if(!Objects.equals(sourceParameters, targetParameters)) {
throw new SchemaProjectorException("Schema parameters not equal. source 
parameters: " + source.parameters() + " and target parameters: " + 
target.parameters());
}
}
{code}
 

> Managing the doc field in case of schema projection - kafka connect
> ---
>
> Key: KAFKA-8314
> URL: https://issues.apache.org/jira/browse/KAFKA-8314
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Priority: Major
>
> Doc field change in the schema while writing to hdfs using hdfs sink 
> connector via connect framework would cause failures in schema projection.
>  
> java.lang.RuntimeException: 
> org.apache.kafka.connect.errors.SchemaProjectorException: Schema parameters 
> not equal. source parameters: \{connect.record.doc=xxx} and target 
> parameters: \{connect.record.doc=yyy} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8314) Managing the doc field in case of schema projection - kafka connect

2019-05-02 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-8314:
---

 Summary: Managing the doc field in case of schema projection - 
kafka connect
 Key: KAFKA-8314
 URL: https://issues.apache.org/jira/browse/KAFKA-8314
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


Doc field change in the schema while writing to hdfs using hdfs sink connector 
via connect framework would cause failures in schema projection.

 

java.lang.RuntimeException: 
org.apache.kafka.connect.errors.SchemaProjectorException: Schema parameters not 
equal. source parameters: \{connect.record.doc=xxx} and target parameters: 
\{connect.record.doc=yyy} 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7667) Need synchronous records send support for kafka performance producer java application.

2018-11-22 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-7667:
---

 Summary: Need synchronous records send support for kafka 
performance producer java application.
 Key: KAFKA-7667
 URL: https://issues.apache.org/jira/browse/KAFKA-7667
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.0.0
Reporter: kaushik srinivas
Assignee: kaushik srinivas


Why synchronous send support for performance producer ?

ProducerPerformance java application is used for load testing kafka brokers.
Load testing involves replicating very high throughput records flowing in to 
kafka brokers and 
many producers in field would use synchronous way of sending data i.e blocking 
until the message has been 
written completely on all the min.insyc.replicas no of brokers.

Asynchronous sends would satisfy the first requirement of loading kafka brokers.

This requirement would help in performance tuning the kafka brokers when 
producers are deployed with "acks": all, "min.insync.replicas" :
equal to replication factor and synchronous way of sending. 
Throughput degradation happens with synchronous producers and this would help 
in 
tuning resources for replication in kafka brokers. 
Also benchmarks could be made from kafka producer perspective with synchronous 
way of sending records and tune kafka producer's 
resources appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7659) dummy test

2018-11-20 Thread kaushik srinivas (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas reassigned KAFKA-7659:
---

Assignee: kaushik srinivas

> dummy test
> --
>
> Key: KAFKA-7659
> URL: https://issues.apache.org/jira/browse/KAFKA-7659
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: kaushik srinivas
>Assignee: kaushik srinivas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7659) dummy test

2018-11-20 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-7659:
---

 Summary: dummy test
 Key: KAFKA-7659
 URL: https://issues.apache.org/jira/browse/KAFKA-7659
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Reporter: kaushik srinivas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7171) KafkaPerformanceProducer crashes with same transaction id.

2018-07-17 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-7171:
---

 Summary: KafkaPerformanceProducer crashes with same transaction id.
 Key: KAFKA-7171
 URL: https://issues.apache.org/jira/browse/KAFKA-7171
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.0.1
Reporter: kaushik srinivas


Running org.apache.kafka.tools.ProducerPerformance code to performance test the 
kafka cluster. As a trial cluster has only one broker and zookeeper with 12GB 
of heap space.

Running 6 producers on 3 machines with same transaction id (2 producers on each 
node).

Below are the settings of each producer,

kafka-run-class org.apache.kafka.tools.ProducerPerformance --print-metrics 
--topic perf1 --num-records 9223372036854 --throughput 25  --record-size 
200 --producer-props bootstrap.servers=localhost:9092 buffer.memory=524288000 
batch.size=524288

 

for 2 hours all producers run fine, then suddenly throughput of all producers 
increase 3 times and 4 producers on 2 nodes crashes with below exceptions,

[2018-07-16 14:00:18,744] ERROR Error executing user-provided callback on 
message for topic-partition perf1-6: 
(org.apache.kafka.clients.producer.internals.RecordBatch)
java.lang.ClassCastException: 
org.apache.kafka.clients.producer.internals.RecordAccumulator$RecordAppendResult
 cannot be cast to org.apache.kafka.clients.producer.internals.RecordBatch$Thunk
 at 
org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:99)
 at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:312)
 at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:272)
 at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:57)
 at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:358)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
 at java.lang.Thread.run(Thread.java:748)

 

First machine (2 producers) run fine.

Need some pointers on this issue. 

Queires:

why the throughput is increasing 3 times after 2 hours of duration ?

why the other producers are crashing ?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6961) UnknownTopicOrPartitionException & NotLeaderForPartitionException upon replication of topics.

2018-05-28 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-6961:
---

 Summary: UnknownTopicOrPartitionException & 
NotLeaderForPartitionException upon replication of topics.
 Key: KAFKA-6961
 URL: https://issues.apache.org/jira/browse/KAFKA-6961
 Project: Kafka
  Issue Type: Bug
 Environment: kubernetes cluster kafka.
Reporter: kaushik srinivas
 Attachments: k8s_NotLeaderForPartition.txt, k8s_replication_errors.txt

Running kafka & zookeeper in kubernetes cluster.

No of brokers : 3

No of partitions per topic : 3

creating topic with 3 partitions, and looks like all the partitions are up.

Below is the snapshot to confirm the same,

Topic:applestore  PartitionCount:3  ReplicationFactor:3   Configs:
 Topic: applestore  Partition: 0Leader: 1001Replicas: 
1001,1003,1002Isr: 1001,1003,1002
 Topic: applestore  Partition: 1Leader: 1002Replicas: 
1002,1001,1003Isr: 1002,1001,1003
 Topic: applestore  Partition: 2Leader: 1003Replicas: 
1003,1002,1001Isr: 1003,1002,1001
 
But, we see in the brokers as soon as the topics are created below stack traces 
appears,
 
error 1: 
[2018-05-28 08:00:31,875] ERROR [ReplicaFetcher replicaId=1001, leaderId=1003, 
fetcherId=7] Error for partition applestore-2 to broker 
1003:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
 
error 2 :
[2018-05-28 00:43:20,993] ERROR [ReplicaFetcher replicaId=1003, leaderId=1001, 
fetcherId=0] Error for partition apple-6 to broker 
1001:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server 
is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
 
When we tries producing records to each specific partition, it works fine and 
also log size across the replicated brokers appears to be equal, which means 
replication is happening fine.
Attaching the two stack trace files.
 
Why are these stack traces appearing ? can we ignore these stack traces if its 
some spam messages ?
 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6600) Kafka Bytes Out lags behind Kafka Bytes In on all brokers when topics replicated with 3 and flume kafka consumer.

2018-02-27 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-6600:
---

 Summary: Kafka Bytes Out lags behind Kafka Bytes In on all brokers 
when topics replicated with 3 and flume kafka consumer.
 Key: KAFKA-6600
 URL: https://issues.apache.org/jira/browse/KAFKA-6600
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.0.1
Reporter: kaushik srinivas


Below is the setup detail,

Kafka with 3 brokers (each broker with 10 cores and 32GBmem (12 GB heap)).

Created topic with 120 partitions and replication factor 3.

Throughput per broker is ~40k msgs/sec and bytes in ~8mb/sec.

Flume kafka source is used as the consumer.

Observations:

When the replication factor is kept 1, the bytes out and bytes in stops exactly 
at same timestamp(i.e when the producer to kafka is stopped).

But when the replication factor is increased to 3, there is a time lag observed 
in bytes out compared to bytes in. Flume kafka source is pulling data slowly. 
But flume is configured with very high memory and cpu configurations.

 

Tried increasing num.replica.fetchers from default value 1 to 10, 20, 50 etc 
and replica.fetch.max.bytes from default 1MB to 10MB,20MB. But no improvement 
is found to be observed in terms of the lag.

under repplicated partitions is observed to be zero using replica manager 
metrics in jmx.

Kafka brokers were monitored for cpu and memory, cpu is being used at 3% of 
total cores max and memory used at 4gb (32 Gb configured).

Flume kafka source has overriden kafka consumer properties : 
max.partition.fetch bytes is kept at default 1MB and fetch.max.bytes is kept at 
default 52MB. Flume kafka source batch size is kept at default value 1000.
 agent.sources..kafka.consumer.fetch.max.bytes = 10485760
 agent.sources..kafka.consumer.max.partition.fetch.bytes = 10485760
 agent.sources..batchSize = 1000
 

what more tuning is needed in order to reduce the lag between bytes in and 
bytes out  at kafka brokers with replication factor 3 or is there any 
configuration missed out?

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6356) UnknownTopicOrPartitionException & NotLeaderForPartitionException and log deletion happening with retention bytes kept at -1.

2017-12-13 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-6356:
---

 Summary: UnknownTopicOrPartitionException & 
NotLeaderForPartitionException and log deletion happening with retention bytes 
kept at -1.
 Key: KAFKA-6356
 URL: https://issues.apache.org/jira/browse/KAFKA-6356
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.1.0
 Environment: Cent OS 7.2,
HDD : 2Tb,
CPUs: 56 cores,
RAM : 256GB
Reporter: kaushik srinivas
 Attachments: configs.txt, stderr_b0, stderr_b1, stderr_b2, stdout_b0, 
stdout_b1, stdout_b2, topic_description, topic_offsets

Facing issues in kafka topic with partitions and replication factor of 3.

Config used :
No of partitions : 20
replication factor : 3
No of brokers : 3
Memory for broker : 32GB
Heap for broker : 12GB

Producer is run to produce data for 20 partitions of a single topic.
But observed that partitions for which the leader is one of the 
broker(broker-1), the offsets are never incremented and also we see log file 
with 0MB size in the broker disk.

Seeing below error in the brokers :

error 1:
2017-12-13 07:11:11,191] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[test2,5] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)

error 2:
[2017-12-11 12:19:41,599] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[test1,13] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)

Attaching,
1. error and std out files of all the brokers.
2. kafka config used.
3. offsets and topic description.

Retention bytes was kept to -1 and retention period 96 hours.
But still observing some of the log files deleting at the broker,

from logs :
[2017-12-11 12:20:20,586] INFO Deleting index 
/var/lib/mesos/slave/slaves/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-S7/frameworks/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-0006/executors/ckafka__5f085d0c-e296-40f0-a686-8953dd14e4c6/runs/506a1ce7-23d1-45ea-bb7c-84e015405285/kafka-broker-data/broker-1/test1-12/.timeindex
 (kafka.log.TimeIndex)
[2017-12-11 12:20:20,587] INFO Deleted log for partition [test1,12] in 
/var/lib/mesos/slave/slaves/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-S7/frameworks/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-0006/executors/ckafka__5f085d0c-e296-40f0-a686-8953dd14e4c6/runs/506a1ce7-23d1-45ea-bb7c-84e015405285/kafka-broker-data/broker-1/test1-12.
 (kafka.log.LogManager)

We are expecting the logs to be never delete if retention bytes set to -1.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-20 Thread kaushik srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260268#comment-16260268
 ] 

kaushik srinivas commented on KAFKA-6165:
-

yes.
Can the OutOfMemoryError stack trace thrown be made to add more clarity of the 
root cause.
i.e map count being exceeded in this case ?

-Kaushik

> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
> Attachments: config.json, kafkaServer-gc-agent06.7z, 
> kafkaServer-gc.log, kafkaServer-gc_agent03.log, kafkaServer-gc_agent04.log, 
> kafka_config.txt, map_counts_agent06, stderr_broker1.txt, stderr_broker2.txt, 
> stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
>   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
>   at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
>   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at 

[jira] [Updated] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-08 Thread kaushik srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-6165:

Attachment: config.json
kafkaServer-gc-agent06.7z
map_counts_agent06

map count monitored for one broker for 10hrs, gc log of a broker and latest 
kafka config used.

> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
> Attachments: config.json, kafkaServer-gc-agent06.7z, 
> kafkaServer-gc.log, kafkaServer-gc_agent03.log, kafkaServer-gc_agent04.log, 
> kafka_config.txt, map_counts_agent06, stderr_broker1.txt, stderr_broker2.txt, 
> stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
>   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
>   at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
>   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at 

[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-08 Thread kaushik srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245321#comment-16245321
 ] 

kaushik srinivas commented on KAFKA-6165:
-

Hi Huxihx,

Thanks for the feedback.

with,
"log_segment_bytes": 3
"log_retention_check_interval_ms": 180
"log_retention_bytes": "75"
"log_retention_hours": 48
We have now observed for 20 hours and brokers have not crashed so far.
max map count doesnt seem to go beyond ~15000 now.
We will monitor for some more duration and share if there are failures.

In general we have below questions,observations  and need your recommendations 
to tune our setups.

1. log.segment.bytes and log_retention_bytes is across all the topics and 
partitions,
In our case, few topics have very high througput and few very low throughput.
what would be the recommended way to set these two parameters considering wide 
range of data rates across topics.
2. log_retention_check_interval_ms is made to 30 mins now.
Would there be any extra memory overhead on the brokers if this value is very 
low (highly frequent).
since topics have different data injestion rates, log files generation rate is 
not same for partitons,
What is the best approach to decide on this setting ?
3. Observed map counts of the kafka process,
we see that max value on one of the broker is ~15000 and drops down to ~8000 in 
a span of 1.5 hours.
Is this ok from the GC point of view or any more optimisation can be done with 
respect to this ?
Attaching gc log file for one of the broker.
4. Since it appears to be log.segment.bytes config change, is it ok to reduce 
Java Heap Space back to 8GB (thats where we started from)?
Whats the recommendation for java heap space config in cases like ours.
5. Also from a user point of view, is it possible to have some clear error 
stack traces which helps to understand outOfMemoryError due to map limit 
exceeded or something ?

Attaching files,

Kafka latest config : [^config.json]
Map Counts monitored on one of the broker for 10 - 15 hrs: [^map_counts_agent06]
GC log file on one of the broker : [^kafkaServer-gc-agent06.7z]

Thanks in advance,
Kaushik



> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
> Attachments: config.json, kafkaServer-gc-agent06.7z, 
> kafkaServer-gc.log, kafkaServer-gc_agent03.log, kafkaServer-gc_agent04.log, 
> kafka_config.txt, map_counts_agent06, stderr_broker1.txt, stderr_broker2.txt, 
> stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
>   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>

[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-07 Thread kaushik srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241745#comment-16241745
 ] 

kaushik srinivas commented on KAFKA-6165:
-

Hi huxihx,

Thanks for the feedback.

Initially when the heap size was 8gb also, observed these issues. Do you think 
8gb is also a high value for our profiles ?
Any recommendation for "vm.max_map_count" to increase ?

We did not observe this issue when the throughput to kafka was lesser.
ie (Messages/Sec across all topics & partitions : 250k.
Bytes In/Sec across all topics & partitions : Approx 50 MB/sec.)

started observing with profiles like
(Messages/Sec across all topics & partitions : 600k.
Bytes In/Sec across all topics & partitions : Approx 120 MB/sec.)

Also find the gc log of all the three brokers attached.
broker 1: [^kafkaServer-gc.log]
broker 2: [^kafkaServer-gc_agent03.log]
broker 3: [^kafkaServer-gc_agent04.log]

Thanks & Regards,
-kaushik



> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
> Attachments: kafkaServer-gc.log, kafkaServer-gc_agent03.log, 
> kafkaServer-gc_agent04.log, kafka_config.txt, stderr_broker1.txt, 
> stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
>   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
>   at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
>   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
>   at 
> 

[jira] [Updated] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-07 Thread kaushik srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-6165:

Attachment: kafkaServer-gc.log
kafkaServer-gc_agent03.log
kafkaServer-gc_agent04.log

gc files of the 3 kafka brokers when the outOfMemoryError was occuring with 
high throughput data coming in to the kafka.

> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
> Attachments: kafkaServer-gc.log, kafkaServer-gc_agent03.log, 
> kafkaServer-gc_agent04.log, kafka_config.txt, stderr_broker1.txt, 
> stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
>   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
>   at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
>   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at 

[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-04 Thread kaushik srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239413#comment-16239413
 ] 

kaushik srinivas commented on KAFKA-6165:
-

Observed the no of file descriptors open over a period of time on one of the 
broker,
cat /proc/sys/fs/file-nr
47488   0   39340038

Did not observe it exceeding the limit.

> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
> Attachments: kafka_config.txt, stderr_broker1.txt, 
> stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
>   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
>   at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
>   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
>   at kafka.log.Log.roll(Log.scala:771)
>   at kafka.log.Log.maybeRoll(Log.scala:742)
> 

[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-03 Thread kaushik srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237555#comment-16237555
 ] 

kaushik srinivas commented on KAFKA-6165:
-

Tried with 12GB of Heap space.
Observed kafka brokers crashing again with,

[2017-11-03 08:02:12,424] FATAL [ReplicaFetcherThread-0-0], Disk error while 
replicating data for store_sales-15 (kafka.server.ReplicaFetcherThread)
kafka.common.KafkaStorageException: I/O exception in append to log 
'store_sales-15'
at kafka.log.Log.append(Log.scala:349)
at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:130)
at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:159)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:141)
at scala.Option.foreach(Option.scala:257)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:141)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:138)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:138)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:136)
at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
at kafka.log.AbstractIndex.(AbstractIndex.scala:61)
at kafka.log.TimeIndex.(TimeIndex.scala:55)
at kafka.log.LogSegment.(LogSegment.scala:68)
at kafka.log.Log.roll(Log.scala:776)
at kafka.log.Log.maybeRoll(Log.scala:742)
at kafka.log.Log.append(Log.scala:405)
... 16 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)

Observing 300k messages/sec on each broker (3 brokers) at the time of broker 
crash.

> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
>Priority: Major
> Attachments: kafka_config.txt, stderr_broker1.txt, 
> stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at 

[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-03 Thread kaushik srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237358#comment-16237358
 ] 

kaushik srinivas commented on KAFKA-6165:
-

Sure will try to reduce the heap size to 12gb.
Initially the config was 8gb heap.
But then observed outOfMemory issues more frequently.
Actually it was consuming around 10gb heap, that was the reason heap was 
increased to 16gb.

> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
>Priority: Major
> Attachments: kafka_config.txt, stderr_broker1.txt, 
> stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
>   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
>   at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
>   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
>   at 

[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-03 Thread kaushik srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237189#comment-16237189
 ] 

kaushik srinivas commented on KAFKA-6165:
-

java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)

> Kafka Brokers goes down with outOfMemoryError.
> --
>
> Key: KAFKA-6165
> URL: https://issues.apache.org/jira/browse/KAFKA-6165
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
> Environment: DCOS cluster with 4 agent nodes and 3 masters.
> agent machine config :
> RAM : 384 GB
> DISK : 4TB
>Reporter: kaushik srinivas
>Priority: Major
> Attachments: kafka_config.txt, stderr_broker1.txt, 
> stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt
>
>
> Performance testing kafka with end to end pipe lines of,
> Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
> Kafka Data Producer -> kafka -> flume -> hdfs -- stream2
> stream1 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> stream2 kafka configs :
> No of topics : 10
> No of partitions : 20 for all the topics
> Some important Kafka Configuration :
> "BROKER_MEM": "32768"(32GB)
> "BROKER_JAVA_HEAP": "16384"(16GB)
> "BROKER_COUNT": "3"
> "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
> "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
> "KAFKA_NUM_PARTITIONS": "20"
> "BROKER_DISK_SIZE": "5000" (5GB)
> "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
> "KAFKA_LOG_RETENTION_BYTES": "50"(5GB)
> Data Producer to kafka Throughput:
> message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
> topics/partitions.
> message size : approx 300 to 400 bytes.
> Issues observed with this configs:
> Issue 1:
> stack trace:
> [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
> unrecoverable I/O error while handling produce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> 'store_sales-16'
>   at kafka.log.Log.append(Log.scala:349)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
>   at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
>   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
>   at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
>   at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
>   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
>   at 
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at 
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
>   at kafka.log.Log.roll(Log.scala:771)
>   at kafka.log.Log.maybeRoll(Log.scala:742)
>   

[jira] [Created] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-03 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-6165:
---

 Summary: Kafka Brokers goes down with outOfMemoryError.
 Key: KAFKA-6165
 URL: https://issues.apache.org/jira/browse/KAFKA-6165
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.0
 Environment: DCOS cluster with 4 agent nodes and 3 masters.

agent machine config :
RAM : 384 GB
DISK : 4TB


Reporter: kaushik srinivas
Priority: Major
 Attachments: kafka_config.txt, stderr_broker1.txt, stderr_broker2.txt, 
stdout_broker1.txt, stdout_broker2.txt

Performance testing kafka with end to end pipe lines of,
Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
Kafka Data Producer -> kafka -> flume -> hdfs -- stream2

stream1 kafka configs :
No of topics : 10
No of partitions : 20 for all the topics

stream2 kafka configs :
No of topics : 10
No of partitions : 20 for all the topics

Some important Kafka Configuration :
"BROKER_MEM": "32768"(32GB)
"BROKER_JAVA_HEAP": "16384"(16GB)
"BROKER_COUNT": "3"
"KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
"KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
"KAFKA_NUM_PARTITIONS": "20"
"BROKER_DISK_SIZE": "5000" (5GB)
"KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
"KAFKA_LOG_RETENTION_BYTES": "50"(5GB)

Data Producer to kafka Throughput:

message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
topics/partitions.
message size : approx 300 to 400 bytes.

Issues observed with this configs:

Issue 1:

stack trace:

[2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
unrecoverable I/O error while handling produce request:  
(kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log 
'store_sales-16'
at kafka.log.Log.append(Log.scala:349)
at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
at 
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
at 
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
at 
kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
at 
kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
at 
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
at 
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
at 
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
at kafka.log.Log.roll(Log.scala:771)
at kafka.log.Log.maybeRoll(Log.scala:742)
at kafka.log.Log.append(Log.scala:405)
... 22 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
... 34 more


Issue 2 :

stack trace :

[2017-11-02 23:55:49,602] FATAL [ReplicaFetcherThread-0-0], Disk error while 
replicating data for catalog_sales-3 

  1   2   >