[jira] [Commented] (KAFKA-17101) Mirror maker internal topics cleanup policy changes to 'delete' from 'compact'
[ https://issues.apache.org/jira/browse/KAFKA-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866646#comment-17866646 ] kaushik srinivas commented on KAFKA-17101: -- It is MM2. We do not have any other services editing the configuration and this is not easily reproducible as well. > Mirror maker internal topics cleanup policy changes to 'delete' from > 'compact' > --- > > Key: KAFKA-17101 > URL: https://issues.apache.org/jira/browse/KAFKA-17101 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.4.1, 3.5.1, 3.6.1 >Reporter: kaushik srinivas >Priority: Major > > Scenario/Setup details > Kafka cluster 1: 3 replicas > Kafka cluster 2: 3 replicas > MM1 moving data from cluster 1 to cluster 2 > MM2 moving data from cluster 2 to cluster 1 > Sometimes with a reboot of the kafka cluster 1 and MM1 instance, we observe > MM failing to come up with below exception, > {code:java} > {"message":"DistributedHerder-connect-1-1 - > org.apache.kafka.connect.runtime.distributed.DistributedHerder - [Worker > clientId=connect-1, groupId=site1-mm2] Uncaught exception in herder work > thread, exiting: "}} > org.apache.kafka.common.config.ConfigException: Topic > 'mm2-offsets.site1.internal' supplied via the 'offset.storage.topic' property > is required to have 'cleanup.policy=compact' to guarantee consistency and > durability of source connector offsets, but found the topic currently has > 'cleanup.policy=delete'. Continuing would likely result in eventually losing > source connector offsets and problems restarting this Connect cluster in the > future. Change the 'offset.storage.topic' property in the Connect worker > configurations to use a topic with 'cleanup.policy=compact'. {code} > Once the topic is altered with cleanup policy of compact. MM works just fine. > This is happening on our setups sporadically and across varieties of > scenarios. Not been successful in identifying the exact reproduction steps as > of now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3
[ https://issues.apache.org/jira/browse/KAFKA-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-17079: - Priority: Major (was: Blocker) > scoverage plugin not found in maven repo, version 1.9.3 > --- > > Key: KAFKA-17079 > URL: https://issues.apache.org/jira/browse/KAFKA-17079 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Major > > Team, > Creating coverage reports for kafka core module. Below issue is seen. Branch > used is 3.6 > * Exception is: > org.gradle.api.tasks.TaskExecutionException: Execution failed for task > ':core:compileScoverageScala'. > at > org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52) > at > org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303) > at > org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463) > at > org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380) > at > org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) > at > org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47) > Caused by: > org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException: > Could not resolve all files for configuration ':core:scoverage'. > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496) > at > org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30) > at > org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74) > at > org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574) > at > org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) > at > org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133) > at >
[jira] [Commented] (KAFKA-17101) Mirror maker internal topics cleanup policy changes to 'delete' from 'compact'
[ https://issues.apache.org/jira/browse/KAFKA-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864935#comment-17864935 ] kaushik srinivas commented on KAFKA-17101: -- [~gharris1727] Below is the configuration. site1.ssl.keystore.filename=/etc/mirror-maker/secrets/ssl/site1/server.jks site1.ssl.truststore.filename=/etc/mirror-maker/secrets/ssl/site1/trustchain.jks site2.security.protocol=SSL site1->site2.heartbeats.topic.replication.factor=3 site1.ssl.enabled.protocols=TLSv1.2,TLSv1.3 syslog=false site2.ssl.keystore.filename=/etc/mirror-maker/secrets/ssl/site2/server.jks site2.consumer.ssl.cipher.suites= site1->site2.sync.topic.configs.interval.seconds=300 site1->site2.topics=.*_ALARM$,.*_INTERNAL_Intent_Changes$,product_INTERNAL_HAM_UPDATE$ site1->site2.replication.factor=3 site1->site2.emit.checkpoints.enabled=true site2.ssl.key.password=productkeystore tasks.max=1 site1.ssl.keystore.password=productkeystore site1.ssl.truststore.location=/etc/kafka/shared/site1_truststore site1.status.storage.replication.factor=3 site1.ssl.truststore.password=productkeystore site1->site2.sync.topic.acls.enabled=false site1->site2.refresh.topics.interval.seconds=300 site2.ssl.truststore.location=/etc/kafka/shared/site2_truststore site2.ssl.truststore.filename=/etc/mirror-maker/secrets/ssl/site2/trustchain.jks site1.ssl.protocol=TLSv1.2 site1->site2.replication.policy.class=RetainTopicNameReplicationPolicy site1->site2.groups=.* site1.security.protocol=SSL site1->site2.groups.blacklist=console-consumer-.*, connect-.*, __.* site1.config.storage.replication.factor=3 site1->site2.emit.hearbeats.enabled=true site2.ssl.truststore.password=productkeystore site1->site2.offset-syncs.topic.replication.factor=3 site1.ssl.keystore.location=/etc/kafka/shared/site1_keystore site1.offset.storage.replication.factor=3 clusters=site1,site2 site1.bootstrap.servers=product-kafka-headless:9092 site2.ssl.protocol=TLSv1.2 site1->site2.refresh.groups.interval.seconds=300 site2.ssl.enabled=true site2.ssl.enabled.protocols=TLSv1.2,TLSv1.3 site2.ssl.endpoint.identification.algorithm= site1.ssl.cipher.suites= site1->site2.checkpoints.topic.replication.factor=3 site1.producer.ssl.cipher.suites= site2.ssl.keystore.location=/etc/kafka/shared/site2_keystore site2.config.storage.replication.factor=3 site2.status.storage.replication.factor=3 site1.ssl.enabled=true site1->site2.enabled=true site1.ssl.endpoint.identification.algorithm= site1.admin.ssl.cipher.suites= site2.producer.ssl.cipher.suites= site2.ssl.keystore.password=productkeystore site2.ssl.cipher.suites= site2.offset.storage.replication.factor=3 site1.ssl.key.password=productkeystore site1.consumer.ssl.cipher.suites= site2.bootstrap.servers=product-kafka-headless:9097 site1->site2.sync.topic.configs.enabled=true site2.admin.ssl.cipher.suites= > Mirror maker internal topics cleanup policy changes to 'delete' from > 'compact' > --- > > Key: KAFKA-17101 > URL: https://issues.apache.org/jira/browse/KAFKA-17101 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.4.1, 3.5.1, 3.6.1 >Reporter: kaushik srinivas >Priority: Major > > Scenario/Setup details > Kafka cluster 1: 3 replicas > Kafka cluster 2: 3 replicas > MM1 moving data from cluster 1 to cluster 2 > MM2 moving data from cluster 2 to cluster 1 > Sometimes with a reboot of the kafka cluster 1 and MM1 instance, we observe > MM failing to come up with below exception, > {code:java} > {"message":"DistributedHerder-connect-1-1 - > org.apache.kafka.connect.runtime.distributed.DistributedHerder - [Worker > clientId=connect-1, groupId=site1-mm2] Uncaught exception in herder work > thread, exiting: "}} > org.apache.kafka.common.config.ConfigException: Topic > 'mm2-offsets.site1.internal' supplied via the 'offset.storage.topic' property > is required to have 'cleanup.policy=compact' to guarantee consistency and > durability of source connector offsets, but found the topic currently has > 'cleanup.policy=delete'. Continuing would likely result in eventually losing > source connector offsets and problems restarting this Connect cluster in the > future. Change the 'offset.storage.topic' property in the Connect worker > configurations to use a topic with 'cleanup.policy=compact'. {code} > Once the topic is altered with cleanup policy of compact. MM works just fine. > This is happening on our setups sporadically and across varieties of > scenarios. Not been successful in identifying the exact reproduction steps as > of now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17101) Mirror maker internal topics cleanup policy changes to 'delete' from 'compact'
kaushik srinivas created KAFKA-17101: Summary: Mirror maker internal topics cleanup policy changes to 'delete' from 'compact' Key: KAFKA-17101 URL: https://issues.apache.org/jira/browse/KAFKA-17101 Project: Kafka Issue Type: Bug Affects Versions: 3.6.1, 3.5.1, 3.4.1 Reporter: kaushik srinivas Scenario/Setup details Kafka cluster 1: 3 replicas Kafka cluster 2: 3 replicas MM1 moving data from cluster 1 to cluster 2 MM2 moving data from cluster 2 to cluster 1 Sometimes with a reboot of the kafka cluster 1 and MM1 instance, we observe MM failing to come up with below exception, {code:java} {"message":"DistributedHerder-connect-1-1 - org.apache.kafka.connect.runtime.distributed.DistributedHerder - [Worker clientId=connect-1, groupId=site1-mm2] Uncaught exception in herder work thread, exiting: "}} org.apache.kafka.common.config.ConfigException: Topic 'mm2-offsets.site1.internal' supplied via the 'offset.storage.topic' property is required to have 'cleanup.policy=compact' to guarantee consistency and durability of source connector offsets, but found the topic currently has 'cleanup.policy=delete'. Continuing would likely result in eventually losing source connector offsets and problems restarting this Connect cluster in the future. Change the 'offset.storage.topic' property in the Connect worker configurations to use a topic with 'cleanup.policy=compact'. {code} Once the topic is altered with cleanup policy of compact. MM works just fine. This is happening on our setups sporadically and across varieties of scenarios. Not been successful in identifying the exact reproduction steps as of now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3
[ https://issues.apache.org/jira/browse/KAFKA-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-17079: - Issue Type: Bug (was: Improvement) > scoverage plugin not found in maven repo, version 1.9.3 > --- > > Key: KAFKA-17079 > URL: https://issues.apache.org/jira/browse/KAFKA-17079 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Major > > Team, > Creating coverage reports for kafka core module. Below issue is seen. Branch > used is 3.6 > * Exception is: > org.gradle.api.tasks.TaskExecutionException: Execution failed for task > ':core:compileScoverageScala'. > at > org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52) > at > org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303) > at > org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463) > at > org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380) > at > org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) > at > org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47) > Caused by: > org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException: > Could not resolve all files for configuration ':core:scoverage'. > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496) > at > org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30) > at > org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74) > at > org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574) > at > org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) > at > org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133) > at >
[jira] [Updated] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3
[ https://issues.apache.org/jira/browse/KAFKA-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-17079: - Priority: Blocker (was: Major) > scoverage plugin not found in maven repo, version 1.9.3 > --- > > Key: KAFKA-17079 > URL: https://issues.apache.org/jira/browse/KAFKA-17079 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Blocker > > Team, > Creating coverage reports for kafka core module. Below issue is seen. Branch > used is 3.6 > * Exception is: > org.gradle.api.tasks.TaskExecutionException: Execution failed for task > ':core:compileScoverageScala'. > at > org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66) > at > org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59) > at > org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53) > at > org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73) > at > org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52) > at > org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317) > at > org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303) > at > org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463) > at > org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380) > at > org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) > at > org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47) > Caused by: > org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException: > Could not resolve all files for configuration ':core:scoverage'. > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496) > at > org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30) > at > org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74) > at > org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) > at > org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574) > at > org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) > at > org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133) > at >
[jira] [Created] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3
kaushik srinivas created KAFKA-17079: Summary: scoverage plugin not found in maven repo, version 1.9.3 Key: KAFKA-17079 URL: https://issues.apache.org/jira/browse/KAFKA-17079 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas Team, Creating coverage reports for kafka core module. Below issue is seen. Branch used is 3.6 * Exception is: org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':core:compileScoverageScala'. at org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38) at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77) at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55) at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52) at org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204) at org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199) at org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66) at org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59) at org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157) at org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59) at org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53) at org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73) at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52) at org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42) at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337) at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324) at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317) at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303) at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463) at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) at org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47) Caused by: org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException: Could not resolve all files for configuration ':core:scoverage'. at org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769) at org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176) at org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496) at org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30) at org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74) at org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) at org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574) at org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) at org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133) at org.gradle.api.internal.file.UnionFileCollection.visitChildren(UnionFileCollection.java:81) at org.gradle.api.internal.file.CompositeFileCollection.visitContents(CompositeFileCollection.java:133) at org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) at
[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
[ https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-16370: - Issue Type: Bug (was: Improvement) > offline rollback procedure from kraft mode to zookeeper mode. > - > > Key: KAFKA-16370 > URL: https://issues.apache.org/jira/browse/KAFKA-16370 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Major > > From the KIP, > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] > > h2. Finalizing the Migration > Once the cluster has been fully upgraded to KRaft mode, the controller will > still be running in migration mode and making dual writes to KRaft and ZK. > Since the data in ZK is still consistent with that of the KRaft metadata log, > it is still possible to revert back to ZK. > *_The time that the cluster is running all KRaft brokers/controllers, but > still running in migration mode, is effectively unbounded._* > Once the operator has decided to commit to KRaft mode, the final step is to > restart the controller quorum and take it out of migration mode by setting > _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The > active controller will only finalize the migration once it detects that all > members of the quorum have signaled that they are finalizing the migration > (again, using the tagged field in ApiVersionsResponse). Once the controller > leaves migration mode, it will write a ZkMigrationStateRecord to the log and > no longer perform writes to ZK. It will also disable its special handling of > ZK RPCs. > *At this point, the cluster is fully migrated and is running in KRaft mode. A > rollback to ZK is still possible after finalizing the migration, but it must > be done offline and it will cause metadata loss (which can also cause > partition data loss).* > > Trying out the same in a kafka cluster which is migrated from zookeeper into > kraft mode. We observe the rollback is possible by deleting the "/controller" > node in the zookeeper before the rollback from kraft mode to zookeeper is > done. > The above snippet indicates that the rollback from kraft to zk after > migration is finalized is still possible in offline method. Is there any > already known steps to be done as part of this offline method of rollback ? > From our experience, we currently know of the step "deletion of /controller > node in zookeeper to force zookeper based brokers to be elected as new > controller after the rollback is done". Are there any additional > steps/actions apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
[ https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-16370: - Issue Type: Wish (was: Improvement) > offline rollback procedure from kraft mode to zookeeper mode. > - > > Key: KAFKA-16370 > URL: https://issues.apache.org/jira/browse/KAFKA-16370 > Project: Kafka > Issue Type: Wish >Reporter: kaushik srinivas >Priority: Major > > From the KIP, > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] > > h2. Finalizing the Migration > Once the cluster has been fully upgraded to KRaft mode, the controller will > still be running in migration mode and making dual writes to KRaft and ZK. > Since the data in ZK is still consistent with that of the KRaft metadata log, > it is still possible to revert back to ZK. > *_The time that the cluster is running all KRaft brokers/controllers, but > still running in migration mode, is effectively unbounded._* > Once the operator has decided to commit to KRaft mode, the final step is to > restart the controller quorum and take it out of migration mode by setting > _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The > active controller will only finalize the migration once it detects that all > members of the quorum have signaled that they are finalizing the migration > (again, using the tagged field in ApiVersionsResponse). Once the controller > leaves migration mode, it will write a ZkMigrationStateRecord to the log and > no longer perform writes to ZK. It will also disable its special handling of > ZK RPCs. > *At this point, the cluster is fully migrated and is running in KRaft mode. A > rollback to ZK is still possible after finalizing the migration, but it must > be done offline and it will cause metadata loss (which can also cause > partition data loss).* > > Trying out the same in a kafka cluster which is migrated from zookeeper into > kraft mode. We observe the rollback is possible by deleting the "/controller" > node in the zookeeper before the rollback from kraft mode to zookeeper is > done. > The above snippet indicates that the rollback from kraft to zk after > migration is finalized is still possible in offline method. Is there any > already known steps to be done as part of this offline method of rollback ? > From our experience, we currently know of the step "deletion of /controller > node in zookeeper to force zookeper based brokers to be elected as new > controller after the rollback is done". Are there any additional > steps/actions apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
[ https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-16370: - Issue Type: Improvement (was: Wish) > offline rollback procedure from kraft mode to zookeeper mode. > - > > Key: KAFKA-16370 > URL: https://issues.apache.org/jira/browse/KAFKA-16370 > Project: Kafka > Issue Type: Improvement >Reporter: kaushik srinivas >Priority: Major > > From the KIP, > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] > > h2. Finalizing the Migration > Once the cluster has been fully upgraded to KRaft mode, the controller will > still be running in migration mode and making dual writes to KRaft and ZK. > Since the data in ZK is still consistent with that of the KRaft metadata log, > it is still possible to revert back to ZK. > *_The time that the cluster is running all KRaft brokers/controllers, but > still running in migration mode, is effectively unbounded._* > Once the operator has decided to commit to KRaft mode, the final step is to > restart the controller quorum and take it out of migration mode by setting > _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The > active controller will only finalize the migration once it detects that all > members of the quorum have signaled that they are finalizing the migration > (again, using the tagged field in ApiVersionsResponse). Once the controller > leaves migration mode, it will write a ZkMigrationStateRecord to the log and > no longer perform writes to ZK. It will also disable its special handling of > ZK RPCs. > *At this point, the cluster is fully migrated and is running in KRaft mode. A > rollback to ZK is still possible after finalizing the migration, but it must > be done offline and it will cause metadata loss (which can also cause > partition data loss).* > > Trying out the same in a kafka cluster which is migrated from zookeeper into > kraft mode. We observe the rollback is possible by deleting the "/controller" > node in the zookeeper before the rollback from kraft mode to zookeeper is > done. > The above snippet indicates that the rollback from kraft to zk after > migration is finalized is still possible in offline method. Is there any > already known steps to be done as part of this offline method of rollback ? > From our experience, we currently know of the step "deletion of /controller > node in zookeeper to force zookeper based brokers to be elected as new > controller after the rollback is done". Are there any additional > steps/actions apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
kaushik srinivas created KAFKA-16370: Summary: offline rollback procedure from kraft mode to zookeeper mode. Key: KAFKA-16370 URL: https://issues.apache.org/jira/browse/KAFKA-16370 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas >From the KIP, >[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] h2. Finalizing the Migration Once the cluster has been fully upgraded to KRaft mode, the controller will still be running in migration mode and making dual writes to KRaft and ZK. Since the data in ZK is still consistent with that of the KRaft metadata log, it is still possible to revert back to ZK. *_The time that the cluster is running all KRaft brokers/controllers, but still running in migration mode, is effectively unbounded._* Once the operator has decided to commit to KRaft mode, the final step is to restart the controller quorum and take it out of migration mode by setting _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active controller will only finalize the migration once it detects that all members of the quorum have signaled that they are finalizing the migration (again, using the tagged field in ApiVersionsResponse). Once the controller leaves migration mode, it will write a ZkMigrationStateRecord to the log and no longer perform writes to ZK. It will also disable its special handling of ZK RPCs. *At this point, the cluster is fully migrated and is running in KRaft mode. A rollback to ZK is still possible after finalizing the migration, but it must be done offline and it will cause metadata loss (which can also cause partition data loss).* Trying out the same in a kafka cluster which is migrated from zookeeper into kraft mode. We observe the rollback is possible by deleting the "/controller" node in the zookeeper before the rollback from kraft mode to zookeeper is done. The above snippet indicates that the rollback from kraft to zk after migration is finalized is still possible in offline method. Is there any already known steps to be done as part of this offline method of rollback ? >From our experience, we currently know of the step "deletion of /controller >node in zookeeper to force zookeper based brokers to be elected as new >controller after the rollback is done". Are there any additional steps/actions >apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16360) Release plan of 3.x kafka releases.
kaushik srinivas created KAFKA-16360: Summary: Release plan of 3.x kafka releases. Key: KAFKA-16360 URL: https://issues.apache.org/jira/browse/KAFKA-16360 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas KIP [https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-ReleaseTimeline] mentions , h2. Kafka 3.7 * January 2024 * Final release with ZK mode But we see in Jira, some tickets are marked for 3.8 release. Does apache continue to make 3.x releases having zookeeper and kraft supported independent of pure kraft 4.x releases ? If yes, how many more releases can be expected on 3.x release line ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-15223: - Priority: Major (was: Critical) > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1 >Reporter: kaushik srinivas >Priority: Major > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior releases of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-15223: - Issue Type: Improvement (was: Bug) > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior releases of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-5892) Connector property override does not work unless setting ALL converter properties
[ https://issues.apache.org/jira/browse/KAFKA-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765979#comment-17765979 ] kaushik srinivas commented on KAFKA-5892: - [~rhauch] [~jsahu] , Is this still open to be modified ? > Connector property override does not work unless setting ALL converter > properties > - > > Key: KAFKA-5892 > URL: https://issues.apache.org/jira/browse/KAFKA-5892 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Reporter: Yeva Byzek >Assignee: Jitendra Sahu >Priority: Minor > Labels: newbie > > A single connector setting override {{value.converter.schemas.enable=false}} > only takes effect if ALL of the converter properties are overridden in the > connector. > At minimum, we should give user warning or error that this is will be ignored. > We should also consider changing the behavior to allow the single property > override even if all the converter properties are not specified, but this > requires discussion to evaluate the impact of this change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755207#comment-17755207 ] kaushik srinivas commented on KAFKA-15223: -- Any updates on this ? > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior releases of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-15223: - Issue Type: Bug (was: Improvement) > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior releases of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750590#comment-17750590 ] kaushik srinivas commented on KAFKA-15223: -- [~ijuma] Can you please support us answering this query ? > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior releases of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747266#comment-17747266 ] kaushik srinivas commented on KAFKA-15223: -- Any updates on this ? > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior releases of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745419#comment-17745419 ] kaushik srinivas commented on KAFKA-15223: -- [~ijuma] , Can you please help us with this ? > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior releases of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-15223: - Affects Version/s: 3.4.1 3.5.0 3.3.2 3.3.1 3.4.0 > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.1, 3.3.2, 3.5.0, 3.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior releases of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-15223: - Summary: Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases. (was: Need clarity in documentation for upgrade/downgrade across releases.) > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Improvement >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below > "Restart the brokers one by one for the new protocol version to take effect. > {*}Once the brokers begin using the latest protocol version, it will no > longer be possible to downgrade the cluster to an older version.{*}" > > These two statements are repeated across a lot of prior release of kafka and > is confusing. > Below are the questions: > # Is downgrade not at all possible to *"any"* older version of kafka once > the inter.broker.protocol.version is updated to latest version *OR* > downgrades are not possible only to versions *"<2.1"* ? > # Suppose one takes an approach similar to upgrade even for the downgrade > path. i.e. downgrade the inter.broker.protocol.version first to the previous > version, next downgrade the software/code of kafka to previous release > revision. Does downgrade work with this approach ? > Can these two questions be documented if the results are already known ? > Maybe a downgrade guide can be created too similar to the existing upgrade > guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15223) Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
[ https://issues.apache.org/jira/browse/KAFKA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-15223: - Description: Referring to the upgrade documentation for apache kafka. [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] There is confusion with respect to below statements from the above sectioned link of apache docs. "If you are upgrading from a version prior to 2.1.x, please see the note below about the change to the schema used to store consumer offsets. *Once you have changed the inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a version prior to 2.1."* The above statement mentions that the downgrade would not be possible to version prior to "2.1" in case of "upgrading the inter.broker.protocol.version to the latest version". But, there is another statement made in the documentation in *point 4* as below "Restart the brokers one by one for the new protocol version to take effect. {*}Once the brokers begin using the latest protocol version, it will no longer be possible to downgrade the cluster to an older version.{*}" These two statements are repeated across a lot of prior releases of kafka and is confusing. Below are the questions: # Is downgrade not at all possible to *"any"* older version of kafka once the inter.broker.protocol.version is updated to latest version *OR* downgrades are not possible only to versions *"<2.1"* ? # Suppose one takes an approach similar to upgrade even for the downgrade path. i.e. downgrade the inter.broker.protocol.version first to the previous version, next downgrade the software/code of kafka to previous release revision. Does downgrade work with this approach ? Can these two questions be documented if the results are already known ? Maybe a downgrade guide can be created too similar to the existing upgrade guide ? was: Referring to the upgrade documentation for apache kafka. [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] There is confusion with respect to below statements from the above sectioned link of apache docs. "If you are upgrading from a version prior to 2.1.x, please see the note below about the change to the schema used to store consumer offsets. *Once you have changed the inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a version prior to 2.1."* The above statement mentions that the downgrade would not be possible to version prior to "2.1" in case of "upgrading the inter.broker.protocol.version to the latest version". But, there is another statement made in the documentation in *point 4* as below "Restart the brokers one by one for the new protocol version to take effect. {*}Once the brokers begin using the latest protocol version, it will no longer be possible to downgrade the cluster to an older version.{*}" These two statements are repeated across a lot of prior release of kafka and is confusing. Below are the questions: # Is downgrade not at all possible to *"any"* older version of kafka once the inter.broker.protocol.version is updated to latest version *OR* downgrades are not possible only to versions *"<2.1"* ? # Suppose one takes an approach similar to upgrade even for the downgrade path. i.e. downgrade the inter.broker.protocol.version first to the previous version, next downgrade the software/code of kafka to previous release revision. Does downgrade work with this approach ? Can these two questions be documented if the results are already known ? Maybe a downgrade guide can be created too similar to the existing upgrade guide ? > Need more clarity in documentation for upgrade/downgrade procedures and > limitations across releases. > > > Key: KAFKA-15223 > URL: https://issues.apache.org/jira/browse/KAFKA-15223 > Project: Kafka > Issue Type: Improvement >Reporter: kaushik srinivas >Priority: Critical > > Referring to the upgrade documentation for apache kafka. > [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] > There is confusion with respect to below statements from the above sectioned > link of apache docs. > "If you are upgrading from a version prior to 2.1.x, please see the note > below about the change to the schema used to store consumer offsets. *Once > you have changed the inter.broker.protocol.version to the latest version, it > will not be possible to downgrade to a version prior to 2.1."* > The above statement mentions that the downgrade would not be possible to > version prior to "2.1" in case of "upgrading the > inter.broker.protocol.version to the latest version". > But, there is another statement made in the documentation in *point 4* as > below >
[jira] [Created] (KAFKA-15223) Need clarity in documentation for upgrade/downgrade across releases.
kaushik srinivas created KAFKA-15223: Summary: Need clarity in documentation for upgrade/downgrade across releases. Key: KAFKA-15223 URL: https://issues.apache.org/jira/browse/KAFKA-15223 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas Referring to the upgrade documentation for apache kafka. [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] There is confusion with respect to below statements from the above sectioned link of apache docs. "If you are upgrading from a version prior to 2.1.x, please see the note below about the change to the schema used to store consumer offsets. *Once you have changed the inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a version prior to 2.1."* The above statement mentions that the downgrade would not be possible to version prior to "2.1" in case of "upgrading the inter.broker.protocol.version to the latest version". But, there is another statement made in the documentation in *point 4* as below "Restart the brokers one by one for the new protocol version to take effect. {*}Once the brokers begin using the latest protocol version, it will no longer be possible to downgrade the cluster to an older version.{*}" These two statements are repeated across a lot of prior release of kafka and is confusing. Below are the questions: # Is downgrade not at all possible to *"any"* older version of kafka once the inter.broker.protocol.version is updated to latest version *OR* downgrades are not possible only to versions *"<2.1"* ? # Suppose one takes an approach similar to upgrade even for the downgrade path. i.e. downgrade the inter.broker.protocol.version first to the previous version, next downgrade the software/code of kafka to previous release revision. Does downgrade work with this approach ? Can these two questions be documented if the results are already known ? Maybe a downgrade guide can be created too similar to the existing upgrade guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-9542) ZSTD Compression Not Working
[ https://issues.apache.org/jira/browse/KAFKA-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485114#comment-17485114 ] kaushik srinivas commented on KAFKA-9542: - [~lucasbradstreet] We see this issue even with latest kafka and also fails with readOnlyRootFileSystem /tmp path. Do you see any possible issues in relocating this tmpdir to anything apart from system /tmp path ? > ZSTD Compression Not Working > > > Key: KAFKA-9542 > URL: https://issues.apache.org/jira/browse/KAFKA-9542 > Project: Kafka > Issue Type: Bug > Components: compression >Affects Versions: 2.3.0 > Environment: Linux, CentOS >Reporter: Prashant >Priority: Critical > > I enabled zstd compression at producer by adding "compression.type=zstd" in > producer config. When try to run it, producer fails with > "org.apache.kafka.common.errors.UnknownServerException: The server > experienced an unexpected error when processing the request" > In Broker Logs, I could find following exception: > > [2020-02-12 11:48:04,623] ERROR [ReplicaManager broker=1] Error processing > append operation on partition load_logPlPts-6 (kafka.server.ReplicaManager) > org.apache.kafka.common.KafkaException: java.lang.NoClassDefFoundError: Could > not initialize class > org.apache.kafka.common.record.CompressionType$ZstdConstructors > at > org.apache.kafka.common.record.CompressionType$5.wrapForInput(CompressionType.java:133) > at > org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:257) > at > org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:324) > at > scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:54) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > kafka.log.LogValidator$$anonfun$validateMessagesAndAssignOffsetsCompressed$1.apply(LogValidator.scala:269) > at > kafka.log.LogValidator$$anonfun$validateMessagesAndAssignOffsetsCompressed$1.apply(LogValidator.scala:261) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:72) > at kafka.log.Log$$anonfun$append$2.liftedTree1$1(Log.scala:869) > at kafka.log.Log$$anonfun$append$2.apply(Log.scala:868) > at kafka.log.Log$$anonfun$append$2.apply(Log.scala:850) > at kafka.log.Log.maybeHandleIOException(Log.scala:2065) > at kafka.log.Log.append(Log.scala:850) > at kafka.log.Log.appendAsLeader(Log.scala:819) > at kafka.cluster.Partition$$anonfun$14.apply(Partition.scala:771) > at kafka.cluster.Partition$$anonfun$14.apply(Partition.scala:759) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) > > This is fresh broker installed on "CentOS Linux" v7. This doesn't seem to be > a classpath issue as same package is working on MacOS. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (KAFKA-13578) Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500
[ https://issues.apache.org/jira/browse/KAFKA-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470385#comment-17470385 ] kaushik srinivas edited comment on KAFKA-13578 at 1/7/22, 7:25 AM: --- [~ijuma] [~cmccabe] Can you provide some insights on this as we are considering this upgrade in kafka. was (Author: kaushik srinivas): [~ijuma] Can you provide some insights on this as we are considering this upgrade in kafka. > Need clarity on the bridge release version of kafka without zookeeper in the > eco system - KIP500 > > > Key: KAFKA-13578 > URL: https://issues.apache.org/jira/browse/KAFKA-13578 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Critical > > We see from the KIP-500 > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum] > > Below statement needs some clarity, > *We will be able to upgrade from any version of Kafka to this bridge release, > and from the bridge release to a post-ZK release. When upgrading from an > earlier release to a post-ZK release, the upgrade must be done in two steps: > first, you must upgrade to the bridge release, and then you must upgrade to > the post-ZK release.* > > What apache kafka version is referred to as bridge release in this context ? > can apache kafka 3.0.0 be considered bridge release even though it is not > production ready to run without zookeeper ? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KAFKA-13578) Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500
[ https://issues.apache.org/jira/browse/KAFKA-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470385#comment-17470385 ] kaushik srinivas commented on KAFKA-13578: -- [~ijuma] Can you provide some insights on this as we are considering this upgrade in kafka. > Need clarity on the bridge release version of kafka without zookeeper in the > eco system - KIP500 > > > Key: KAFKA-13578 > URL: https://issues.apache.org/jira/browse/KAFKA-13578 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Critical > > We see from the KIP-500 > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum] > > Below statement needs some clarity, > *We will be able to upgrade from any version of Kafka to this bridge release, > and from the bridge release to a post-ZK release. When upgrading from an > earlier release to a post-ZK release, the upgrade must be done in two steps: > first, you must upgrade to the bridge release, and then you must upgrade to > the post-ZK release.* > > What apache kafka version is referred to as bridge release in this context ? > can apache kafka 3.0.0 be considered bridge release even though it is not > production ready to run without zookeeper ? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13578) Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500
kaushik srinivas created KAFKA-13578: Summary: Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500 Key: KAFKA-13578 URL: https://issues.apache.org/jira/browse/KAFKA-13578 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas We see from the KIP-500 [https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum] Below statement needs some clarity, *We will be able to upgrade from any version of Kafka to this bridge release, and from the bridge release to a post-ZK release. When upgrading from an earlier release to a post-ZK release, the upgrade must be done in two steps: first, you must upgrade to the bridge release, and then you must upgrade to the post-ZK release.* What apache kafka version is referred to as bridge release in this context ? can apache kafka 3.0.0 be considered bridge release even though it is not production ready to run without zookeeper ? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers
[ https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas resolved KAFKA-13177. -- Resolution: Not A Bug > partition failures and fewer shrink but a lot of isr expansions with > increased num.replica.fetchers in kafka brokers > > > Key: KAFKA-13177 > URL: https://issues.apache.org/jira/browse/KAFKA-13177 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Assignee: kaushik srinivas >Priority: Major > > Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s) > topics : 15, partitions each : 15 replication factor 3, min.insync.replicas > : 2 > producers running with acks : all > Initially the num.replica.fetchers was set to 1 (default) and we observed > very frequent ISR shrinks and expansions. So the setups were tuned with a > higher value of 4. > Once after this change was done, we see below behavior and warning msgs in > broker logs > # Over a period of 2 days, there are around 10 shrinks corresponding to 10 > partitions, but around 700 ISR expansions corresponding to almost all > partitions in the cluster(approx 50 to 60 partitions). > # we see frequent warn msg of partitions being marked as failure in the same > time span. Below is the trace --> {"type":"log", "host":"ww", > "level":"WARN", "neid":"kafka-ww", "system":"kafka", > "time":"2021-08-03T20:09:15.340", "timezone":"UTC", > "log":{"message":"ReplicaFetcherThread-2-1003 - > kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, > leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}* > > We see the above behavior continuously after increasing the > num.replica.fetchers to 4 from 1. We did increase this to improve the > replication performance and hence reduce the ISR shrinks. > But we see this strange behavior after the change. What would the above trace > indicate and is marking partitions as failed just a WARN msgs and handled by > kafka or is it something to worry about ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers
[ https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395873#comment-17395873 ] kaushik srinivas commented on KAFKA-13177: -- This is a expected behavior to occur. This happens when the epoch of the leader changes either due to bouncing of brokers or controllers. replica fetchers would be removed and added when ever these epoch changes. These logs vanished once after the brokers are up and running. And is expected to happen when the brokers restart or reassignment happens. > partition failures and fewer shrink but a lot of isr expansions with > increased num.replica.fetchers in kafka brokers > > > Key: KAFKA-13177 > URL: https://issues.apache.org/jira/browse/KAFKA-13177 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Assignee: kaushik srinivas >Priority: Major > > Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s) > topics : 15, partitions each : 15 replication factor 3, min.insync.replicas > : 2 > producers running with acks : all > Initially the num.replica.fetchers was set to 1 (default) and we observed > very frequent ISR shrinks and expansions. So the setups were tuned with a > higher value of 4. > Once after this change was done, we see below behavior and warning msgs in > broker logs > # Over a period of 2 days, there are around 10 shrinks corresponding to 10 > partitions, but around 700 ISR expansions corresponding to almost all > partitions in the cluster(approx 50 to 60 partitions). > # we see frequent warn msg of partitions being marked as failure in the same > time span. Below is the trace --> {"type":"log", "host":"ww", > "level":"WARN", "neid":"kafka-ww", "system":"kafka", > "time":"2021-08-03T20:09:15.340", "timezone":"UTC", > "log":{"message":"ReplicaFetcherThread-2-1003 - > kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, > leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}* > > We see the above behavior continuously after increasing the > num.replica.fetchers to 4 from 1. We did increase this to improve the > replication performance and hence reduce the ISR shrinks. > But we see this strange behavior after the change. What would the above trace > indicate and is marking partitions as failed just a WARN msgs and handled by > kafka or is it something to worry about ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers
[ https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas reassigned KAFKA-13177: Assignee: kaushik srinivas > partition failures and fewer shrink but a lot of isr expansions with > increased num.replica.fetchers in kafka brokers > > > Key: KAFKA-13177 > URL: https://issues.apache.org/jira/browse/KAFKA-13177 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Assignee: kaushik srinivas >Priority: Major > > Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s) > topics : 15, partitions each : 15 replication factor 3, min.insync.replicas > : 2 > producers running with acks : all > Initially the num.replica.fetchers was set to 1 (default) and we observed > very frequent ISR shrinks and expansions. So the setups were tuned with a > higher value of 4. > Once after this change was done, we see below behavior and warning msgs in > broker logs > # Over a period of 2 days, there are around 10 shrinks corresponding to 10 > partitions, but around 700 ISR expansions corresponding to almost all > partitions in the cluster(approx 50 to 60 partitions). > # we see frequent warn msg of partitions being marked as failure in the same > time span. Below is the trace --> {"type":"log", "host":"ww", > "level":"WARN", "neid":"kafka-ww", "system":"kafka", > "time":"2021-08-03T20:09:15.340", "timezone":"UTC", > "log":{"message":"ReplicaFetcherThread-2-1003 - > kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, > leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}* > > We see the above behavior continuously after increasing the > num.replica.fetchers to 4 from 1. We did increase this to improve the > replication performance and hence reduce the ISR shrinks. > But we see this strange behavior after the change. What would the above trace > indicate and is marking partitions as failed just a WARN msgs and handled by > kafka or is it something to worry about ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13178) frequent network_exception trace at kafka producer.
kaushik srinivas created KAFKA-13178: Summary: frequent network_exception trace at kafka producer. Key: KAFKA-13178 URL: https://issues.apache.org/jira/browse/KAFKA-13178 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas Running 3 node kafka cluster (4 cores and 4 cpu in k8s). topics : 15, partitions each : 15 , replication factor : 3. min.insync.replicas : 2 producer is running with acks : "all" We see frequent failures in the kafka producer with below trace {"host":"ww","level":"WARN","log":{"classname":"org.apache.kafka.clients.producer.internals.Sender:595","message":"[Producer clientId=producer-1] Got error produce response with correlation id 2646 on topic-partition *-0, retrying (2 attempts left). Error: NETWORK_EXCEPTION","stacktrace":"","threadname":"kafka-producer-network-thread | producer-1"},"time":"2021-08-*04T02:22:20.529Z","timezone":"UTC","type":"log","system":"w","systemid":"3"} What could be possible reasons for the above trace ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers
kaushik srinivas created KAFKA-13177: Summary: partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers Key: KAFKA-13177 URL: https://issues.apache.org/jira/browse/KAFKA-13177 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s) topics : 15, partitions each : 15 replication factor 3, min.insync.replicas : 2 producers running with acks : all Initially the num.replica.fetchers was set to 1 (default) and we observed very frequent ISR shrinks and expansions. So the setups were tuned with a higher value of 4. Once after this change was done, we see below behavior and warning msgs in broker logs # Over a period of 2 days, there are around 10 shrinks corresponding to 10 partitions, but around 700 ISR expansions corresponding to almost all partitions in the cluster(approx 50 to 60 partitions). # we see frequent warn msg of partitions being marked as failure in the same time span. Below is the trace --> {"type":"log", "host":"ww", "level":"WARN", "neid":"kafka-ww", "system":"kafka", "time":"2021-08-03T20:09:15.340", "timezone":"UTC", "log":{"message":"ReplicaFetcherThread-2-1003 - kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}* We see the above behavior continuously after increasing the num.replica.fetchers to 4 from 1. We did increase this to improve the replication performance and hence reduce the ISR shrinks. But we see this strange behavior after the change. What would the above trace indicate and is marking partitions as failed just a WARN msgs and handled by kafka or is it something to worry about ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13176) frequent ISR shrinks and expansion with default num.replica.fetchers (1) under very low throughput conditions.
kaushik srinivas created KAFKA-13176: Summary: frequent ISR shrinks and expansion with default num.replica.fetchers (1) under very low throughput conditions. Key: KAFKA-13176 URL: https://issues.apache.org/jira/browse/KAFKA-13176 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas Running a 3 node kafka cluster (2.3.x kafka) with 4 cores of cpu and 4Gi of memory on a k8s environment. num.replica.fetchers is configured to 1 (default value). There are around 15 topics in the cluster and all of them receive a very low rate of records/sec (less than 100 per second most of the cases). All the topics have more than 10 partitions and 3 replication each. min.insync.replicas is set to 2. And producers are run with acks level set to 'all'. we constantly observer ISR shrinks and expansions for almost each topic partition continuously. shrinks and expansions are mostly seperated by around 6 to 8 seconds mostly usually. During these shrinks and expands we see a lot of request time outs at the kafka producer side for these topics. any known configuration items we can use to overcome this ? Confused about the fact of continuous ISR shrinks and expands with very low throughput topics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12858) dynamically update the ssl certificates of kafka connect worker without restarting connect process.
[ https://issues.apache.org/jira/browse/KAFKA-12858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352986#comment-17352986 ] kaushik srinivas commented on KAFKA-12858: -- [~ijuma] Can you provide us your inputs on this requirement. > dynamically update the ssl certificates of kafka connect worker without > restarting connect process. > --- > > Key: KAFKA-12858 > URL: https://issues.apache.org/jira/browse/KAFKA-12858 > Project: Kafka > Issue Type: Improvement >Reporter: kaushik srinivas >Priority: Major > > Hi, > > We are trying to update the ssl certificates of kafka connect worker which is > due for expiry. Is there any way to dynamically update the ssl certificate of > connet worker as it is possible in kafka using kafka-configs.sh script ? > If not, what is the recommended way to update the ssl certificates of kafka > connect worker without disrupting the existing traffic ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12858) dynamically update the ssl certificates of kafka connect worker without restarting connect process.
kaushik srinivas created KAFKA-12858: Summary: dynamically update the ssl certificates of kafka connect worker without restarting connect process. Key: KAFKA-12858 URL: https://issues.apache.org/jira/browse/KAFKA-12858 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas Hi, We are trying to update the ssl certificates of kafka connect worker which is due for expiry. Is there any way to dynamically update the ssl certificate of connet worker as it is possible in kafka using kafka-configs.sh script ? If not, what is the recommended way to update the ssl certificates of kafka connect worker without disrupting the existing traffic ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12855) Update ssl certificates of kafka connect worker runtime without restarting the worker process.
kaushik srinivas created KAFKA-12855: Summary: Update ssl certificates of kafka connect worker runtime without restarting the worker process. Key: KAFKA-12855 URL: https://issues.apache.org/jira/browse/KAFKA-12855 Project: Kafka Issue Type: Improvement Components: KafkaConnect Reporter: kaushik srinivas Is there a possibility to update the ssl certificates of kafka connect worker dynamically something similar to kafka-configs script for kafka ? Or the only way to update the certificates is to restart the worker processes and update the certificates ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350264#comment-17350264 ] kaushik srinivas commented on KAFKA-12534: -- Hi [~cricket007] To update more on this, the test works fine when the certificate duration/expiry is extended and the same sequence of steps were performed to regenerate the keystore of the kafka broker and updated using kafka-configs.sh script. The script fails only when the keystore password is being changed. -Kaushik. > kafka-configs does not work with ssl enabled kafka broker. > -- > > Key: KAFKA-12534 > URL: https://issues.apache.org/jira/browse/KAFKA-12534 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: kaushik srinivas >Priority: Critical > > We are trying to change the trust store password on the fly using the > kafka-configs script for a ssl enabled kafka broker. > Below is the command used: > kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx' > But we see below error in the broker logs when the command is run. > {"type":"log", "host":"kf-2-0", "level":"INFO", > "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", > "time":"2021-03-23T12:14:40.055", "timezone":"UTC", > "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 > - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] > Failed authentication with /127.0.0.1 (SSL handshake failed)"}} > How can anyone configure ssl certs for the kafka-configs script and succeed > with the ssl handshake in this case ? > Note : > We are trying with a single listener i.e SSL: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347483#comment-17347483 ] kaushik srinivas commented on KAFKA-12534: -- Hi [~cricket007] Did you get a chance to reproduce this issue ? Need your support. Thanks Kaushik. > kafka-configs does not work with ssl enabled kafka broker. > -- > > Key: KAFKA-12534 > URL: https://issues.apache.org/jira/browse/KAFKA-12534 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: kaushik srinivas >Priority: Critical > > We are trying to change the trust store password on the fly using the > kafka-configs script for a ssl enabled kafka broker. > Below is the command used: > kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx' > But we see below error in the broker logs when the command is run. > {"type":"log", "host":"kf-2-0", "level":"INFO", > "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", > "time":"2021-03-23T12:14:40.055", "timezone":"UTC", > "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 > - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] > Failed authentication with /127.0.0.1 (SSL handshake failed)"}} > How can anyone configure ssl certs for the kafka-configs script and succeed > with the ssl handshake in this case ? > Note : > We are trying with a single listener i.e SSL: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345886#comment-17345886 ] kaushik srinivas edited comment on KAFKA-12534 at 5/17/21, 5:53 AM: Hi [~cricket007] , We have tried the exact steps. Captured the commands and logs in detail. The scenario to change the keystore password does not work still. sequence of steps to reproduce # install kafka broker by generating a CA, truststore and keystore. (password for stores: 123456) # re generate the keystore with a new password (1234567). Use the same old generated CA and trust store from step1. # issue the dynamic reconfig command after replacing the keystore file in the specified location. # dynamic config command issued: {code:java} ./kafka-configs --bootstrap-server kafkabroker:9092 --command-config ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config 'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore' {code} Note: listener name is ssl and is in the format specified in [https://docs.confluent.io/platform/current/kafka/dynamic-config.html#updating-ssl-keystore-of-an-existing-listener] # command fails with following trace {code:java} Error while executing config command with args '--bootstrap-server kafkabroker:9092 --command-config ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore' java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338) at kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308) at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85) at kafka.admin.ConfigCommand.main(ConfigCommand.scala) Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration {code} Kafka broker logs the below output {code:java} { "timezone":"UTC", "log":{"message":"data-plane-kafka-request-handler-5 - kafka.server.AdminManager - [Admin Manager on Broker 1001]: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration"}} {code} As per docs, the CA is not supposed to be changed and we have maintained that and the CA and trust stores are not changed. Also another observation is that, when for example the country name in the cert generation is changed and the certificate is regenerated, the dynamic config command works fine and we could see the ssl certs being reloaded in the kafka broker logs. But when the keystore password is changed, things have never worked for us even after so many attempts of retries. Can you please help in reproducing this issue and provide some detailed steps if possible for the case where the keystore's password is being changed ? It has clearly never worked for us, even after many attempts. was (Author: kaushik srinivas): Hi, We have tried the exact steps. Captured the commands and logs in detail. The scenario to change the keystore password does not work still. sequence of steps to reproduce # install kafka broker by generating a CA, truststore and keystore. (password for stores: 123456) # re generate the keystore with a new password (1234567). Use the same old generated CA and trust store from step1. # issue the dynamic reconfig command
[jira] [Comment Edited] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345886#comment-17345886 ] kaushik srinivas edited comment on KAFKA-12534 at 5/17/21, 5:52 AM: Hi, We have tried the exact steps. Captured the commands and logs in detail. The scenario to change the keystore password does not work still. sequence of steps to reproduce # install kafka broker by generating a CA, truststore and keystore. (password for stores: 123456) # re generate the keystore with a new password (1234567). Use the same old generated CA and trust store from step1. # issue the dynamic reconfig command after replacing the keystore file in the specified location. # dynamic config command issued: {code:java} ./kafka-configs --bootstrap-server kafkabroker:9092 --command-config ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config 'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore' {code} Note: listener name is ssl and is in the format specified in [https://docs.confluent.io/platform/current/kafka/dynamic-config.html#updating-ssl-keystore-of-an-existing-listener] # command fails with following trace {code:java} Error while executing config command with args '--bootstrap-server kafkabroker:9092 --command-config ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore' java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338) at kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308) at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85) at kafka.admin.ConfigCommand.main(ConfigCommand.scala) Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration {code} Kafka broker logs the below output {code:java} { "timezone":"UTC", "log":{"message":"data-plane-kafka-request-handler-5 - kafka.server.AdminManager - [Admin Manager on Broker 1001]: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration"}} {code} As per docs, the CA is not supposed to be changed and we have maintained that and the CA and trust stores are not changed. Also another observation is that, when for example the country name in the cert generation is changed and the certificate is regenerated, the dynamic config command works fine and we could see the ssl certs being reloaded in the kafka broker logs. But when the keystore password is changed, things have never worked for us even after so many attempts of retries. Can you please help in reproducing this issue and provide some detailed steps if possible for the case where the keystore's password is being changed ? It has clearly never worked for us, even after many attempts. was (Author: kaushik srinivas): Hi, We have tried the exact steps. Captured the commands and logs in detail. The scenario to change the keystore password does not work still. sequence of steps to reproduce # install kafka broker by generating a CA, truststore and keystore. (password for stores: 123456) # re generate the keystore with a new password (1234567). Use the same old generated CA and trust store from step1. # issue the dynamic reconfig command after replacing
[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345886#comment-17345886 ] kaushik srinivas commented on KAFKA-12534: -- Hi, We have tried the exact steps. Captured the commands and logs in detail. The scenario to change the keystore password does not work still. sequence of steps to reproduce # install kafka broker by generating a CA, truststore and keystore. (password for stores: 123456) # re generate the keystore with a new password (1234567). Use the same old generated CA and trust store from step1. # issue the dynamic reconfig command after replacing the keystore file in the specified location. # dynamic config command issued: {code:java} ./kafka-configs --bootstrap-server kafkabroker:9092 --command-config ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config 'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore' {code} Note: listener name is ssl and is in the format specified in [https://docs.confluent.io/platform/current/kafka/dynamic-config.html#updating-ssl-keystore-of-an-existing-listener] # command fails with following trace {code:java} Error while executing config command with args '--bootstrap-server kafkabroker:9092 --command-config ssl.properties --entity-type brokers --entity-name 1001 --alter --add-config listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.keystore.location=/etc/kafka/secrets/ssl/keyStore' java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338) at kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308) at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85) at kafka.admin.ConfigCommand.main(ConfigCommand.scala) Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration {code} Kafka broker logs the below output {code:java} { "timezone":"UTC", "log":{"message":"data-plane-kafka-request-handler-5 - kafka.server.AdminManager - [Admin Manager on Broker 1001]: Invalid config value for resource ConfigResource(type=BROKER, name='1001'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration"}} {code} {code:java} {code} > kafka-configs does not work with ssl enabled kafka broker. > -- > > Key: KAFKA-12534 > URL: https://issues.apache.org/jira/browse/KAFKA-12534 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: kaushik srinivas >Priority: Critical > > We are trying to change the trust store password on the fly using the > kafka-configs script for a ssl enabled kafka broker. > Below is the command used: > kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx' > But we see below error in the broker logs when the command is run. > {"type":"log", "host":"kf-2-0", "level":"INFO", > "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", > "time":"2021-03-23T12:14:40.055", "timezone":"UTC", > "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 > - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] > Failed authentication with /127.0.0.1 (SSL handshake failed)"}} > How can anyone configure ssl certs for the kafka-configs script and
[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343135#comment-17343135 ] kaushik srinivas commented on KAFKA-12534: -- Hi [~cricket007] We have tried this out. File permissions are ok and the double slash also is corrected. Still we face the same. what do you suggest. > kafka-configs does not work with ssl enabled kafka broker. > -- > > Key: KAFKA-12534 > URL: https://issues.apache.org/jira/browse/KAFKA-12534 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: kaushik srinivas >Priority: Critical > > We are trying to change the trust store password on the fly using the > kafka-configs script for a ssl enabled kafka broker. > Below is the command used: > kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx' > But we see below error in the broker logs when the command is run. > {"type":"log", "host":"kf-2-0", "level":"INFO", > "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", > "time":"2021-03-23T12:14:40.055", "timezone":"UTC", > "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 > - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] > Failed authentication with /127.0.0.1 (SSL handshake failed)"}} > How can anyone configure ssl certs for the kafka-configs script and succeed > with the ssl handshake in this case ? > Note : > We are trying with a single listener i.e SSL: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320966#comment-17320966 ] kaushik srinivas edited comment on KAFKA-12534 at 4/14/21, 12:42 PM: - Hi [~cricket007] , We tried to change the keystore password and key pass for one of the kafka broker. below is the command used, ./kafka-configs --bootstrap-server xx:9092 --command-config ssl.xt --entity-type brokers --entity-name 1007 --alter --add-config 'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567' contents of command config file ssl.xt [root@vm-10-75-112-163 bin]# cat ssl.xt ssl.key.password=123456 ssl.keystore.location=/securityFiles/ssl/kafka.client.keystore.jks ssl.keystore.password=123456 ssl.truststore.location=/securityFiles/ssl/kafka.client.truststore.jks ssl.truststore.password=123456 security.protocol=SSL note: We have keystores created one for kafka broker and one for admin client. the password for admin client keystore file is 123456. And this is what is configured in the command config file. But we see below output when we run this command {code:java} hreads appropriately using -XX:ParallelGCThreads=N [2021-04-14 12:24:39,705] WARN The configuration 'ssl.truststore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2021-04-14 12:24:39,708] WARN The configuration 'ssl.keystore.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2021-04-14 12:24:39,710] WARN The configuration 'ssl.key.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2021-04-14 12:24:39,710] WARN The configuration 'ssl.keystore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2021-04-14 12:24:39,710] WARN The configuration 'ssl.truststore.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) Error while executing config command with args '--bootstrap-server x:9092 --command-config ssl.xt --entity-type brokers --entity-name 1007 --alter --add-config listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567' java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1007'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338) at kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308) at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85) at kafka.admin.ConfigCommand.main(ConfigCommand.scala) Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1007'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration {code} It says WARN The configuration 'ssl.truststore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) Also the new keystore is encrypted with the new password and still we observe that the validation has failed. Note : the server.properties file is not updated with the latest password. It is still referring to the old keystore and key passwords. Can you help us in this. -kaushik was (Author: kaushik srinivas): Hi [~cricket007] , We tried to change the keystore password and key pass for one of the kafka broker. below is the command used, ./kafka-configs --bootstrap-server xx:9092 --command-config ssl.xt --entity-type brokers --entity-name 1007 --alter --add-config 'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567' contents of command config file ssl.xt [root@vm-10-75-112-163 bin]# cat ssl.xt ssl.key.password=123456 ssl.keystore.location=/securityFiles/ssl/kafka.client.keystore.jks
[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320966#comment-17320966 ] kaushik srinivas commented on KAFKA-12534: -- Hi [~cricket007] , We tried to change the keystore password and key pass for one of the kafka broker. below is the command used, ./kafka-configs --bootstrap-server xx:9092 --command-config ssl.xt --entity-type brokers --entity-name 1007 --alter --add-config 'listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567' contents of command config file ssl.xt [root@vm-10-75-112-163 bin]# cat ssl.xt ssl.key.password=123456 ssl.keystore.location=/securityFiles/ssl/kafka.client.keystore.jks ssl.keystore.password=123456 ssl.truststore.location=/securityFiles/ssl/kafka.client.truststore.jks ssl.truststore.password=123456 security.protocol=SSL note: We have keystores created one for kafka broker and one for admin client. the password for admin client keystore file is 123456. And this is what is configured in the command config file. But we see below output when we run this command {code:java} hreads appropriately using -XX:ParallelGCThreads=N [2021-04-14 12:24:39,705] WARN The configuration 'ssl.truststore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2021-04-14 12:24:39,708] WARN The configuration 'ssl.keystore.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2021-04-14 12:24:39,710] WARN The configuration 'ssl.key.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2021-04-14 12:24:39,710] WARN The configuration 'ssl.keystore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2021-04-14 12:24:39,710] WARN The configuration 'ssl.truststore.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) Error while executing config command with args '--bootstrap-server x:9092 --command-config ssl.xt --entity-type brokers --entity-name 1007 --alter --add-config listener.name.ssl.ssl.keystore.password=1234567,listener.name.ssl.ssl.key.password=1234567' java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1007'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) at kafka.admin.ConfigCommand$.alterBrokerConfig(ConfigCommand.scala:338) at kafka.admin.ConfigCommand$.processBrokerConfig(ConfigCommand.scala:308) at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:85) at kafka.admin.ConfigCommand.main(ConfigCommand.scala) Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='1007'): Invalid value org.apache.kafka.common.config.ConfigException: Validation of dynamic config update of SSLFactory failed: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /etc/kafka/secrets//ssl/keyStore of type JKS for configuration Invalid dynamic configuration {code} It says WARN The configuration 'ssl.truststore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) Also the new keystore is encrypted with the new password and still we observe that the validation has failed. Can you help us in this. -kaushik > kafka-configs does not work with ssl enabled kafka broker. > -- > > Key: KAFKA-12534 > URL: https://issues.apache.org/jira/browse/KAFKA-12534 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: kaushik srinivas >Priority: Critical > > We are trying to change the trust store password on the fly using the > kafka-configs script for a ssl enabled kafka broker. > Below is the command used: > kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx' > But we see below error in the broker logs when the command is run. > {"type":"log",
[jira] [Commented] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
[ https://issues.apache.org/jira/browse/KAFKA-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310435#comment-17310435 ] kaushik srinivas commented on KAFKA-12534: -- Hi [~ijuma] Need your help here. > kafka-configs does not work with ssl enabled kafka broker. > -- > > Key: KAFKA-12534 > URL: https://issues.apache.org/jira/browse/KAFKA-12534 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: kaushik srinivas >Priority: Critical > > We are trying to change the trust store password on the fly using the > kafka-configs script for a ssl enabled kafka broker. > Below is the command used: > kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx' > But we see below error in the broker logs when the command is run. > {"type":"log", "host":"kf-2-0", "level":"INFO", > "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", > "time":"2021-03-23T12:14:40.055", "timezone":"UTC", > "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 > - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] > Failed authentication with /127.0.0.1 (SSL handshake failed)"}} > How can anyone configure ssl certs for the kafka-configs script and succeed > with the ssl handshake in this case ? > Note : > We are trying with a single listener i.e SSL: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.
[ https://issues.apache.org/jira/browse/KAFKA-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310434#comment-17310434 ] kaushik srinivas commented on KAFKA-12530: -- Any inputs for this please ? > kafka-configs.sh does not work while changing the sasl jaas configurations. > --- > > Key: KAFKA-12530 > URL: https://issues.apache.org/jira/browse/KAFKA-12530 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Major > > We are trying to use kafka-configs script to modify the sasl jaas > configurations, but unable to do so. > Command used: > ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n > org.apache.kafka.common.security.plain.PlainLoginModule required \n > username=\"test\" \n password=\"test\"; \n };' > error: > requirement failed: Invalid entity config: all configs to be added must be in > the format "key=val". > command 2: > kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 59 --alter --add-config > 'sasl.jaas.config=[username=test,password=test]' > output: > command does not return , but kafka broker logs below error: > DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", > "time":"2021-03-23T08:29:00.946", "timezone":"UTC", > "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 > - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - > Set SASL server state to FAILED during authentication"}} > {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", > "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", > "time":"2021-03-23T08:29:00.946", "timezone":"UTC", > "log":{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 > - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] > Failed authentication with /127.0.0.1 (Unexpected Kafka request of type > METADATA during SASL handshake.)"}} > We have below issues: > 1. If one installs kafka broker with SASL mechanism and wants to change the > SASL jaas config via kafka-configs scripts, how is it supposed to be done ? > Is one supposed to provide kafka-configs script credentials to get > authenticated with kafka broker ? > does kafka-configs needs client credentials to do the same ? > 2. Can anyone point us to example commands of kafka-configs to alter the > sasl.jaas.config property of kafka broker. We do not see any documentation or > examples for the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
kaushik srinivas created KAFKA-12534: Summary: kafka-configs does not work with ssl enabled kafka broker. Key: KAFKA-12534 URL: https://issues.apache.org/jira/browse/KAFKA-12534 Project: Kafka Issue Type: Bug Affects Versions: 2.6.1 Reporter: kaushik srinivas We are trying to change the trust store password on the fly using the kafka-configs script for a ssl enabled kafka broker. Below is the command used: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx' But we see below error in the broker logs when the command is run. {"type":"log", "host":"kf-2-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", "time":"2021-03-23T12:14:40.055", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] Failed authentication with /127.0.0.1 (SSL handshake failed)"}} How can anyone configure ssl certs for the kafka-configs script and succeed with the ssl handshake in this case ? Note : We are trying with a single listener i.e SSL: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.
[ https://issues.apache.org/jira/browse/KAFKA-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-12530: - Description: We are trying to use kafka-configs script to modify the sasl jaas configurations, but unable to do so. Command used: ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"test\" \n password=\"test\"; \n };' error: requirement failed: Invalid entity config: all configs to be added must be in the format "key=val". command 2: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=[username=test,password=test]' output: command does not return , but kafka broker logs below error: DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set SASL server state to FAILED during authentication"}} {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.)"}} We have below issues: 1. If one installs kafka broker with SASL mechanism and wants to change the SASL jaas config via kafka-configs scripts, how is it supposed to be done ? Is one supposed to provide kafka-configs script credentials to get authenticated with kafka broker ? does kafka-configs needs client credentials to do the same ? 2. Can anyone point us to example commands of kafka-configs to alter the sasl.jaas.config property of kafka broker. We do not see any documentation or examples for the same. was: We are trying to use kafka-configs script to modify the sasl jaas configurations, but unable to do so. Command used: ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"test\" \n password=\"test\"; \n };' error: requirement failed: Invalid entity config: all configs to be added must be in the format "key=val". command 2: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=[username=test,password=test]' output: command does not return , but kafka broker logs below error: DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set SASL server state to FAILED during authentication"}} {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.)"}} We have below issues: 1. If one installs kafka broker with SASL mechanism and wants to change the SASL jaas config via kafka-configs scripts, how is it supposed to be done ? does kafka-configs needs client credentials to do the same ? 2. Can anyone point us to example commands of kafka-configs to alter the sasl.jaas.config property of kafka broker. We do not see any documentation or examples for the same. > kafka-configs.sh does not work while changing the sasl jaas configurations. > --- > > Key: KAFKA-12530 > URL: https://issues.apache.org/jira/browse/KAFKA-12530 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Major > > We are trying to use kafka-configs script to modify the sasl jaas > configurations, but unable to do so. > Command used: > ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n >
[jira] [Commented] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.
[ https://issues.apache.org/jira/browse/KAFKA-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306892#comment-17306892 ] kaushik srinivas commented on KAFKA-12530: -- Hi [~enether] Can you provide us some inputs on this. > kafka-configs.sh does not work while changing the sasl jaas configurations. > --- > > Key: KAFKA-12530 > URL: https://issues.apache.org/jira/browse/KAFKA-12530 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Major > > We are trying to use kafka-configs script to modify the sasl jaas > configurations, but unable to do so. > Command used: > ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n > org.apache.kafka.common.security.plain.PlainLoginModule required \n > username=\"test\" \n password=\"test\"; \n };' > error: > requirement failed: Invalid entity config: all configs to be added must be in > the format "key=val". > command 2: > kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 59 --alter --add-config > 'sasl.jaas.config=[username=test,password=test]' > output: > command does not return , but kafka broker logs below error: > DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", > "time":"2021-03-23T08:29:00.946", "timezone":"UTC", > "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 > - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - > Set SASL server state to FAILED during authentication"}} > {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", > "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", > "time":"2021-03-23T08:29:00.946", "timezone":"UTC", > "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 > - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] > Failed authentication with /127.0.0.1 (Unexpected Kafka request of type > METADATA during SASL handshake.)"}} > We have below issues: > 1. If one installs kafka broker with SASL mechanism and wants to change the > SASL jaas config via kafka-configs scripts, how is it supposed to be done ? > does kafka-configs needs client credentials to do the same ? > 2. Can anyone point us to example commands of kafka-configs to alter the > sasl.jaas.config property of kafka broker. We do not see any documentation or > examples for the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.
kaushik srinivas created KAFKA-12530: Summary: kafka-configs.sh does not work while changing the sasl jaas configurations. Key: KAFKA-12530 URL: https://issues.apache.org/jira/browse/KAFKA-12530 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas We are trying to use kafka-configs script to modify the sasl jaas configurations, but unable to do so. Command used: ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"test\" \n password=\"test\"; \n };' error: requirement failed: Invalid entity config: all configs to be added must be in the format "key=val". command 2: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=[username=test,password=test]' output: command does not return , but kafka broker logs below error: DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set SASL server state to FAILED during authentication"}} {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.)"}} We have below issues: 1. If one installs kafka broker with SASL mechanism and wants to change the SASL jaas config via kafka-configs scripts, how is it supposed to be done ? does kafka-configs needs client credentials to do the same ? 2. Can anyone point us to example commands of kafka-configs to alter the sasl.jaas.config property of kafka broker. We do not see any documentation or examples for the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12529) kafka-configs.sh does not work while changing the sasl jaas configurations.
kaushik srinivas created KAFKA-12529: Summary: kafka-configs.sh does not work while changing the sasl jaas configurations. Key: KAFKA-12529 URL: https://issues.apache.org/jira/browse/KAFKA-12529 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas We are trying to use kafka-configs script to modify the sasl jaas configurations, but unable to do so. Command used: ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"test\" \n password=\"test\"; \n };' error: requirement failed: Invalid entity config: all configs to be added must be in the format "key=val". command 2: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=[username=test,password=test]' output: command does not return , but kafka broker logs below error: DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set SASL server state to FAILED during authentication"}} {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.)"}} We have below issues: 1. If one installs kafka broker with SASL mechanism and wants to change the SASL jaas config via kafka-configs scripts, how is it supposed to be done ? does kafka-configs needs client credentials to do the same ? 2. Can anyone point us to example commands of kafka-configs to alter the sasl.jaas.config property of kafka broker. We do not see any documentation or examples for the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12528) kafka-configs.sh does not work while changing the sasl jaas configurations.
kaushik srinivas created KAFKA-12528: Summary: kafka-configs.sh does not work while changing the sasl jaas configurations. Key: KAFKA-12528 URL: https://issues.apache.org/jira/browse/KAFKA-12528 Project: Kafka Issue Type: Bug Components: admin, core Reporter: kaushik srinivas We are trying to modify the sasl jaas configurations for the kafka broker runtime using the dynamic config update functionality using the kafka-configs.sh script. But we are unable to get it working. Below is our command: ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"test\" \n password=\"test\"; \n };' command is exiting with error: requirement failed: Invalid entity config: all configs to be added must be in the format "key=val". we also tried below format as well: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=[username=test,password=test]' command does not return but the kafka broker logs prints the below error messages. org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set SASL server state to FAILED during authentication"}} {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.)"}} 1. If one has SASL enabled and with a single listener, how are we supposed to change the sasl credentials using this command ? 2. can anyone point us out to some example commands for modifying the sasl jaas configurations ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value
[ https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306800#comment-17306800 ] kaushik srinivas edited comment on KAFKA-8010 at 3/23/21, 6:52 AM: --- Hi [~enether] Can you please share an example command with square brackets for sasl jaas config usage ? Also can you point out to documentation updates. we still have this issue. Also if the kafka broker is already sasl enabled with let us assume PLAIN mechanism ,how can we update the sasl jaas config without passing any credentials with kafka-configs script as client to kafka broker ? was (Author: kaushik srinivas): Hi [~enether] Can you please share an example command with square brackets for sasl jaas config usage ? Also can you point out to documentation updates. we still have this issue. > kafka-configs.sh does not allow setting config with an equal in the value > - > > Key: KAFKA-8010 > URL: https://issues.apache.org/jira/browse/KAFKA-8010 > Project: Kafka > Issue Type: Bug > Components: tools >Reporter: Mickael Maison >Priority: Major > Attachments: image-2019-03-05-19-45-47-168.png > > > The sasl.jaas.config typically includes equals in its value. Unfortunately > the kafka-configs tool does not parse such values correctly and hits an error: > ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n > org.apache.kafka.common.security.plain.PlainLoginModule required\n > username=\"myuser\"\n password=\"mypassword\";\n};\nClient \{\n > org.apache.zookeeper.server.auth.DigestLoginModule required\n > username=\"myuser2\"\n password=\"mypassword2\;\n};" > requirement failed: Invalid entity config: all configs to be added must be in > the format "key=val" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value
[ https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306800#comment-17306800 ] kaushik srinivas commented on KAFKA-8010: - Hi [~enether] Can you please share an example command with square brackets for sasl jaas config usage ? Also can you point out to documentation updates. we still have this issue. > kafka-configs.sh does not allow setting config with an equal in the value > - > > Key: KAFKA-8010 > URL: https://issues.apache.org/jira/browse/KAFKA-8010 > Project: Kafka > Issue Type: Bug > Components: tools >Reporter: Mickael Maison >Priority: Major > Attachments: image-2019-03-05-19-45-47-168.png > > > The sasl.jaas.config typically includes equals in its value. Unfortunately > the kafka-configs tool does not parse such values correctly and hits an error: > ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers > --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n > org.apache.kafka.common.security.plain.PlainLoginModule required\n > username=\"myuser\"\n password=\"mypassword\";\n};\nClient \{\n > org.apache.zookeeper.server.auth.DigestLoginModule required\n > username=\"myuser2\"\n password=\"mypassword2\;\n};" > requirement failed: Invalid entity config: all configs to be added must be in > the format "key=val" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
[ https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293740#comment-17293740 ] kaushik srinivas commented on KAFKA-12164: -- Hi [~kkonstantine] Can you provide some insights into this issue ? Thanks, Kaushik. > ssue when kafka connect worker pod restart, during creation of nested > partition directories in hdfs file system. > > > Key: KAFKA-12164 > URL: https://issues.apache.org/jira/browse/KAFKA-12164 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: kaushik srinivas >Priority: Critical > > In our production labs, an issue is observed. Below is the sequence of the > same. > # hdfs connector is added to the connect worker. > # hdfs connector is creating folders in hdfs /test1=1/test2=2/ > Based on the custom partitioner. Here test1 and test2 are two separate nested > directories derived from multiple fields in the record using a custom > partitioner. > # Now kafka connect hdfs connector uses below function calls to create the > directories in the hdfs file system. > fs.mkdirs(new Path(filename)); > ref: > [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] > Now the important thing to note is that if mkdirs() is a non atomic operation > (i.e can result in partial execution if interrupted) > then suppose the first directory ie test1 is created and just before creation > of test2 in hdfs happens if there is a restart to the connect worker pod. > Then the hdfs file system will remain with partial folders created for > partitions during the restart time frames. > So we might have conditions in hdfs as below > /test1=0/test2=0/ > /test1=1/ > /test1=2/test2=2 > /test1=3/test2=3 > So the second partition has a missing directory in it. And if hive > integration is enabled, hive metastore exceptions will occur since there is a > partition expected from hive table is missing for few partitions in hdfs. > *This can occur to any connector with some ongoing non atomic operation and a > restart is triggered to kafka connect worker pod. This will result in some > partially completed states in the system and may cause issues for the > connector to continue its operation*. > *This is a very critical issue and needs some attention on ways for handling > the same.* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
[ https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269087#comment-17269087 ] kaushik srinivas commented on KAFKA-12164: -- Adding few major concerns with regard to the feedback of re creating the corrupt directories upon restart, function syncWithHive() (DataWriter.java) is called at every restart/boot up of the connector. And this is the function which does an initial audit of all the partition directories and tries to sync the hdfs folders with the hive partitions before proceeding further to consume records from kafka. Below is the snippet for the same. {code:java} List partitions = hiveMetaStore.listPartitions(hiveDatabase, topicTableMap.get(topic), (short) -1); FileStatus[] statuses = FileUtils.getDirectories(storage, new Path(topicDir)); for (FileStatus status : statuses) { String location = status.getPath().toString(); if (!partitions.contains(location)) { String partitionValue = getPartitionValue(location); hiveMetaStore.addPartition(hiveDatabase, topicTableMap.get(topic), partitionValue); } {code} Now going one step inside into function getDirectories > getDirectoriesImpl (from FileUtils). here, those paths are returned as partition path if a. the path is a directory b. path does not contain nested directories (by way of checking no of non directory files is equal to no of (directory + non directory) files in the path. If above conditions are met, then the path is added as partition path. So in the erroneous case where the actual path is supposed to look like /test1=0/test2=0/xxx.parquet But instead due to a crash looks like below, /test1=0/ In this case /test1=0 , satisfies the above a conditions and hence is returned as a new partition path to be updated to hive. Doing this update to hive fails because the actual partition for hive is expected to be /test1=0/test2=0 and not /test1=0/ So this would mean, once there is a corrupt partition directory in hdfs, at every restart of the connector syncWithHive() call will keep throwing hive exceptions till the directory is corrected in the hdfs. This means that the stage of consuming the old (failed to commit) records again (even assuming its present in kafka after restart) would never be reached and connector remains in crashed state forever and requires a manual intervention of clean up activity. -kaushik > ssue when kafka connect worker pod restart, during creation of nested > partition directories in hdfs file system. > > > Key: KAFKA-12164 > URL: https://issues.apache.org/jira/browse/KAFKA-12164 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: kaushik srinivas >Priority: Critical > > In our production labs, an issue is observed. Below is the sequence of the > same. > # hdfs connector is added to the connect worker. > # hdfs connector is creating folders in hdfs /test1=1/test2=2/ > Based on the custom partitioner. Here test1 and test2 are two separate nested > directories derived from multiple fields in the record using a custom > partitioner. > # Now kafka connect hdfs connector uses below function calls to create the > directories in the hdfs file system. > fs.mkdirs(new Path(filename)); > ref: > [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] > Now the important thing to note is that if mkdirs() is a non atomic operation > (i.e can result in partial execution if interrupted) > then suppose the first directory ie test1 is created and just before creation > of test2 in hdfs happens if there is a restart to the connect worker pod. > Then the hdfs file system will remain with partial folders created for > partitions during the restart time frames. > So we might have conditions in hdfs as below > /test1=0/test2=0/ > /test1=1/ > /test1=2/test2=2 > /test1=3/test2=3 > So the second partition has a missing directory in it. And if hive > integration is enabled, hive metastore exceptions will occur since there is a > partition expected from hive table is missing for few partitions in hdfs. > *This can occur to any connector with some ongoing non atomic operation and a > restart is triggered to kafka connect worker pod. This will result in some > partially completed states in the system and may cause issues for the > connector to continue its operation*. > *This is a very critical issue and needs some attention on ways for handling > the same.* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
[ https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269064#comment-17269064 ] kaushik srinivas edited comment on KAFKA-12164 at 1/21/21, 6:31 AM: For the below comments "Now, by reading quickly the javadoc for FileSystem#mkdirs in HDFS I understand that a nested directory can be constructed from a given path, even if some of the parents exist. Which makes me think that a restart would allow the creation of the full path as long as the partitioner is deterministic with respect to the record (i.e. for a given record it always creates the same path). Upon a restart the export of a record will be retried, so the partitioner could possibly reattempt the full creation of the path. But that's an early assumption from my side. " Lets suppose partitioner derives the directory /test1/test2 for a given record. And the connector crashes during the directory creation and hdfs ends up with only /test1/ folder. But there is no guarantee that the kafka record which was not committed fully by the connector would exist in kafka even after restart/recovery of the connector. So there could be scenarios where these partial directories remain partial forever and requires an additional manual intervention to clean it up or complete its creation in hdfs, without which we see issues in hive table partitions if enabled. was (Author: kaushik srinivas): For the below comments "Now, by reading quickly the javadoc for FileSystem#mkdirs in HDFS I understand that a nested directory can be constructed from a given path, even if some of the parents exist. Which makes me think that a restart would allow the creation of the full path as long as the partitioner is deterministic with respect to the record (i.e. for a given record it always creates the same path). Upon a restart the export of a record will be retried, so the partitioner could possibly reattempt the full creation of the path. But that's an early assumption from my side. " Lets suppose partitioner derives the directory /test1/test2 for a given record. And the connector crashed during the directory creation and hdfs ends up with only /test1/ folder. But there is no guarantee that the kafka record which was not committed fully by the connector would exist in kafka even after restart/recovery of the connector. So there could be scenarios where these partial directories remain partial forever and requires an additional manual intervention to clean it up or complete its creation in hdfs, without which we see issues in hive table partitions if enabled. > ssue when kafka connect worker pod restart, during creation of nested > partition directories in hdfs file system. > > > Key: KAFKA-12164 > URL: https://issues.apache.org/jira/browse/KAFKA-12164 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: kaushik srinivas >Priority: Critical > > In our production labs, an issue is observed. Below is the sequence of the > same. > # hdfs connector is added to the connect worker. > # hdfs connector is creating folders in hdfs /test1=1/test2=2/ > Based on the custom partitioner. Here test1 and test2 are two separate nested > directories derived from multiple fields in the record using a custom > partitioner. > # Now kafka connect hdfs connector uses below function calls to create the > directories in the hdfs file system. > fs.mkdirs(new Path(filename)); > ref: > [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] > Now the important thing to note is that if mkdirs() is a non atomic operation > (i.e can result in partial execution if interrupted) > then suppose the first directory ie test1 is created and just before creation > of test2 in hdfs happens if there is a restart to the connect worker pod. > Then the hdfs file system will remain with partial folders created for > partitions during the restart time frames. > So we might have conditions in hdfs as below > /test1=0/test2=0/ > /test1=1/ > /test1=2/test2=2 > /test1=3/test2=3 > So the second partition has a missing directory in it. And if hive > integration is enabled, hive metastore exceptions will occur since there is a > partition expected from hive table is missing for few partitions in hdfs. > *This can occur to any connector with some ongoing non atomic operation and a > restart is triggered to kafka connect worker pod. This will result in some > partially completed states in the system and may cause issues for the > connector to continue its operation*. > *This is a very critical issue and needs some attention on ways for handling >
[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
[ https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269064#comment-17269064 ] kaushik srinivas commented on KAFKA-12164: -- For the below comments "Now, by reading quickly the javadoc for FileSystem#mkdirs in HDFS I understand that a nested directory can be constructed from a given path, even if some of the parents exist. Which makes me think that a restart would allow the creation of the full path as long as the partitioner is deterministic with respect to the record (i.e. for a given record it always creates the same path). Upon a restart the export of a record will be retried, so the partitioner could possibly reattempt the full creation of the path. But that's an early assumption from my side. " Lets suppose partitioner derives the directory /test1/test2 for a given record. And the connector crashed during the directory creation and hdfs ends up with only /test1/ folder. But there is no guarantee that the kafka record which was not committed fully by the connector would exist in kafka even after restart/recovery of the connector. So there could be scenarios where these partial directories remain partial forever and requires an additional manual intervention to clean it up or complete its creation in hdfs, without which we see issues in hive table partitions if enabled. > ssue when kafka connect worker pod restart, during creation of nested > partition directories in hdfs file system. > > > Key: KAFKA-12164 > URL: https://issues.apache.org/jira/browse/KAFKA-12164 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: kaushik srinivas >Priority: Critical > > In our production labs, an issue is observed. Below is the sequence of the > same. > # hdfs connector is added to the connect worker. > # hdfs connector is creating folders in hdfs /test1=1/test2=2/ > Based on the custom partitioner. Here test1 and test2 are two separate nested > directories derived from multiple fields in the record using a custom > partitioner. > # Now kafka connect hdfs connector uses below function calls to create the > directories in the hdfs file system. > fs.mkdirs(new Path(filename)); > ref: > [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] > Now the important thing to note is that if mkdirs() is a non atomic operation > (i.e can result in partial execution if interrupted) > then suppose the first directory ie test1 is created and just before creation > of test2 in hdfs happens if there is a restart to the connect worker pod. > Then the hdfs file system will remain with partial folders created for > partitions during the restart time frames. > So we might have conditions in hdfs as below > /test1=0/test2=0/ > /test1=1/ > /test1=2/test2=2 > /test1=3/test2=3 > So the second partition has a missing directory in it. And if hive > integration is enabled, hive metastore exceptions will occur since there is a > partition expected from hive table is missing for few partitions in hdfs. > *This can occur to any connector with some ongoing non atomic operation and a > restart is triggered to kafka connect worker pod. This will result in some > partially completed states in the system and may cause issues for the > connector to continue its operation*. > *This is a very critical issue and needs some attention on ways for handling > the same.* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
[ https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269061#comment-17269061 ] kaushik srinivas commented on KAFKA-12164: -- Hi [~kkonstantine] We have created the ticket even on the hdfs sink connector side as well. Below is the ticket [https://github.com/confluentinc/kafka-connect-hdfs/issues/538] We have also captured more detailed analysis over there. But there are no responses in that forum as well. -Kaushik > ssue when kafka connect worker pod restart, during creation of nested > partition directories in hdfs file system. > > > Key: KAFKA-12164 > URL: https://issues.apache.org/jira/browse/KAFKA-12164 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: kaushik srinivas >Priority: Critical > > In our production labs, an issue is observed. Below is the sequence of the > same. > # hdfs connector is added to the connect worker. > # hdfs connector is creating folders in hdfs /test1=1/test2=2/ > Based on the custom partitioner. Here test1 and test2 are two separate nested > directories derived from multiple fields in the record using a custom > partitioner. > # Now kafka connect hdfs connector uses below function calls to create the > directories in the hdfs file system. > fs.mkdirs(new Path(filename)); > ref: > [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] > Now the important thing to note is that if mkdirs() is a non atomic operation > (i.e can result in partial execution if interrupted) > then suppose the first directory ie test1 is created and just before creation > of test2 in hdfs happens if there is a restart to the connect worker pod. > Then the hdfs file system will remain with partial folders created for > partitions during the restart time frames. > So we might have conditions in hdfs as below > /test1=0/test2=0/ > /test1=1/ > /test1=2/test2=2 > /test1=3/test2=3 > So the second partition has a missing directory in it. And if hive > integration is enabled, hive metastore exceptions will occur since there is a > partition expected from hive table is missing for few partitions in hdfs. > *This can occur to any connector with some ongoing non atomic operation and a > restart is triggered to kafka connect worker pod. This will result in some > partially completed states in the system and may cause issues for the > connector to continue its operation*. > *This is a very critical issue and needs some attention on ways for handling > the same.* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
[ https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268532#comment-17268532 ] kaushik srinivas commented on KAFKA-12164: -- [~ijuma] This is an issue with lot of impacts. Can you provide some suggestions. > ssue when kafka connect worker pod restart, during creation of nested > partition directories in hdfs file system. > > > Key: KAFKA-12164 > URL: https://issues.apache.org/jira/browse/KAFKA-12164 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: kaushik srinivas >Priority: Critical > > In our production labs, an issue is observed. Below is the sequence of the > same. > # hdfs connector is added to the connect worker. > # hdfs connector is creating folders in hdfs /test1=1/test2=2/ > Based on the custom partitioner. Here test1 and test2 are two separate nested > directories derived from multiple fields in the record using a custom > partitioner. > # Now kafka connect hdfs connector uses below function calls to create the > directories in the hdfs file system. > fs.mkdirs(new Path(filename)); > ref: > [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] > Now the important thing to note is that if mkdirs() is a non atomic operation > (i.e can result in partial execution if interrupted) > then suppose the first directory ie test1 is created and just before creation > of test2 in hdfs happens if there is a restart to the connect worker pod. > Then the hdfs file system will remain with partial folders created for > partitions during the restart time frames. > So we might have conditions in hdfs as below > /test1=0/test2=0/ > /test1=1/ > /test1=2/test2=2 > /test1=3/test2=3 > So the second partition has a missing directory in it. And if hive > integration is enabled, hive metastore exceptions will occur since there is a > partition expected from hive table is missing for few partitions in hdfs. > *This can occur to any connector with some ongoing non atomic operation and a > restart is triggered to kafka connect worker pod. This will result in some > partially completed states in the system and may cause issues for the > connector to continue its operation*. > *This is a very critical issue and needs some attention on ways for handling > the same.* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
[ https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262536#comment-17262536 ] kaushik srinivas commented on KAFKA-12164: -- [~ijuma] Need your inputs. > ssue when kafka connect worker pod restart, during creation of nested > partition directories in hdfs file system. > > > Key: KAFKA-12164 > URL: https://issues.apache.org/jira/browse/KAFKA-12164 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: kaushik srinivas >Priority: Critical > > In our production labs, an issue is observed. Below is the sequence of the > same. > # hdfs connector is added to the connect worker. > # hdfs connector is creating folders in hdfs /test1=1/test2=2/ > Based on the custom partitioner. Here test1 and test2 are two separate nested > directories derived from multiple fields in the record using a custom > partitioner. > # Now kafka connect hdfs connector uses below function calls to create the > directories in the hdfs file system. > fs.mkdirs(new Path(filename)); > ref: > [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] > Now the important thing to note is that if mkdirs() is a non atomic operation > (i.e can result in partial execution if interrupted) > then suppose the first directory ie test1 is created and just before creation > of test2 in hdfs happens if there is a restart to the connect worker pod. > Then the hdfs file system will remain with partial folders created for > partitions during the restart time frames. > So we might have conditions in hdfs as below > /test1=0/test2=0/ > /test1=1/ > /test1=2/test2=2 > /test1=3/test2=3 > So the second partition has a missing directory in it. And if hive > integration is enabled, hive metastore exceptions will occur since there is a > partition expected from hive table is missing for few partitions in hdfs. > *This can occur to any connector with some ongoing non atomic operation and a > restart is triggered to kafka connect worker pod. This will result in some > partially completed states in the system and may cause issues for the > connector to continue its operation*. > *This is a very critical issue and needs some attention on ways for handling > the same.* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
kaushik srinivas created KAFKA-12164: Summary: ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system. Key: KAFKA-12164 URL: https://issues.apache.org/jira/browse/KAFKA-12164 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: kaushik srinivas In our production labs, an issue is observed. Below is the sequence of the same. # hdfs connector is added to the connect worker. # hdfs connector is creating folders in hdfs /test1=1/test2=2/ Based on the custom partitioner. Here test1 and test2 are two separate nested directories derived from multiple fields in the record using a custom partitioner. # Now kafka connect hdfs connector uses below function calls to create the directories in the hdfs file system. fs.mkdirs(new Path(filename)); ref: [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] Now the important thing to note is that if mkdirs() is a non atomic operation (i.e can result in partial execution if interrupted) then suppose the first directory ie test1 is created and just before creation of test2 in hdfs happens if there is a restart to the connect worker pod. Then the hdfs file system will remain with partial folders created for partitions during the restart time frames. So we might have conditions in hdfs as below /test1=0/test2=0/ /test1=1/ /test1=2/test2=2 /test1=3/test2=3 So the second partition has a missing directory in it. And if hive integration is enabled, hive metastore exceptions will occur since there is a partition expected from hive table is missing for few partitions in hdfs. *This can occur to any connector with some ongoing non atomic operation and a restart is triggered to kafka connect worker pod. This will result in some partially completed states in the system and may cause issues for the connector to continue its operation*. *This is a very critical issue and needs some attention on ways for handling the same.* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.
[ https://issues.apache.org/jira/browse/KAFKA-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160930#comment-17160930 ] kaushik srinivas commented on KAFKA-10278: -- Hi [~bbyrne] , Does it mean in 2.5, even though the dynamic properties are pushed via static file, even those get displayed when we run kafka-configs with --alll option ? Thanks, Kaushik > kafka-configs does not show the current properties of running kafka broker > upon describe. > - > > Key: KAFKA-10278 > URL: https://issues.apache.org/jira/browse/KAFKA-10278 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.1 >Reporter: kaushik srinivas >Priority: Critical > Labels: kafka-configs.sh > > kafka-configs.sh does not list the properties > (read-only/per-broker/cluster-wide) with which the kafka broker is currently > running. > The command returns nothing. > Only those properties added or updated via kafka-configs.sh is listed by the > describe command. > bash-4.2$ env -i bin/kafka-configs.sh --bootstrap-server > kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers > --entity-default --describe Default config for brokers in the cluster are: > log.cleaner.threads=2 sensitive=false > synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.
[ https://issues.apache.org/jira/browse/KAFKA-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159129#comment-17159129 ] kaushik srinivas commented on KAFKA-10278: -- Hi [~rajinisiva...@gmail.com] [~rsivaram], [~ijuma] Is this a known issue already ? > kafka-configs does not show the current properties of running kafka broker > upon describe. > - > > Key: KAFKA-10278 > URL: https://issues.apache.org/jira/browse/KAFKA-10278 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.1 >Reporter: kaushik srinivas >Priority: Critical > Labels: kafka-configs.sh > > kafka-configs.sh does not list the properties > (read-only/per-broker/cluster-wide) with which the kafka broker is currently > running. > The command returns nothing. > Only those properties added or updated via kafka-configs.sh is listed by the > describe command. > bash-4.2$ env -i bin/kafka-configs.sh --bootstrap-server > kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers > --entity-default --describe Default config for brokers in the cluster are: > log.cleaner.threads=2 sensitive=false > synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.
kaushik srinivas created KAFKA-10278: Summary: kafka-configs does not show the current properties of running kafka broker upon describe. Key: KAFKA-10278 URL: https://issues.apache.org/jira/browse/KAFKA-10278 Project: Kafka Issue Type: Bug Affects Versions: 2.4.1 Reporter: kaushik srinivas kafka-configs.sh does not list the properties (read-only/per-broker/cluster-wide) with which the kafka broker is currently running. The command returns nothing. Only those properties added or updated via kafka-configs.sh is listed by the describe command. bash-4.2$ env -i bin/kafka-configs.sh --bootstrap-server kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers --entity-default --describe Default config for brokers in the cluster are: log.cleaner.threads=2 sensitive=false synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.
[ https://issues.apache.org/jira/browse/KAFKA-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098850#comment-17098850 ] kaushik srinivas commented on KAFKA-9933: - Hi [~ijuma] We see when SASL (GSSAPI) and SSL both are turned ON, the kerberos principal name only is being validated for the ACLs. Even when the ssl certificate based principal name was not given ACLs, but with proper kerberos based name ACLs created, clients are working fine. Is this behavior expected, as in SASL is given precendence for ACLs check with both SSL AND SASL is enabled. thanks a lot in advance kaushik > Need doc update on the AclAuthorizer when SASL_SSL is the protocol used. > > > Key: KAFKA-9933 > URL: https://issues.apache.org/jira/browse/KAFKA-9933 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Hello, > Document on the usage of the authorizer does not speak about the principal > being used when the protocol for the listener is chosen as SASL + SSL > (SASL_SSL). > Suppose kerberos and ssl is enabled together, will the authorization be based > on the kerberos principal names or on the ssl certificate DN names ? > There is no document covering this part of the use case. > This needs information and documentation update. > Thanks, > Kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.
[ https://issues.apache.org/jira/browse/KAFKA-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098824#comment-17098824 ] kaushik srinivas commented on KAFKA-9933: - Hi [~ijuma] Need your inputs. We enable both SASL and SSL with flavours of GSSAPI and PLAIN mechanisms both in varied deployments. Thanks, kaushik > Need doc update on the AclAuthorizer when SASL_SSL is the protocol used. > > > Key: KAFKA-9933 > URL: https://issues.apache.org/jira/browse/KAFKA-9933 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Hello, > Document on the usage of the authorizer does not speak about the principal > being used when the protocol for the listener is chosen as SASL + SSL > (SASL_SSL). > Suppose kerberos and ssl is enabled together, will the authorization be based > on the kerberos principal names or on the ssl certificate DN names ? > There is no document covering this part of the use case. > This needs information and documentation update. > Thanks, > Kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.
[ https://issues.apache.org/jira/browse/KAFKA-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-9933: Issue Type: Bug (was: Improvement) > Need doc update on the AclAuthorizer when SASL_SSL is the protocol used. > > > Key: KAFKA-9933 > URL: https://issues.apache.org/jira/browse/KAFKA-9933 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Hello, > Document on the usage of the authorizer does not speak about the principal > being used when the protocol for the listener is chosen as SASL + SSL > (SASL_SSL). > Suppose kerberos and ssl is enabled together, will the authorization be based > on the kerberos principal names or on the ssl certificate DN names ? > There is no document covering this part of the use case. > This needs information and documentation update. > Thanks, > Kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9934) Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT
[ https://issues.apache.org/jira/browse/KAFKA-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098822#comment-17098822 ] kaushik srinivas commented on KAFKA-9934: - Hi [~ijuma] Need your inputs on this case. Thanks, kaushik > Information and doc update needed for support of AclAuthorizer when protocol > is PLAINTEXT > - > > Key: KAFKA-9934 > URL: https://issues.apache.org/jira/browse/KAFKA-9934 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Need information on the case where the protocol is PLAINTEXT for listeners in > kafka. > Does Authorization applies when the protocol is PLAINTEXT ? > if so, what would be used as the principal name for the authorization acl > validations? > There is no doc which describes this case. > Need info and doc update for the same. > Thanks, > kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9934) Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT
[ https://issues.apache.org/jira/browse/KAFKA-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-9934: Issue Type: Bug (was: Improvement) > Information and doc update needed for support of AclAuthorizer when protocol > is PLAINTEXT > - > > Key: KAFKA-9934 > URL: https://issues.apache.org/jira/browse/KAFKA-9934 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 >Reporter: kaushik srinivas >Priority: Critical > > Need information on the case where the protocol is PLAINTEXT for listeners in > kafka. > Does Authorization applies when the protocol is PLAINTEXT ? > if so, what would be used as the principal name for the authorization acl > validations? > There is no doc which describes this case. > Need info and doc update for the same. > Thanks, > kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9934) Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT
kaushik srinivas created KAFKA-9934: --- Summary: Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT Key: KAFKA-9934 URL: https://issues.apache.org/jira/browse/KAFKA-9934 Project: Kafka Issue Type: Improvement Components: security Affects Versions: 2.4.1 Reporter: kaushik srinivas Need information on the case where the protocol is PLAINTEXT for listeners in kafka. Does Authorization applies when the protocol is PLAINTEXT ? if so, what would be used as the principal name for the authorization acl validations? There is no doc which describes this case. Need info and doc update for the same. Thanks, kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.
kaushik srinivas created KAFKA-9933: --- Summary: Need doc update on the AclAuthorizer when SASL_SSL is the protocol used. Key: KAFKA-9933 URL: https://issues.apache.org/jira/browse/KAFKA-9933 Project: Kafka Issue Type: Improvement Components: security Affects Versions: 2.4.1 Reporter: kaushik srinivas Hello, Document on the usage of the authorizer does not speak about the principal being used when the protocol for the listener is chosen as SASL + SSL (SASL_SSL). Suppose kerberos and ssl is enabled together, will the authorization be based on the kerberos principal names or on the ssl certificate DN names ? There is no document covering this part of the use case. This needs information and documentation update. Thanks, Kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-8622) Snappy Compression Not Working
[ https://issues.apache.org/jira/browse/KAFKA-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas resolved KAFKA-8622. - Resolution: Resolved > Snappy Compression Not Working > -- > > Key: KAFKA-8622 > URL: https://issues.apache.org/jira/browse/KAFKA-8622 > Project: Kafka > Issue Type: Bug > Components: compression >Affects Versions: 2.3.0, 2.2.1 >Reporter: Kunal Verma >Assignee: kaushik srinivas >Priority: Major > > I am trying to produce a message on the broker with compression enabled as > snappy. > Environment : > Brokers[Kafka-cluster] are hosted on Centos 7 > I have download the latest version (2.3.0 & 2.2.1) tar, extract it and moved > to /opt/kafka- > I have executed the broker with standard configuration. > In my producer service(written in java), I have enabled snappy compression. > props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy"); > > so while sending record on broker, I am getting following errors: > org.apache.kafka.common.errors.UnknownServerException: The server experienced > an unexpected error when processing the request > > While investing further at broker end I got following error in log > > logs/kafkaServer.out:java.lang.UnsatisfiedLinkError: > /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: > /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: > failed to map segment from shared object: Operation not permitted > -- > > [2019-07-02 15:29:43,399] ERROR [ReplicaManager broker=1] Error processing > append operation on partition test-bulk-1 (kafka.server.ReplicaManager) > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > at > org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:435) > at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:466) > at java.io.DataInputStream.readByte(DataInputStream.java:265) > at org.apache.kafka.common.utils.ByteUtils.readVarint(ByteUtils.java:168) > at > org.apache.kafka.common.record.DefaultRecord.readFrom(DefaultRecord.java:293) > at > org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(DefaultRecordBatch.java:264) > at > org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:569) > at > org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:538) > at > org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:327) > at > scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:55) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:269) > at > kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1$adapted(LogValidator.scala:261) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:73) > at kafka.log.Log.liftedTree1$1(Log.scala:881) > at kafka.log.Log.$anonfun$append$2(Log.scala:868) > at kafka.log.Log.maybeHandleIOException(Log.scala:2065) > at kafka.log.Log.append(Log.scala:850) > at kafka.log.Log.appendAsLeader(Log.scala:819) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at
[jira] [Commented] (KAFKA-8622) Snappy Compression Not Working
[ https://issues.apache.org/jira/browse/KAFKA-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876874#comment-16876874 ] kaushik srinivas commented on KAFKA-8622: - This can be due to permission issue for the snappy library. try to create a folder with write permissions and add this below flag in the kafka-run-class.sh script to point to the newly created directory. -Dorg.xerial.snappy.tempdir=/home/cloud-user/GENERIC_FRAMEWORK/kaushik/confluent-5.1.2/tmp It should work fine. Let know the test results. > Snappy Compression Not Working > -- > > Key: KAFKA-8622 > URL: https://issues.apache.org/jira/browse/KAFKA-8622 > Project: Kafka > Issue Type: Bug > Components: compression >Affects Versions: 2.3.0, 2.2.1 >Reporter: Kunal Verma >Assignee: kaushik srinivas >Priority: Major > > I am trying to produce a message on the broker with compression enabled as > snappy. > Environment : > Brokers[Kafka-cluster] are hosted on Centos 7 > I have download the latest version (2.3.0 & 2.2.1) tar, extract it and moved > to /opt/kafka- > I have executed the broker with standard configuration. > In my producer service(written in java), I have enabled snappy compression. > props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy"); > > so while sending record on broker, I am getting following errors: > org.apache.kafka.common.errors.UnknownServerException: The server experienced > an unexpected error when processing the request > > While investing further at broker end I got following error in log > > logs/kafkaServer.out:java.lang.UnsatisfiedLinkError: > /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: > /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: > failed to map segment from shared object: Operation not permitted > -- > > [2019-07-02 15:29:43,399] ERROR [ReplicaManager broker=1] Error processing > append operation on partition test-bulk-1 (kafka.server.ReplicaManager) > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > at > org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:435) > at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:466) > at java.io.DataInputStream.readByte(DataInputStream.java:265) > at org.apache.kafka.common.utils.ByteUtils.readVarint(ByteUtils.java:168) > at > org.apache.kafka.common.record.DefaultRecord.readFrom(DefaultRecord.java:293) > at > org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(DefaultRecordBatch.java:264) > at > org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:569) > at > org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:538) > at > org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:327) > at > scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:55) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:269) > at > kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1$adapted(LogValidator.scala:261) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:73) > at kafka.log.Log.liftedTree1$1(Log.scala:881) > at kafka.log.Log.$anonfun$append$2(Log.scala:868) > at kafka.log.Log.maybeHandleIOException(Log.scala:2065) > at kafka.log.Log.append(Log.scala:850) > at kafka.log.Log.appendAsLeader(Log.scala:819) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at
[jira] [Assigned] (KAFKA-8622) Snappy Compression Not Working
[ https://issues.apache.org/jira/browse/KAFKA-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas reassigned KAFKA-8622: --- Assignee: kaushik srinivas > Snappy Compression Not Working > -- > > Key: KAFKA-8622 > URL: https://issues.apache.org/jira/browse/KAFKA-8622 > Project: Kafka > Issue Type: Bug > Components: compression >Affects Versions: 2.3.0, 2.2.1 >Reporter: Kunal Verma >Assignee: kaushik srinivas >Priority: Major > > I am trying to produce a message on the broker with compression enabled as > snappy. > Environment : > Brokers[Kafka-cluster] are hosted on Centos 7 > I have download the latest version (2.3.0 & 2.2.1) tar, extract it and moved > to /opt/kafka- > I have executed the broker with standard configuration. > In my producer service(written in java), I have enabled snappy compression. > props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy"); > > so while sending record on broker, I am getting following errors: > org.apache.kafka.common.errors.UnknownServerException: The server experienced > an unexpected error when processing the request > > While investing further at broker end I got following error in log > > logs/kafkaServer.out:java.lang.UnsatisfiedLinkError: > /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: > /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: > failed to map segment from shared object: Operation not permitted > -- > > [2019-07-02 15:29:43,399] ERROR [ReplicaManager broker=1] Error processing > append operation on partition test-bulk-1 (kafka.server.ReplicaManager) > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > at > org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:435) > at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:466) > at java.io.DataInputStream.readByte(DataInputStream.java:265) > at org.apache.kafka.common.utils.ByteUtils.readVarint(ByteUtils.java:168) > at > org.apache.kafka.common.record.DefaultRecord.readFrom(DefaultRecord.java:293) > at > org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(DefaultRecordBatch.java:264) > at > org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:569) > at > org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:538) > at > org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:327) > at > scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:55) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:269) > at > kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1$adapted(LogValidator.scala:261) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:73) > at kafka.log.Log.liftedTree1$1(Log.scala:881) > at kafka.log.Log.$anonfun$append$2(Log.scala:868) > at kafka.log.Log.maybeHandleIOException(Log.scala:2065) > at kafka.log.Log.append(Log.scala:850) > at kafka.log.Log.appendAsLeader(Log.scala:819) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at
[jira] [Comment Edited] (KAFKA-8485) Kafka connect worker does not respond/function when kafka broker goes down.
[ https://issues.apache.org/jira/browse/KAFKA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858366#comment-16858366 ] kaushik srinivas edited comment on KAFKA-8485 at 6/7/19 7:47 AM: - Hi [~kkonstantine] I see weird behavior across different runs. One of the suggestion from the kafka dev group pointed out to this bug, [https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-7941] So we have done a patch to our connect with the pull request being opened for this issue. But no help from this patch, we still see stability issues with connect when kafka broker goes down. Now we have seen below stack trace couple of times, where in after restart of kafka brokers, GET requests would work fine but when tried to add a connector or delete a connector, sometimes we have seen this trace {code:java} org.apache.kafka.connect.errors.ConnectException: Error writing connector configuration to Kafka at org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:334) at org.apache.kafka.connect.storage.KafkaConfigBackingStore.putConnectorConfig(KafkaConfigBackingStore.java:303) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:555) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:535) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:271) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:220) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.TimeoutException: Timed out waiting for future at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:73) at org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:331) ... 10 more {code} By the time POST connectors or DELETE connectors are run, kafka brokers are completely up and running back. verified also the leader assignment of the kafka connect internal topics as well. Not much activity is happening in the connect logs even in debug mode except from seeing the above trace. Issue is very consistent, even without data activity once kafka brokers are restarted (2 out of 3 brokers), we see this behavior. was (Author: kaushik srinivas): Hi [~kkonstantine] I see weird behavior across different runs. One of the suggestion from the kafka dev group pointed out to this bug, [https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-7941] So we have done a patch to our connect with the pull request being opened for this issue. But no help from this patch, we still see stability issues with connect when kafka broker goes down. Now we have seen below stack trace couple of times, where in after restart of kafka brokers, GET requests would work fine but when tried to add a connector or delete a connector, sometimes we have seen this trace {code:java} org.apache.kafka.connect.errors.ConnectException: Error writing connector configuration to Kafka at org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:334) at org.apache.kafka.connect.storage.KafkaConfigBackingStore.putConnectorConfig(KafkaConfigBackingStore.java:303) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:555) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:535) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:271) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:220) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.TimeoutException: Timed out waiting for future at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:73) at org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:331) ... 10 more {code} Not much activity is happening in the connect logs even in debug mode except from seeing the above trace. Issue is very consistent, even without data activity once kafka brokers are
[jira] [Updated] (KAFKA-8485) Kafka connect worker does not respond/function when kafka broker goes down.
[ https://issues.apache.org/jira/browse/KAFKA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-8485: Summary: Kafka connect worker does not respond/function when kafka broker goes down. (was: Kafka connect worker does not respond when kafka broker goes down with data streaming in progress) > Kafka connect worker does not respond/function when kafka broker goes down. > --- > > Key: KAFKA-8485 > URL: https://issues.apache.org/jira/browse/KAFKA-8485 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.2.1 >Reporter: kaushik srinivas >Priority: Blocker > Labels: performance > > Below is the scenario > 3 kafka brokers are up and running. > Kafka connect worker is installed and a hdfs sink connector is added. > Data streaming started, data being flushed out of kafka into hdfs. > Topic is created with 3 partitons, one leader on all the three brokers. > Now, 2 kafka brokers are restarted. Partition re balance happens. > Now we observe, kafka connect does not respond. REST API keeps timing out. > Nothing useful is being logged at the connect logs as well. > Only way to get out of this situation currently is to restart the kafka > connect worker and things gets normal. > > The same scenario when tried without data being in progress, works fine. > Meaning REST API does not get into timing out state. > making this issue a blocker, because of the impact due to kafka broker > restart. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8485) Kafka connect worker does not respond when kafka broker goes down with data streaming in progress
[ https://issues.apache.org/jira/browse/KAFKA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858367#comment-16858367 ] kaushik srinivas commented on KAFKA-8485: - I have a query in this scenario, KafkaBasedLog.java uses producers to update connect config topic within kafka, and this producer object is created at the runtime bootup. If the leader broker of the config topic goes down after everything is working fine, would it mean the metadata which is present with this producer is no more valid and we hit issues with this producer ? > Kafka connect worker does not respond when kafka broker goes down with data > streaming in progress > - > > Key: KAFKA-8485 > URL: https://issues.apache.org/jira/browse/KAFKA-8485 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.2.1 >Reporter: kaushik srinivas >Priority: Blocker > Labels: performance > > Below is the scenario > 3 kafka brokers are up and running. > Kafka connect worker is installed and a hdfs sink connector is added. > Data streaming started, data being flushed out of kafka into hdfs. > Topic is created with 3 partitons, one leader on all the three brokers. > Now, 2 kafka brokers are restarted. Partition re balance happens. > Now we observe, kafka connect does not respond. REST API keeps timing out. > Nothing useful is being logged at the connect logs as well. > Only way to get out of this situation currently is to restart the kafka > connect worker and things gets normal. > > The same scenario when tried without data being in progress, works fine. > Meaning REST API does not get into timing out state. > making this issue a blocker, because of the impact due to kafka broker > restart. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8485) Kafka connect worker does not respond when kafka broker goes down with data streaming in progress
[ https://issues.apache.org/jira/browse/KAFKA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858366#comment-16858366 ] kaushik srinivas commented on KAFKA-8485: - Hi [~kkonstantine] I see weird behavior across different runs. One of the suggestion from the kafka dev group pointed out to this bug, [https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-7941] So we have done a patch to our connect with the pull request being opened for this issue. But no help from this patch, we still see stability issues with connect when kafka broker goes down. Now we have seen below stack trace couple of times, where in after restart of kafka brokers, GET requests would work fine but when tried to add a connector or delete a connector, sometimes we have seen this trace {code:java} org.apache.kafka.connect.errors.ConnectException: Error writing connector configuration to Kafka at org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:334) at org.apache.kafka.connect.storage.KafkaConfigBackingStore.putConnectorConfig(KafkaConfigBackingStore.java:303) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:555) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:535) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:271) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:220) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.TimeoutException: Timed out waiting for future at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:73) at org.apache.kafka.connect.storage.KafkaConfigBackingStore.updateConnectorConfig(KafkaConfigBackingStore.java:331) ... 10 more {code} Not much activity is happening in the connect logs even in debug mode except from seeing the above trace. Issue is very consistent, even without data activity once kafka brokers are restarted (2 out of 3 brokers), we see this behavior. > Kafka connect worker does not respond when kafka broker goes down with data > streaming in progress > - > > Key: KAFKA-8485 > URL: https://issues.apache.org/jira/browse/KAFKA-8485 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.2.1 >Reporter: kaushik srinivas >Priority: Blocker > Labels: performance > > Below is the scenario > 3 kafka brokers are up and running. > Kafka connect worker is installed and a hdfs sink connector is added. > Data streaming started, data being flushed out of kafka into hdfs. > Topic is created with 3 partitons, one leader on all the three brokers. > Now, 2 kafka brokers are restarted. Partition re balance happens. > Now we observe, kafka connect does not respond. REST API keeps timing out. > Nothing useful is being logged at the connect logs as well. > Only way to get out of this situation currently is to restart the kafka > connect worker and things gets normal. > > The same scenario when tried without data being in progress, works fine. > Meaning REST API does not get into timing out state. > making this issue a blocker, because of the impact due to kafka broker > restart. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8485) Kafka connect worker does not respond when kafka broker goes down with data streaming in progress
kaushik srinivas created KAFKA-8485: --- Summary: Kafka connect worker does not respond when kafka broker goes down with data streaming in progress Key: KAFKA-8485 URL: https://issues.apache.org/jira/browse/KAFKA-8485 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 2.2.1 Reporter: kaushik srinivas Below is the scenario 3 kafka brokers are up and running. Kafka connect worker is installed and a hdfs sink connector is added. Data streaming started, data being flushed out of kafka into hdfs. Topic is created with 3 partitons, one leader on all the three brokers. Now, 2 kafka brokers are restarted. Partition re balance happens. Now we observe, kafka connect does not respond. REST API keeps timing out. Nothing useful is being logged at the connect logs as well. Only way to get out of this situation currently is to restart the kafka connect worker and things gets normal. The same scenario when tried without data being in progress, works fine. Meaning REST API does not get into timing out state. making this issue a blocker, because of the impact due to kafka broker restart. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8314) Managing the doc field in case of schema projection - kafka connect
[ https://issues.apache.org/jira/browse/KAFKA-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831549#comment-16831549 ] kaushik srinivas commented on KAFKA-8314: - We have added a check in connect/api/src/main/java/org/apache/kafka/connect/data/SchemaProjector.java as below to exclude non mandatory items of schema in our local build of kafka. {code:java} private static void checkMaybeCompatible(Schema source, Schema target) { if (source.type() != target.type() && !isPromotable(source.type(), target.type())) { throw new SchemaProjectorException("Schema type mismatch. source type: " + source.type() + " and target type: " + target.type()); } else if (!Objects.equals(source.name(), target.name())) { throw new SchemaProjectorException("Schema name mismatch. source name: " + source.name() + " and target name: " + target.name()); } else if (!Objects.equals(source.parameters(), target.parameters())) { // Create a copy of the original source and schema properties maps. Map sourceParameters = new HashMap<>(source.parameters()); Map targetParameters = new HashMap<>(target.parameters()); // exclude doc,aliases and namespace fields for schema compatability check sourceParameters.remove("connect.record.doc"); sourceParameters.remove("connect.record.aliases"); sourceParameters.remove("connect.record.namespace"); targetParameters.remove("connect.record.doc"); targetParameters.remove("connect.record.aliases"); targetParameters.remove("connect.record.namespace"); // throw SchemaProjectorException if comparison fails for remaining attributes. if(!Objects.equals(sourceParameters, targetParameters)) { throw new SchemaProjectorException("Schema parameters not equal. source parameters: " + source.parameters() + " and target parameters: " + target.parameters()); } } {code} > Managing the doc field in case of schema projection - kafka connect > --- > > Key: KAFKA-8314 > URL: https://issues.apache.org/jira/browse/KAFKA-8314 > Project: Kafka > Issue Type: Bug >Reporter: kaushik srinivas >Priority: Major > > Doc field change in the schema while writing to hdfs using hdfs sink > connector via connect framework would cause failures in schema projection. > > java.lang.RuntimeException: > org.apache.kafka.connect.errors.SchemaProjectorException: Schema parameters > not equal. source parameters: \{connect.record.doc=xxx} and target > parameters: \{connect.record.doc=yyy} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8314) Managing the doc field in case of schema projection - kafka connect
kaushik srinivas created KAFKA-8314: --- Summary: Managing the doc field in case of schema projection - kafka connect Key: KAFKA-8314 URL: https://issues.apache.org/jira/browse/KAFKA-8314 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas Doc field change in the schema while writing to hdfs using hdfs sink connector via connect framework would cause failures in schema projection. java.lang.RuntimeException: org.apache.kafka.connect.errors.SchemaProjectorException: Schema parameters not equal. source parameters: \{connect.record.doc=xxx} and target parameters: \{connect.record.doc=yyy} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7667) Need synchronous records send support for kafka performance producer java application.
kaushik srinivas created KAFKA-7667: --- Summary: Need synchronous records send support for kafka performance producer java application. Key: KAFKA-7667 URL: https://issues.apache.org/jira/browse/KAFKA-7667 Project: Kafka Issue Type: Improvement Components: tools Affects Versions: 2.0.0 Reporter: kaushik srinivas Assignee: kaushik srinivas Why synchronous send support for performance producer ? ProducerPerformance java application is used for load testing kafka brokers. Load testing involves replicating very high throughput records flowing in to kafka brokers and many producers in field would use synchronous way of sending data i.e blocking until the message has been written completely on all the min.insyc.replicas no of brokers. Asynchronous sends would satisfy the first requirement of loading kafka brokers. This requirement would help in performance tuning the kafka brokers when producers are deployed with "acks": all, "min.insync.replicas" : equal to replication factor and synchronous way of sending. Throughput degradation happens with synchronous producers and this would help in tuning resources for replication in kafka brokers. Also benchmarks could be made from kafka producer perspective with synchronous way of sending records and tune kafka producer's resources appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7659) dummy test
[ https://issues.apache.org/jira/browse/KAFKA-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas reassigned KAFKA-7659: --- Assignee: kaushik srinivas > dummy test > -- > > Key: KAFKA-7659 > URL: https://issues.apache.org/jira/browse/KAFKA-7659 > Project: Kafka > Issue Type: Improvement > Components: tools >Reporter: kaushik srinivas >Assignee: kaushik srinivas >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7659) dummy test
kaushik srinivas created KAFKA-7659: --- Summary: dummy test Key: KAFKA-7659 URL: https://issues.apache.org/jira/browse/KAFKA-7659 Project: Kafka Issue Type: Improvement Components: tools Reporter: kaushik srinivas -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7171) KafkaPerformanceProducer crashes with same transaction id.
kaushik srinivas created KAFKA-7171: --- Summary: KafkaPerformanceProducer crashes with same transaction id. Key: KAFKA-7171 URL: https://issues.apache.org/jira/browse/KAFKA-7171 Project: Kafka Issue Type: Bug Affects Versions: 1.0.1 Reporter: kaushik srinivas Running org.apache.kafka.tools.ProducerPerformance code to performance test the kafka cluster. As a trial cluster has only one broker and zookeeper with 12GB of heap space. Running 6 producers on 3 machines with same transaction id (2 producers on each node). Below are the settings of each producer, kafka-run-class org.apache.kafka.tools.ProducerPerformance --print-metrics --topic perf1 --num-records 9223372036854 --throughput 25 --record-size 200 --producer-props bootstrap.servers=localhost:9092 buffer.memory=524288000 batch.size=524288 for 2 hours all producers run fine, then suddenly throughput of all producers increase 3 times and 4 producers on 2 nodes crashes with below exceptions, [2018-07-16 14:00:18,744] ERROR Error executing user-provided callback on message for topic-partition perf1-6: (org.apache.kafka.clients.producer.internals.RecordBatch) java.lang.ClassCastException: org.apache.kafka.clients.producer.internals.RecordAccumulator$RecordAppendResult cannot be cast to org.apache.kafka.clients.producer.internals.RecordBatch$Thunk at org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:99) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:312) at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:272) at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:57) at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:358) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) at java.lang.Thread.run(Thread.java:748) First machine (2 producers) run fine. Need some pointers on this issue. Queires: why the throughput is increasing 3 times after 2 hours of duration ? why the other producers are crashing ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6961) UnknownTopicOrPartitionException & NotLeaderForPartitionException upon replication of topics.
kaushik srinivas created KAFKA-6961: --- Summary: UnknownTopicOrPartitionException & NotLeaderForPartitionException upon replication of topics. Key: KAFKA-6961 URL: https://issues.apache.org/jira/browse/KAFKA-6961 Project: Kafka Issue Type: Bug Environment: kubernetes cluster kafka. Reporter: kaushik srinivas Attachments: k8s_NotLeaderForPartition.txt, k8s_replication_errors.txt Running kafka & zookeeper in kubernetes cluster. No of brokers : 3 No of partitions per topic : 3 creating topic with 3 partitions, and looks like all the partitions are up. Below is the snapshot to confirm the same, Topic:applestore PartitionCount:3 ReplicationFactor:3 Configs: Topic: applestore Partition: 0Leader: 1001Replicas: 1001,1003,1002Isr: 1001,1003,1002 Topic: applestore Partition: 1Leader: 1002Replicas: 1002,1001,1003Isr: 1002,1001,1003 Topic: applestore Partition: 2Leader: 1003Replicas: 1003,1002,1001Isr: 1003,1002,1001 But, we see in the brokers as soon as the topics are created below stack traces appears, error 1: [2018-05-28 08:00:31,875] ERROR [ReplicaFetcher replicaId=1001, leaderId=1003, fetcherId=7] Error for partition applestore-2 to broker 1003:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. (kafka.server.ReplicaFetcherThread) error 2 : [2018-05-28 00:43:20,993] ERROR [ReplicaFetcher replicaId=1003, leaderId=1001, fetcherId=0] Error for partition apple-6 to broker 1001:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread) When we tries producing records to each specific partition, it works fine and also log size across the replicated brokers appears to be equal, which means replication is happening fine. Attaching the two stack trace files. Why are these stack traces appearing ? can we ignore these stack traces if its some spam messages ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6600) Kafka Bytes Out lags behind Kafka Bytes In on all brokers when topics replicated with 3 and flume kafka consumer.
kaushik srinivas created KAFKA-6600: --- Summary: Kafka Bytes Out lags behind Kafka Bytes In on all brokers when topics replicated with 3 and flume kafka consumer. Key: KAFKA-6600 URL: https://issues.apache.org/jira/browse/KAFKA-6600 Project: Kafka Issue Type: Bug Affects Versions: 0.10.0.1 Reporter: kaushik srinivas Below is the setup detail, Kafka with 3 brokers (each broker with 10 cores and 32GBmem (12 GB heap)). Created topic with 120 partitions and replication factor 3. Throughput per broker is ~40k msgs/sec and bytes in ~8mb/sec. Flume kafka source is used as the consumer. Observations: When the replication factor is kept 1, the bytes out and bytes in stops exactly at same timestamp(i.e when the producer to kafka is stopped). But when the replication factor is increased to 3, there is a time lag observed in bytes out compared to bytes in. Flume kafka source is pulling data slowly. But flume is configured with very high memory and cpu configurations. Tried increasing num.replica.fetchers from default value 1 to 10, 20, 50 etc and replica.fetch.max.bytes from default 1MB to 10MB,20MB. But no improvement is found to be observed in terms of the lag. under repplicated partitions is observed to be zero using replica manager metrics in jmx. Kafka brokers were monitored for cpu and memory, cpu is being used at 3% of total cores max and memory used at 4gb (32 Gb configured). Flume kafka source has overriden kafka consumer properties : max.partition.fetch bytes is kept at default 1MB and fetch.max.bytes is kept at default 52MB. Flume kafka source batch size is kept at default value 1000. agent.sources..kafka.consumer.fetch.max.bytes = 10485760 agent.sources..kafka.consumer.max.partition.fetch.bytes = 10485760 agent.sources..batchSize = 1000 what more tuning is needed in order to reduce the lag between bytes in and bytes out at kafka brokers with replication factor 3 or is there any configuration missed out? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6356) UnknownTopicOrPartitionException & NotLeaderForPartitionException and log deletion happening with retention bytes kept at -1.
kaushik srinivas created KAFKA-6356: --- Summary: UnknownTopicOrPartitionException & NotLeaderForPartitionException and log deletion happening with retention bytes kept at -1. Key: KAFKA-6356 URL: https://issues.apache.org/jira/browse/KAFKA-6356 Project: Kafka Issue Type: Bug Affects Versions: 0.10.1.0 Environment: Cent OS 7.2, HDD : 2Tb, CPUs: 56 cores, RAM : 256GB Reporter: kaushik srinivas Attachments: configs.txt, stderr_b0, stderr_b1, stderr_b2, stdout_b0, stdout_b1, stdout_b2, topic_description, topic_offsets Facing issues in kafka topic with partitions and replication factor of 3. Config used : No of partitions : 20 replication factor : 3 No of brokers : 3 Memory for broker : 32GB Heap for broker : 12GB Producer is run to produce data for 20 partitions of a single topic. But observed that partitions for which the leader is one of the broker(broker-1), the offsets are never incremented and also we see log file with 0MB size in the broker disk. Seeing below error in the brokers : error 1: 2017-12-13 07:11:11,191] ERROR [ReplicaFetcherThread-0-2], Error for partition [test2,5] to broker 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. (kafka.server.ReplicaFetcherThread) error 2: [2017-12-11 12:19:41,599] ERROR [ReplicaFetcherThread-0-2], Error for partition [test1,13] to broker 2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread) Attaching, 1. error and std out files of all the brokers. 2. kafka config used. 3. offsets and topic description. Retention bytes was kept to -1 and retention period 96 hours. But still observing some of the log files deleting at the broker, from logs : [2017-12-11 12:20:20,586] INFO Deleting index /var/lib/mesos/slave/slaves/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-S7/frameworks/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-0006/executors/ckafka__5f085d0c-e296-40f0-a686-8953dd14e4c6/runs/506a1ce7-23d1-45ea-bb7c-84e015405285/kafka-broker-data/broker-1/test1-12/.timeindex (kafka.log.TimeIndex) [2017-12-11 12:20:20,587] INFO Deleted log for partition [test1,12] in /var/lib/mesos/slave/slaves/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-S7/frameworks/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-0006/executors/ckafka__5f085d0c-e296-40f0-a686-8953dd14e4c6/runs/506a1ce7-23d1-45ea-bb7c-84e015405285/kafka-broker-data/broker-1/test1-12. (kafka.log.LogManager) We are expecting the logs to be never delete if retention bytes set to -1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260268#comment-16260268 ] kaushik srinivas commented on KAFKA-6165: - yes. Can the OutOfMemoryError stack trace thrown be made to add more clarity of the root cause. i.e map count being exceeded in this case ? -Kaushik > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas > Attachments: config.json, kafkaServer-gc-agent06.7z, > kafkaServer-gc.log, kafkaServer-gc_agent03.log, kafkaServer-gc_agent04.log, > kafka_config.txt, map_counts_agent06, stderr_broker1.txt, stderr_broker2.txt, > stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) > at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) > at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) > at kafka.server.KafkaApis.handle(KafkaApis.scala:78) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at
[jira] [Updated] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-6165: Attachment: config.json kafkaServer-gc-agent06.7z map_counts_agent06 map count monitored for one broker for 10hrs, gc log of a broker and latest kafka config used. > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas > Attachments: config.json, kafkaServer-gc-agent06.7z, > kafkaServer-gc.log, kafkaServer-gc_agent03.log, kafkaServer-gc_agent04.log, > kafka_config.txt, map_counts_agent06, stderr_broker1.txt, stderr_broker2.txt, > stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) > at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) > at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) > at kafka.server.KafkaApis.handle(KafkaApis.scala:78) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at
[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245321#comment-16245321 ] kaushik srinivas commented on KAFKA-6165: - Hi Huxihx, Thanks for the feedback. with, "log_segment_bytes": 3 "log_retention_check_interval_ms": 180 "log_retention_bytes": "75" "log_retention_hours": 48 We have now observed for 20 hours and brokers have not crashed so far. max map count doesnt seem to go beyond ~15000 now. We will monitor for some more duration and share if there are failures. In general we have below questions,observations and need your recommendations to tune our setups. 1. log.segment.bytes and log_retention_bytes is across all the topics and partitions, In our case, few topics have very high througput and few very low throughput. what would be the recommended way to set these two parameters considering wide range of data rates across topics. 2. log_retention_check_interval_ms is made to 30 mins now. Would there be any extra memory overhead on the brokers if this value is very low (highly frequent). since topics have different data injestion rates, log files generation rate is not same for partitons, What is the best approach to decide on this setting ? 3. Observed map counts of the kafka process, we see that max value on one of the broker is ~15000 and drops down to ~8000 in a span of 1.5 hours. Is this ok from the GC point of view or any more optimisation can be done with respect to this ? Attaching gc log file for one of the broker. 4. Since it appears to be log.segment.bytes config change, is it ok to reduce Java Heap Space back to 8GB (thats where we started from)? Whats the recommendation for java heap space config in cases like ours. 5. Also from a user point of view, is it possible to have some clear error stack traces which helps to understand outOfMemoryError due to map limit exceeded or something ? Attaching files, Kafka latest config : [^config.json] Map Counts monitored on one of the broker for 10 - 15 hrs: [^map_counts_agent06] GC log file on one of the broker : [^kafkaServer-gc-agent06.7z] Thanks in advance, Kaushik > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas > Attachments: config.json, kafkaServer-gc-agent06.7z, > kafkaServer-gc.log, kafkaServer-gc_agent03.log, kafkaServer-gc_agent04.log, > kafka_config.txt, map_counts_agent06, stderr_broker1.txt, stderr_broker2.txt, > stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) > at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) >
[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241745#comment-16241745 ] kaushik srinivas commented on KAFKA-6165: - Hi huxihx, Thanks for the feedback. Initially when the heap size was 8gb also, observed these issues. Do you think 8gb is also a high value for our profiles ? Any recommendation for "vm.max_map_count" to increase ? We did not observe this issue when the throughput to kafka was lesser. ie (Messages/Sec across all topics & partitions : 250k. Bytes In/Sec across all topics & partitions : Approx 50 MB/sec.) started observing with profiles like (Messages/Sec across all topics & partitions : 600k. Bytes In/Sec across all topics & partitions : Approx 120 MB/sec.) Also find the gc log of all the three brokers attached. broker 1: [^kafkaServer-gc.log] broker 2: [^kafkaServer-gc_agent03.log] broker 3: [^kafkaServer-gc_agent04.log] Thanks & Regards, -kaushik > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas > Attachments: kafkaServer-gc.log, kafkaServer-gc_agent03.log, > kafkaServer-gc_agent04.log, kafka_config.txt, stderr_broker1.txt, > stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) > at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) > at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) > at kafka.server.KafkaApis.handle(KafkaApis.scala:78) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) > at >
[jira] [Updated] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-6165: Attachment: kafkaServer-gc.log kafkaServer-gc_agent03.log kafkaServer-gc_agent04.log gc files of the 3 kafka brokers when the outOfMemoryError was occuring with high throughput data coming in to the kafka. > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas > Attachments: kafkaServer-gc.log, kafkaServer-gc_agent03.log, > kafkaServer-gc_agent04.log, kafka_config.txt, stderr_broker1.txt, > stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) > at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) > at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) > at kafka.server.KafkaApis.handle(KafkaApis.scala:78) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at
[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239413#comment-16239413 ] kaushik srinivas commented on KAFKA-6165: - Observed the no of file descriptors open over a period of time on one of the broker, cat /proc/sys/fs/file-nr 47488 0 39340038 Did not observe it exceeding the limit. > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas > Attachments: kafka_config.txt, stderr_broker1.txt, > stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) > at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) > at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) > at kafka.server.KafkaApis.handle(KafkaApis.scala:78) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159) > at kafka.log.Log.roll(Log.scala:771) > at kafka.log.Log.maybeRoll(Log.scala:742) >
[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237555#comment-16237555 ] kaushik srinivas commented on KAFKA-6165: - Tried with 12GB of Heap space. Observed kafka brokers crashing again with, [2017-11-03 08:02:12,424] FATAL [ReplicaFetcherThread-0-0], Disk error while replicating data for store_sales-15 (kafka.server.ReplicaFetcherThread) kafka.common.KafkaStorageException: I/O exception in append to log 'store_sales-15' at kafka.log.Log.append(Log.scala:349) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:130) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:159) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:141) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:141) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:138) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:138) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:136) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) at kafka.log.AbstractIndex.(AbstractIndex.scala:61) at kafka.log.TimeIndex.(TimeIndex.scala:55) at kafka.log.LogSegment.(LogSegment.scala:68) at kafka.log.Log.roll(Log.scala:776) at kafka.log.Log.maybeRoll(Log.scala:742) at kafka.log.Log.append(Log.scala:405) ... 16 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) Observing 300k messages/sec on each broker (3 brokers) at the time of broker crash. > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas >Priority: Major > Attachments: kafka_config.txt, stderr_broker1.txt, > stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at
[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237358#comment-16237358 ] kaushik srinivas commented on KAFKA-6165: - Sure will try to reduce the heap size to 12gb. Initially the config was 8gb heap. But then observed outOfMemory issues more frequently. Actually it was consuming around 10gb heap, that was the reason heap was increased to 16gb. > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas >Priority: Major > Attachments: kafka_config.txt, stderr_broker1.txt, > stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) > at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) > at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) > at kafka.server.KafkaApis.handle(KafkaApis.scala:78) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159) > at
[jira] [Commented] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
[ https://issues.apache.org/jira/browse/KAFKA-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237189#comment-16237189 ] kaushik srinivas commented on KAFKA-6165: - java version "1.8.0_112" Java(TM) SE Runtime Environment (build 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode) > Kafka Brokers goes down with outOfMemoryError. > -- > > Key: KAFKA-6165 > URL: https://issues.apache.org/jira/browse/KAFKA-6165 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 > Environment: DCOS cluster with 4 agent nodes and 3 masters. > agent machine config : > RAM : 384 GB > DISK : 4TB >Reporter: kaushik srinivas >Priority: Major > Attachments: kafka_config.txt, stderr_broker1.txt, > stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt > > > Performance testing kafka with end to end pipe lines of, > Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 > Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 > stream1 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > stream2 kafka configs : > No of topics : 10 > No of partitions : 20 for all the topics > Some important Kafka Configuration : > "BROKER_MEM": "32768"(32GB) > "BROKER_JAVA_HEAP": "16384"(16GB) > "BROKER_COUNT": "3" > "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) > "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) > "KAFKA_NUM_PARTITIONS": "20" > "BROKER_DISK_SIZE": "5000" (5GB) > "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) > "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) > Data Producer to kafka Throughput: > message rate : 5 lakhs messages/sec approx across all the 3 brokers and > topics/partitions. > message size : approx 300 to 400 bytes. > Issues observed with this configs: > Issue 1: > stack trace: > [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to > unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > 'store_sales-16' > at kafka.log.Log.append(Log.scala:349) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) > at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) > at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) > at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) > at kafka.server.KafkaApis.handle(KafkaApis.scala:78) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) > at > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at > kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159) > at kafka.log.Log.roll(Log.scala:771) > at kafka.log.Log.maybeRoll(Log.scala:742) >
[jira] [Created] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
kaushik srinivas created KAFKA-6165: --- Summary: Kafka Brokers goes down with outOfMemoryError. Key: KAFKA-6165 URL: https://issues.apache.org/jira/browse/KAFKA-6165 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.0 Environment: DCOS cluster with 4 agent nodes and 3 masters. agent machine config : RAM : 384 GB DISK : 4TB Reporter: kaushik srinivas Priority: Major Attachments: kafka_config.txt, stderr_broker1.txt, stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt Performance testing kafka with end to end pipe lines of, Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 stream1 kafka configs : No of topics : 10 No of partitions : 20 for all the topics stream2 kafka configs : No of topics : 10 No of partitions : 20 for all the topics Some important Kafka Configuration : "BROKER_MEM": "32768"(32GB) "BROKER_JAVA_HEAP": "16384"(16GB) "BROKER_COUNT": "3" "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) "KAFKA_NUM_PARTITIONS": "20" "BROKER_DISK_SIZE": "5000" (5GB) "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) Data Producer to kafka Throughput: message rate : 5 lakhs messages/sec approx across all the 3 brokers and topics/partitions. message size : approx 300 to 400 bytes. Issues observed with this configs: Issue 1: stack trace: [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager) kafka.common.KafkaStorageException: I/O exception in append to log 'store_sales-16' at kafka.log.Log.append(Log.scala:349) at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) at kafka.server.KafkaApis.handle(KafkaApis.scala:78) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106) at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160) at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159) at kafka.log.Log.roll(Log.scala:771) at kafka.log.Log.maybeRoll(Log.scala:742) at kafka.log.Log.append(Log.scala:405) ... 22 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) ... 34 more Issue 2 : stack trace : [2017-11-02 23:55:49,602] FATAL [ReplicaFetcherThread-0-0], Disk error while replicating data for catalog_sales-3