[jira] [Created] (KAFKA-17101) Mirror maker internal topics cleanup policy changes to 'delete' from 'compact'
kaushik srinivas created KAFKA-17101: Summary: Mirror maker internal topics cleanup policy changes to 'delete' from 'compact' Key: KAFKA-17101 URL: https://issues.apache.org/jira/browse/KAFKA-17101 Project: Kafka Issue Type: Bug Affects Versions: 3.6.1, 3.5.1, 3.4.1 Reporter: kaushik srinivas Scenario/Setup details Kafka cluster 1: 3 replicas Kafka cluster 2: 3 replicas MM1 moving data from cluster 1 to cluster 2 MM2 moving data from cluster 2 to cluster 1 Sometimes with a reboot of the kafka cluster 1 and MM1 instance, we observe MM failing to come up with below exception, {code:java} {"message":"DistributedHerder-connect-1-1 - org.apache.kafka.connect.runtime.distributed.DistributedHerder - [Worker clientId=connect-1, groupId=site1-mm2] Uncaught exception in herder work thread, exiting: "}} org.apache.kafka.common.config.ConfigException: Topic 'mm2-offsets.site1.internal' supplied via the 'offset.storage.topic' property is required to have 'cleanup.policy=compact' to guarantee consistency and durability of source connector offsets, but found the topic currently has 'cleanup.policy=delete'. Continuing would likely result in eventually losing source connector offsets and problems restarting this Connect cluster in the future. Change the 'offset.storage.topic' property in the Connect worker configurations to use a topic with 'cleanup.policy=compact'. {code} Once the topic is altered with cleanup policy of compact. MM works just fine. This is happening on our setups sporadically and across varieties of scenarios. Not been successful in identifying the exact reproduction steps as of now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17079) scoverage plugin not found in maven repo, version 1.9.3
kaushik srinivas created KAFKA-17079: Summary: scoverage plugin not found in maven repo, version 1.9.3 Key: KAFKA-17079 URL: https://issues.apache.org/jira/browse/KAFKA-17079 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas Team, Creating coverage reports for kafka core module. Below issue is seen. Branch used is 3.6 * Exception is: org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':core:compileScoverageScala'. at org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:38) at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77) at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55) at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52) at org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:204) at org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:199) at org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66) at org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59) at org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157) at org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59) at org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:53) at org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:73) at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52) at org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:42) at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:337) at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:324) at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:317) at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:303) at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:463) at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:380) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) at org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47) Caused by: org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException: Could not resolve all files for configuration ':core:scoverage'. at org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.mapFailure(DefaultConfiguration.java:1769) at org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.access$3400(DefaultConfiguration.java:176) at org.gradle.api.internal.artifacts.configurations.DefaultConfiguration$DefaultResolutionHost.mapFailure(DefaultConfiguration.java:2496) at org.gradle.api.internal.artifacts.configurations.ResolutionHost.rethrowFailure(ResolutionHost.java:30) at org.gradle.api.internal.artifacts.configurations.ResolutionBackedFileCollection.visitContents(ResolutionBackedFileCollection.java:74) at org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) at org.gradle.api.internal.artifacts.configurations.DefaultConfiguration.visitContents(DefaultConfiguration.java:574) at org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) at org.gradle.api.internal.file.CompositeFileCollection.lambda$visitContents$0(CompositeFileCollection.java:133) at org.gradle.api.internal.file.UnionFileCollection.visitChildren(UnionFileCollection.java:81) at org.gradle.api.internal.file.CompositeFileCollection.visitContents(CompositeFileCollection.java:133) at org.gradle.api.internal.file.AbstractFileCollection.visitStructure(AbstractFileCollection.java:366) at org.gradle.api.internal.file.CompositeF
[jira] [Created] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
kaushik srinivas created KAFKA-16370: Summary: offline rollback procedure from kraft mode to zookeeper mode. Key: KAFKA-16370 URL: https://issues.apache.org/jira/browse/KAFKA-16370 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas >From the KIP, >[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] h2. Finalizing the Migration Once the cluster has been fully upgraded to KRaft mode, the controller will still be running in migration mode and making dual writes to KRaft and ZK. Since the data in ZK is still consistent with that of the KRaft metadata log, it is still possible to revert back to ZK. *_The time that the cluster is running all KRaft brokers/controllers, but still running in migration mode, is effectively unbounded._* Once the operator has decided to commit to KRaft mode, the final step is to restart the controller quorum and take it out of migration mode by setting _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active controller will only finalize the migration once it detects that all members of the quorum have signaled that they are finalizing the migration (again, using the tagged field in ApiVersionsResponse). Once the controller leaves migration mode, it will write a ZkMigrationStateRecord to the log and no longer perform writes to ZK. It will also disable its special handling of ZK RPCs. *At this point, the cluster is fully migrated and is running in KRaft mode. A rollback to ZK is still possible after finalizing the migration, but it must be done offline and it will cause metadata loss (which can also cause partition data loss).* Trying out the same in a kafka cluster which is migrated from zookeeper into kraft mode. We observe the rollback is possible by deleting the "/controller" node in the zookeeper before the rollback from kraft mode to zookeeper is done. The above snippet indicates that the rollback from kraft to zk after migration is finalized is still possible in offline method. Is there any already known steps to be done as part of this offline method of rollback ? >From our experience, we currently know of the step "deletion of /controller >node in zookeeper to force zookeper based brokers to be elected as new >controller after the rollback is done". Are there any additional steps/actions >apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16360) Release plan of 3.x kafka releases.
kaushik srinivas created KAFKA-16360: Summary: Release plan of 3.x kafka releases. Key: KAFKA-16360 URL: https://issues.apache.org/jira/browse/KAFKA-16360 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas KIP [https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-ReleaseTimeline] mentions , h2. Kafka 3.7 * January 2024 * Final release with ZK mode But we see in Jira, some tickets are marked for 3.8 release. Does apache continue to make 3.x releases having zookeeper and kraft supported independent of pure kraft 4.x releases ? If yes, how many more releases can be expected on 3.x release line ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
RE: Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
Hi Team, Any help on this query ? Thanks. From: Kaushik Srinivas (Nokia) Sent: Tuesday, August 22, 2023 10:27 AM To: dev@kafka.apache.org Subject: Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases. Hi Team, Referring to the upgrade documentation for apache kafka. https://kafka.apache.org/34/documentation.html#upgrade_3_4_0 There is confusion with respect to below statements from the above sectioned link of apache docs. "If you are upgrading from a version prior to 2.1.x, please see the note below about the change to the schema used to store consumer offsets. Once you have changed the inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a version prior to 2.1." The above statement mentions that the downgrade would not be possible to version prior to "2.1" in case of "upgrading the inter.broker.protocol.version to the latest version". But, there is another statement made in the documentation in point 4 as below "Restart the brokers one by one for the new protocol version to take effect. Once the brokers begin using the latest protocol version, it will no longer be possible to downgrade the cluster to an older version." These two statements are repeated across a lot of prior releases of kafka and is confusing. Below are the questions: 1. Is downgrade not at all possible to "any" older version of kafka once the inter.broker.protocol.version is updated to latest version OR downgrades are not possible only to versions "<2.1" ? 2. Suppose one takes an approach similar to upgrade even for the downgrade path. i.e. downgrade the inter.broker.protocol.version first to the previous version, next downgrade the software/code of kafka to previous release revision. Does downgrade work with this approach ? Can these two questions be documented if the results are already known ? Regards, Kaushik.
Need more clarity in documentation for upgrade/downgrade procedures and limitations across releases.
Hi Team, Referring to the upgrade documentation for apache kafka. https://kafka.apache.org/34/documentation.html#upgrade_3_4_0 There is confusion with respect to below statements from the above sectioned link of apache docs. "If you are upgrading from a version prior to 2.1.x, please see the note below about the change to the schema used to store consumer offsets. Once you have changed the inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a version prior to 2.1." The above statement mentions that the downgrade would not be possible to version prior to "2.1" in case of "upgrading the inter.broker.protocol.version to the latest version". But, there is another statement made in the documentation in point 4 as below "Restart the brokers one by one for the new protocol version to take effect. Once the brokers begin using the latest protocol version, it will no longer be possible to downgrade the cluster to an older version." These two statements are repeated across a lot of prior releases of kafka and is confusing. Below are the questions: 1. Is downgrade not at all possible to "any" older version of kafka once the inter.broker.protocol.version is updated to latest version OR downgrades are not possible only to versions "<2.1" ? 2. Suppose one takes an approach similar to upgrade even for the downgrade path. i.e. downgrade the inter.broker.protocol.version first to the previous version, next downgrade the software/code of kafka to previous release revision. Does downgrade work with this approach ? Can these two questions be documented if the results are already known ? Regards, Kaushik.
[jira] [Created] (KAFKA-15223) Need clarity in documentation for upgrade/downgrade across releases.
kaushik srinivas created KAFKA-15223: Summary: Need clarity in documentation for upgrade/downgrade across releases. Key: KAFKA-15223 URL: https://issues.apache.org/jira/browse/KAFKA-15223 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas Referring to the upgrade documentation for apache kafka. [https://kafka.apache.org/34/documentation.html#upgrade_3_4_0] There is confusion with respect to below statements from the above sectioned link of apache docs. "If you are upgrading from a version prior to 2.1.x, please see the note below about the change to the schema used to store consumer offsets. *Once you have changed the inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a version prior to 2.1."* The above statement mentions that the downgrade would not be possible to version prior to "2.1" in case of "upgrading the inter.broker.protocol.version to the latest version". But, there is another statement made in the documentation in *point 4* as below "Restart the brokers one by one for the new protocol version to take effect. {*}Once the brokers begin using the latest protocol version, it will no longer be possible to downgrade the cluster to an older version.{*}" These two statements are repeated across a lot of prior release of kafka and is confusing. Below are the questions: # Is downgrade not at all possible to *"any"* older version of kafka once the inter.broker.protocol.version is updated to latest version *OR* downgrades are not possible only to versions *"<2.1"* ? # Suppose one takes an approach similar to upgrade even for the downgrade path. i.e. downgrade the inter.broker.protocol.version first to the previous version, next downgrade the software/code of kafka to previous release revision. Does downgrade work with this approach ? Can these two questions be documented if the results are already known ? Maybe a downgrade guide can be created too similar to the existing upgrade guide ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-13578) Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500
kaushik srinivas created KAFKA-13578: Summary: Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500 Key: KAFKA-13578 URL: https://issues.apache.org/jira/browse/KAFKA-13578 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas We see from the KIP-500 [https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum] Below statement needs some clarity, *We will be able to upgrade from any version of Kafka to this bridge release, and from the bridge release to a post-ZK release. When upgrading from an earlier release to a post-ZK release, the upgrade must be done in two steps: first, you must upgrade to the bridge release, and then you must upgrade to the post-ZK release.* What apache kafka version is referred to as bridge release in this context ? can apache kafka 3.0.0 be considered bridge release even though it is not production ready to run without zookeeper ? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers
[ https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas resolved KAFKA-13177. -- Resolution: Not A Bug > partition failures and fewer shrink but a lot of isr expansions with > increased num.replica.fetchers in kafka brokers > > > Key: KAFKA-13177 > URL: https://issues.apache.org/jira/browse/KAFKA-13177 > Project: Kafka > Issue Type: Bug > Reporter: kaushik srinivas >Assignee: kaushik srinivas >Priority: Major > > Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s) > topics : 15, partitions each : 15 replication factor 3, min.insync.replicas > : 2 > producers running with acks : all > Initially the num.replica.fetchers was set to 1 (default) and we observed > very frequent ISR shrinks and expansions. So the setups were tuned with a > higher value of 4. > Once after this change was done, we see below behavior and warning msgs in > broker logs > # Over a period of 2 days, there are around 10 shrinks corresponding to 10 > partitions, but around 700 ISR expansions corresponding to almost all > partitions in the cluster(approx 50 to 60 partitions). > # we see frequent warn msg of partitions being marked as failure in the same > time span. Below is the trace --> {"type":"log", "host":"ww", > "level":"WARN", "neid":"kafka-ww", "system":"kafka", > "time":"2021-08-03T20:09:15.340", "timezone":"UTC", > "log":{"message":"ReplicaFetcherThread-2-1003 - > kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, > leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}* > > We see the above behavior continuously after increasing the > num.replica.fetchers to 4 from 1. We did increase this to improve the > replication performance and hence reduce the ISR shrinks. > But we see this strange behavior after the change. What would the above trace > indicate and is marking partitions as failed just a WARN msgs and handled by > kafka or is it something to worry about ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13178) frequent network_exception trace at kafka producer.
kaushik srinivas created KAFKA-13178: Summary: frequent network_exception trace at kafka producer. Key: KAFKA-13178 URL: https://issues.apache.org/jira/browse/KAFKA-13178 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas Running 3 node kafka cluster (4 cores and 4 cpu in k8s). topics : 15, partitions each : 15 , replication factor : 3. min.insync.replicas : 2 producer is running with acks : "all" We see frequent failures in the kafka producer with below trace {"host":"ww","level":"WARN","log":{"classname":"org.apache.kafka.clients.producer.internals.Sender:595","message":"[Producer clientId=producer-1] Got error produce response with correlation id 2646 on topic-partition *-0, retrying (2 attempts left). Error: NETWORK_EXCEPTION","stacktrace":"","threadname":"kafka-producer-network-thread | producer-1"},"time":"2021-08-*04T02:22:20.529Z","timezone":"UTC","type":"log","system":"w","systemid":"3"} What could be possible reasons for the above trace ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers
kaushik srinivas created KAFKA-13177: Summary: partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers Key: KAFKA-13177 URL: https://issues.apache.org/jira/browse/KAFKA-13177 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s) topics : 15, partitions each : 15 replication factor 3, min.insync.replicas : 2 producers running with acks : all Initially the num.replica.fetchers was set to 1 (default) and we observed very frequent ISR shrinks and expansions. So the setups were tuned with a higher value of 4. Once after this change was done, we see below behavior and warning msgs in broker logs # Over a period of 2 days, there are around 10 shrinks corresponding to 10 partitions, but around 700 ISR expansions corresponding to almost all partitions in the cluster(approx 50 to 60 partitions). # we see frequent warn msg of partitions being marked as failure in the same time span. Below is the trace --> {"type":"log", "host":"ww", "level":"WARN", "neid":"kafka-ww", "system":"kafka", "time":"2021-08-03T20:09:15.340", "timezone":"UTC", "log":{"message":"ReplicaFetcherThread-2-1003 - kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}* We see the above behavior continuously after increasing the num.replica.fetchers to 4 from 1. We did increase this to improve the replication performance and hence reduce the ISR shrinks. But we see this strange behavior after the change. What would the above trace indicate and is marking partitions as failed just a WARN msgs and handled by kafka or is it something to worry about ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13176) frequent ISR shrinks and expansion with default num.replica.fetchers (1) under very low throughput conditions.
kaushik srinivas created KAFKA-13176: Summary: frequent ISR shrinks and expansion with default num.replica.fetchers (1) under very low throughput conditions. Key: KAFKA-13176 URL: https://issues.apache.org/jira/browse/KAFKA-13176 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas Running a 3 node kafka cluster (2.3.x kafka) with 4 cores of cpu and 4Gi of memory on a k8s environment. num.replica.fetchers is configured to 1 (default value). There are around 15 topics in the cluster and all of them receive a very low rate of records/sec (less than 100 per second most of the cases). All the topics have more than 10 partitions and 3 replication each. min.insync.replicas is set to 2. And producers are run with acks level set to 'all'. we constantly observer ISR shrinks and expansions for almost each topic partition continuously. shrinks and expansions are mostly seperated by around 6 to 8 seconds mostly usually. During these shrinks and expands we see a lot of request time outs at the kafka producer side for these topics. any known configuration items we can use to overcome this ? Confused about the fact of continuous ISR shrinks and expands with very low throughput topics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12858) dynamically update the ssl certificates of kafka connect worker without restarting connect process.
kaushik srinivas created KAFKA-12858: Summary: dynamically update the ssl certificates of kafka connect worker without restarting connect process. Key: KAFKA-12858 URL: https://issues.apache.org/jira/browse/KAFKA-12858 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas Hi, We are trying to update the ssl certificates of kafka connect worker which is due for expiry. Is there any way to dynamically update the ssl certificate of connet worker as it is possible in kafka using kafka-configs.sh script ? If not, what is the recommended way to update the ssl certificates of kafka connect worker without disrupting the existing traffic ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12855) Update ssl certificates of kafka connect worker runtime without restarting the worker process.
kaushik srinivas created KAFKA-12855: Summary: Update ssl certificates of kafka connect worker runtime without restarting the worker process. Key: KAFKA-12855 URL: https://issues.apache.org/jira/browse/KAFKA-12855 Project: Kafka Issue Type: Improvement Components: KafkaConnect Reporter: kaushik srinivas Is there a possibility to update the ssl certificates of kafka connect worker dynamically something similar to kafka-configs script for kafka ? Or the only way to update the certificates is to restart the worker processes and update the certificates ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.
kaushik srinivas created KAFKA-12534: Summary: kafka-configs does not work with ssl enabled kafka broker. Key: KAFKA-12534 URL: https://issues.apache.org/jira/browse/KAFKA-12534 Project: Kafka Issue Type: Bug Affects Versions: 2.6.1 Reporter: kaushik srinivas We are trying to change the trust store password on the fly using the kafka-configs script for a ssl enabled kafka broker. Below is the command used: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx' But we see below error in the broker logs when the command is run. {"type":"log", "host":"kf-2-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", "time":"2021-03-23T12:14:40.055", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] Failed authentication with /127.0.0.1 (SSL handshake failed)"}} How can anyone configure ssl certs for the kafka-configs script and succeed with the ssl handshake in this case ? Note : We are trying with a single listener i.e SSL: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.
kaushik srinivas created KAFKA-12530: Summary: kafka-configs.sh does not work while changing the sasl jaas configurations. Key: KAFKA-12530 URL: https://issues.apache.org/jira/browse/KAFKA-12530 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas We are trying to use kafka-configs script to modify the sasl jaas configurations, but unable to do so. Command used: ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"test\" \n password=\"test\"; \n };' error: requirement failed: Invalid entity config: all configs to be added must be in the format "key=val". command 2: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=[username=test,password=test]' output: command does not return , but kafka broker logs below error: DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set SASL server state to FAILED during authentication"}} {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.)"}} We have below issues: 1. If one installs kafka broker with SASL mechanism and wants to change the SASL jaas config via kafka-configs scripts, how is it supposed to be done ? does kafka-configs needs client credentials to do the same ? 2. Can anyone point us to example commands of kafka-configs to alter the sasl.jaas.config property of kafka broker. We do not see any documentation or examples for the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12529) kafka-configs.sh does not work while changing the sasl jaas configurations.
kaushik srinivas created KAFKA-12529: Summary: kafka-configs.sh does not work while changing the sasl jaas configurations. Key: KAFKA-12529 URL: https://issues.apache.org/jira/browse/KAFKA-12529 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas We are trying to use kafka-configs script to modify the sasl jaas configurations, but unable to do so. Command used: ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"test\" \n password=\"test\"; \n };' error: requirement failed: Invalid entity config: all configs to be added must be in the format "key=val". command 2: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=[username=test,password=test]' output: command does not return , but kafka broker logs below error: DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set SASL server state to FAILED during authentication"}} {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.)"}} We have below issues: 1. If one installs kafka broker with SASL mechanism and wants to change the SASL jaas config via kafka-configs scripts, how is it supposed to be done ? does kafka-configs needs client credentials to do the same ? 2. Can anyone point us to example commands of kafka-configs to alter the sasl.jaas.config property of kafka broker. We do not see any documentation or examples for the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12528) kafka-configs.sh does not work while changing the sasl jaas configurations.
kaushik srinivas created KAFKA-12528: Summary: kafka-configs.sh does not work while changing the sasl jaas configurations. Key: KAFKA-12528 URL: https://issues.apache.org/jira/browse/KAFKA-12528 Project: Kafka Issue Type: Bug Components: admin, core Reporter: kaushik srinivas We are trying to modify the sasl jaas configurations for the kafka broker runtime using the dynamic config update functionality using the kafka-configs.sh script. But we are unable to get it working. Below is our command: ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"test\" \n password=\"test\"; \n };' command is exiting with error: requirement failed: Invalid entity config: all configs to be added must be in the format "key=val". we also tried below format as well: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 59 --alter --add-config 'sasl.jaas.config=[username=test,password=test]' command does not return but the kafka broker logs prints the below error messages. org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set SASL server state to FAILED during authentication"}} {"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", "time":"2021-03-23T08:29:00.946", "timezone":"UTC", "log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.)"}} 1. If one has SASL enabled and with a single listener, how are we supposed to change the sasl credentials using this command ? 2. can anyone point us out to some example commands for modifying the sasl jaas configurations ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.
kaushik srinivas created KAFKA-12164: Summary: ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system. Key: KAFKA-12164 URL: https://issues.apache.org/jira/browse/KAFKA-12164 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: kaushik srinivas In our production labs, an issue is observed. Below is the sequence of the same. # hdfs connector is added to the connect worker. # hdfs connector is creating folders in hdfs /test1=1/test2=2/ Based on the custom partitioner. Here test1 and test2 are two separate nested directories derived from multiple fields in the record using a custom partitioner. # Now kafka connect hdfs connector uses below function calls to create the directories in the hdfs file system. fs.mkdirs(new Path(filename)); ref: [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java] Now the important thing to note is that if mkdirs() is a non atomic operation (i.e can result in partial execution if interrupted) then suppose the first directory ie test1 is created and just before creation of test2 in hdfs happens if there is a restart to the connect worker pod. Then the hdfs file system will remain with partial folders created for partitions during the restart time frames. So we might have conditions in hdfs as below /test1=0/test2=0/ /test1=1/ /test1=2/test2=2 /test1=3/test2=3 So the second partition has a missing directory in it. And if hive integration is enabled, hive metastore exceptions will occur since there is a partition expected from hive table is missing for few partitions in hdfs. *This can occur to any connector with some ongoing non atomic operation and a restart is triggered to kafka connect worker pod. This will result in some partially completed states in the system and may cause issues for the connector to continue its operation*. *This is a very critical issue and needs some attention on ways for handling the same.* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.
kaushik srinivas created KAFKA-10278: Summary: kafka-configs does not show the current properties of running kafka broker upon describe. Key: KAFKA-10278 URL: https://issues.apache.org/jira/browse/KAFKA-10278 Project: Kafka Issue Type: Bug Affects Versions: 2.4.1 Reporter: kaushik srinivas kafka-configs.sh does not list the properties (read-only/per-broker/cluster-wide) with which the kafka broker is currently running. The command returns nothing. Only those properties added or updated via kafka-configs.sh is listed by the describe command. bash-4.2$ env -i bin/kafka-configs.sh --bootstrap-server kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers --entity-default --describe Default config for brokers in the cluster are: log.cleaner.threads=2 sensitive=false synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9934) Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT
kaushik srinivas created KAFKA-9934: --- Summary: Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT Key: KAFKA-9934 URL: https://issues.apache.org/jira/browse/KAFKA-9934 Project: Kafka Issue Type: Improvement Components: security Affects Versions: 2.4.1 Reporter: kaushik srinivas Need information on the case where the protocol is PLAINTEXT for listeners in kafka. Does Authorization applies when the protocol is PLAINTEXT ? if so, what would be used as the principal name for the authorization acl validations? There is no doc which describes this case. Need info and doc update for the same. Thanks, kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.
kaushik srinivas created KAFKA-9933: --- Summary: Need doc update on the AclAuthorizer when SASL_SSL is the protocol used. Key: KAFKA-9933 URL: https://issues.apache.org/jira/browse/KAFKA-9933 Project: Kafka Issue Type: Improvement Components: security Affects Versions: 2.4.1 Reporter: kaushik srinivas Hello, Document on the usage of the authorizer does not speak about the principal being used when the protocol for the listener is chosen as SASL + SSL (SASL_SSL). Suppose kerberos and ssl is enabled together, will the authorization be based on the kerberos principal names or on the ssl certificate DN names ? There is no document covering this part of the use case. This needs information and documentation update. Thanks, Kaushik. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-8622) Snappy Compression Not Working
[ https://issues.apache.org/jira/browse/KAFKA-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas resolved KAFKA-8622. - Resolution: Resolved > Snappy Compression Not Working > -- > > Key: KAFKA-8622 > URL: https://issues.apache.org/jira/browse/KAFKA-8622 > Project: Kafka > Issue Type: Bug > Components: compression >Affects Versions: 2.3.0, 2.2.1 >Reporter: Kunal Verma > Assignee: kaushik srinivas >Priority: Major > > I am trying to produce a message on the broker with compression enabled as > snappy. > Environment : > Brokers[Kafka-cluster] are hosted on Centos 7 > I have download the latest version (2.3.0 & 2.2.1) tar, extract it and moved > to /opt/kafka- > I have executed the broker with standard configuration. > In my producer service(written in java), I have enabled snappy compression. > props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy"); > > so while sending record on broker, I am getting following errors: > org.apache.kafka.common.errors.UnknownServerException: The server experienced > an unexpected error when processing the request > > While investing further at broker end I got following error in log > > logs/kafkaServer.out:java.lang.UnsatisfiedLinkError: > /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: > /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: > failed to map segment from shared object: Operation not permitted > -- > > [2019-07-02 15:29:43,399] ERROR [ReplicaManager broker=1] Error processing > append operation on partition test-bulk-1 (kafka.server.ReplicaManager) > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > at > org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:435) > at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:466) > at java.io.DataInputStream.readByte(DataInputStream.java:265) > at org.apache.kafka.common.utils.ByteUtils.readVarint(ByteUtils.java:168) > at > org.apache.kafka.common.record.DefaultRecord.readFrom(DefaultRecord.java:293) > at > org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(DefaultRecordBatch.java:264) > at > org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:569) > at > org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:538) > at > org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:327) > at > scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:55) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:269) > at > kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1$adapted(LogValidator.scala:261) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261) > at > kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:73) > at kafka.log.Log.liftedTree1$1(Log.scala:881) > at kafka.log.Log.$anonfun$append$2(Log.scala:868) > at kafka.log.Log.maybeHandleIOException(Log.scala:2065) > at kafka.log.Log.append(Log.scala:850) > at kafka.log.Log.appendAsLeader(Log.scala:819) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:2
[jira] [Created] (KAFKA-8485) Kafka connect worker does not respond when kafka broker goes down with data streaming in progress
kaushik srinivas created KAFKA-8485: --- Summary: Kafka connect worker does not respond when kafka broker goes down with data streaming in progress Key: KAFKA-8485 URL: https://issues.apache.org/jira/browse/KAFKA-8485 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 2.2.1 Reporter: kaushik srinivas Below is the scenario 3 kafka brokers are up and running. Kafka connect worker is installed and a hdfs sink connector is added. Data streaming started, data being flushed out of kafka into hdfs. Topic is created with 3 partitons, one leader on all the three brokers. Now, 2 kafka brokers are restarted. Partition re balance happens. Now we observe, kafka connect does not respond. REST API keeps timing out. Nothing useful is being logged at the connect logs as well. Only way to get out of this situation currently is to restart the kafka connect worker and things gets normal. The same scenario when tried without data being in progress, works fine. Meaning REST API does not get into timing out state. making this issue a blocker, because of the impact due to kafka broker restart. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8314) Managing the doc field in case of schema projection - kafka connect
kaushik srinivas created KAFKA-8314: --- Summary: Managing the doc field in case of schema projection - kafka connect Key: KAFKA-8314 URL: https://issues.apache.org/jira/browse/KAFKA-8314 Project: Kafka Issue Type: Bug Reporter: kaushik srinivas Doc field change in the schema while writing to hdfs using hdfs sink connector via connect framework would cause failures in schema projection. java.lang.RuntimeException: org.apache.kafka.connect.errors.SchemaProjectorException: Schema parameters not equal. source parameters: \{connect.record.doc=xxx} and target parameters: \{connect.record.doc=yyy} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7667) Need synchronous records send support for kafka performance producer java application.
kaushik srinivas created KAFKA-7667: --- Summary: Need synchronous records send support for kafka performance producer java application. Key: KAFKA-7667 URL: https://issues.apache.org/jira/browse/KAFKA-7667 Project: Kafka Issue Type: Improvement Components: tools Affects Versions: 2.0.0 Reporter: kaushik srinivas Assignee: kaushik srinivas Why synchronous send support for performance producer ? ProducerPerformance java application is used for load testing kafka brokers. Load testing involves replicating very high throughput records flowing in to kafka brokers and many producers in field would use synchronous way of sending data i.e blocking until the message has been written completely on all the min.insyc.replicas no of brokers. Asynchronous sends would satisfy the first requirement of loading kafka brokers. This requirement would help in performance tuning the kafka brokers when producers are deployed with "acks": all, "min.insync.replicas" : equal to replication factor and synchronous way of sending. Throughput degradation happens with synchronous producers and this would help in tuning resources for replication in kafka brokers. Also benchmarks could be made from kafka producer perspective with synchronous way of sending records and tune kafka producer's resources appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Need to subscribe to mail list and get access to contribute to jira tickets
Hi, Need subscription to kafka mailing list. Also need to assign jira tickets to myself. Have worked on few pull requests and need to submit the code. Need support in getting the required permissions to assign the kafka jira ticket to myself. Thanks & Regards, kaushik
[jira] [Created] (KAFKA-7659) dummy test
kaushik srinivas created KAFKA-7659: --- Summary: dummy test Key: KAFKA-7659 URL: https://issues.apache.org/jira/browse/KAFKA-7659 Project: Kafka Issue Type: Improvement Components: tools Reporter: kaushik srinivas -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7171) KafkaPerformanceProducer crashes with same transaction id.
kaushik srinivas created KAFKA-7171: --- Summary: KafkaPerformanceProducer crashes with same transaction id. Key: KAFKA-7171 URL: https://issues.apache.org/jira/browse/KAFKA-7171 Project: Kafka Issue Type: Bug Affects Versions: 1.0.1 Reporter: kaushik srinivas Running org.apache.kafka.tools.ProducerPerformance code to performance test the kafka cluster. As a trial cluster has only one broker and zookeeper with 12GB of heap space. Running 6 producers on 3 machines with same transaction id (2 producers on each node). Below are the settings of each producer, kafka-run-class org.apache.kafka.tools.ProducerPerformance --print-metrics --topic perf1 --num-records 9223372036854 --throughput 25 --record-size 200 --producer-props bootstrap.servers=localhost:9092 buffer.memory=524288000 batch.size=524288 for 2 hours all producers run fine, then suddenly throughput of all producers increase 3 times and 4 producers on 2 nodes crashes with below exceptions, [2018-07-16 14:00:18,744] ERROR Error executing user-provided callback on message for topic-partition perf1-6: (org.apache.kafka.clients.producer.internals.RecordBatch) java.lang.ClassCastException: org.apache.kafka.clients.producer.internals.RecordAccumulator$RecordAppendResult cannot be cast to org.apache.kafka.clients.producer.internals.RecordBatch$Thunk at org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:99) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:312) at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:272) at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:57) at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:358) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) at java.lang.Thread.run(Thread.java:748) First machine (2 producers) run fine. Need some pointers on this issue. Queires: why the throughput is increasing 3 times after 2 hours of duration ? why the other producers are crashing ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6961) UnknownTopicOrPartitionException & NotLeaderForPartitionException upon replication of topics.
kaushik srinivas created KAFKA-6961: --- Summary: UnknownTopicOrPartitionException & NotLeaderForPartitionException upon replication of topics. Key: KAFKA-6961 URL: https://issues.apache.org/jira/browse/KAFKA-6961 Project: Kafka Issue Type: Bug Environment: kubernetes cluster kafka. Reporter: kaushik srinivas Attachments: k8s_NotLeaderForPartition.txt, k8s_replication_errors.txt Running kafka & zookeeper in kubernetes cluster. No of brokers : 3 No of partitions per topic : 3 creating topic with 3 partitions, and looks like all the partitions are up. Below is the snapshot to confirm the same, Topic:applestore PartitionCount:3 ReplicationFactor:3 Configs: Topic: applestore Partition: 0Leader: 1001Replicas: 1001,1003,1002Isr: 1001,1003,1002 Topic: applestore Partition: 1Leader: 1002Replicas: 1002,1001,1003Isr: 1002,1001,1003 Topic: applestore Partition: 2Leader: 1003Replicas: 1003,1002,1001Isr: 1003,1002,1001 But, we see in the brokers as soon as the topics are created below stack traces appears, error 1: [2018-05-28 08:00:31,875] ERROR [ReplicaFetcher replicaId=1001, leaderId=1003, fetcherId=7] Error for partition applestore-2 to broker 1003:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. (kafka.server.ReplicaFetcherThread) error 2 : [2018-05-28 00:43:20,993] ERROR [ReplicaFetcher replicaId=1003, leaderId=1001, fetcherId=0] Error for partition apple-6 to broker 1001:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread) When we tries producing records to each specific partition, it works fine and also log size across the replicated brokers appears to be equal, which means replication is happening fine. Attaching the two stack trace files. Why are these stack traces appearing ? can we ignore these stack traces if its some spam messages ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6600) Kafka Bytes Out lags behind Kafka Bytes In on all brokers when topics replicated with 3 and flume kafka consumer.
kaushik srinivas created KAFKA-6600: --- Summary: Kafka Bytes Out lags behind Kafka Bytes In on all brokers when topics replicated with 3 and flume kafka consumer. Key: KAFKA-6600 URL: https://issues.apache.org/jira/browse/KAFKA-6600 Project: Kafka Issue Type: Bug Affects Versions: 0.10.0.1 Reporter: kaushik srinivas Below is the setup detail, Kafka with 3 brokers (each broker with 10 cores and 32GBmem (12 GB heap)). Created topic with 120 partitions and replication factor 3. Throughput per broker is ~40k msgs/sec and bytes in ~8mb/sec. Flume kafka source is used as the consumer. Observations: When the replication factor is kept 1, the bytes out and bytes in stops exactly at same timestamp(i.e when the producer to kafka is stopped). But when the replication factor is increased to 3, there is a time lag observed in bytes out compared to bytes in. Flume kafka source is pulling data slowly. But flume is configured with very high memory and cpu configurations. Tried increasing num.replica.fetchers from default value 1 to 10, 20, 50 etc and replica.fetch.max.bytes from default 1MB to 10MB,20MB. But no improvement is found to be observed in terms of the lag. under repplicated partitions is observed to be zero using replica manager metrics in jmx. Kafka brokers were monitored for cpu and memory, cpu is being used at 3% of total cores max and memory used at 4gb (32 Gb configured). Flume kafka source has overriden kafka consumer properties : max.partition.fetch bytes is kept at default 1MB and fetch.max.bytes is kept at default 52MB. Flume kafka source batch size is kept at default value 1000. agent.sources..kafka.consumer.fetch.max.bytes = 10485760 agent.sources..kafka.consumer.max.partition.fetch.bytes = 10485760 agent.sources..batchSize = 1000 what more tuning is needed in order to reduce the lag between bytes in and bytes out at kafka brokers with replication factor 3 or is there any configuration missed out? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6356) UnknownTopicOrPartitionException & NotLeaderForPartitionException and log deletion happening with retention bytes kept at -1.
kaushik srinivas created KAFKA-6356: --- Summary: UnknownTopicOrPartitionException & NotLeaderForPartitionException and log deletion happening with retention bytes kept at -1. Key: KAFKA-6356 URL: https://issues.apache.org/jira/browse/KAFKA-6356 Project: Kafka Issue Type: Bug Affects Versions: 0.10.1.0 Environment: Cent OS 7.2, HDD : 2Tb, CPUs: 56 cores, RAM : 256GB Reporter: kaushik srinivas Attachments: configs.txt, stderr_b0, stderr_b1, stderr_b2, stdout_b0, stdout_b1, stdout_b2, topic_description, topic_offsets Facing issues in kafka topic with partitions and replication factor of 3. Config used : No of partitions : 20 replication factor : 3 No of brokers : 3 Memory for broker : 32GB Heap for broker : 12GB Producer is run to produce data for 20 partitions of a single topic. But observed that partitions for which the leader is one of the broker(broker-1), the offsets are never incremented and also we see log file with 0MB size in the broker disk. Seeing below error in the brokers : error 1: 2017-12-13 07:11:11,191] ERROR [ReplicaFetcherThread-0-2], Error for partition [test2,5] to broker 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. (kafka.server.ReplicaFetcherThread) error 2: [2017-12-11 12:19:41,599] ERROR [ReplicaFetcherThread-0-2], Error for partition [test1,13] to broker 2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread) Attaching, 1. error and std out files of all the brokers. 2. kafka config used. 3. offsets and topic description. Retention bytes was kept to -1 and retention period 96 hours. But still observing some of the log files deleting at the broker, from logs : [2017-12-11 12:20:20,586] INFO Deleting index /var/lib/mesos/slave/slaves/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-S7/frameworks/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-0006/executors/ckafka__5f085d0c-e296-40f0-a686-8953dd14e4c6/runs/506a1ce7-23d1-45ea-bb7c-84e015405285/kafka-broker-data/broker-1/test1-12/.timeindex (kafka.log.TimeIndex) [2017-12-11 12:20:20,587] INFO Deleted log for partition [test1,12] in /var/lib/mesos/slave/slaves/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-S7/frameworks/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-0006/executors/ckafka__5f085d0c-e296-40f0-a686-8953dd14e4c6/runs/506a1ce7-23d1-45ea-bb7c-84e015405285/kafka-broker-data/broker-1/test1-12. (kafka.log.LogManager) We are expecting the logs to be never delete if retention bytes set to -1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.
kaushik srinivas created KAFKA-6165: --- Summary: Kafka Brokers goes down with outOfMemoryError. Key: KAFKA-6165 URL: https://issues.apache.org/jira/browse/KAFKA-6165 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.0 Environment: DCOS cluster with 4 agent nodes and 3 masters. agent machine config : RAM : 384 GB DISK : 4TB Reporter: kaushik srinivas Priority: Major Attachments: kafka_config.txt, stderr_broker1.txt, stderr_broker2.txt, stdout_broker1.txt, stdout_broker2.txt Performance testing kafka with end to end pipe lines of, Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1 Kafka Data Producer -> kafka -> flume -> hdfs -- stream2 stream1 kafka configs : No of topics : 10 No of partitions : 20 for all the topics stream2 kafka configs : No of topics : 10 No of partitions : 20 for all the topics Some important Kafka Configuration : "BROKER_MEM": "32768"(32GB) "BROKER_JAVA_HEAP": "16384"(16GB) "BROKER_COUNT": "3" "KAFKA_MESSAGE_MAX_BYTES": "112"(1MB) "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB) "KAFKA_NUM_PARTITIONS": "20" "BROKER_DISK_SIZE": "5000" (5GB) "KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB) "KAFKA_LOG_RETENTION_BYTES": "50"(5GB) Data Producer to kafka Throughput: message rate : 5 lakhs messages/sec approx across all the 3 brokers and topics/partitions. message size : approx 300 to 400 bytes. Issues observed with this configs: Issue 1: stack trace: [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager) kafka.common.KafkaStorageException: I/O exception in append to log 'store_sales-16' at kafka.log.Log.append(Log.scala:349) at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443) at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393) at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330) at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425) at kafka.server.KafkaApis.handle(KafkaApis.scala:78) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116) at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106) at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160) at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159) at kafka.log.Log.roll(Log.scala:771) at kafka.log.Log.maybeRoll(Log.scala:742) at kafka.log.Log.append(Log.scala:405) ... 22 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:93