[jira] [Created] (KAFKA-16931) A transient REST failure to forward fenceZombie request leaves Connect Task in FAILED state
Edoardo Comar created KAFKA-16931: - Summary: A transient REST failure to forward fenceZombie request leaves Connect Task in FAILED state Key: KAFKA-16931 URL: https://issues.apache.org/jira/browse/KAFKA-16931 Project: Kafka Issue Type: Bug Components: connect Reporter: Edoardo Comar When Kafka Connect runs in exactly_once mode, a task restart will fence possible zombies tasks. This is achieved forwarding the request to the leader worker using the REST protocol. At scale, in distributed mode, occasionally an HTTPs request may fail because of a networking glitch, reconfiguration etc Currently there is no attempt to retry the REST request, the task is left in a FAILED state and requires an external restart (with the REST API). Would this issue require a small KIP to introduce configuration entries to limit the number of retries, backoff times etc ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced
[ https://issues.apache.org/jira/browse/KAFKA-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-14657. --- Resolution: Duplicate duplicate of https://issues.apache.org/jira/browse/KAFKA-16570 > Admin.fenceProducers fails when Producer has ongoing transaction - but > Producer gets fenced > --- > > Key: KAFKA-14657 > URL: https://issues.apache.org/jira/browse/KAFKA-14657 > Project: Kafka > Issue Type: Bug > Components: admin >Reporter: Edoardo Comar >Assignee: Edoardo Comar >Priority: Major > Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java > > > Admin.fenceProducers() > fails with a ConcurrentTransactionsException if invoked when a Producer has a > transaction ongoing. > However, further attempts by that producer to produce fail with > InvalidProducerEpochException and the producer is not re-usable, > cannot abort/commit as it is fenced. > An InvalidProducerEpochException is also logged as error on the broker > [2023-01-27 17:16:32,220] ERROR [ReplicaManager broker=1] Error processing > append operation on partition topic-0 (kafka.server.ReplicaManager) > org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of > producer 1062 at offset 84 in topic-0 is 0, which is smaller than the last > seen epoch > > Conversely, if Admin.fenceProducers() > is invoked while there is no open transaction, the call succeeds and further > attempts by that producer to produce fail with ProducerFenced. > see attached snippets > As the caller of Admin.fenceProducers() is likely unaware of the producers > state, the call should succeed regardless -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16488) fix flaky MirrorConnectorsIntegrationExactlyOnceTest#testReplication
[ https://issues.apache.org/jira/browse/KAFKA-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-16488. --- Resolution: Fixed > fix flaky MirrorConnectorsIntegrationExactlyOnceTest#testReplication > > > Key: KAFKA-16488 > URL: https://issues.apache.org/jira/browse/KAFKA-16488 > Project: Kafka > Issue Type: Test >Reporter: Chia-Ping Tsai >Priority: Major > > org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not > reset connector offsets. Error response: {"error_code":500,"message":"Failed > to perform zombie fencing for source connector prior to modifying offsets"} > at > app//org.apache.kafka.connect.util.clusters.EmbeddedConnect.resetConnectorOffsets(EmbeddedConnect.java:646) > at > app//org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.resetConnectorOffsets(EmbeddedConnectCluster.java:48) > at > app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.resetAllMirrorMakerConnectorOffsets(MirrorConnectorsIntegrationBaseTest.java:1063) > at > app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationExactlyOnceTest.testReplication(MirrorConnectorsIntegrationExactlyOnceTest.java:90) > at > java.base@21.0.2/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base@21.0.2/java.lang.reflect.Method.invoke(Method.java:580) > at > app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) > at > app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) > at > app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) > at > app//org.junit.platform.engine.support.hi
[jira] [Reopened] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation
[ https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar reopened KAFKA-15905: --- reopening for backporting to 3.7.1 to be confermed > Restarts of MirrorCheckpointTask should not permanently interrupt offset > translation > > > Key: KAFKA-15905 > URL: https://issues.apache.org/jira/browse/KAFKA-15905 > Project: Kafka > Issue Type: Improvement > Components: mirrormaker >Affects Versions: 3.6.0 >Reporter: Greg Harris >Assignee: Edoardo Comar >Priority: Major > Fix For: 3.7.1, 3.8 > > > Executive summary: When the MirrorCheckpointTask restarts, it loses the state > of checkpointsPerConsumerGroup, which limits offset translation to records > mirrored after the latest restart. > For example, if 1000 records are mirrored and the OffsetSyncs are read by > MirrorCheckpointTask, the emitted checkpoints are cached, and translation can > happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more > records are mirrored, translation can happen at the ~1500th record, but no > longer at the ~500th record. > Context: > Before KAFKA-13659, MM2 made translation decisions based on the > incompletely-initialized OffsetSyncStore, and the checkpoint could appear to > go backwards temporarily during restarts. To fix this, we forced the > OffsetSyncStore to initialize completely before translation could take place, > ensuring that the latest OffsetSync had been read, and thus providing the > most accurate translation. > Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. > Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to > allow for translation of earlier offsets. This came with the caveat that the > cache's sparseness allowed translations to go backwards permanently. To > prevent this behavior, a cache of the latest Checkpoints was kept in the > MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset > translation remained restricted to the fully-initialized OffsetSyncStore. > Effectively, the MirrorCheckpointTask ensures that it translates based on an > OffsetSync emitted during it's lifetime, to ensure that no previous > MirrorCheckpointTask emitted a later sync. If we can read the checkpoints > emitted by previous generations of MirrorCheckpointTask, we can still ensure > that checkpoints are monotonic, while allowing translation further back in > history. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once
Edoardo Comar created KAFKA-16622: - Summary: Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once Key: KAFKA-16622 URL: https://issues.apache.org/jira/browse/KAFKA-16622 Project: Kafka Issue Type: Bug Components: mirrormaker Affects Versions: 3.6.2, 3.7.0, 3.8.0 Reporter: Edoardo Comar Attachments: edo-connect-mirror-maker-sourcetarget.properties We observed an excessively delayed emission of the MM2 Checkpoint record. It only gets created when the source consumer reaches the end of a topic. This does not seem reasonable. In a very simple setup : Tested with a standalone single process MirrorMaker2 mirroring between two single-node kafka clusters(mirromaker config attached) with quick refresh intervals (eg 5 sec) and a small offset.lag.max (eg 10) create a single topic in the source cluster produce data to it (e.g. 1 records) start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec between polls which commits after each poll watch the Checkpoint topic in the target cluster bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \ --topic source.checkpoints.internal \ --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \ --from-beginning -> no record appears in the checkpoint topic until the consumer reaches the end of the topic (ie its consumer group lag gets down to 0). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use
Edoardo Comar created KAFKA-16369: - Summary: Broker may not shut down when SocketServer fails to bind as Address already in use Key: KAFKA-16369 URL: https://issues.apache.org/jira/browse/KAFKA-16369 Project: Kafka Issue Type: Bug Reporter: Edoardo Comar When in Zookeeper mode, if a port the broker should listen to is already bound the KafkaException: Socket server failed to bind to localhost:9092: Address already in use. is thrown but the Broker continues to startup . It correctly shuts down when in KRaft mode. Easy to reproduce when in Zookeper mode with server.config set to listen to localhost only {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1
[ https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-15144. --- Resolution: Not A Bug Closing as not a bug. The "problem" arose as without config changes, by updating MM2 from 3.3.2 to a later release the observable content of the Checkpoint topic has changed considerably. In 3.3.2 even without new records in the OffsetSync topic, the Checkpoint records were advancing often (and even contain many duplicates). Now gaps of up to offset.lag.max must be expected and more reprocessing of records downstream may occur > MM2 Checkpoint downstreamOffset stuck to 1 > -- > > Key: KAFKA-15144 > URL: https://issues.apache.org/jira/browse/KAFKA-15144 > Project: Kafka > Issue Type: Bug > Components: mirrormaker >Reporter: Edoardo Comar >Assignee: Edoardo Comar >Priority: Major > Attachments: edo-connect-mirror-maker-sourcetarget.properties > > > Steps to reproduce : > 1.Start the source cluster > 2.Start the target cluster > 3.Start connect-mirror-maker.sh using a config like the attached > 4.Create a topic in source cluster > 5.produce a few messages > 6.consume them all with autocommit enabled > > 7. then dump the Checkpoint topic content e.g. > {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic > source.checkpoints.internal --from-beginning --formatter > org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}} > {{{}Checkpoint{consumerGroupId=edogroup, > topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, > {*}downstreamOffset=1{*}, metadata={ > > the downstreamOffset remains at 1, while, in a fresh cluster pair like with > the source topic created while MM2 is running, > I'd expect the downstreamOffset to match the upstreamOffset. > Note that dumping the offset sync topic, shows matching initial offsets > {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic > mm2-offset-syncs.source.internal --from-beginning --formatter > org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}} > {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, > downstreamOffset=0{ > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15144) Checkpoint downstreamOffset stuck to 1
Edoardo Comar created KAFKA-15144: - Summary: Checkpoint downstreamOffset stuck to 1 Key: KAFKA-15144 URL: https://issues.apache.org/jira/browse/KAFKA-15144 Project: Kafka Issue Type: Bug Components: mirrormaker Reporter: Edoardo Comar Attachments: edo-connect-mirror-maker-sourcetarget.properties Steps to reproduce : Start source cluster Start target cluster start connect-mirror-maker.sh using a config like the attached create topic in source cluster produce a few messages consume them all with autocmiit enabled then dumping the Checkpoint topic content e.g. % bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic source.checkpoints.internal --from-beginning --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter Checkpoint\{consumerGroupId=edogroup, topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, downstreamOffset=1, metadata=} the downstreamOffset remains at 1, while, in a fresh cluster pair like with the source topic created while MM2 is running, I'd expect the downstreamOffset to match the upstreamOffset. dumping the offset sync topic % bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic mm2-offset-syncs.source.internal --from-beginning --formatter org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter shows matching initial offsets OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, downstreamOffset=0} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15133) RequestMetrics MessageConversionsTimeMs count is ticked even when no conversion occurs
Edoardo Comar created KAFKA-15133: - Summary: RequestMetrics MessageConversionsTimeMs count is ticked even when no conversion occurs Key: KAFKA-15133 URL: https://issues.apache.org/jira/browse/KAFKA-15133 Project: Kafka Issue Type: Bug Components: core Affects Versions: 3.4.1, 3.5.0 Reporter: Edoardo Comar Assignee: Edoardo Comar The Histogram {}{color:#00}RequestChannel{color}.{}}}messageConversionsTimeHist}} is ticked even when a Produce/Fetch request incurred no conversion, because a new entry is added to the historgram distribution, with a 0ms value. It's confusing comparing the Histogram kafka.network RequestMetrics MessageConversionsTimeMs with the Meter kafka.server BrokerTopicMetrics ProduceMessageConversionsPerSec because for the latter, the metric is ticked only if a conversion actually occurred -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH
Edoardo Comar created KAFKA-14996: - Summary: CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH Key: KAFKA-14996 URL: https://issues.apache.org/jira/browse/KAFKA-14996 Project: Kafka Issue Type: Bug Components: controller Reporter: Edoardo Comar If an attempt is made to create a topic with num partitions >= QuorumController.MAX_RECORDS_PER_BATCH (1) the client receives an UnknownServerException - it could rather receive a better error. The controller logs {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed with unknown server exception IllegalStateException at epoch 2 in 21956 us. Renouncing leadership and reverting to the last committed offset 174. (org.apache.kafka.controller.QuorumController)}} {{java.lang.IllegalStateException: Attempted to atomically commit 10001 records, but maxRecordsPerBatch is 1}} {{ at org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}} {{ at org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}} {{ at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}} {{ at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}} {{ at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}} {{ at java.base/java.lang.Thread.run(Thread.java:829)}} {{[}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced
Edoardo Comar created KAFKA-14657: - Summary: Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced Key: KAFKA-14657 URL: https://issues.apache.org/jira/browse/KAFKA-14657 Project: Kafka Issue Type: Bug Components: admin Reporter: Edoardo Comar Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java {{Admin.fenceProducers() }} fails with a ConcurrentTransactionsException if invoked when a Producer has a transaction ongoing. However, further attempts by that producer to produce fail with InvalidProducerEpochException and the producer is not re-usable, cannot abort/commit as it is fenced. Conversely, if {{Admin.fenceProducers() }} is invoked while there is no open transaction, the call succeeds and further attempts by that producer to produce fail with ProducerFenced. see attached snippets As the caller of {{Admin.fenceProducers() }} the call should succeed regardless of the state of the producer -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-7666) KIP-391: Allow Producing with Offsets for Cluster Replication
[ https://issues.apache.org/jira/browse/KAFKA-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-7666. -- Resolution: Won't Fix KIP has been retired > KIP-391: Allow Producing with Offsets for Cluster Replication > - > > Key: KAFKA-7666 > URL: https://issues.apache.org/jira/browse/KAFKA-7666 > Project: Kafka > Issue Type: New Feature >Reporter: Edoardo Comar >Assignee: Edoardo Comar >Priority: Major > > Implementing KIP-391 > https://cwiki.apache.org/confluence/display/KAFKA/KIP-391%3A+Allow+Producing+with+Offsets+for+Cluster+Replication -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14571) ZkMetadataCache.getClusterMetadata is missing rack information for aliveNodes
Edoardo Comar created KAFKA-14571: - Summary: ZkMetadataCache.getClusterMetadata is missing rack information for aliveNodes Key: KAFKA-14571 URL: https://issues.apache.org/jira/browse/KAFKA-14571 Project: Kafka Issue Type: Bug Components: core Affects Versions: 3.3 Reporter: Edoardo Comar Assignee: Edoardo Comar ZkMetadataCache.getClusterMetadata returns a Cluster object where the aliveNodes are missing their rack info. when ZkMetadataCache updates the metadataSnapshot, includes the rack in `aliveBrokers` but not in `aliveNodes` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-10220) NPE when describing resources
Edoardo Comar created KAFKA-10220: - Summary: NPE when describing resources Key: KAFKA-10220 URL: https://issues.apache.org/jira/browse/KAFKA-10220 Project: Kafka Issue Type: Bug Components: core Reporter: Edoardo Comar In current trunk code Describing a topic can fail with an NPE in the broker on the line {{ resource.configurationKeys.asScala.forall(_.contains(configName))}} (configurationKeys is null?) {{[2020-06-30 11:10:39,464] ERROR [Admin Manager on Broker 0]: Error processing describe configs request for resource DescribeConfigsResource(resourceType=2, resourceName='topic1', configurationKeys=null) (kafka.server.AdminManager)}}{{java.lang.NullPointerException}}{{at kafka.server.AdminManager.$anonfun$describeConfigs$3(AdminManager.scala:395)}}{{at kafka.server.AdminManager.$anonfun$describeConfigs$3$adapted(AdminManager.scala:393)}}{{at scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:248)}}{{at scala.collection.Iterator.foreach(Iterator.scala:929)}}{{at scala.collection.Iterator.foreach$(Iterator.scala:929)}}{{at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)}}{{at scala.collection.IterableLike.foreach(IterableLike.scala:71)}}{{at scala.collection.IterableLike.foreach$(IterableLike.scala:70)}}{{at scala.collection.AbstractIterable.foreach(Iterable.scala:54)}}{{at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:247)}}{{at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:245)}}{{at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:104)}}{{at scala.collection.TraversableLike.filter(TraversableLike.scala:259)}}{{at scala.collection.TraversableLike.filter$(TraversableLike.scala:259)}}{{at scala.collection.AbstractTraversable.filter(Traversable.scala:104)}}{{at kafka.server.AdminManager.createResponseConfig$1(AdminManager.scala:393)}}{{at kafka.server.AdminManager.$anonfun$describeConfigs$1(AdminManager.scala:412)}}{{at scala.collection.immutable.List.map(List.scala:283)}}{{at kafka.server.AdminManager.describeConfigs(AdminManager.scala:386)}}{{at kafka.server.KafkaApis.handleDescribeConfigsRequest(KafkaApis.scala:2595)}}{{at kafka.server.KafkaApis.handle(KafkaApis.scala:165)}}{{at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:70)}}{{at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-8564) NullPointerException when loading logs at startup
[ https://issues.apache.org/jira/browse/KAFKA-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-8564. -- Resolution: Fixed Fix Version/s: 2.2.2 2.1.2 2.3.0 > NullPointerException when loading logs at startup > - > > Key: KAFKA-8564 > URL: https://issues.apache.org/jira/browse/KAFKA-8564 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.3.0, 2.2.1 >Reporter: Mickael Maison >Assignee: Edoardo Comar >Priority: Blocker > Fix For: 2.3.0, 2.1.2, 2.2.2 > > > If brokers restart when topics are being deleted, it's possible to end up > with a partition folder with the deleted suffix but without any log segments: > {quote}ls -la > ./kafka-logs/3part3rep5-1.f2ce83b86df9416abe50d2e2299009c2-delete/ > total 8 > drwxr-xr-x@ 4 mickael staff 128 6 Jun 14:35 . > drwxr-xr-x@ 61 mickael staff 1952 6 Jun 14:35 .. > -rw-r--r--@ 1 mickael staff 10 6 Jun 14:32 23261863.snapshot > -rw-r--r--@ 1 mickael staff 0 6 Jun 14:35 leader-epoch-checkpoint > {quote} > From 2.2.1, brokers fail to start when loading such folders: > {quote}[2019-06-19 09:40:48,123] ERROR There was an error in one of the > threads during logs loading: java.lang.NullPointerException > (kafka.log.LogManager) > [2019-06-19 09:40:48,126] ERROR [KafkaServer id=1] Fatal error during > KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) > java.lang.NullPointerException > at kafka.log.Log.activeSegment(Log.scala:1896) > at kafka.log.Log.(Log.scala:295) > at kafka.log.Log$.apply(Log.scala:2186) > at kafka.log.LogManager.loadLog(LogManager.scala:275) > at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:345) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {quote} > With 2.2.0, upon loading such folders, brokers create a new empty log segment > and load that successfully. > The change of behaviour was introduced in > [https://github.com/apache/kafka/commit/f000dab5442ce49c4852823c257b4fb0cdfe15aa] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7861) AlterConfig may change the source of another config entry if it matches the broker default
Edoardo Comar created KAFKA-7861: Summary: AlterConfig may change the source of another config entry if it matches the broker default Key: KAFKA-7861 URL: https://issues.apache.org/jira/browse/KAFKA-7861 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.1.0 Reporter: Edoardo Comar Attachments: AlterConfigJira.java * Create a topic with an explicit config entry that matches the broker's default (e.g. '"cleanup.policy"="delete"); * Describe config - the entry will be of type DYNAMIC_TOPIC_CONFIG. * Alter some other config for the topic (e.g `segment.bytes`) * Describe config again - the previously DYNAMIC_TOPIC_CONFIG entry ("cleanup.policy") will be now described as DEFAULT_CONFIG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7761) CLONE - Add broker configuration to set minimum value for segment.bytes and segment.ms
[ https://issues.apache.org/jira/browse/KAFKA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-7761. -- Resolution: Duplicate if there is a valid reason for a duplicate, feel free to repoen > CLONE - Add broker configuration to set minimum value for segment.bytes and > segment.ms > -- > > Key: KAFKA-7761 > URL: https://issues.apache.org/jira/browse/KAFKA-7761 > Project: Kafka > Issue Type: Improvement >Reporter: Chinmay Patil >Priority: Major > Labels: kip, newbie > > If someone set segment.bytes or segment.ms at topic level to a very small > value (e.g. segment.bytes=1000 or segment.ms=1000), Kafka will generate a > very high number of segment files. This can bring down the whole broker due > to hitting the maximum open file (for log) or maximum number of mmap-ed file > (for index). > To prevent that from happening, I would like to suggest adding two new items > to the broker configuration: > * min.topic.segment.bytes, defaults to 1048576: The minimum value for > segment.bytes. When someone sets topic configuration segment.bytes to a value > lower than this, Kafka throws an error INVALID VALUE. > * min.topic.segment.ms, defaults to 360: The minimum value for > segment.ms. When someone sets topic configuration segment.ms to a value lower > than this, Kafka throws an error INVALID VALUE. > Thanks > Badai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7720) kafka-configs script should also describe default broker entries
Edoardo Comar created KAFKA-7720: Summary: kafka-configs script should also describe default broker entries Key: KAFKA-7720 URL: https://issues.apache.org/jira/browse/KAFKA-7720 Project: Kafka Issue Type: Improvement Components: tools Reporter: Edoardo Comar Running the configs tool to describe the broker configs only appears to print dynamically added entries. Running {{bin/kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-default --entity-type brokers}} on a broker without a prior added configs {{--alter --add-config 'key=value'}} will otherwise print an empty list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7666) KIP-391: Allow Producing with Offsets for Cluster Replication
Edoardo Comar created KAFKA-7666: Summary: KIP-391: Allow Producing with Offsets for Cluster Replication Key: KAFKA-7666 URL: https://issues.apache.org/jira/browse/KAFKA-7666 Project: Kafka Issue Type: New Feature Reporter: Edoardo Comar Implementing KIP-391 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6994) KafkaConsumer.poll throwing AuthorizationException timeout-dependent
Edoardo Comar created KAFKA-6994: Summary: KafkaConsumer.poll throwing AuthorizationException timeout-dependent Key: KAFKA-6994 URL: https://issues.apache.org/jira/browse/KAFKA-6994 Project: Kafka Issue Type: Bug Reporter: Edoardo Comar With auto-topic creation enabled, when attempting to consume from a non-existent topic, the {{AuthorizationException}} may or may not be thrown from {{poll(timeout)}} depending on the {{timeout}} value. The issue can be recreated modifying a test in {{AuthorizerIntegrationTest}} as below (see comment) to *not* add the needed acl and therefore expecting the test to fail. While the first {{poll}} call will always throw with a short timeout, the second {{poll}} will not throw with the short timeout. {code:java} @Test def testCreatePermissionOnClusterToReadFromNonExistentTopic() { testCreatePermissionNeededToReadFromNonExistentTopic("newTopic", Set(new Acl(userPrincipal, Allow, Acl.WildCardHost, Create)), Cluster) } private def testCreatePermissionNeededToReadFromNonExistentTopic(newTopic: String, acls: Set[Acl], resType: ResourceType) { val topicPartition = new TopicPartition(newTopic, 0) val newTopicResource = new Resource(Topic, newTopic) addAndVerifyAcls(Set(new Acl(userPrincipal, Allow, Acl.WildCardHost, Read)), newTopicResource) addAndVerifyAcls(groupReadAcl(groupResource), groupResource) this.consumers.head.assign(List(topicPartition).asJava) try { this.consumers.head.poll(Duration.ofMillis(50L)); Assert.fail("should have thrown Authorization Exception") } catch { case e: TopicAuthorizationException => assertEquals(Collections.singleton(newTopic), e.unauthorizedTopics()) } //val resource = if (resType == Topic) newTopicResource else Resource.ClusterResource // addAndVerifyAcls(acls, resource) // need to use a larger timeout in this subsequent poll else it may not cause topic auto-creation // this can be verified by commenting the above addAndVerifyAcls line and expecting this test to fail this.consumers.head.poll(Duration.ofMillis(50L)); } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6863) Kafka clients should try to use multiple DNS resolved IP addresses if the first one fails
Edoardo Comar created KAFKA-6863: Summary: Kafka clients should try to use multiple DNS resolved IP addresses if the first one fails Key: KAFKA-6863 URL: https://issues.apache.org/jira/browse/KAFKA-6863 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 1.1.0, 1.0.0 Reporter: Edoardo Comar Assignee: Edoardo Comar Fix For: 2.0.0 Currently Kafka clients resolve a symbolic hostname using {{new InetSocketAddress(String hostname, int port)}} which only picks one IP address even if the DNS has multiple records for the hostname, as it calls {{InetAddress.getAllByName(host)[0]}} For some environments where the hostnames are mapped by the DNS to multiple IPs, e.g. in clouds where the IPs point to the external load balancers, it would be preferable that the client, on failing to connect to one of the IPs, would try the other ones before giving up the connection. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-5497) KIP-170: Enhanced CreateTopicPolicy and DeleteTopicPolicy
[ https://issues.apache.org/jira/browse/KAFKA-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-5497. -- Resolution: Won't Fix superseded by https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces > KIP-170: Enhanced CreateTopicPolicy and DeleteTopicPolicy > - > > Key: KAFKA-5497 > URL: https://issues.apache.org/jira/browse/KAFKA-5497 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar >Assignee: Edoardo Comar >Priority: Major > > JIRA for the implementation of KIP-170 > https://cwiki.apache.org/confluence/display/KAFKA/KIP-170%3A+Enhanced+TopicCreatePolicy+and+introduction+of+TopicDeletePolicy -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6726) KIP-277 - Fine Grained ACL for CreateTopics A
Edoardo Comar created KAFKA-6726: Summary: KIP-277 - Fine Grained ACL for CreateTopics A Key: KAFKA-6726 URL: https://issues.apache.org/jira/browse/KAFKA-6726 Project: Kafka Issue Type: Improvement Components: core, tools Reporter: Edoardo Comar Assignee: Edoardo Comar issue to track implementation of https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6516) KafkaProducer retries indefinitely to authenticate on SaslAuthenticationException
[ https://issues.apache.org/jira/browse/KAFKA-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-6516. -- Resolution: Won't Fix thanks, [~rsivaram] so my expectation was invalid then > KafkaProducer retries indefinitely to authenticate on > SaslAuthenticationException > - > > Key: KAFKA-6516 > URL: https://issues.apache.org/jira/browse/KAFKA-6516 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 1.0.0 >Reporter: Edoardo Comar >Priority: Major > > Even after https://issues.apache.org/jira/browse/KAFKA-5854 > the producer's (background) polling thread keeps retrying to authenticate. > if the future returned by KafkaProducer.send is not resolved. > The current test > {{org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testProducerWithInvalidCredentials()}} > passes because it relies on the future being resolved. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6516) KafkaProducer retries indefinitely to authenticate on SaslAuthenticationException
Edoardo Comar created KAFKA-6516: Summary: KafkaProducer retries indefinitely to authenticate on SaslAuthenticationException Key: KAFKA-6516 URL: https://issues.apache.org/jira/browse/KAFKA-6516 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 1.0.0 Reporter: Edoardo Comar Even after https://issues.apache.org/jira/browse/KAFKA-5854 the producer's (background) polling thread keeps retrying to authenticate. if the future returned by KafkaProducer.send is not resolved. The current test {{org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testProducerWithInvalidCredentials()}} passes because it relies on the future being resolved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-4206) Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners)
[ https://issues.apache.org/jira/browse/KAFKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-4206. -- Resolution: Won't Do > Improve handling of invalid credentials to mitigate DOS issue (especially on > SSL listeners) > --- > > Key: KAFKA-4206 > URL: https://issues.apache.org/jira/browse/KAFKA-4206 > Project: Kafka > Issue Type: Improvement > Components: network, security >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The current handling of invalid credentials (ie wrong user/password) is to > let the {{SaslException}} thrown from an implementation of > {{javax.security.sasl.SaslServer.evaluateResponse()}} > bubble up the call stack until it gets caught in > {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} > where the {{KafkaChannel}} gets closed - which will cause the client that > made the request to be disconnected. > This will happen however after the server has used considerable resources, > especially for the SSL handshake which appears to be computationally > expensive in Java. > We have observed that if just a few clients keep repeating requests with the > wrong credentials, it is quite easy to get all the network processing threads > in the Kafka server busy doing SSL handshakes. > This makes a Kafka cluster to easily suffer from a Denial Of Service - also > non intentional - attack. > It can be non intentional, i.e. also caused by friendly clients, for example > because a Kafka Java client Producer supplied with the wrong credentials will > not throw an exception on publishing, so it may keep attempting to connect > without the caller realising. > An easy fix which we have implemented and will supply a PR for is to *delay* > considerably closing the {{KafkaChannel}} in the {{Selector}}, but obviously > without blocking the processing thread. > This has been tested to be very effective in reducing the cpu usage spikes > caused by non malicious clients using invalid SASL PLAIN credentials over SSL. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5497) KIP-170: Enhanced CreateTopicPolicy and DeleteTopicPolicy
Edoardo Comar created KAFKA-5497: Summary: KIP-170: Enhanced CreateTopicPolicy and DeleteTopicPolicy Key: KAFKA-5497 URL: https://issues.apache.org/jira/browse/KAFKA-5497 Project: Kafka Issue Type: Improvement Reporter: Edoardo Comar Assignee: Edoardo Comar JIRA for the implementation of KIP-170 https://cwiki.apache.org/confluence/display/KAFKA/KIP-170%3A+Enhanced+TopicCreatePolicy+and+introduction+of+TopicDeletePolicy -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion
[ https://issues.apache.org/jira/browse/KAFKA-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar reassigned KAFKA-5418: Assignee: Edoardo Comar > ZkUtils.getAllPartitions() may fail if a topic is marked for deletion > - > > Key: KAFKA-5418 > URL: https://issues.apache.org/jira/browse/KAFKA-5418 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.9.0.1, 0.10.2.1 >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck > in the 'marked for deletion' state > so it was a child of {{/brokers/topics}} > but it had no children, i.e. the path {{/brokers/topics/thistopic/partitions}} > did not exist, throws a ZkNoNodeException while iterating: > {noformat} > rg.I0Itec.zkclient.exception.ZkNoNodeException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /brokers/topics/xyzblahfoo/partitions > at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995) > at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675) > at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671) > at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537) > at > kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817) > at > kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816) > ... > at java.lang.Thread.run(Thread.java:809) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) > at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114) > at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678) > at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675) > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion
[ https://issues.apache.org/jira/browse/KAFKA-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-5418: - Description: Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in the 'marked for deletion' state so it was a child of {{/brokers/topics}} but it had no children, i.e. the path {{/brokers/topics/thistopic/partitions}} did not exist, throws a ZkNoNodeException while iterating: {noformat} rg.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671) at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816) ... at java.lang.Thread.run(Thread.java:809) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114) at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678) at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985) ... {noformat} was: Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in the 'marked for deletion' state so it was a child of {{/brokers/topics}} but it had no children, i.e. the path {{ /brokers/topics/thistopic/partitions }} did not exist, throws a ZkNoNodeException while iterating: {noformat} rg.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671) at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816) ... at java.lang.Thread.run(Thread.java:809) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) at org.I0Itec.zkclient.ZkConnection
[jira] [Created] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion
Edoardo Comar created KAFKA-5418: Summary: ZkUtils.getAllPartitions() may fail if a topic is marked for deletion Key: KAFKA-5418 URL: https://issues.apache.org/jira/browse/KAFKA-5418 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.10.2.1, 0.9.0.1 Reporter: Edoardo Comar Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in the 'marked for deletion' state so it was a child of {{/brokers/topics}} but it had no children, i.e. the path {{/brokers/topics/thistopic/partitions }} did not exist throws a ZkNoNodeException while iterating: {noformat} rg.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671) at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816) ... at java.lang.Thread.run(Thread.java:809) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114) at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678) at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion
[ https://issues.apache.org/jira/browse/KAFKA-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-5418: - Description: Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in the 'marked for deletion' state so it was a child of {{/brokers/topics}} but it had no children, i.e. the path {{ /brokers/topics/thistopic/partitions }} did not exist, throws a ZkNoNodeException while iterating: {noformat} rg.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671) at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816) ... at java.lang.Thread.run(Thread.java:809) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114) at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678) at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985) ... {noformat} was: Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in the 'marked for deletion' state so it was a child of {{/brokers/topics}} but it had no children, i.e. the path {{/brokers/topics/thistopic/partitions }} did not exist throws a ZkNoNodeException while iterating: {noformat} rg.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671) at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817) at kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816) ... at java.lang.Thread.run(Thread.java:809) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) at org.I0Itec.zkclient.ZkConnectio
[jira] [Commented] (KAFKA-5290) docs need clarification on meaning of 'committed' to the log
[ https://issues.apache.org/jira/browse/KAFKA-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029248#comment-16029248 ] Edoardo Comar commented on KAFKA-5290: -- https://github.com/apache/kafka/pull/3035 > docs need clarification on meaning of 'committed' to the log > > > Key: KAFKA-5290 > URL: https://issues.apache.org/jira/browse/KAFKA-5290 > Project: Kafka > Issue Type: Bug > Components: documentation >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The docs around > http://kafka.apache.org/documentation/#semantics > http://kafka.apache.org/documentation/#replication > say > ??A message is considered "committed" when all in sync replicas for that > partition have applied it to their log. Only committed messages are ever > given out to the consumer?? > I've always found that in need of clarification - as the producer acks > setting is crucial in determining what committed means. > Based on conversations with [~rsivaram], [~apurva], [~vahid] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KAFKA-5290) docs need clarification on meaning of 'committed' to the log
[ https://issues.apache.org/jira/browse/KAFKA-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-5290: - Description: The docs around http://kafka.apache.org/documentation/#semantics http://kafka.apache.org/documentation/#replication say ??A message is considered "committed" when all in sync replicas for that partition have applied it to their log. Only committed messages are ever given out to the consumer?? I've always found that in need of clarification - as the producer acks setting is crucial in determining what committed means. Based on conversations with [~rsivaram], [~apurva], [~vahid] was: The docs around http://kafka.apache.org/documentation/#semantics http://kafka.apache.org/documentation/#replication say ??A message is considered "committed" when all in sync replicas for that partition have applied it to their log. Only committed messages are ever given out to the consumer?? I've always found that in need of clarification - as the producer acks setting is crucial in determining what committed means. Based on conversations with [~rsivaram][~apurva][~vahid] > docs need clarification on meaning of 'committed' to the log > > > Key: KAFKA-5290 > URL: https://issues.apache.org/jira/browse/KAFKA-5290 > Project: Kafka > Issue Type: Bug > Components: documentation >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The docs around > http://kafka.apache.org/documentation/#semantics > http://kafka.apache.org/documentation/#replication > say > ??A message is considered "committed" when all in sync replicas for that > partition have applied it to their log. Only committed messages are ever > given out to the consumer?? > I've always found that in need of clarification - as the producer acks > setting is crucial in determining what committed means. > Based on conversations with [~rsivaram], [~apurva], [~vahid] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KAFKA-5290) docs need clarification on meaning of 'committed' to the log
[ https://issues.apache.org/jira/browse/KAFKA-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-5290: - Description: The docs around http://kafka.apache.org/documentation/#semantics http://kafka.apache.org/documentation/#replication say ??A message is considered "committed" when all in sync replicas for that partition have applied it to their log. Only committed messages are ever given out to the consumer?? I've always found that in need of clarification - as the producer acks setting is crucial in determining what committed means. Based on conversations with [~rsivaram][~apurva][~vahid] was: The docs around http://kafka.apache.org/documentation/#semantics http://kafka.apache.org/documentation/#replication say ??A message is considered "committed" when all in sync replicas for that partition have applied it to their log. Only committed messages are ever given out to the consumer. ?? I've always found that in need of clarification - as the producer acks setting is crucial in determining what committed means. Based on conversations with [~rsivaram][~apurva][~vahid] > docs need clarification on meaning of 'committed' to the log > > > Key: KAFKA-5290 > URL: https://issues.apache.org/jira/browse/KAFKA-5290 > Project: Kafka > Issue Type: Bug > Components: documentation >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The docs around > http://kafka.apache.org/documentation/#semantics > http://kafka.apache.org/documentation/#replication > say > ??A message is considered "committed" when all in sync replicas for that > partition have applied it to their log. Only committed messages are ever > given out to the consumer?? > I've always found that in need of clarification - as the producer acks > setting is crucial in determining what committed means. > Based on conversations with [~rsivaram][~apurva][~vahid] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5290) docs need clarification on meaning of 'committed' to the log
Edoardo Comar created KAFKA-5290: Summary: docs need clarification on meaning of 'committed' to the log Key: KAFKA-5290 URL: https://issues.apache.org/jira/browse/KAFKA-5290 Project: Kafka Issue Type: Bug Components: documentation Reporter: Edoardo Comar Assignee: Edoardo Comar The docs around http://kafka.apache.org/documentation/#semantics http://kafka.apache.org/documentation/#replication say ??A message is considered "committed" when all in sync replicas for that partition have applied it to their log. Only committed messages are ever given out to the consumer. ?? I've always found that in need of clarification - as the producer acks setting is crucial in determining what committed means. Based on conversations with [~rsivaram][~apurva][~vahid] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KAFKA-5200) If a replicated topic is deleted with one broker down, it can't be recreated
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-5200: - Summary: If a replicated topic is deleted with one broker down, it can't be recreated (was: Deleting topic when one broker is down will prevent topic to be re-creatable) > If a replicated topic is deleted with one broker down, it can't be recreated > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010608#comment-16010608 ] Edoardo Comar commented on KAFKA-5200: -- I am actually hoping for someone familiar with the non-open source confluent-rebalancer tool to comment and then try to implement an open source solution to this issue. It may possibly require a KIP (e.g adding options to kafka.admin.ReassignPartitionsCommand ) > Deleting topic when one broker is down will prevent topic to be re-creatable > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010608#comment-16010608 ] Edoardo Comar edited comment on KAFKA-5200 at 5/15/17 2:29 PM: --- I am actually hoping for someone familiar with the non-open source confluent-rebalancer tool to comment. Then we'd try and implement an open source solution to this issue. It may possibly require a KIP (e.g adding options to kafka.admin.ReassignPartitionsCommand ) was (Author: ecomar): I am actually hoping for someone familiar with the non-open source confluent-rebalancer tool to comment and then try to implement an open source solution to this issue. It may possibly require a KIP (e.g adding options to kafka.admin.ReassignPartitionsCommand ) > Deleting topic when one broker is down will prevent topic to be re-creatable > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010601#comment-16010601 ] Edoardo Comar commented on KAFKA-5200: -- Thanks [~huxi_2b] unfortunately such steps would imply significant downtime which is not acceptable to us. We actually tested a much less intrusive way to handle this occurrence, i.e. delete the zookeeper info about the topic while the cluster is still running (minus the dead broker of course) and then force *only the controller broker* to restart. Even if this is less intrusive, it still means that for a short-ish time two brokers are down. With replication-factor 3 and min.insync.2 this implies an outage for some clients which remains unacceptable. > Deleting topic when one broker is down will prevent topic to be re-creatable > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics (KIP-136)
[ https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008499#comment-16008499 ] Edoardo Comar commented on KAFKA-4982: -- Thanks [~ijuma] following the discussion in the PR thread, I've updated the implementation to only use the listener tag and I've updated the KIP to match the implementation and describe the compatibility choice of not tagging the yammer metric. > Add listener tag to socket-server-metrics.connection-... metrics (KIP-136) > -- > > Key: KAFKA-4982 > URL: https://issues.apache.org/jira/browse/KAFKA-4982 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar >Assignee: Edoardo Comar > Fix For: 0.11.0.0 > > > Metrics in socket-server-metrics like connection-count connection-close-rate > etc are tagged with networkProcessor: > where the id of a network processor is just a numeric integer. > If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id > just keeps incrementing and when looking at the metrics it is hard to match > the metric tag to a listener. > You need to know the number of network threads and the order in which the > listeners are declared in the brokers' server.properties. > We should add a tag showing the listener label, that would also make it much > easier to group the metrics in a tool like grafana -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007893#comment-16007893 ] Edoardo Comar commented on KAFKA-5200: -- [~huxi_2b] if we could restart the broker that's down there would be no actual problem to be solved. I'd like to find a way - tooling option is ok - to allow the deletion to progress. And handle the eventual restart of the missing broker. > Deleting topic when one broker is down will prevent topic to be re-creatable > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics
[ https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004675#comment-16004675 ] Edoardo Comar commented on KAFKA-4982: -- The implementation took the slight deviation from the KIP screenshots of adding the {{listener}} tag only if it differs from {{protocol}}. We should have taken this decision while discussing the KIP. Opinions? > Add listener tag to socket-server-metrics.connection-... metrics > -- > > Key: KAFKA-4982 > URL: https://issues.apache.org/jira/browse/KAFKA-4982 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar >Assignee: Edoardo Comar > Fix For: 0.11.0.0 > > > Metrics in socket-server-metrics like connection-count connection-close-rate > etc are tagged with networkProcessor: > where the id of a network processor is just a numeric integer. > If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id > just keeps incrementing and when looking at the metrics it is hard to match > the metric tag to a listener. > You need to know the number of network threads and the order in which the > listeners are declared in the brokers' server.properties. > We should add a tag showing the listener label, that would also make it much > easier to group the metrics in a tool like grafana -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5201) Compacted topic could be misused to fill up a disk but deletion policy can't retain legitimate keys
[ https://issues.apache.org/jira/browse/KAFKA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002885#comment-16002885 ] Edoardo Comar commented on KAFKA-5201: -- Thanks [~mihbor] Sure there is no concept of legitimate vs abusive client ! Quotas affect throughput rather than storage, so I can't see them to be directly applicable. Thinking again, though a practical approach could be to have the delete cleanup policy based on size only, not on time then if the topic ACL limits it to trusted producers, only an application error would cause the deletion topics not protected by ACL could not be trusted to have the latest value for a given key but that would be acceptable. I'm closing this JIRA for now. > Compacted topic could be misused to fill up a disk but deletion policy can't > retain legitimate keys > > > Key: KAFKA-5201 > URL: https://issues.apache.org/jira/browse/KAFKA-5201 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar > > Misuse of a topic with cleanup policy = compact > could lead to a disk being filled if a misbehaving producer keeps producing > messages with unique keys. > The mixed cleanup policy compact,delete could be adopted, but would not > guarantee that the latest "legitimate" keys will be kept. > It would be desirable to have a cleanup policy that attempts to preserve > messages with 'legitimate' keys > This issue needs a KIP but I have no proposed solution yet at the time of > writing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KAFKA-5201) Compacted topic could be misused to fill up a disk but deletion policy can't retain legitimate keys
[ https://issues.apache.org/jira/browse/KAFKA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-5201. -- Resolution: Not A Problem > Compacted topic could be misused to fill up a disk but deletion policy can't > retain legitimate keys > > > Key: KAFKA-5201 > URL: https://issues.apache.org/jira/browse/KAFKA-5201 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar > > Misuse of a topic with cleanup policy = compact > could lead to a disk being filled if a misbehaving producer keeps producing > messages with unique keys. > The mixed cleanup policy compact,delete could be adopted, but would not > guarantee that the latest "legitimate" keys will be kept. > It would be desirable to have a cleanup policy that attempts to preserve > messages with 'legitimate' keys > This issue needs a KIP but I have no proposed solution yet at the time of > writing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KAFKA-5201) Compacted topic could be misused to fill up a disk but deletion policy can't retain legitimate keys
[ https://issues.apache.org/jira/browse/KAFKA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-5201: - Summary: Compacted topic could be misused to fill up a disk but deletion policy can't retain legitimate keys (was: Compacted topic could be misused up to fill a disk; ) > Compacted topic could be misused to fill up a disk but deletion policy can't > retain legitimate keys > > > Key: KAFKA-5201 > URL: https://issues.apache.org/jira/browse/KAFKA-5201 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar > > Misuse of a topic with cleanup policy = compact > could lead to a disk being filled if a misbehaving producer keeps producing > messages with unique keys. > The mixed cleanup policy compact,delete could be adopted, but would not > guarantee that the latest "legitimate" keys will be kept. > It would be desirable to have a cleanup policy that attempts to preserve > messages with 'legitimate' keys > This issue needs a KIP but I have no proposed solution yet at the time of > writing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5201) Compacted topic could be misused up to fill a disk;
Edoardo Comar created KAFKA-5201: Summary: Compacted topic could be misused up to fill a disk; Key: KAFKA-5201 URL: https://issues.apache.org/jira/browse/KAFKA-5201 Project: Kafka Issue Type: Improvement Reporter: Edoardo Comar Misuse of a topic with cleanup policy = compact could lead to a disk being filled if a misbehaving producer keeps producing messages with unique keys. The mixed cleanup policy compact,delete could be adopted, but would not guarantee that the latest "legitimate" keys will be kept. It would be desirable to have a cleanup policy that attempts to preserve messages with 'legitimate' keys This issue needs a KIP but I have no proposed solution yet at the time of writing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
Edoardo Comar created KAFKA-5200: Summary: Deleting topic when one broker is down will prevent topic to be re-creatable Key: KAFKA-5200 URL: https://issues.apache.org/jira/browse/KAFKA-5200 Project: Kafka Issue Type: Improvement Components: core Reporter: Edoardo Comar In a cluster with 5 broker, replication factor=3, min in sync=2, one broker went down A user's app remained of course unaware of that and deleted a topic that (unknowingly) had a replica on the dead broker. The topic went in 'pending delete' mode The user then tried to recreate the topic - which failed, so his app was left stuck - no working topic and no ability to create one. The reassignment tool fails to move the replica out of the dead broker - specifically because the broker with the partition replica to move is dead :-) Incidentally the confluent-rebalancer docs say http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > Supports moving partitions away from dead brokers It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4795) Confusion around topic deletion
[ https://issues.apache.org/jira/browse/KAFKA-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977538#comment-15977538 ] Edoardo Comar edited comment on KAFKA-4795 at 4/20/17 9:15 PM: --- [~vahid] I may still see topics ??marked for deletion?? when using Kafka 0.10.2.x however deletion is asynchronous and usually, with topic deletion enabled on the broker, running the list command after a {{--delete --topic}} will not show anything (i.e topic is gone) was (Author: ecomar): [~vahid] I may still see topics ??marked for deletion?? when using Kafka 0.10.2.x however deletion is asynchronous and usually, with topic deletion enabled on the broker, running the {{ --list }} command after a {{--delete --topic}} will not show anything (i.e topic is gone) > Confusion around topic deletion > --- > > Key: KAFKA-4795 > URL: https://issues.apache.org/jira/browse/KAFKA-4795 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0 >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian > > The topic deletion works like this in 0.10.2.0: > # {{bin/zookeeper-server-start.sh config/zookeeper.properties}} > # {{bin/kafka-server-start.sh config/server.properties}} (uses default > {{server.properties}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test > --replication-factor 1 --partitions 1}} (creates the topic {{test}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} > (reports {{Topic test is marked for deletion. Note: This will have no impact > if delete.topic.enable is not set to true.}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}}) > Previously, the last command above returned {{test - marked for deletion}}, > which matched the output statement of the {{--delete}} topic command. > Continuing with the above scenario, > # stop the broker > # add the broker config {{delete.topic.enable=true}} in the config file > # {{bin/kafka-server-start.sh config/server.properties}} (this does not > remove the topic {{test}}, as if the topic was never marked for deletion). > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} > (reports {{Topic test is marked for deletion. Note: This will have no impact > if delete.topic.enable is not set to true.}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns no > topics). > It seems that the "marked for deletion" state for topics no longer exists. > I opened this JIRA so I can get a confirmation on the expected topic deletion > behavior, because in any case, I think the user experience could be improved > (either there is a bug in the code, or the command's output statement is > misleading). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4795) Confusion around topic deletion
[ https://issues.apache.org/jira/browse/KAFKA-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977538#comment-15977538 ] Edoardo Comar edited comment on KAFKA-4795 at 4/20/17 9:13 PM: --- [~vahid] I may still see topics ??marked for deletion?? when using Kafka 0.10.2.x however deletion is asynchronous and usually, with topic deletion enabled on the broker, running the {{ --list }} command after a {{--delete --topic}} will not show anything (i.e topic is gone) was (Author: ecomar): [~vahid] I may still see topics ??marked for deletion?? when using Kafka 0.10.2.x however deletion is asynchronous and usually - with topic deletion enabled on the broker - running the {{--list}} command after a {{--delete --topic}} will not show anything (i.e topic is gone) > Confusion around topic deletion > --- > > Key: KAFKA-4795 > URL: https://issues.apache.org/jira/browse/KAFKA-4795 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0 >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian > > The topic deletion works like this in 0.10.2.0: > # {{bin/zookeeper-server-start.sh config/zookeeper.properties}} > # {{bin/kafka-server-start.sh config/server.properties}} (uses default > {{server.properties}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test > --replication-factor 1 --partitions 1}} (creates the topic {{test}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} > (reports {{Topic test is marked for deletion. Note: This will have no impact > if delete.topic.enable is not set to true.}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}}) > Previously, the last command above returned {{test - marked for deletion}}, > which matched the output statement of the {{--delete}} topic command. > Continuing with the above scenario, > # stop the broker > # add the broker config {{delete.topic.enable=true}} in the config file > # {{bin/kafka-server-start.sh config/server.properties}} (this does not > remove the topic {{test}}, as if the topic was never marked for deletion). > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} > (reports {{Topic test is marked for deletion. Note: This will have no impact > if delete.topic.enable is not set to true.}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns no > topics). > It seems that the "marked for deletion" state for topics no longer exists. > I opened this JIRA so I can get a confirmation on the expected topic deletion > behavior, because in any case, I think the user experience could be improved > (either there is a bug in the code, or the command's output statement is > misleading). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4795) Confusion around topic deletion
[ https://issues.apache.org/jira/browse/KAFKA-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977538#comment-15977538 ] Edoardo Comar commented on KAFKA-4795: -- [~vahid] I may still see topics ??marked for deletion?? when using Kafka 0.10.2.x however deletion is asynchronous and usually - with topic deletion enabled on the broker - running the {{--list}} command after a {{--delete --topic}} will not show anything (i.e topic is gone) > Confusion around topic deletion > --- > > Key: KAFKA-4795 > URL: https://issues.apache.org/jira/browse/KAFKA-4795 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0 >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian > > The topic deletion works like this in 0.10.2.0: > # {{bin/zookeeper-server-start.sh config/zookeeper.properties}} > # {{bin/kafka-server-start.sh config/server.properties}} (uses default > {{server.properties}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test > --replication-factor 1 --partitions 1}} (creates the topic {{test}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} > (reports {{Topic test is marked for deletion. Note: This will have no impact > if delete.topic.enable is not set to true.}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}}) > Previously, the last command above returned {{test - marked for deletion}}, > which matched the output statement of the {{--delete}} topic command. > Continuing with the above scenario, > # stop the broker > # add the broker config {{delete.topic.enable=true}} in the config file > # {{bin/kafka-server-start.sh config/server.properties}} (this does not > remove the topic {{test}}, as if the topic was never marked for deletion). > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} > (reports {{Topic test is marked for deletion. Note: This will have no impact > if delete.topic.enable is not set to true.}}) > # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns no > topics). > It seems that the "marked for deletion" state for topics no longer exists. > I opened this JIRA so I can get a confirmation on the expected topic deletion > behavior, because in any case, I think the user experience could be improved > (either there is a bug in the code, or the command's output statement is > misleading). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5081) two versions of jackson-annotations-xxx.jar in distribution tgz
Edoardo Comar created KAFKA-5081: Summary: two versions of jackson-annotations-xxx.jar in distribution tgz Key: KAFKA-5081 URL: https://issues.apache.org/jira/browse/KAFKA-5081 Project: Kafka Issue Type: Bug Components: build Reporter: Edoardo Comar Priority: Minor git clone https://github.com/apache/kafka.git cd kafka gradle ./gradlew releaseTarGz then kafka/core/build/distributions/kafka-...-SNAPSHOT.tgz contains in the libs directory two versions of this jar jackson-annotations-2.8.0.jar jackson-annotations-2.8.5.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.
[ https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967735#comment-15967735 ] Edoardo Comar commented on KAFKA-2729: -- FWIW - we saw the same message {{ Cached zkVersion [66] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) }} when redeploying kafka 0.10.0.1 in a cluster after we had run 0.10.2.0 after having wiped kafka's storage, but having kept zookeeper's version (the one bundled with kafka 0.10.2) and its storage For us eventually the cluster recovered. HTH. > Cached zkVersion not equal to that in zookeeper, broker not recovering. > --- > > Key: KAFKA-2729 > URL: https://issues.apache.org/jira/browse/KAFKA-2729 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.2.1 >Reporter: Danil Serdyuchenko > > After a small network wobble where zookeeper nodes couldn't reach each other, > we started seeing a large number of undereplicated partitions. The zookeeper > cluster recovered, however we continued to see a large number of > undereplicated partitions. Two brokers in the kafka cluster were showing this > in the logs: > {code} > [2015-10-27 11:36:00,888] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for > partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 > (kafka.cluster.Partition) > [2015-10-27 11:36:00,891] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] > not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) > {code} > For all of the topics on the effected brokers. Both brokers only recovered > after a restart. Our own investigation yielded nothing, I was hoping you > could shed some light on this issue. Possibly if it's related to: > https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using > 0.8.2.1. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics
[ https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962765#comment-15962765 ] Edoardo Comar commented on KAFKA-4982: -- KIP is https://cwiki.apache.org/confluence/display/KAFKA/KIP-136%3A+Add+Listener+name+and+Security+Protocol+name+to+SelectorMetrics+tags > Add listener tag to socket-server-metrics.connection-... metrics > -- > > Key: KAFKA-4982 > URL: https://issues.apache.org/jira/browse/KAFKA-4982 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > Metrics in socket-server-metrics like connection-count connection-close-rate > etc are tagged with networkProcessor: > where the id of a network processor is just a numeric integer. > If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id > just keeps incrementing and when looking at the metrics it is hard to match > the metric tag to a listener. > You need to know the number of network threads and the order in which the > listeners are declared in the brokers' server.properties. > We should add a tag showing the listener label, that would also make it much > easier to group the metrics in a tool like grafana -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4981) Add connection-accept-rate and connection-prepare-rate metrics
[ https://issues.apache.org/jira/browse/KAFKA-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949229#comment-15949229 ] Edoardo Comar commented on KAFKA-4981: -- Hi [~ijuma] for the 3rd time today :-) ... does this need a kip too :-) :-) > Add connection-accept-rate and connection-prepare-rate metrics > --- > > Key: KAFKA-4981 > URL: https://issues.apache.org/jira/browse/KAFKA-4981 > Project: Kafka > Issue Type: Improvement > Components: network >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The current set of socket-server-metrics include > connection-close-rate and connection-creation-rate. > On the server-side we'd find useful to have rates for > connections accepted and connections 'prepared' (when the channel is ready) > to see how many clients do not go through handshake or authentication -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4981) Add connection-accept-rate and connection-prepare-rate metrics
[ https://issues.apache.org/jira/browse/KAFKA-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949223#comment-15949223 ] Edoardo Comar commented on KAFKA-4981: -- Thanks I know the block is skipped for PLAINTEXT. Actually we do not care that this rate is always 0 for PLAINTEXT - calling it authenticated would work for us. We wanted from an ops perspective to check how many requests go wasted either because of failed TLS handshakes or failed authentications. In fact this metric would be more useful if we add tags for the listener the network thread belongs to see https://issues.apache.org/jira/browse/KAFKA-4982 > Add connection-accept-rate and connection-prepare-rate metrics > --- > > Key: KAFKA-4981 > URL: https://issues.apache.org/jira/browse/KAFKA-4981 > Project: Kafka > Issue Type: Improvement > Components: network >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The current set of socket-server-metrics include > connection-close-rate and connection-creation-rate. > On the server-side we'd find useful to have rates for > connections accepted and connections 'prepared' (when the channel is ready) > to see how many clients do not go through handshake or authentication -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4981) Add connection-accept-rate and connection-prepare-rate metrics
[ https://issues.apache.org/jira/browse/KAFKA-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949221#comment-15949221 ] Edoardo Comar commented on KAFKA-4981: -- as per comments on the pull request, the 'prepared' metric may be better called 'authenticated' and will have a 0 flat rate for PLAINTEXT listeners. [~rsivaram] wrote: Prepared doesn't convey much meaning in terms of an externally visible metric. I imagine you chose it rather than authenticated since you intended it to work for PLAINTEXT. But PLAINTEXT doesn't go through this if-block since channel.ready() returns true. > Add connection-accept-rate and connection-prepare-rate metrics > --- > > Key: KAFKA-4981 > URL: https://issues.apache.org/jira/browse/KAFKA-4981 > Project: Kafka > Issue Type: Improvement > Components: network >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The current set of socket-server-metrics include > connection-close-rate and connection-creation-rate. > On the server-side we'd find useful to have rates for > connections accepted and connections 'prepared' (when the channel is ready) > to see how many clients do not go through handshake or authentication -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics
[ https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949036#comment-15949036 ] Edoardo Comar commented on KAFKA-4982: -- [~ijuma] does this need a kip ? > Add listener tag to socket-server-metrics.connection-... metrics > -- > > Key: KAFKA-4982 > URL: https://issues.apache.org/jira/browse/KAFKA-4982 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > Metrics in socket-server-metrics like connection-count connection-close-rate > etc are tagged with networkProcessor: > where the id of a network processor is just a numeric integer. > If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id > just keeps incrementing and when looking at the metrics it is hard to match > the metric tag to a listener. > You need to know the number of network threads and the order in which the > listeners are declared in the brokers' server.properties. > We should add a tag showing the listener label, that would also make it much > easier to group the metrics in a tool like grafana -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics
[ https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar reassigned KAFKA-4982: Assignee: Edoardo Comar > Add listener tag to socket-server-metrics.connection-... metrics > -- > > Key: KAFKA-4982 > URL: https://issues.apache.org/jira/browse/KAFKA-4982 > Project: Kafka > Issue Type: Improvement >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > Metrics in socket-server-metrics like connection-count connection-close-rate > etc are tagged with networkProcessor: > where the id of a network processor is just a numeric integer. > If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id > just keeps incrementing and when looking at the metrics it is hard to match > the metric tag to a listener. > You need to know the number of network threads and the order in which the > listeners are declared in the brokers' server.properties. > We should add a tag showing the listener label, that would also make it much > easier to group the metrics in a tool like grafana -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics
Edoardo Comar created KAFKA-4982: Summary: Add listener tag to socket-server-metrics.connection-... metrics Key: KAFKA-4982 URL: https://issues.apache.org/jira/browse/KAFKA-4982 Project: Kafka Issue Type: Improvement Reporter: Edoardo Comar Metrics in socket-server-metrics like connection-count connection-close-rate etc are tagged with networkProcessor: where the id of a network processor is just a numeric integer. If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id just keeps incrementing and when looking at the metrics it is hard to match the metric tag to a listener. You need to know the number of network threads and the order in which the listeners are declared in the brokers' server.properties. We should add a tag showing the listener label, that would also make it much easier to group the metrics in a tool like grafana -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-4981) Add connection-accept-rate and connection-prepare-rate metrics
Edoardo Comar created KAFKA-4981: Summary: Add connection-accept-rate and connection-prepare-rate metrics Key: KAFKA-4981 URL: https://issues.apache.org/jira/browse/KAFKA-4981 Project: Kafka Issue Type: Improvement Components: network Reporter: Edoardo Comar Assignee: Edoardo Comar The current set of socket-server-metrics include connection-close-rate and connection-creation-rate. On the server-side we'd find useful to have rates for connections accepted and connections 'prepared' (when the channel is ready) to see how many clients do not go through handshake or authentication -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901349#comment-15901349 ] Edoardo Comar commented on KAFKA-4669: -- thanks [~ijuma] yes we're planning to move to 0.10.2 very soon > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.11.0.0 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901075#comment-15901075 ] Edoardo Comar commented on KAFKA-4669: -- The NPE moved from broker1 to another broker when the former was restarted. This suggest to me it's caused by a 'special' client > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.11.0.0 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899847#comment-15899847 ] Edoardo Comar commented on KAFKA-4669: -- [~ijuma] It's the same NPE as https://issues.apache.org/jira/browse/KAFKA-3689?focusedCommentId=15383936&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15383936 could it be caused by an exotic client ? > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.11.0.0 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899827#comment-15899827 ] Edoardo Comar commented on KAFKA-4669: -- We have found a strong correlation between the clients getting {code} Uncaught error in kafka producer I/O thread: java.lang.IllegalStateException: Correlation id for response (703766) does not match request (703764) {code} and an NPE in one of our 10.0.1 brokers {code} [2017-03-06 17:46:29,827] ERROR Processor got uncaught exception. (kafka.network.Processor) java.lang.NullPointerException at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:486) at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:483) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.network.Processor.processCompletedReceives(SocketServer.scala:483) at kafka.network.Processor.run(SocketServer.scala:413) at java.lang.Thread.run(Thread.java:809) {code} that suggest that somehow {code} private def processCompletedReceives() { selector.completedReceives.asScala.foreach { receive => try { val channel = selector.channel(receive.source) val session = RequestChannel.Session(new KafkaPrincipal(KafkaPrincipal.USER_TYPE, channel.principal.getName), //NPE if channel is null ... {code} [~ijuma] the only clients that are getting the occasional IllegalStateException are the ones producing to a partition that has as leader a broker where that NPE is appearing in our logs. > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.11.0.0 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897746#comment-15897746 ] Edoardo Comar commented on KAFKA-4669: -- We've seen this same exception using 10.0.1 clients against 10.0.1 brokers - without doing a {{KafkaProducer.flush()}} We've seen it happening during {{KafkaConsumer.poll()}} {{KafkaConsumer.commitSync()}} and in the network I/O of a KafkaProducer that is just doing sends.eg. {code} java.lang.IllegalStateException: Correlation id for response (4564) does not match request (4562) at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:381) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) at java.lang.Thread.run(Unknown Source) {code} {code} java.lang.IllegalStateException: Correlation id for response (742) does not match request (174) at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:381) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:426) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1059) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1027) {code} Usually the response's correlation id is off by just 1 or 2 but we've also seen it off by a few hundreds. When this happens, all subsequent responses are also shifted: {code} java.lang.IllegalStateException: Correlation id for response (743) does not match request (742) java.lang.IllegalStateException: Correlation id for response (744) does not match request (743) java.lang.IllegalStateException: Correlation id for response (745) does not match request (744) {code} > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.11.0.0 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flu
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897747#comment-15897747 ] Edoardo Comar commented on KAFKA-4669: -- It's easy to discard and recreate the consumer instance to recover however we can't do that with the producer as it occurs in the Sender thread. > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.11.0.0 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-1368) Upgrade log4j
[ https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892271#comment-15892271 ] Edoardo Comar commented on KAFKA-1368: -- Yes [~ijuma] we will pick this one up. I work together with [~mimaison] who has self assigned this one. We will work on this just not yet next week. What are the compatibility implications? You mean that users need to switch the jars on which the kafka-client.jar depends on ? > Upgrade log4j > - > > Key: KAFKA-1368 > URL: https://issues.apache.org/jira/browse/KAFKA-1368 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Vladislav Pernin >Assignee: Mickael Maison > > Upgrade log4j to at least 1.2.16 ou 1.2.17. > Usage of EnhancedPatternLayout will be possible. > It allows to set delimiters around the full log, stacktrace included, making > log messages collection easier with tools like Logstash. > Example : <[%d{}]...[%t] %m%throwable>%n > <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch > request for partition [X,6] offset 700 from consumer with correlation id > 0 (kafka.server.KafkaApis) > kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only > have log segments in the range 16021 to 16021. > at kafka.log.Log.read(Log.scala:429) > ... > at java.lang.Thread.run(Thread.java:744)> -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure
[ https://issues.apache.org/jira/browse/KAFKA-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826797#comment-15826797 ] Edoardo Comar commented on KAFKA-4617: -- [~dhwanikatagade] thanks for the PR ! > gradle-generated core eclipse project has incorrect source folder structure > --- > > Key: KAFKA-4617 > URL: https://issues.apache.org/jira/browse/KAFKA-4617 > Project: Kafka > Issue Type: Bug > Components: build >Reporter: Edoardo Comar >Assignee: Dhwani Katagade >Priority: Minor > Labels: build > > The gradle-generated Eclipse Scala project for Kafka core has a > classpath defined as : > {code:xml} > > > > {code} > because of how the source files are for tests are structured, code navigation > / running unit tests fails. The correct structure should be instead : > {code:xml} > >path="src/test/scala"/> > > > > > {code} > Moreover, the classpath included as libraries core/build/test and > core/build/resources > which should not be there as the eclipse classes are not generated under build -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure
[ https://issues.apache.org/jira/browse/KAFKA-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824788#comment-15824788 ] Edoardo Comar commented on KAFKA-4617: -- [~dhwanikatagade] thanks for the patch, The source folders in core are now correct. I see that you have made the eclipse compiled classes location match the ones used by gradle on the CLI. However two generated Eclipse projects (core and streams) still have a mistake. Their generated classpaths still include as library entries the output directories of other projects they depend on. They should not. please see my comment in the PR > gradle-generated core eclipse project has incorrect source folder structure > --- > > Key: KAFKA-4617 > URL: https://issues.apache.org/jira/browse/KAFKA-4617 > Project: Kafka > Issue Type: Bug > Components: build >Reporter: Edoardo Comar >Assignee: Dhwani Katagade >Priority: Minor > Labels: build > > The gradle-generated Eclipse Scala project for Kafka core has a > classpath defined as : > {code:xml} > > > > {code} > because of how the source files are for tests are structured, code navigation > / running unit tests fails. The correct structure should be instead : > {code:xml} > >path="src/test/scala"/> > > > > > {code} > Moreover, the classpath included as libraries core/build/test and > core/build/resources > which should not be there as the eclipse classes are not generated under build -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure
[ https://issues.apache.org/jira/browse/KAFKA-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-4617: - Description: The gradle-generated Eclipse Scala project for Kafka core has a classpath defined as : {code:xml} {code} because of how the source files are for tests are structured, code navigation / running unit tests fails. The correct structure should be instead : {code:xml} {code} Moreover, the classpath included as libraries core/build/test and core/build/resources which should not be there as the eclipse classes are not generated under build was: The gradle-generated Eclipse Scala project for Kafka core has a classpath defined as : {code:xml} {code} because of how the source files are for tests are structured, code navigation / running unit tests fails. The correct structure should be instead : {code:xml} {code} > gradle-generated core eclipse project has incorrect source folder structure > --- > > Key: KAFKA-4617 > URL: https://issues.apache.org/jira/browse/KAFKA-4617 > Project: Kafka > Issue Type: Bug > Components: build >Reporter: Edoardo Comar >Priority: Minor > > The gradle-generated Eclipse Scala project for Kafka core has a > classpath defined as : > {code:xml} > > > > {code} > because of how the source files are for tests are structured, code navigation > / running unit tests fails. The correct structure should be instead : > {code:xml} > >path="src/test/scala"/> > > > > > {code} > Moreover, the classpath included as libraries core/build/test and > core/build/resources > which should not be there as the eclipse classes are not generated under build -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure
[ https://issues.apache.org/jira/browse/KAFKA-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818047#comment-15818047 ] Edoardo Comar commented on KAFKA-4617: -- Hi [~dhwanikatagade], thanks I saw that thread and that's what prompted me to open this defect. I am actually not bothered by the different output folder, that has not been annoying to me. I know that the eclipse compile output is different from the output generated running gradle on the CLI and am happy with that. > gradle-generated core eclipse project has incorrect source folder structure > --- > > Key: KAFKA-4617 > URL: https://issues.apache.org/jira/browse/KAFKA-4617 > Project: Kafka > Issue Type: Bug > Components: build >Reporter: Edoardo Comar >Priority: Minor > > The gradle-generated Eclipse Scala project for Kafka core has a > classpath defined as : > {code:xml} > > > > {code} > because of how the source files are for tests are structured, code navigation > / running unit tests fails. The correct structure should be instead : > {code:xml} > >path="src/test/scala"/> > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure
Edoardo Comar created KAFKA-4617: Summary: gradle-generated core eclipse project has incorrect source folder structure Key: KAFKA-4617 URL: https://issues.apache.org/jira/browse/KAFKA-4617 Project: Kafka Issue Type: Bug Components: build Reporter: Edoardo Comar Priority: Minor The gradle-generated Eclipse Scala project for Kafka core has a classpath defined as : {code:xml} {code} because of how the source files are for tests are structured, code navigation / running unit tests fails. The correct structure should be instead : {code:xml} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4611) Support custom authentication mechanism
[ https://issues.apache.org/jira/browse/KAFKA-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814508#comment-15814508 ] Edoardo Comar commented on KAFKA-4611: -- Hi, I am not sure what you are asking is not covered already : https://issues.apache.org/jira/browse/KAFKA-4292 proposes custom callback handlers (therefore you control how to check username and pwd). https://issues.apache.org/jira/browse/KAFKA-4259 added the functionality of defining jaas configurations in the client config file. https://issues.apache.org/jira/browse/KAFKA-4180 will add the functionality of allowing different jaas config in a single client process. Please note that Kerberos login uses JAAS too. > Support custom authentication mechanism > --- > > Key: KAFKA-4611 > URL: https://issues.apache.org/jira/browse/KAFKA-4611 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 0.10.0.0 >Reporter: mahendiran chandrasekar > > Currently there are two login mechanisms supported by kafka client. > 1) Default Login / Abstract Login which uses JAAS authentication > 2) Kerberos Login > Supporting user defined login mechanism's would be nice. > This could be achieved by removing the limitation from > [here](https://github.com/apache/kafka/blob/0.10.0/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L44) > ... Instead get custom login module implemented by user from the configs, > gives users the option to implement custom login mechanism. > I am running into an issue in setting JAAS authentication system property on > all executors of my spark cluster. Having custom mechanism to authorize kafka > would be a good improvement for me -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion
[ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812411#comment-15812411 ] Edoardo Comar commented on KAFKA-4441: -- Please ignore the previous comment - the {{UnderReplicatedPartitions}} did not need to check for deletion, it was triggered during topic creation. New PR coming soon > Kafka Monitoring is incorrect during rapid topic creation and deletion > -- > > Key: KAFKA-4441 > URL: https://issues.apache.org/jira/browse/KAFKA-4441 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Tom Crayford >Assignee: Edoardo Comar > > Kafka reports several metrics off the state of partitions: > UnderReplicatedPartitions > PreferredReplicaImbalanceCount > OfflinePartitionsCount > All of these metrics trigger when rapidly creating and deleting topics in a > tight loop, although the actual causes of the metrics firing are from topics > that are undergoing creation/deletion, and the cluster is otherwise stable. > Looking through the source code, topic deletion goes through an asynchronous > state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35. > However, the metrics do not know about the progress of this state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 > > I believe the fix to this is relatively simple - we need to make the metrics > know that a topic is currently undergoing deletion or creation, and only > include topics that are "stable" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion
[ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804874#comment-15804874 ] Edoardo Comar edited comment on KAFKA-4441 at 1/6/17 4:16 PM: -- For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside {{ReplicaManager}} needs to be able to make a check like {{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} The current startup ordering inside {{KafkaServer}} has the {{ReplicaManager}} start before the {{KafkaController}}. Could the order be reversed ? Else the {{ReplicaManager}} could be assigned a {{DeletionChecker}} function after the {{KafkaController}} has started. This would be minimally disruptive to the current code. [~ijuma] [~junrao] any preferences ? was (Author: ecomar): For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside {{ReplicaManager}} needs to be able to make a check {{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} We will follow up with another PR > Kafka Monitoring is incorrect during rapid topic creation and deletion > -- > > Key: KAFKA-4441 > URL: https://issues.apache.org/jira/browse/KAFKA-4441 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Tom Crayford >Assignee: Edoardo Comar > > Kafka reports several metrics off the state of partitions: > UnderReplicatedPartitions > PreferredReplicaImbalanceCount > OfflinePartitionsCount > All of these metrics trigger when rapidly creating and deleting topics in a > tight loop, although the actual causes of the metrics firing are from topics > that are undergoing creation/deletion, and the cluster is otherwise stable. > Looking through the source code, topic deletion goes through an asynchronous > state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35. > However, the metrics do not know about the progress of this state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 > > I believe the fix to this is relatively simple - we need to make the metrics > know that a topic is currently undergoing deletion or creation, and only > include topics that are "stable" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion
[ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804874#comment-15804874 ] Edoardo Comar commented on KAFKA-4441: -- For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside {{ReplicaManager}} needs to be able to make a check {{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} We will follow up with another PR > Kafka Monitoring is incorrect during rapid topic creation and deletion > -- > > Key: KAFKA-4441 > URL: https://issues.apache.org/jira/browse/KAFKA-4441 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Tom Crayford >Assignee: Edoardo Comar > > Kafka reports several metrics off the state of partitions: > UnderReplicatedPartitions > PreferredReplicaImbalanceCount > OfflinePartitionsCount > All of these metrics trigger when rapidly creating and deleting topics in a > tight loop, although the actual causes of the metrics firing are from topics > that are undergoing creation/deletion, and the cluster is otherwise stable. > Looking through the source code, topic deletion goes through an asynchronous > state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35. > However, the metrics do not know about the progress of this state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 > > I believe the fix to this is relatively simple - we need to make the metrics > know that a topic is currently undergoing deletion or creation, and only > include topics that are "stable" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4605) MetricsRegistry during KafkaServerTestHarness polluted by side effects
Edoardo Comar created KAFKA-4605: Summary: MetricsRegistry during KafkaServerTestHarness polluted by side effects Key: KAFKA-4605 URL: https://issues.apache.org/jira/browse/KAFKA-4605 Project: Kafka Issue Type: Bug Components: unit tests Reporter: Edoardo Comar Priority: Minor Components like KafkaController, ReplicaManager etc that extend KafkaMetricsGroup *apparently* create new metrics but the 'newGauge' method actually invokes 'MetricsRegistry.getOrAdd' so with multiple servers in the same JVM only the first server actually creates metrics. The side effects are 1) a test cannot fully check the metric - only the metric instance created by the first class that registered it 2) after a tearDown, the registry is still ful of metrics and a subsequent test will not instantiate new metrics We've been bitten by the issue in https://github.com/apache/kafka/pull/2325 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion
[ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804732#comment-15804732 ] Edoardo Comar commented on KAFKA-4441: -- Raising the severity because we've seen this spurious metric values many times in the systems we monitor, to the point that the metric became not trusted unless the values persisted for some time. > Kafka Monitoring is incorrect during rapid topic creation and deletion > -- > > Key: KAFKA-4441 > URL: https://issues.apache.org/jira/browse/KAFKA-4441 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Tom Crayford >Assignee: Edoardo Comar > > Kafka reports several metrics off the state of partitions: > UnderReplicatedPartitions > PreferredReplicaImbalanceCount > OfflinePartitionsCount > All of these metrics trigger when rapidly creating and deleting topics in a > tight loop, although the actual causes of the metrics firing are from topics > that are undergoing creation/deletion, and the cluster is otherwise stable. > Looking through the source code, topic deletion goes through an asynchronous > state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35. > However, the metrics do not know about the progress of this state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 > > I believe the fix to this is relatively simple - we need to make the metrics > know that a topic is currently undergoing deletion or creation, and only > include topics that are "stable" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion
[ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-4441: - Priority: Major (was: Minor) > Kafka Monitoring is incorrect during rapid topic creation and deletion > -- > > Key: KAFKA-4441 > URL: https://issues.apache.org/jira/browse/KAFKA-4441 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Tom Crayford >Assignee: Edoardo Comar > > Kafka reports several metrics off the state of partitions: > UnderReplicatedPartitions > PreferredReplicaImbalanceCount > OfflinePartitionsCount > All of these metrics trigger when rapidly creating and deleting topics in a > tight loop, although the actual causes of the metrics firing are from topics > that are undergoing creation/deletion, and the cluster is otherwise stable. > Looking through the source code, topic deletion goes through an asynchronous > state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35. > However, the metrics do not know about the progress of this state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 > > I believe the fix to this is relatively simple - we need to make the metrics > know that a topic is currently undergoing deletion or creation, and only > include topics that are "stable" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1368) Upgrade log4j
[ https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801283#comment-15801283 ] Edoardo Comar commented on KAFKA-1368: -- Moving to Log4j v2 would allow dynamic reconfiguration of the logging levels *without restarting the brokers* and without any API changes. This could be done by a user, switching to the XML format, rather than properties, and adding a directive like {code:xml} {code} as described here http://logging.apache.org/log4j/2.x/manual/configuration.html#AutomaticReconfiguration > Upgrade log4j > - > > Key: KAFKA-1368 > URL: https://issues.apache.org/jira/browse/KAFKA-1368 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Vladislav Pernin > > Upgrade log4j to at least 1.2.16 ou 1.2.17. > Usage of EnhancedPatternLayout will be possible. > It allows to set delimiters around the full log, stacktrace included, making > log messages collection easier with tools like Logstash. > Example : <[%d{}]...[%t] %m%throwable>%n > <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch > request for partition [X,6] offset 700 from consumer with correlation id > 0 (kafka.server.KafkaApis) > kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only > have log segments in the range 16021 to 16021. > at kafka.log.Log.read(Log.scala:429) > ... > at java.lang.Thread.run(Thread.java:744)> -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4180) Shared authentication with multiple active Kafka producers/consumers
[ https://issues.apache.org/jira/browse/KAFKA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780212#comment-15780212 ] Edoardo Comar commented on KAFKA-4180: -- Now that https://issues.apache.org/jira/browse/KAFKA-4259 has been resolved and merged in trunk, this can be merged too > Shared authentication with multiple active Kafka producers/consumers > > > Key: KAFKA-4180 > URL: https://issues.apache.org/jira/browse/KAFKA-4180 > Project: Kafka > Issue Type: Bug > Components: producer , security >Affects Versions: 0.10.0.1 >Reporter: Guillaume Grossetie >Assignee: Mickael Maison > Labels: authentication, jaas, loginmodule, plain, producer, > sasl, user > > I'm using Kafka 0.10.0.1 with an SASL authentication on the client: > {code:title=kafka_client_jaas.conf|borderStyle=solid} > KafkaClient { > org.apache.kafka.common.security.plain.PlainLoginModule required > username="guillaume" > password="secret"; > }; > {code} > When using multiple Kafka producers the authentification is shared [1]. In > other words it's not currently possible to have multiple Kafka producers in a > JVM process. > Am I missing something ? How can I have multiple active Kafka producers with > different credentials ? > My use case is that I have an application that send messages to multiples > clusters (one cluster for logs, one cluster for metrics, one cluster for > business data). > [1] > https://github.com/apache/kafka/blob/69ebf6f7be2fc0e471ebd5b7a166468017ff2651/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L35 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4180) Shared authentication with multiple active Kafka producers/consumers
[ https://issues.apache.org/jira/browse/KAFKA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748758#comment-15748758 ] Edoardo Comar commented on KAFKA-4180: -- I have updated the PR based on the discussion in the mailing list - namely the key used to cache `LoginManager` instances. I have retired KIP-83, the PR that closes this issue is mostly based on KIP-85 and associated https://issues.apache.org/jira/browse/KAFKA-4259 > Shared authentication with multiple active Kafka producers/consumers > > > Key: KAFKA-4180 > URL: https://issues.apache.org/jira/browse/KAFKA-4180 > Project: Kafka > Issue Type: Bug > Components: producer , security >Affects Versions: 0.10.0.1 >Reporter: Guillaume Grossetie >Assignee: Mickael Maison > Labels: authentication, jaas, loginmodule, plain, producer, > sasl, user > > I'm using Kafka 0.10.0.1 with an SASL authentication on the client: > {code:title=kafka_client_jaas.conf|borderStyle=solid} > KafkaClient { > org.apache.kafka.common.security.plain.PlainLoginModule required > username="guillaume" > password="secret"; > }; > {code} > When using multiple Kafka producers the authentification is shared [1]. In > other words it's not currently possible to have multiple Kafka producers in a > JVM process. > Am I missing something ? How can I have multiple active Kafka producers with > different credentials ? > My use case is that I have an application that send messages to multiples > clusters (one cluster for logs, one cluster for metrics, one cluster for > business data). > [1] > https://github.com/apache/kafka/blob/69ebf6f7be2fc0e471ebd5b7a166468017ff2651/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L35 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4531) Rationalise client configuration validation
Edoardo Comar created KAFKA-4531: Summary: Rationalise client configuration validation Key: KAFKA-4531 URL: https://issues.apache.org/jira/browse/KAFKA-4531 Project: Kafka Issue Type: Improvement Components: clients Reporter: Edoardo Comar The broker-side configuration has a {{validateValues()}} method that could be introduced also in the client-side {{ProducerConfig}} and {{ConsumerConfig}} classes. The rationale is to centralise constraints between values, like e.g. this one currently in the {{KafkaConsumer}} constructor: {code} if (this.requestTimeoutMs <= sessionTimeOutMs || this.requestTimeoutMs <= fetchMaxWaitMs) throw new ConfigException(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG + " should be greater than " + ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG + " and " + ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG); {code} or custom validation of the provided values, e.g. this one in the {{KafkaProducer}} : {code} private static int parseAcks(String acksString) { try { return acksString.trim().equalsIgnoreCase("all") ? -1 : Integer.parseInt(acksString.trim()); } catch (NumberFormatException e) { throw new ConfigException("Invalid configuration value for 'acks': " + acksString); } } {code} also some new KIPs, e.g. KIP-81 propose constraints among different values, so it would be good not to scatter them around. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion
[ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar reassigned KAFKA-4441: Assignee: Edoardo Comar > Kafka Monitoring is incorrect during rapid topic creation and deletion > -- > > Key: KAFKA-4441 > URL: https://issues.apache.org/jira/browse/KAFKA-4441 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Tom Crayford >Assignee: Edoardo Comar >Priority: Minor > > Kafka reports several metrics off the state of partitions: > UnderReplicatedPartitions > PreferredReplicaImbalanceCount > OfflinePartitionsCount > All of these metrics trigger when rapidly creating and deleting topics in a > tight loop, although the actual causes of the metrics firing are from topics > that are undergoing creation/deletion, and the cluster is otherwise stable. > Looking through the source code, topic deletion goes through an asynchronous > state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35. > However, the metrics do not know about the progress of this state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 > > I believe the fix to this is relatively simple - we need to make the metrics > know that a topic is currently undergoing deletion or creation, and only > include topics that are "stable" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4185) Abstract out password verifier in SaslServer as an injectable dependency
[ https://issues.apache.org/jira/browse/KAFKA-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571363#comment-15571363 ] Edoardo Comar commented on KAFKA-4185: -- Hi [~piyushvijay] I see that a KIP has been proposed as a wider solution (works for other mechanisms too) https://cwiki.apache.org/confluence/display/KAFKA/KIP-86%3A+Configurable+SASL+callback+handlers > Abstract out password verifier in SaslServer as an injectable dependency > > > Key: KAFKA-4185 > URL: https://issues.apache.org/jira/browse/KAFKA-4185 > Project: Kafka > Issue Type: Improvement > Components: security >Affects Versions: 0.10.0.1 >Reporter: Piyush Vijay > Fix For: 0.10.0.2 > > > Kafka comes with a default SASL/PLAIN implementation which assumes that > username and password are present in a JAAS > config file. People often want to use some other way to provide username and > password to SaslServer. Their best bet, > currently, is to have their own implementation of SaslServer (which would be, > in most cases, a copied version of PlainSaslServer > minus the logic where password verification happens). This is not ideal. > We believe that there exists a better way to structure the current > PlainSaslServer implementation which makes it very > easy for people to plug-in their custom password verifier without having to > rewrite SaslServer or copy any code. > The idea is to have an injectable dependency interface PasswordVerifier which > can be re-implemented based on the > requirements. There would be no need to re-implement or extend > PlainSaslServer class. > Note that this is commonly asked feature and there have been some attempts in > the past to solve this problem: > https://github.com/apache/kafka/pull/1350 > https://github.com/apache/kafka/pull/1770 > https://issues.apache.org/jira/browse/KAFKA-2629 > https://issues.apache.org/jira/browse/KAFKA-3679 > We believe that this proposed solution does not have the demerits because of > previous proposals were rejected. > I would be happy to discuss more. > Please find the link to the PR in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3302) Pass kerberos keytab and principal as part of client config
[ https://issues.apache.org/jira/browse/KAFKA-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15551821#comment-15551821 ] Edoardo Comar commented on KAFKA-3302: -- looks like [~rsivaram]'s https://issues.apache.org/jira/browse/KAFKA-4259 includes also this issue > Pass kerberos keytab and principal as part of client config > > > Key: KAFKA-3302 > URL: https://issues.apache.org/jira/browse/KAFKA-3302 > Project: Kafka > Issue Type: Improvement > Components: security >Reporter: Sriharsha Chintalapani >Assignee: Sriharsha Chintalapani > Fix For: 0.10.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3396) Unauthorized topics are returned to the user
[ https://issues.apache.org/jira/browse/KAFKA-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529377#comment-15529377 ] Edoardo Comar commented on KAFKA-3396: -- new PR is https://github.com/apache/kafka/pull/1908 > Unauthorized topics are returned to the user > > > Key: KAFKA-3396 > URL: https://issues.apache.org/jira/browse/KAFKA-3396 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 0.9.0.0, 0.10.0.0 >Reporter: Grant Henke >Assignee: Edoardo Comar >Priority: Critical > Fix For: 0.10.1.0 > > > Kafka's clients and protocol exposes unauthorized topics to the end user. > This is often considered a security hole. To some, the topic name is > considered sensitive information. Those that do not consider the name > sensitive, still consider it more information that allows a user to try and > circumvent security. Instead, if a user does not have access to the topic, > the servers should act as if the topic does not exist. > To solve this some of the changes could include: > - The broker should not return a TOPIC_AUTHORIZATION(29) error for > requests (metadata, produce, fetch, etc) that include a topic that the user > does not have DESCRIBE access to. > - A user should not receive a TopicAuthorizationException when they do > not have DESCRIBE access to a topic or the cluster. > - The client should not maintain and expose a list of unauthorized > topics in org.apache.kafka.common.Cluster. > Other changes may be required that are not listed here. Further analysis is > needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied
[ https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar reassigned KAFKA-3899: Assignee: Edoardo Comar > Consumer.poll() stuck in loop if wrong credentials are supplied > --- > > Key: KAFKA-3899 > URL: https://issues.apache.org/jira/browse/KAFKA-3899 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.1.0, 0.10.0.0 >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > With the broker configured to use SASL PLAIN , > if the client is supplying wrong credentials, > a consumer calling poll() > is stuck forever and only inspection of DEBUG-level logging can tell what is > wrong. > [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 > disconnected (org.apache.kafka.common.network.Selector) > java.io.EOFException > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318) > at org.apache.kafka.common.network.Selector.poll(Selector.java:283) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4185) Abstract out password verifier in SaslServer as an injectable dependency
[ https://issues.apache.org/jira/browse/KAFKA-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526623#comment-15526623 ] Edoardo Comar commented on KAFKA-4185: -- I think this is a duplicate of https://issues.apache.org/jira/browse/KAFKA-3679 which I decided to close as won't fix as the procedure is well documented in http://kafka.apache.org/0100/documentation.html#security_sasl_plain_brokerconfig > Abstract out password verifier in SaslServer as an injectable dependency > > > Key: KAFKA-4185 > URL: https://issues.apache.org/jira/browse/KAFKA-4185 > Project: Kafka > Issue Type: Improvement > Components: security >Affects Versions: 0.10.0.1 >Reporter: Piyush Vijay > Fix For: 0.10.0.2 > > > Kafka comes with a default SASL/PLAIN implementation which assumes that > username and password are present in a JAAS > config file. People often want to use some other way to provide username and > password to SaslServer. Their best bet, > currently, is to have their own implementation of SaslServer (which would be, > in most cases, a copied version of PlainSaslServer > minus the logic where password verification happens). This is not ideal. > We believe that there exists a better way to structure the current > PlainSaslServer implementation which makes it very > easy for people to plug-in their custom password verifier without having to > rewrite SaslServer or copy any code. > The idea is to have an injectable dependency interface PasswordVerifier which > can be re-implemented based on the > requirements. There would be no need to re-implement or extend > PlainSaslServer class. > Note that this is commonly asked feature and there have been some attempts in > the past to solve this problem: > https://github.com/apache/kafka/pull/1350 > https://github.com/apache/kafka/pull/1770 > https://issues.apache.org/jira/browse/KAFKA-2629 > https://issues.apache.org/jira/browse/KAFKA-3679 > We believe that this proposed solution does not have the demerits because of > previous proposals were rejected. > I would be happy to discuss more. > Please find the link to the PR in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4180) Shared authentification with multiple actives Kafka producers/consumers
[ https://issues.apache.org/jira/browse/KAFKA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526574#comment-15526574 ] Edoardo Comar commented on KAFKA-4180: -- Hi [~sriharsha] thanks for pointing this out. I'd say this one didn't look a duplicate of your JIRA, possibly because yours only contains a title with no description/details. Also I was going to target KIP-83 to SASL plain, will open a thread on the mailing list for discussion, as suggested by the template, rather than on the wiki > Shared authentification with multiple actives Kafka producers/consumers > --- > > Key: KAFKA-4180 > URL: https://issues.apache.org/jira/browse/KAFKA-4180 > Project: Kafka > Issue Type: Bug > Components: producer , security >Affects Versions: 0.10.0.1 >Reporter: Guillaume Grossetie >Assignee: Mickael Maison > Labels: authentication, jaas, loginmodule, plain, producer, > sasl, user > > I'm using Kafka 0.10.0.1 with an SASL authentication on the client: > {code:title=kafka_client_jaas.conf|borderStyle=solid} > KafkaClient { > org.apache.kafka.common.security.plain.PlainLoginModule required > username="guillaume" > password="secret"; > }; > {code} > When using multiple Kafka producers the authentification is shared [1]. In > other words it's not currently possible to have multiple Kafka producers in a > JVM process. > Am I missing something ? How can I have multiple active Kafka producers with > different credentials ? > My use case is that I have an application that send messages to multiples > clusters (one cluster for logs, one cluster for metrics, one cluster for > business data). > [1] > https://github.com/apache/kafka/blob/69ebf6f7be2fc0e471ebd5b7a166468017ff2651/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L35 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-3679) Allow reuse of implementation of RFC 4616 in PlainSaslServer
[ https://issues.apache.org/jira/browse/KAFKA-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar resolved KAFKA-3679. -- Resolution: Won't Fix The Documentation http://kafka.apache.org/0100/documentation.html#security_sasl_plain_brokerconfig suggests to plugin a security provider and implement your own SaslServer by copy'n'paste of the provided kafka PlainSaslServer so closing as wont't fix > Allow reuse of implementation of RFC 4616 in PlainSaslServer > - > > Key: KAFKA-3679 > URL: https://issues.apache.org/jira/browse/KAFKA-3679 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 0.10.0.0 >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > Using SASL PLAIN in production may require a different username/password > checking than what is currently in the codebase, based on data contained in > the server jaas.conf. > To do so, a deployment needs to extend the SaslPlainServer as described here > http://kafka.apache.org/0100/documentation.html#security_sasl_plain_production > However the evaluate(byes) method still needs to impleemnt RFC4616, so it is > useful to separate the password checking from the reading of the data from > the wire. > A simple extract method into an overridable methos should suffice -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4206) Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners)
[ https://issues.apache.org/jira/browse/KAFKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-4206: - Description: The current handling of invalid credentials (ie wrong user/password) is to let the {{SaslException}} thrown from an implementation of {{javax.security.sasl.SaslServer.evaluateResponse()}} bubble up the call stack until it gets caught in {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} where the {{KafkaChannel}} gets closed - which will cause the client that made the request to be disconnected. This will happen however after the server has used considerable resources, especially for the SSL handshake which appears to be computationally expensive in Java. We have observed that if just a few clients keep repeating requests with the wrong credentials, it is quite easy to get all the network processing threads in the Kafka server busy doing SSL handshakes. This makes a Kafka cluster to easily suffer from a Denial Of Service - also non intentional - attack. It can be non intentional, i.e. also caused by friendly clients, for example because a Kafka Java client Producer supplied with the wrong credentials will not throw an exception on publishing, so it may keep attempting to connect without the caller realising. An easy fix which we have implemented and will supply a PR for is to *delay* considerably closing the {{KafkaChannel}} in the {{Selector}}, but obviously without blocking the processing thread. This has been tested to be very effective in reducing the cpu usage spikes caused by non malicious clients using invalid SASL PLAIN credentials over SSL. was: The current handling of invalid credentials (ie wrong user/password) is to let the {{SaslException}} thrown from an implementation of {{javax.security.sasl.SaslServer.evaluateResponse()}} bubble up the call stack until it gets caught in {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} where the `KafkaChannel` gets closed - which will cause the client that made the request to be disconnected. This will happen however after the server has used considerable resources, especially for the SSL handshake which appears to be computationally expensive in Java. We have observed that if just a few clients keep repeating requests with the wrong credentials, it is quite easy to get all the network processing threads in the Kafka server busy doing SSL handshakes. This makes a Kafka cluster to easily suffer from a Denial Of Service - also non intentional - attack. It can be non intentional, i.e. also caused by friendly clients, for example because a Kafka Java client Producer supplied with the wrong credentials will not throw an exception on publishing, so it may keep attempting to connect without the caller realising. An easy fix which we have implemented and will supply a PR for is to *delay* considerably closing the `KafkaChannel` in the `Selector`, but obviously without blocking the processing thread. This has be tested to be very effective in reducing the cpu usage spikes caused by non malicious ssl clients using invalid credentials. > Improve handling of invalid credentials to mitigate DOS issue (especially on > SSL listeners) > --- > > Key: KAFKA-4206 > URL: https://issues.apache.org/jira/browse/KAFKA-4206 > Project: Kafka > Issue Type: Improvement > Components: network, security >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The current handling of invalid credentials (ie wrong user/password) is to > let the {{SaslException}} thrown from an implementation of > {{javax.security.sasl.SaslServer.evaluateResponse()}} > bubble up the call stack until it gets caught in > {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} > where the {{KafkaChannel}} gets closed - which will cause the client that > made the request to be disconnected. > This will happen however after the server has used considerable resources, > especially for the SSL handshake which appears to be computationally > expensive in Java. > We have observed that if just a few clients keep repeating requests with the > wrong credentials, it is quite easy to get all the network processing threads > in the Kafka server busy doing SSL handshakes. > This makes a Kafka cluster to easily suffer from a Denial Of Service - also > non intentional - attack. > It can be non intentional, i.e. also caused by friendly clients, for example > because a Kafka Java client Producer supplied with the wrong credentials will > not throw an exception on publishing, so it may keep attempting to connect > without the caller realising. > An easy fix which we have implemented and will supply a PR fo
[jira] [Updated] (KAFKA-4206) Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners)
[ https://issues.apache.org/jira/browse/KAFKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-4206: - Description: The current handling of invalid credentials (ie wrong user/password) is to let the {{SaslException}} thrown from an implementation of {{javax.security.sasl.SaslServer.evaluateResponse()}} bubble up the call stack until it gets caught in {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} where the `KafkaChannel` gets closed - which will cause the client that made the request to be disconnected. This will happen however after the server has used considerable resources, especially for the SSL handshake which appears to be computationally expensive in Java. We have observed that if just a few clients keep repeating requests with the wrong credentials, it is quite easy to get all the network processing threads in the Kafka server busy doing SSL handshakes. This makes a Kafka cluster to easily suffer from a Denial Of Service - also non intentional - attack. It can be non intentional, i.e. also caused by friendly clients, for example because a Kafka Java client Producer supplied with the wrong credentials will not throw an exception on publishing, so it may keep attempting to connect without the caller realising. An easy fix which we have implemented and will supply a PR for is to *delay* considerably closing the `KafkaChannel` in the `Selector`, but obviously without blocking the processing thread. This has be tested to be very effective in reducing the cpu usage spikes caused by non malicious ssl clients using invalid credentials. was: The current handling of invalid credentials (ie wrong user/password) is to let the `SaslException` thrown from an implementation of `javax.security.sasl.SaslServer.evaluateResponse()` bubble up the call stack until it gets caught in `org.apache.kafka.common.network.Selector.pollSelectionKeys()` where the `KafkaChannel` gets closed - which will cause the client that made the request to be disconnected. This will happen however after the server has used considerable resources, especially for the SSL handshake which appears to be computationally expensive in Java. We have observed that if just a few clients keep repeating requests with the wrong credentials, it is quite easy to get all the network processing threads in the Kafka server busy doing SSL handshakes. This makes a Kafka cluster to easily suffer from a Denial Of Service - also non intentional - attack. It can be non intentional, i.e. also caused by friendly clients, for example because a Kafka Java client Producer supplied with the wrong credentials will not throw an exception on publishing, so it may keep attempting to connect without the caller realising. An easy fix which we have implemented and will supply a PR for is to *delay* considerably closing the `KafkaChannel` in the `Selector`, but obviously without blocking the processing thread. This has be tested to be very effective in reducing the cpu usage spikes caused by non malicious ssl clients using invalid credentials. > Improve handling of invalid credentials to mitigate DOS issue (especially on > SSL listeners) > --- > > Key: KAFKA-4206 > URL: https://issues.apache.org/jira/browse/KAFKA-4206 > Project: Kafka > Issue Type: Improvement > Components: network, security >Affects Versions: 0.10.0.0, 0.10.0.1 >Reporter: Edoardo Comar >Assignee: Edoardo Comar > > The current handling of invalid credentials (ie wrong user/password) is to > let the {{SaslException}} thrown from an implementation of > {{javax.security.sasl.SaslServer.evaluateResponse()}} > bubble up the call stack until it gets caught in > {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} > where the `KafkaChannel` gets closed - which will cause the client that made > the request to be disconnected. > This will happen however after the server has used considerable resources, > especially for the SSL handshake which appears to be computationally > expensive in Java. > We have observed that if just a few clients keep repeating requests with the > wrong credentials, it is quite easy to get all the network processing threads > in the Kafka server busy doing SSL handshakes. > This makes a Kafka cluster to easily suffer from a Denial Of Service - also > non intentional - attack. > It can be non intentional, i.e. also caused by friendly clients, for example > because a Kafka Java client Producer supplied with the wrong credentials will > not throw an exception on publishing, so it may keep attempting to connect > without the caller realising. > An easy fix which we have implemented and will supply a PR for is to *delay* > considerably
[jira] [Created] (KAFKA-4206) Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners)
Edoardo Comar created KAFKA-4206: Summary: Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners) Key: KAFKA-4206 URL: https://issues.apache.org/jira/browse/KAFKA-4206 Project: Kafka Issue Type: Improvement Components: network, security Affects Versions: 0.10.0.1, 0.10.0.0 Reporter: Edoardo Comar Assignee: Edoardo Comar The current handling of invalid credentials (ie wrong user/password) is to let the `SaslException` thrown from an implementation of `javax.security.sasl.SaslServer.evaluateResponse()` bubble up the call stack until it gets caught in `org.apache.kafka.common.network.Selector.pollSelectionKeys()` where the `KafkaChannel` gets closed - which will cause the client that made the request to be disconnected. This will happen however after the server has used considerable resources, especially for the SSL handshake which appears to be computationally expensive in Java. We have observed that if just a few clients keep repeating requests with the wrong credentials, it is quite easy to get all the network processing threads in the Kafka server busy doing SSL handshakes. This makes a Kafka cluster to easily suffer from a Denial Of Service - also non intentional - attack. It can be non intentional, i.e. also caused by friendly clients, for example because a Kafka Java client Producer supplied with the wrong credentials will not throw an exception on publishing, so it may keep attempting to connect without the caller realising. An easy fix which we have implemented and will supply a PR for is to *delay* considerably closing the `KafkaChannel` in the `Selector`, but obviously without blocking the processing thread. This has be tested to be very effective in reducing the cpu usage spikes caused by non malicious ssl clients using invalid credentials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3827) log.message.format.version should default to inter.broker.protocol.version
[ https://issues.apache.org/jira/browse/KAFKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370562#comment-15370562 ] Edoardo Comar commented on KAFKA-3827: -- Hu [~junrao] [~ijuma] [~ewencp] given that the upgrade docs at https://kafka.apache.org/documentation.html#upgrade cannot be relied on, because of this issue, one MUST also set log.message.format.version=0.9 if they set inter.broker.protocol.version=0.9 two questions: 1) if one sets the two properties and does a rolling 0.9->0.10 upgrade, is the second step of the upgrade to remove both settings (ie have them both =0.10) in one go or then the settings should be removed one at a time and restarting each time ? 2) what is the actual disadvantage in NOT setting inter.broker.protocol.version=0.9 (and log.message.format.version) and doing the 0.9->0.10 rolling upgrade anyway ? I would guess that the brokers still running on 0.9 won't understand replication from brokers already on 0.10 ? But if a short outage is tolerated and all the brokers are quickly brought back to 0.10 this would be ok, wouldn't it ? thanks! > log.message.format.version should default to inter.broker.protocol.version > -- > > Key: KAFKA-3827 > URL: https://issues.apache.org/jira/browse/KAFKA-3827 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.10.0.0 >Reporter: Jun Rao >Assignee: Manasvi Gupta > Labels: newbie > > Currently, if one sets inter.broker.protocol.version to 0.9.0 and restarts > the broker, one will get the following exception since > log.message.format.version defaults to 0.10.0. It will be more intuitive if > log.message.format.version defaults to the value of > inter.broker.protocol.version. > java.lang.IllegalArgumentException: requirement failed: > log.message.format.version 0.10.0-IV1 cannot be used when > inter.broker.protocol.version is set to 0.9.0.1 > at scala.Predef$.require(Predef.scala:233) > at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:1023) > at kafka.server.KafkaConfig.(KafkaConfig.scala:994) > at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:743) > at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:740) > at > kafka.server.KafkaServerStartable$.fromProps(KafkaServerStartable.scala:28) > at kafka.Kafka$.main(Kafka.scala:58) > at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied
[ https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348181#comment-15348181 ] Edoardo Comar commented on KAFKA-3899: -- Like in the case of issue https://issues.apache.org/jira/browse/KAFKA-3727 the options include - TimeoutException - propagating the timeout passed to poll(timeout) - another exception thrown out of poll() - exception passed to a listener of the consumer, like the onCompletion of the Callback available to a Producer > Consumer.poll() stuck in loop if wrong credentials are supplied > --- > > Key: KAFKA-3899 > URL: https://issues.apache.org/jira/browse/KAFKA-3899 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.1.0, 0.10.0.0 >Reporter: Edoardo Comar > > With the broker configured to use SASL PLAIN , > if the client is supplying wrong credentials, > a consumer calling poll() > is stuck forever and only inspection of DEBUG-level logging can tell what is > wrong. > [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 > disconnected (org.apache.kafka.common.network.Selector) > java.io.EOFException > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318) > at org.apache.kafka.common.network.Selector.poll(Selector.java:283) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied
[ https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar updated KAFKA-3899: - Affects Version/s: 0.10.1.0 > Consumer.poll() stuck in loop if wrong credentials are supplied > --- > > Key: KAFKA-3899 > URL: https://issues.apache.org/jira/browse/KAFKA-3899 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.1.0, 0.10.0.0 >Reporter: Edoardo Comar > > With the broker configured to use SASL PLAIN , > if the client is supplying wrong credentials, > a consumer calling poll() > is stuck forever and only inspection of DEBUG-level logging can tell what is > wrong. > [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 > disconnected (org.apache.kafka.common.network.Selector) > java.io.EOFException > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318) > at org.apache.kafka.common.network.Selector.poll(Selector.java:283) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied
[ https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348172#comment-15348172 ] Edoardo Comar commented on KAFKA-3899: -- The Producer is instead more protected against these issues. The completion callback will be called with metadata=null exception=org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 1000 ms. and correspondingly the TimeoutException will be thrown by the Future returned by send > Consumer.poll() stuck in loop if wrong credentials are supplied > --- > > Key: KAFKA-3899 > URL: https://issues.apache.org/jira/browse/KAFKA-3899 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.0.0 >Reporter: Edoardo Comar > > With the broker configured to use SASL PLAIN , > if the client is supplying wrong credentials, > a consumer calling poll() > is stuck forever and only inspection of DEBUG-level logging can tell what is > wrong. > [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 > disconnected (org.apache.kafka.common.network.Selector) > java.io.EOFException > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318) > at org.apache.kafka.common.network.Selector.poll(Selector.java:283) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied
[ https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348164#comment-15348164 ] Edoardo Comar commented on KAFKA-3899: -- A less important use case that shows the same behavior is: broker configured to use SASL, consumer is not again an infinte loop occurs with only debug level tracing showing the problem. IMHO this is a less important use case as confiuration should be sorted at test time, but the main use case of no feedback given to the client on wrong credentials is a serious issue. this is the stack trace printed in debug for a misconfigured client: java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323) at org.apache.kafka.common.network.Selector.poll(Selector.java:283) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) > Consumer.poll() stuck in loop if wrong credentials are supplied > --- > > Key: KAFKA-3899 > URL: https://issues.apache.org/jira/browse/KAFKA-3899 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.0.0 >Reporter: Edoardo Comar > > With the broker configured to use SASL PLAIN , > if the client is supplying wrong credentials, > a consumer calling poll() > is stuck forever and only inspection of DEBUG-level logging can tell what is > wrong. > [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 > disconnected (org.apache.kafka.common.network.Selector) > java.io.EOFException > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318) > at org.apache.kafka.common.network.Selector.poll(Selector.java:283) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied
Edoardo Comar created KAFKA-3899: Summary: Consumer.poll() stuck in loop if wrong credentials are supplied Key: KAFKA-3899 URL: https://issues.apache.org/jira/browse/KAFKA-3899 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.10.0.0 Reporter: Edoardo Comar With the broker configured to use SASL PLAIN , if the client is supplying wrong credentials, a consumer calling poll() is stuck forever and only inspection of DEBUG-level logging can tell what is wrong. [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 disconnected (org.apache.kafka.common.network.Selector) java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318) at org.apache.kafka.common.network.Selector.poll(Selector.java:283) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) -- This message was sent by Atlassian JIRA (v6.3.4#6332)