[jira] [Created] (KAFKA-16931) A transient REST failure to forward fenceZombie request leaves Connect Task in FAILED state

2024-06-11 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-16931:
-

 Summary: A transient REST failure to forward fenceZombie request 
leaves Connect Task in FAILED state
 Key: KAFKA-16931
 URL: https://issues.apache.org/jira/browse/KAFKA-16931
 Project: Kafka
  Issue Type: Bug
  Components: connect
Reporter: Edoardo Comar


When Kafka Connect runs in exactly_once mode, a task restart will fence 
possible zombies tasks.

This is achieved forwarding the request to the leader worker using the REST 
protocol.

At scale, in distributed mode, occasionally an HTTPs request may fail because 
of a networking glitch, reconfiguration etc

Currently there is no attempt to retry the REST request, the task is left in a 
FAILED state and requires an external restart (with the REST API).

Would this issue require a small KIP to introduce configuration entries to  
limit the number of retries, backoff times etc ?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced

2024-06-04 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-14657.
---
Resolution: Duplicate

duplicate of https://issues.apache.org/jira/browse/KAFKA-16570

> Admin.fenceProducers fails when Producer has ongoing transaction - but 
> Producer gets fenced
> ---
>
> Key: KAFKA-14657
> URL: https://issues.apache.org/jira/browse/KAFKA-14657
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java
>
>
> Admin.fenceProducers() 
> fails with a ConcurrentTransactionsException if invoked when a Producer has a 
> transaction ongoing.
> However, further attempts by that producer to produce fail with 
> InvalidProducerEpochException and the producer is not re-usable, 
> cannot abort/commit as it is fenced.
> An InvalidProducerEpochException is also logged as error on the broker
> [2023-01-27 17:16:32,220] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition topic-0 (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of 
> producer 1062 at offset 84 in topic-0 is 0, which is smaller than the last 
> seen epoch
>  
> Conversely, if Admin.fenceProducers() 
> is invoked while there is no open transaction, the call succeeds and further 
> attempts by that producer to produce fail with ProducerFenced.
> see attached snippets
> As the caller of Admin.fenceProducers() is likely unaware of the producers 
> state, the call should succeed regardless



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16488) fix flaky MirrorConnectorsIntegrationExactlyOnceTest#testReplication

2024-06-04 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-16488.
---
Resolution: Fixed

> fix flaky MirrorConnectorsIntegrationExactlyOnceTest#testReplication
> 
>
> Key: KAFKA-16488
> URL: https://issues.apache.org/jira/browse/KAFKA-16488
> Project: Kafka
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Priority: Major
>
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not 
> reset connector offsets. Error response: {"error_code":500,"message":"Failed 
> to perform zombie fencing for source connector prior to modifying offsets"}
>   at 
> app//org.apache.kafka.connect.util.clusters.EmbeddedConnect.resetConnectorOffsets(EmbeddedConnect.java:646)
>   at 
> app//org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.resetConnectorOffsets(EmbeddedConnectCluster.java:48)
>   at 
> app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.resetAllMirrorMakerConnectorOffsets(MirrorConnectorsIntegrationBaseTest.java:1063)
>   at 
> app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationExactlyOnceTest.testReplication(MirrorConnectorsIntegrationExactlyOnceTest.java:90)
>   at 
> java.base@21.0.2/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
>   at java.base@21.0.2/java.lang.reflect.Method.invoke(Method.java:580)
>   at 
> app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
>   at 
> app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>   at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
>   at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
>   at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
>   at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
>   at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
>   at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
>   at 
> app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
>   at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
>   at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
>   at 
> app//org.junit.platform.engine.support.hi

[jira] [Reopened] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2024-05-23 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reopened KAFKA-15905:
---

reopening for backporting to 3.7.1 to be confermed

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> 
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-25 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-16622:
-

 Summary: Mirromaker2 first Checkpoint not emitted until consumer 
group fully catches up once
 Key: KAFKA-16622
 URL: https://issues.apache.org/jira/browse/KAFKA-16622
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 3.6.2, 3.7.0, 3.8.0
Reporter: Edoardo Comar
 Attachments: edo-connect-mirror-maker-sourcetarget.properties

We observed an excessively delayed emission of the MM2 Checkpoint record.
It only gets created when the source consumer reaches the end of a topic. This 
does not seem reasonable.

In a very simple setup :

Tested with a standalone single process MirrorMaker2 mirroring between two 
single-node kafka clusters(mirromaker config attached) with quick refresh 
intervals (eg 5 sec) and a small offset.lag.max (eg 10)

create a single topic in the source cluster
produce data to it (e.g. 1 records)
start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec between 
polls which commits after each poll

watch the Checkpoint topic in the target cluster

bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
  --topic source.checkpoints.internal \
  --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
   --from-beginning

-> no record appears in the checkpoint topic until the consumer reaches the end 
of the topic (ie its consumer group lag gets down to 0).







--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-13 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-16369:
-

 Summary: Broker may not shut down when SocketServer fails to bind 
as Address already in use
 Key: KAFKA-16369
 URL: https://issues.apache.org/jira/browse/KAFKA-16369
 Project: Kafka
  Issue Type: Bug
Reporter: Edoardo Comar


When in Zookeeper mode, if a port the broker should listen to is already bound

the KafkaException: Socket server failed to bind to localhost:9092: Address 
already in use.

is thrown but the Broker continues to startup .

It correctly shuts down when in KRaft mode.

Easy to reproduce when in Zookeper mode with server.config set to listen to 
localhost only
{color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-15144.
---
Resolution: Not A Bug

Closing as not a bug.

The "problem" arose as without config changes, by updating MM2 from 3.3.2 to a 
later release the observable content of the Checkpoint topic has changed 
considerably.

 

In 3.3.2 even without new records in the OffsetSync topic, the Checkpoint 
records were advancing often (and even contain many duplicates). 



Now gaps of up to offset.lag.max must be expected and more reprocessing of 
records downstream may occur

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15144) Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-15144:
-

 Summary: Checkpoint downstreamOffset stuck to 1
 Key: KAFKA-15144
 URL: https://issues.apache.org/jira/browse/KAFKA-15144
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Reporter: Edoardo Comar
 Attachments: edo-connect-mirror-maker-sourcetarget.properties

Steps to reproduce :

Start source cluster

Start target cluster

start connect-mirror-maker.sh using a config like the attached

 

create topic in source cluster

produce a few messages

consume them all with autocmiit enabled

 

then dumping the Checkpoint topic content e.g.

% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
source.checkpoints.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.CheckpointFormatter

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=1, metadata=}


the downstreamOffset remains at 1, while, in a fresh cluster pair like with the 
source topic created while MM2 is running, I'd expect the downstreamOffset to 
match the upstreamOffset.

 

dumping the offset sync topic

% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
mm2-offset-syncs.source.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter

shows matching initial offsets

OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15133) RequestMetrics MessageConversionsTimeMs count is ticked even when no conversion occurs

2023-06-28 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-15133:
-

 Summary: RequestMetrics MessageConversionsTimeMs count is ticked 
even when no conversion occurs
 Key: KAFKA-15133
 URL: https://issues.apache.org/jira/browse/KAFKA-15133
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.4.1, 3.5.0
Reporter: Edoardo Comar
Assignee: Edoardo Comar


The Histogram 
{}{color:#00}RequestChannel{color}.{}}}messageConversionsTimeHist}}
is ticked even when a Produce/Fetch request incurred no conversion,
because a new entry is added to the historgram distribution, with a 0ms value.
 
It's confusing comparing the Histogram
kafka.network RequestMetrics MessageConversionsTimeMs
with the Meter
kafka.server BrokerTopicMetrics ProduceMessageConversionsPerSec
because for the latter, the metric is ticked only if a conversion actually 
occurred



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-12 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-14996:
-

 Summary: CreateTopic falis with UnknownServerException if num 
partitions >= QuorumController.MAX_RECORDS_PER_BATCH 
 Key: KAFKA-14996
 URL: https://issues.apache.org/jira/browse/KAFKA-14996
 Project: Kafka
  Issue Type: Bug
  Components: controller
Reporter: Edoardo Comar


If an attempt is made to create a topic with

num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)

the client receives an UnknownServerException - it could rather receive a 
better error.

The controller logs

{{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
Renouncing leadership and reverting to the last committed offset 174. 
(org.apache.kafka.controller.QuorumController)}}
{{java.lang.IllegalStateException: Attempted to atomically commit 10001 
records, but maxRecordsPerBatch is 1}}
{{    at 
org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
{{    at 
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
{{    at java.base/java.lang.Thread.run(Thread.java:829)}}
{{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced

2023-01-27 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-14657:
-

 Summary: Admin.fenceProducers fails when Producer has ongoing 
transaction - but Producer gets fenced
 Key: KAFKA-14657
 URL: https://issues.apache.org/jira/browse/KAFKA-14657
 Project: Kafka
  Issue Type: Bug
  Components: admin
Reporter: Edoardo Comar
 Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java

{{Admin.fenceProducers() }}
fails with a ConcurrentTransactionsException if invoked when a Producer has a 
transaction ongoing.
However, further attempts by that producer to produce fail with 
InvalidProducerEpochException and the producer is not re-usable, 
cannot abort/commit as it is fenced.

Conversely, if 
{{Admin.fenceProducers() }}
is invoked while there is no open transaction, the call succeeds and further 
attempts by that producer to produce fail with ProducerFenced.

see attached snippets 

As the caller of {{Admin.fenceProducers() }} the call should succeed regardless 
of the state of the producer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-7666) KIP-391: Allow Producing with Offsets for Cluster Replication

2023-01-27 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-7666.
--
Resolution: Won't Fix

KIP has been retired

> KIP-391: Allow Producing with Offsets for Cluster Replication
> -
>
> Key: KAFKA-7666
> URL: https://issues.apache.org/jira/browse/KAFKA-7666
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
>
> Implementing KIP-391
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-391%3A+Allow+Producing+with+Offsets+for+Cluster+Replication



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14571) ZkMetadataCache.getClusterMetadata is missing rack information for aliveNodes

2023-01-04 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-14571:
-

 Summary: ZkMetadataCache.getClusterMetadata is missing rack 
information for aliveNodes
 Key: KAFKA-14571
 URL: https://issues.apache.org/jira/browse/KAFKA-14571
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.3
Reporter: Edoardo Comar
Assignee: Edoardo Comar


ZkMetadataCache.getClusterMetadata returns a Cluster object where the 
aliveNodes are missing their rack info.

when ZkMetadataCache updates the metadataSnapshot, includes the rack in 
`aliveBrokers` but not in `aliveNodes` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-10220) NPE when describing resources

2020-06-30 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-10220:
-

 Summary: NPE when describing resources
 Key: KAFKA-10220
 URL: https://issues.apache.org/jira/browse/KAFKA-10220
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Edoardo Comar


In current trunk code 
Describing a topic can fail with an NPE in the broker



on the line 

{{          resource.configurationKeys.asScala.forall(_.contains(configName))}}

 

(configurationKeys is null?)

{{[2020-06-30 11:10:39,464] ERROR [Admin Manager on Broker 0]: Error processing 
describe configs request for resource DescribeConfigsResource(resourceType=2, 
resourceName='topic1', configurationKeys=null) 
(kafka.server.AdminManager)}}{{java.lang.NullPointerException}}{{at 
kafka.server.AdminManager.$anonfun$describeConfigs$3(AdminManager.scala:395)}}{{at
 
kafka.server.AdminManager.$anonfun$describeConfigs$3$adapted(AdminManager.scala:393)}}{{at
 
scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:248)}}{{at
 scala.collection.Iterator.foreach(Iterator.scala:929)}}{{at 
scala.collection.Iterator.foreach$(Iterator.scala:929)}}{{at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1417)}}{{at 
scala.collection.IterableLike.foreach(IterableLike.scala:71)}}{{at 
scala.collection.IterableLike.foreach$(IterableLike.scala:70)}}{{at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54)}}{{at 
scala.collection.TraversableLike.filterImpl(TraversableLike.scala:247)}}{{at 
scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:245)}}{{at 
scala.collection.AbstractTraversable.filterImpl(Traversable.scala:104)}}{{at 
scala.collection.TraversableLike.filter(TraversableLike.scala:259)}}{{at 
scala.collection.TraversableLike.filter$(TraversableLike.scala:259)}}{{at 
scala.collection.AbstractTraversable.filter(Traversable.scala:104)}}{{at 
kafka.server.AdminManager.createResponseConfig$1(AdminManager.scala:393)}}{{at 
kafka.server.AdminManager.$anonfun$describeConfigs$1(AdminManager.scala:412)}}{{at
 scala.collection.immutable.List.map(List.scala:283)}}{{at 
kafka.server.AdminManager.describeConfigs(AdminManager.scala:386)}}{{at 
kafka.server.KafkaApis.handleDescribeConfigsRequest(KafkaApis.scala:2595)}}{{at 
kafka.server.KafkaApis.handle(KafkaApis.scala:165)}}{{at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:70)}}{{at 
java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8564) NullPointerException when loading logs at startup

2019-06-20 Thread Edoardo Comar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-8564.
--
   Resolution: Fixed
Fix Version/s: 2.2.2
   2.1.2
   2.3.0

> NullPointerException when loading logs at startup
> -
>
> Key: KAFKA-8564
> URL: https://issues.apache.org/jira/browse/KAFKA-8564
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Mickael Maison
>Assignee: Edoardo Comar
>Priority: Blocker
> Fix For: 2.3.0, 2.1.2, 2.2.2
>
>
> If brokers restart when topics are being deleted, it's possible to end up 
> with a partition folder with the deleted suffix but without any log segments:
> {quote}ls -la 
> ./kafka-logs/3part3rep5-1.f2ce83b86df9416abe50d2e2299009c2-delete/
> total 8
> drwxr-xr-x@  4 mickael  staff   128  6 Jun 14:35 .
> drwxr-xr-x@ 61 mickael  staff  1952  6 Jun 14:35 ..
> -rw-r--r--@  1 mickael  staff    10  6 Jun 14:32 23261863.snapshot
> -rw-r--r--@  1 mickael  staff 0  6 Jun 14:35 leader-epoch-checkpoint
> {quote}
> From 2.2.1, brokers fail to start when loading such folders:
> {quote}[2019-06-19 09:40:48,123] ERROR There was an error in one of the 
> threads during logs loading: java.lang.NullPointerException 
> (kafka.log.LogManager)
>  [2019-06-19 09:40:48,126] ERROR [KafkaServer id=1] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
>  java.lang.NullPointerException
>  at kafka.log.Log.activeSegment(Log.scala:1896)
>  at kafka.log.Log.(Log.scala:295)
>  at kafka.log.Log$.apply(Log.scala:2186)
>  at kafka.log.LogManager.loadLog(LogManager.scala:275)
>  at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:345)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {quote}
> With 2.2.0, upon loading such folders, brokers create a new empty log segment 
> and load that successfully.
> The change of behaviour was introduced in 
> [https://github.com/apache/kafka/commit/f000dab5442ce49c4852823c257b4fb0cdfe15aa]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7861) AlterConfig may change the source of another config entry if it matches the broker default

2019-01-23 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-7861:


 Summary: AlterConfig may change the source of another config entry 
if it matches the broker default
 Key: KAFKA-7861
 URL: https://issues.apache.org/jira/browse/KAFKA-7861
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.0
Reporter: Edoardo Comar
 Attachments: AlterConfigJira.java

* Create a topic with an explicit config entry that matches the broker's 
default (e.g. '"cleanup.policy"="delete");
* Describe config - the entry will be of type DYNAMIC_TOPIC_CONFIG.
* Alter some other config for the topic (e.g `segment.bytes`)
* Describe config again - the previously DYNAMIC_TOPIC_CONFIG entry 
("cleanup.policy") will be now described as DEFAULT_CONFIG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7761) CLONE - Add broker configuration to set minimum value for segment.bytes and segment.ms

2018-12-24 Thread Edoardo Comar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-7761.
--
Resolution: Duplicate

if there is a valid reason for a duplicate, feel free to repoen

> CLONE - Add broker configuration to set minimum value for segment.bytes and 
> segment.ms
> --
>
> Key: KAFKA-7761
> URL: https://issues.apache.org/jira/browse/KAFKA-7761
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chinmay Patil
>Priority: Major
>  Labels: kip, newbie
>
> If someone set segment.bytes or segment.ms at topic level to a very small 
> value (e.g. segment.bytes=1000 or segment.ms=1000), Kafka will generate a 
> very high number of segment files. This can bring down the whole broker due 
> to hitting the maximum open file (for log) or maximum number of mmap-ed file 
> (for index).
> To prevent that from happening, I would like to suggest adding two new items 
> to the broker configuration:
>  * min.topic.segment.bytes, defaults to 1048576: The minimum value for 
> segment.bytes. When someone sets topic configuration segment.bytes to a value 
> lower than this, Kafka throws an error INVALID VALUE.
>  * min.topic.segment.ms, defaults to 360: The minimum value for 
> segment.ms. When someone sets topic configuration segment.ms to a value lower 
> than this, Kafka throws an error INVALID VALUE.
> Thanks
> Badai



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7720) kafka-configs script should also describe default broker entries

2018-12-11 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-7720:


 Summary: kafka-configs script should also describe default broker 
entries
 Key: KAFKA-7720
 URL: https://issues.apache.org/jira/browse/KAFKA-7720
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Reporter: Edoardo Comar


Running the configs tool to describe the broker configs only appears to print 
dynamically added entries.
Running
{{bin/kafka-configs.sh --bootstrap-server localhost:9092 --describe 
--entity-default --entity-type brokers}} 
on a broker without a prior added configs 
{{--alter --add-config 'key=value'}}
will otherwise print an empty list.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7666) KIP-391: Allow Producing with Offsets for Cluster Replication

2018-11-21 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-7666:


 Summary: KIP-391: Allow Producing with Offsets for Cluster 
Replication
 Key: KAFKA-7666
 URL: https://issues.apache.org/jira/browse/KAFKA-7666
 Project: Kafka
  Issue Type: New Feature
Reporter: Edoardo Comar


Implementing KIP-391




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6994) KafkaConsumer.poll throwing AuthorizationException timeout-dependent

2018-06-05 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-6994:


 Summary: KafkaConsumer.poll throwing AuthorizationException 
timeout-dependent
 Key: KAFKA-6994
 URL: https://issues.apache.org/jira/browse/KAFKA-6994
 Project: Kafka
  Issue Type: Bug
Reporter: Edoardo Comar


With auto-topic creation enabled, when attempting to consume from a 
non-existent topic, the {{AuthorizationException}} may or may not be thrown 
from {{poll(timeout)}} depending on the {{timeout}} value.

The issue can be recreated modifying a test in {{AuthorizerIntegrationTest}} as 
below (see comment) to *not* add the needed acl and therefore expecting the 
test to fail.
 While the first {{poll}} call will always throw with a short timeout, the 
second {{poll}} will not throw with the short timeout.

{code:java}
  @Test
  def testCreatePermissionOnClusterToReadFromNonExistentTopic() {
testCreatePermissionNeededToReadFromNonExistentTopic("newTopic",
  Set(new Acl(userPrincipal, Allow, Acl.WildCardHost, Create)), 
  Cluster)
  }

  private def testCreatePermissionNeededToReadFromNonExistentTopic(newTopic: 
String, acls: Set[Acl], resType: ResourceType) {
val topicPartition = new TopicPartition(newTopic, 0)
val newTopicResource = new Resource(Topic, newTopic)
addAndVerifyAcls(Set(new Acl(userPrincipal, Allow, Acl.WildCardHost, 
Read)), newTopicResource)
addAndVerifyAcls(groupReadAcl(groupResource), groupResource)
this.consumers.head.assign(List(topicPartition).asJava)
try {
  this.consumers.head.poll(Duration.ofMillis(50L));
  Assert.fail("should have thrown Authorization Exception")
} catch {
  case e: TopicAuthorizationException =>
assertEquals(Collections.singleton(newTopic), e.unauthorizedTopics())
}

//val resource = if (resType == Topic) newTopicResource else 
Resource.ClusterResource
// addAndVerifyAcls(acls, resource)

// need to use a larger timeout in this subsequent poll else it may not 
cause topic auto-creation
// this can be verified by commenting the above addAndVerifyAcls line and 
expecting this test to fail
this.consumers.head.poll(Duration.ofMillis(50L));
  }

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6863) Kafka clients should try to use multiple DNS resolved IP addresses if the first one fails

2018-05-04 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-6863:


 Summary: Kafka clients should try to use multiple DNS resolved IP 
addresses if the first one fails
 Key: KAFKA-6863
 URL: https://issues.apache.org/jira/browse/KAFKA-6863
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 1.1.0, 1.0.0
Reporter: Edoardo Comar
Assignee: Edoardo Comar
 Fix For: 2.0.0


Currently Kafka clients resolve a symbolic hostname using
  {{new InetSocketAddress(String hostname, int port)}}

which only picks one IP address even if the DNS has multiple records for the 
hostname, as it calls
 {{InetAddress.getAllByName(host)[0]}}

For some environments where the hostnames are mapped by the DNS to multiple 
IPs, e.g. in clouds where the IPs point to the external load balancers, it 
would be preferable that the client, on failing to connect to one of the IPs, 
would try the other ones before giving up the connection.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-5497) KIP-170: Enhanced CreateTopicPolicy and DeleteTopicPolicy

2018-04-26 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-5497.
--
Resolution: Won't Fix

superseded by 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces

> KIP-170: Enhanced CreateTopicPolicy and DeleteTopicPolicy
> -
>
> Key: KAFKA-5497
> URL: https://issues.apache.org/jira/browse/KAFKA-5497
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
>
> JIRA  for the implementation of KIP-170
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-170%3A+Enhanced+TopicCreatePolicy+and+introduction+of+TopicDeletePolicy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6726) KIP-277 - Fine Grained ACL for CreateTopics A

2018-03-29 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-6726:


 Summary: KIP-277 - Fine Grained ACL for CreateTopics A
 Key: KAFKA-6726
 URL: https://issues.apache.org/jira/browse/KAFKA-6726
 Project: Kafka
  Issue Type: Improvement
  Components: core, tools
Reporter: Edoardo Comar
Assignee: Edoardo Comar


issue to track implementation of 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6516) KafkaProducer retries indefinitely to authenticate on SaslAuthenticationException

2018-02-01 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-6516.
--
Resolution: Won't Fix

thanks, [~rsivaram] so my expectation was invalid then

> KafkaProducer retries indefinitely to authenticate on 
> SaslAuthenticationException
> -
>
> Key: KAFKA-6516
> URL: https://issues.apache.org/jira/browse/KAFKA-6516
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 1.0.0
>Reporter: Edoardo Comar
>Priority: Major
>
> Even after https://issues.apache.org/jira/browse/KAFKA-5854
> the producer's (background) polling thread keeps retrying to authenticate.
> if the future returned by KafkaProducer.send is not resolved.
> The current test
> {{org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testProducerWithInvalidCredentials()}}
> passes because it relies on the future being resolved.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6516) KafkaProducer retries indefinitely to authenticate on SaslAuthenticationException

2018-02-01 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-6516:


 Summary: KafkaProducer retries indefinitely to authenticate on 
SaslAuthenticationException
 Key: KAFKA-6516
 URL: https://issues.apache.org/jira/browse/KAFKA-6516
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 1.0.0
Reporter: Edoardo Comar


Even after https://issues.apache.org/jira/browse/KAFKA-5854

the producer's (background) polling thread keeps retrying to authenticate.

if the future returned by KafkaProducer.send is not resolved.

The current test

{{org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testProducerWithInvalidCredentials()}}

passes because it relies on the future being resolved.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-4206) Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners)

2017-09-29 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-4206.
--
Resolution: Won't Do

> Improve handling of invalid credentials to mitigate DOS issue (especially on 
> SSL listeners)
> ---
>
> Key: KAFKA-4206
> URL: https://issues.apache.org/jira/browse/KAFKA-4206
> Project: Kafka
>  Issue Type: Improvement
>  Components: network, security
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The current handling of invalid credentials (ie wrong user/password) is to 
> let the {{SaslException}} thrown from an implementation of 
> {{javax.security.sasl.SaslServer.evaluateResponse()}}
> bubble up the call stack until it gets caught in 
> {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}}
> where the {{KafkaChannel}} gets closed - which will cause the client that 
> made the request to be disconnected.
> This will happen however after the server has used considerable resources, 
> especially for the SSL handshake which appears to be computationally 
> expensive in Java.
> We have observed that if just a few clients keep repeating requests with the 
> wrong credentials, it is quite easy to get all the network processing threads 
> in the Kafka server busy doing SSL handshakes.
> This makes a Kafka cluster to easily suffer from a Denial Of Service - also 
> non intentional  - attack. 
> It can be non intentional, i.e. also caused by friendly clients, for example 
> because a Kafka Java client Producer supplied with the wrong credentials will 
> not throw an exception on publishing, so it may keep attempting to connect 
> without the caller realising.
> An easy fix which we have implemented and will supply a PR for is to *delay* 
> considerably closing the {{KafkaChannel}} in the {{Selector}}, but obviously 
> without blocking the processing thread.
> This has been tested to be very effective in reducing the cpu usage spikes 
> caused by non malicious clients using invalid SASL PLAIN credentials over SSL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5497) KIP-170: Enhanced CreateTopicPolicy and DeleteTopicPolicy

2017-06-22 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-5497:


 Summary: KIP-170: Enhanced CreateTopicPolicy and DeleteTopicPolicy
 Key: KAFKA-5497
 URL: https://issues.apache.org/jira/browse/KAFKA-5497
 Project: Kafka
  Issue Type: Improvement
Reporter: Edoardo Comar
Assignee: Edoardo Comar


JIRA  for the implementation of KIP-170

https://cwiki.apache.org/confluence/display/KAFKA/KIP-170%3A+Enhanced+TopicCreatePolicy+and+introduction+of+TopicDeletePolicy



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion

2017-06-12 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-5418:


Assignee: Edoardo Comar

> ZkUtils.getAllPartitions() may fail if a topic is marked for deletion
> -
>
> Key: KAFKA-5418
> URL: https://issues.apache.org/jira/browse/KAFKA-5418
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1, 0.10.2.1
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck 
> in the 'marked for deletion' state 
> so it was a child of {{/brokers/topics}}
> but it had no children, i.e. the path {{/brokers/topics/thistopic/partitions}}
> did not exist, throws a ZkNoNodeException while iterating:
> {noformat}
> rg.I0Itec.zkclient.exception.ZkNoNodeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /brokers/topics/xyzblahfoo/partitions
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995)
>   at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675)
>   at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671)
>   at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537)
>   at 
> kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817)
>   at 
> kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816)
> ...
>   at java.lang.Thread.run(Thread.java:809)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
>   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
>   at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114)
>   at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678)
>   at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985)
>   ... 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion

2017-06-09 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-5418:
-
Description: 
Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in 
the 'marked for deletion' state 
so it was a child of {{/brokers/topics}}
but it had no children, i.e. the path {{/brokers/topics/thistopic/partitions}}
did not exist, throws a ZkNoNodeException while iterating:
{noformat}
rg.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/topics/xyzblahfoo/partitions
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671)
at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816)
...
at java.lang.Thread.run(Thread.java:809)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114)
at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678)
at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985)
... 
{noformat}

  was:
Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in 
the 'marked for deletion' state 
so it was a child of {{/brokers/topics}}
but it had no children, i.e. the path {{ /brokers/topics/thistopic/partitions }}
did not exist, throws a ZkNoNodeException while iterating:
{noformat}
rg.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/topics/xyzblahfoo/partitions
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671)
at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816)
...
at java.lang.Thread.run(Thread.java:809)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
at org.I0Itec.zkclient.ZkConnection

[jira] [Created] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion

2017-06-09 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-5418:


 Summary: ZkUtils.getAllPartitions() may fail if a topic is marked 
for deletion
 Key: KAFKA-5418
 URL: https://issues.apache.org/jira/browse/KAFKA-5418
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.2.1, 0.9.0.1
Reporter: Edoardo Comar


Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in 
the 'marked for deletion' state 
so it was a child of {{/brokers/topics}}
but it had no children, i.e. the path
{{/brokers/topics/thistopic/partitions }}
did not exist

throws a ZkNoNodeException while iterating:
{noformat}
rg.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/topics/xyzblahfoo/partitions
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671)
at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816)
...
at java.lang.Thread.run(Thread.java:809)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114)
at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678)
at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985)
... 
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion

2017-06-09 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-5418:
-
Description: 
Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in 
the 'marked for deletion' state 
so it was a child of {{/brokers/topics}}
but it had no children, i.e. the path {{ /brokers/topics/thistopic/partitions }}
did not exist, throws a ZkNoNodeException while iterating:
{noformat}
rg.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/topics/xyzblahfoo/partitions
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671)
at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816)
...
at java.lang.Thread.run(Thread.java:809)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114)
at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678)
at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985)
... 
{noformat}

  was:
Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck in 
the 'marked for deletion' state 
so it was a child of {{/brokers/topics}}
but it had no children, i.e. the path
{{/brokers/topics/thistopic/partitions }}
did not exist

throws a ZkNoNodeException while iterating:
{noformat}
rg.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/topics/xyzblahfoo/partitions
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671)
at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817)
at 
kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816)
...
at java.lang.Thread.run(Thread.java:809)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
at org.I0Itec.zkclient.ZkConnectio

[jira] [Commented] (KAFKA-5290) docs need clarification on meaning of 'committed' to the log

2017-05-30 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029248#comment-16029248
 ] 

Edoardo Comar commented on KAFKA-5290:
--

https://github.com/apache/kafka/pull/3035

> docs need clarification on meaning of 'committed' to the log
> 
>
> Key: KAFKA-5290
> URL: https://issues.apache.org/jira/browse/KAFKA-5290
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The docs around
> http://kafka.apache.org/documentation/#semantics
> http://kafka.apache.org/documentation/#replication
> say
> ??A message is considered "committed" when all in sync replicas for that 
> partition have applied it to their log. Only committed messages are ever 
> given out to the consumer??
> I've always found that in need of clarification - as the producer acks 
> setting is crucial in determining what committed means.
> Based on conversations with [~rsivaram], [~apurva], [~vahid]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5290) docs need clarification on meaning of 'committed' to the log

2017-05-19 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-5290:
-
Description: 
The docs around
http://kafka.apache.org/documentation/#semantics
http://kafka.apache.org/documentation/#replication
say

??A message is considered "committed" when all in sync replicas for that 
partition have applied it to their log. Only committed messages are ever given 
out to the consumer??

I've always found that in need of clarification - as the producer acks setting 
is crucial in determining what committed means.

Based on conversations with [~rsivaram], [~apurva], [~vahid]

  was:
The docs around
http://kafka.apache.org/documentation/#semantics
http://kafka.apache.org/documentation/#replication
say

??A message is considered "committed" when all in sync replicas for that 
partition have applied it to their log. Only committed messages are ever given 
out to the consumer??

I've always found that in need of clarification - as the producer acks setting 
is crucial in determining what committed means.

Based on conversations with [~rsivaram][~apurva][~vahid]


> docs need clarification on meaning of 'committed' to the log
> 
>
> Key: KAFKA-5290
> URL: https://issues.apache.org/jira/browse/KAFKA-5290
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The docs around
> http://kafka.apache.org/documentation/#semantics
> http://kafka.apache.org/documentation/#replication
> say
> ??A message is considered "committed" when all in sync replicas for that 
> partition have applied it to their log. Only committed messages are ever 
> given out to the consumer??
> I've always found that in need of clarification - as the producer acks 
> setting is crucial in determining what committed means.
> Based on conversations with [~rsivaram], [~apurva], [~vahid]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5290) docs need clarification on meaning of 'committed' to the log

2017-05-19 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-5290:
-
Description: 
The docs around
http://kafka.apache.org/documentation/#semantics
http://kafka.apache.org/documentation/#replication
say

??A message is considered "committed" when all in sync replicas for that 
partition have applied it to their log. Only committed messages are ever given 
out to the consumer??

I've always found that in need of clarification - as the producer acks setting 
is crucial in determining what committed means.

Based on conversations with [~rsivaram][~apurva][~vahid]

  was:
The docs around
http://kafka.apache.org/documentation/#semantics
http://kafka.apache.org/documentation/#replication
say
??A message is considered "committed" when all in sync replicas for that 
partition have applied it to their log. Only committed messages are ever given 
out to the consumer. ??

I've always found that in need of clarification - as the producer acks setting 
is crucial in determining what committed means.

Based on conversations with [~rsivaram][~apurva][~vahid]


> docs need clarification on meaning of 'committed' to the log
> 
>
> Key: KAFKA-5290
> URL: https://issues.apache.org/jira/browse/KAFKA-5290
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The docs around
> http://kafka.apache.org/documentation/#semantics
> http://kafka.apache.org/documentation/#replication
> say
> ??A message is considered "committed" when all in sync replicas for that 
> partition have applied it to their log. Only committed messages are ever 
> given out to the consumer??
> I've always found that in need of clarification - as the producer acks 
> setting is crucial in determining what committed means.
> Based on conversations with [~rsivaram][~apurva][~vahid]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5290) docs need clarification on meaning of 'committed' to the log

2017-05-19 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-5290:


 Summary: docs need clarification on meaning of 'committed' to the 
log
 Key: KAFKA-5290
 URL: https://issues.apache.org/jira/browse/KAFKA-5290
 Project: Kafka
  Issue Type: Bug
  Components: documentation
Reporter: Edoardo Comar
Assignee: Edoardo Comar


The docs around
http://kafka.apache.org/documentation/#semantics
http://kafka.apache.org/documentation/#replication
say
??A message is considered "committed" when all in sync replicas for that 
partition have applied it to their log. Only committed messages are ever given 
out to the consumer. ??

I've always found that in need of clarification - as the producer acks setting 
is crucial in determining what committed means.

Based on conversations with [~rsivaram][~apurva][~vahid]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5200) If a replicated topic is deleted with one broker down, it can't be recreated

2017-05-15 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-5200:
-
Summary: If a replicated topic is deleted with one broker down, it can't be 
recreated  (was: Deleting topic when one broker is down will prevent topic to 
be re-creatable)

> If a replicated topic is deleted with one broker down, it can't be recreated
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-15 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010608#comment-16010608
 ] 

Edoardo Comar commented on KAFKA-5200:
--

I am actually hoping for someone familiar with the non-open source 
confluent-rebalancer tool to comment
and then try to implement an open source solution to this issue. 
It may possibly require a KIP (e.g adding options to 
kafka.admin.ReassignPartitionsCommand )

> Deleting topic when one broker is down will prevent topic to be re-creatable
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-15 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010608#comment-16010608
 ] 

Edoardo Comar edited comment on KAFKA-5200 at 5/15/17 2:29 PM:
---

I am actually hoping for someone familiar with the non-open source 
confluent-rebalancer tool to comment.
Then we'd try and implement an open source solution to this issue. 
It may possibly require a KIP (e.g adding options to 
kafka.admin.ReassignPartitionsCommand )


was (Author: ecomar):
I am actually hoping for someone familiar with the non-open source 
confluent-rebalancer tool to comment
and then try to implement an open source solution to this issue. 
It may possibly require a KIP (e.g adding options to 
kafka.admin.ReassignPartitionsCommand )

> Deleting topic when one broker is down will prevent topic to be re-creatable
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-15 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010601#comment-16010601
 ] 

Edoardo Comar commented on KAFKA-5200:
--

Thanks [~huxi_2b] unfortunately such steps would imply significant downtime 
which is not acceptable to us.

We actually tested a much less intrusive way to handle this occurrence, 
i.e. delete the zookeeper info about the topic while the cluster is still 
running (minus the dead broker of course)
and then force *only the controller broker* to restart.

Even if this is less intrusive, it still means that for a short-ish time two 
brokers are down.
With replication-factor 3 and min.insync.2 this implies an outage for some 
clients 
which remains unacceptable.



> Deleting topic when one broker is down will prevent topic to be re-creatable
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics (KIP-136)

2017-05-12 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008499#comment-16008499
 ] 

Edoardo Comar commented on KAFKA-4982:
--

Thanks [~ijuma] following the discussion in the PR thread, I've updated the 
implementation to only use the listener tag
and I've updated the KIP to match the implementation and describe the 
compatibility choice of not tagging the yammer metric.



> Add listener tag to socket-server-metrics.connection-... metrics (KIP-136)
> --
>
> Key: KAFKA-4982
> URL: https://issues.apache.org/jira/browse/KAFKA-4982
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
> Fix For: 0.11.0.0
>
>
> Metrics in socket-server-metrics like connection-count connection-close-rate 
> etc are tagged with networkProcessor:
> where the id of a network processor is just a numeric integer.
> If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id 
> just keeps incrementing and when looking at the metrics it is hard to match 
> the metric tag to a listener. 
> You need to know the number of network threads and the order in which the 
> listeners are declared in the brokers' server.properties.
> We should add a tag showing the listener label, that would also make it much 
> easier to group the metrics in a tool like grafana



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-12 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007893#comment-16007893
 ] 

Edoardo Comar commented on KAFKA-5200:
--

[~huxi_2b] if we could restart the broker that's down there would be no actual 
problem to be solved.

I'd like to find a way - tooling option is ok - to allow the deletion to 
progress. And handle the eventual restart of the missing broker.


> Deleting topic when one broker is down will prevent topic to be re-creatable
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics

2017-05-10 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004675#comment-16004675
 ] 

Edoardo Comar commented on KAFKA-4982:
--

The implementation took the slight deviation from the KIP screenshots of adding 
the {{listener}} tag only if it differs from {{protocol}}.
We should have taken this decision while discussing the KIP. Opinions?



> Add listener tag to socket-server-metrics.connection-... metrics  
> --
>
> Key: KAFKA-4982
> URL: https://issues.apache.org/jira/browse/KAFKA-4982
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
> Fix For: 0.11.0.0
>
>
> Metrics in socket-server-metrics like connection-count connection-close-rate 
> etc are tagged with networkProcessor:
> where the id of a network processor is just a numeric integer.
> If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id 
> just keeps incrementing and when looking at the metrics it is hard to match 
> the metric tag to a listener. 
> You need to know the number of network threads and the order in which the 
> listeners are declared in the brokers' server.properties.
> We should add a tag showing the listener label, that would also make it much 
> easier to group the metrics in a tool like grafana



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5201) Compacted topic could be misused to fill up a disk but deletion policy can't retain legitimate keys

2017-05-09 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002885#comment-16002885
 ] 

Edoardo Comar commented on KAFKA-5201:
--

Thanks [~mihbor] 
Sure there is no concept of legitimate vs abusive client ! 
Quotas affect throughput rather than storage, so I can't see them to be 
directly applicable.

Thinking again, though a practical approach could be to have the delete cleanup 
policy based on size only, not on time
then if the topic ACL limits it to trusted producers,
only an application error would cause the deletion 

topics not protected by ACL could not be trusted to have the latest value for a 
given key
but that would be acceptable.

I'm closing this JIRA for now.

> Compacted topic could be misused to fill up a disk but deletion policy can't 
> retain legitimate keys 
> 
>
> Key: KAFKA-5201
> URL: https://issues.apache.org/jira/browse/KAFKA-5201
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>
> Misuse of a topic with cleanup policy = compact
> could lead to a disk being filled if a misbehaving producer keeps producing 
> messages with unique keys.
> The mixed cleanup policy compact,delete could be adopted, but would not 
> guarantee that the latest "legitimate" keys will be kept.
> It would be desirable to have a cleanup policy that attempts to preserve 
> messages with 'legitimate' keys
> This issue needs a KIP but I have no proposed solution yet at the time of 
> writing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-5201) Compacted topic could be misused to fill up a disk but deletion policy can't retain legitimate keys

2017-05-09 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-5201.
--
Resolution: Not A Problem

> Compacted topic could be misused to fill up a disk but deletion policy can't 
> retain legitimate keys 
> 
>
> Key: KAFKA-5201
> URL: https://issues.apache.org/jira/browse/KAFKA-5201
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>
> Misuse of a topic with cleanup policy = compact
> could lead to a disk being filled if a misbehaving producer keeps producing 
> messages with unique keys.
> The mixed cleanup policy compact,delete could be adopted, but would not 
> guarantee that the latest "legitimate" keys will be kept.
> It would be desirable to have a cleanup policy that attempts to preserve 
> messages with 'legitimate' keys
> This issue needs a KIP but I have no proposed solution yet at the time of 
> writing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5201) Compacted topic could be misused to fill up a disk but deletion policy can't retain legitimate keys

2017-05-08 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-5201:
-
Summary: Compacted topic could be misused to fill up a disk but deletion 
policy can't retain legitimate keys   (was: Compacted topic could be misused up 
to fill a disk; )

> Compacted topic could be misused to fill up a disk but deletion policy can't 
> retain legitimate keys 
> 
>
> Key: KAFKA-5201
> URL: https://issues.apache.org/jira/browse/KAFKA-5201
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>
> Misuse of a topic with cleanup policy = compact
> could lead to a disk being filled if a misbehaving producer keeps producing 
> messages with unique keys.
> The mixed cleanup policy compact,delete could be adopted, but would not 
> guarantee that the latest "legitimate" keys will be kept.
> It would be desirable to have a cleanup policy that attempts to preserve 
> messages with 'legitimate' keys
> This issue needs a KIP but I have no proposed solution yet at the time of 
> writing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5201) Compacted topic could be misused up to fill a disk;

2017-05-08 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-5201:


 Summary: Compacted topic could be misused up to fill a disk; 
 Key: KAFKA-5201
 URL: https://issues.apache.org/jira/browse/KAFKA-5201
 Project: Kafka
  Issue Type: Improvement
Reporter: Edoardo Comar


Misuse of a topic with cleanup policy = compact
could lead to a disk being filled if a misbehaving producer keeps producing 
messages with unique keys.

The mixed cleanup policy compact,delete could be adopted, but would not 
guarantee that the latest "legitimate" keys will be kept.

It would be desirable to have a cleanup policy that attempts to preserve 
messages with 'legitimate' keys

This issue needs a KIP but I have no proposed solution yet at the time of 
writing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-08 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-5200:


 Summary: Deleting topic when one broker is down will prevent topic 
to be re-creatable
 Key: KAFKA-5200
 URL: https://issues.apache.org/jira/browse/KAFKA-5200
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Edoardo Comar


In a cluster with 5 broker, replication factor=3, min in sync=2,
one broker went down 
A user's app remained of course unaware of that and deleted a topic that 
(unknowingly) had a replica on the dead broker.
The topic went in 'pending delete' mode
The user then tried to recreate the topic - which failed, so his app was left 
stuck - no working topic and no ability to create one.

The reassignment tool fails to move the replica out of the dead broker - 
specifically because the broker with the partition replica to move is dead :-)

Incidentally the confluent-rebalancer docs say
http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> Supports moving partitions away from dead brokers

It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4795) Confusion around topic deletion

2017-04-20 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977538#comment-15977538
 ] 

Edoardo Comar edited comment on KAFKA-4795 at 4/20/17 9:15 PM:
---

[~vahid] I may still see topics ??marked for deletion?? when using Kafka 
0.10.2.x 

however deletion is asynchronous and usually, with topic deletion enabled on 
the broker, running the list command after a {{--delete --topic}} will not show 
anything (i.e topic is gone)


was (Author: ecomar):
[~vahid] I may still see topics ??marked for deletion?? when using Kafka 
0.10.2.x 

however deletion is asynchronous and usually, with topic deletion enabled on 
the broker, running the {{ --list }} command after a {{--delete --topic}} will 
not show anything (i.e topic is gone)

> Confusion around topic deletion
> ---
>
> Key: KAFKA-4795
> URL: https://issues.apache.org/jira/browse/KAFKA-4795
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> The topic deletion works like this in 0.10.2.0:
> # {{bin/zookeeper-server-start.sh config/zookeeper.properties}}
> # {{bin/kafka-server-start.sh config/server.properties}} (uses default 
> {{server.properties}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test 
> --replication-factor 1 --partitions 1}} (creates the topic {{test}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} 
> (reports {{Topic test is marked for deletion. Note: This will have no impact 
> if delete.topic.enable is not set to true.}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}})
> Previously, the last command above returned {{test - marked for deletion}}, 
> which matched the output statement of the {{--delete}} topic command.
> Continuing with the above scenario,
> # stop the broker
> # add the broker config {{delete.topic.enable=true}} in the config file
> # {{bin/kafka-server-start.sh config/server.properties}} (this does not 
> remove the topic {{test}}, as if the topic was never marked for deletion).
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} 
> (reports {{Topic test is marked for deletion. Note: This will have no impact 
> if delete.topic.enable is not set to true.}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns no 
> topics).
> It seems that the "marked for deletion" state for topics no longer exists.
> I opened this JIRA so I can get a confirmation on the expected topic deletion 
> behavior, because in any case, I think the user experience could be improved 
> (either there is a bug in the code, or the command's output statement is 
> misleading).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4795) Confusion around topic deletion

2017-04-20 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977538#comment-15977538
 ] 

Edoardo Comar edited comment on KAFKA-4795 at 4/20/17 9:13 PM:
---

[~vahid] I may still see topics ??marked for deletion?? when using Kafka 
0.10.2.x 

however deletion is asynchronous and usually, with topic deletion enabled on 
the broker, running the {{ --list }} command after a {{--delete --topic}} will 
not show anything (i.e topic is gone)


was (Author: ecomar):
[~vahid] I may still see topics ??marked for deletion?? when using Kafka 
0.10.2.x 

however deletion is asynchronous and usually - with topic deletion enabled on 
the broker - running the {{--list}} command after a {{--delete --topic}} will 
not show anything (i.e topic is gone)

> Confusion around topic deletion
> ---
>
> Key: KAFKA-4795
> URL: https://issues.apache.org/jira/browse/KAFKA-4795
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> The topic deletion works like this in 0.10.2.0:
> # {{bin/zookeeper-server-start.sh config/zookeeper.properties}}
> # {{bin/kafka-server-start.sh config/server.properties}} (uses default 
> {{server.properties}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test 
> --replication-factor 1 --partitions 1}} (creates the topic {{test}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} 
> (reports {{Topic test is marked for deletion. Note: This will have no impact 
> if delete.topic.enable is not set to true.}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}})
> Previously, the last command above returned {{test - marked for deletion}}, 
> which matched the output statement of the {{--delete}} topic command.
> Continuing with the above scenario,
> # stop the broker
> # add the broker config {{delete.topic.enable=true}} in the config file
> # {{bin/kafka-server-start.sh config/server.properties}} (this does not 
> remove the topic {{test}}, as if the topic was never marked for deletion).
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} 
> (reports {{Topic test is marked for deletion. Note: This will have no impact 
> if delete.topic.enable is not set to true.}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns no 
> topics).
> It seems that the "marked for deletion" state for topics no longer exists.
> I opened this JIRA so I can get a confirmation on the expected topic deletion 
> behavior, because in any case, I think the user experience could be improved 
> (either there is a bug in the code, or the command's output statement is 
> misleading).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4795) Confusion around topic deletion

2017-04-20 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977538#comment-15977538
 ] 

Edoardo Comar commented on KAFKA-4795:
--

[~vahid] I may still see topics ??marked for deletion?? when using Kafka 
0.10.2.x 

however deletion is asynchronous and usually - with topic deletion enabled on 
the broker - running the {{--list}} command after a {{--delete --topic}} will 
not show anything (i.e topic is gone)

> Confusion around topic deletion
> ---
>
> Key: KAFKA-4795
> URL: https://issues.apache.org/jira/browse/KAFKA-4795
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> The topic deletion works like this in 0.10.2.0:
> # {{bin/zookeeper-server-start.sh config/zookeeper.properties}}
> # {{bin/kafka-server-start.sh config/server.properties}} (uses default 
> {{server.properties}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test 
> --replication-factor 1 --partitions 1}} (creates the topic {{test}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} 
> (reports {{Topic test is marked for deletion. Note: This will have no impact 
> if delete.topic.enable is not set to true.}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns {{test}})
> Previously, the last command above returned {{test - marked for deletion}}, 
> which matched the output statement of the {{--delete}} topic command.
> Continuing with the above scenario,
> # stop the broker
> # add the broker config {{delete.topic.enable=true}} in the config file
> # {{bin/kafka-server-start.sh config/server.properties}} (this does not 
> remove the topic {{test}}, as if the topic was never marked for deletion).
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test}} 
> (reports {{Topic test is marked for deletion. Note: This will have no impact 
> if delete.topic.enable is not set to true.}})
> # {{bin/kafka-topics.sh --zookeeper localhost:2181 --list}} (returns no 
> topics).
> It seems that the "marked for deletion" state for topics no longer exists.
> I opened this JIRA so I can get a confirmation on the expected topic deletion 
> behavior, because in any case, I think the user experience could be improved 
> (either there is a bug in the code, or the command's output statement is 
> misleading).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5081) two versions of jackson-annotations-xxx.jar in distribution tgz

2017-04-18 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-5081:


 Summary: two versions of jackson-annotations-xxx.jar in 
distribution tgz
 Key: KAFKA-5081
 URL: https://issues.apache.org/jira/browse/KAFKA-5081
 Project: Kafka
  Issue Type: Bug
  Components: build
Reporter: Edoardo Comar
Priority: Minor


git clone https://github.com/apache/kafka.git
cd kafka
gradle

./gradlew releaseTarGz

then kafka/core/build/distributions/kafka-...-SNAPSHOT.tgz contains
in the libs directory two versions of this jar
jackson-annotations-2.8.0.jar
jackson-annotations-2.8.5.jar



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-04-13 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967735#comment-15967735
 ] 

Edoardo Comar commented on KAFKA-2729:
--

FWIW - we saw the same message 
{{  Cached zkVersion [66] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition) }}

when redeploying kafka 0.10.0.1 in a cluster after we had run 0.10.2.0
after having wiped kafka's storage, but having kept zookeeper's version (the 
one bundled with kafka 0.10.2) and its storage

For us eventually the cluster recovered.
HTH.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics

2017-04-10 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962765#comment-15962765
 ] 

Edoardo Comar commented on KAFKA-4982:
--

KIP is
https://cwiki.apache.org/confluence/display/KAFKA/KIP-136%3A+Add+Listener+name+and+Security+Protocol+name+to+SelectorMetrics+tags

> Add listener tag to socket-server-metrics.connection-... metrics  
> --
>
> Key: KAFKA-4982
> URL: https://issues.apache.org/jira/browse/KAFKA-4982
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> Metrics in socket-server-metrics like connection-count connection-close-rate 
> etc are tagged with networkProcessor:
> where the id of a network processor is just a numeric integer.
> If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id 
> just keeps incrementing and when looking at the metrics it is hard to match 
> the metric tag to a listener. 
> You need to know the number of network threads and the order in which the 
> listeners are declared in the brokers' server.properties.
> We should add a tag showing the listener label, that would also make it much 
> easier to group the metrics in a tool like grafana



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4981) Add connection-accept-rate and connection-prepare-rate metrics

2017-03-30 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949229#comment-15949229
 ] 

Edoardo Comar commented on KAFKA-4981:
--

Hi [~ijuma] for the 3rd time today :-) ... does this need a kip too :-) :-) 

> Add connection-accept-rate and connection-prepare-rate  metrics
> ---
>
> Key: KAFKA-4981
> URL: https://issues.apache.org/jira/browse/KAFKA-4981
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The current set of socket-server-metrics include 
> connection-close-rate and connection-creation-rate.
> On the server-side we'd find useful to have rates for 
> connections accepted and connections 'prepared' (when the channel is ready) 
> to see how many clients do not go through handshake or authentication



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4981) Add connection-accept-rate and connection-prepare-rate metrics

2017-03-30 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949223#comment-15949223
 ] 

Edoardo Comar commented on KAFKA-4981:
--

Thanks  I know the block is skipped for PLAINTEXT.

Actually we do not care that this rate is always 0 for PLAINTEXT - calling it 
authenticated would work for us.

We wanted from an ops perspective to check how many requests go wasted either 
because of failed TLS handshakes or failed authentications.

In fact this metric would be more useful if we add tags for the listener the 
network thread belongs to 

see https://issues.apache.org/jira/browse/KAFKA-4982



> Add connection-accept-rate and connection-prepare-rate  metrics
> ---
>
> Key: KAFKA-4981
> URL: https://issues.apache.org/jira/browse/KAFKA-4981
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The current set of socket-server-metrics include 
> connection-close-rate and connection-creation-rate.
> On the server-side we'd find useful to have rates for 
> connections accepted and connections 'prepared' (when the channel is ready) 
> to see how many clients do not go through handshake or authentication



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4981) Add connection-accept-rate and connection-prepare-rate metrics

2017-03-30 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949221#comment-15949221
 ] 

Edoardo Comar commented on KAFKA-4981:
--

as per comments on the pull request, the 'prepared' metric may be better called 
'authenticated' and will have a 0 flat rate for PLAINTEXT listeners.

[~rsivaram] wrote:
Prepared doesn't convey much meaning in terms of an externally visible metric. 
I imagine you chose it rather than authenticated since you intended it to work 
for PLAINTEXT. But PLAINTEXT doesn't go through this if-block since 
channel.ready() returns true.


> Add connection-accept-rate and connection-prepare-rate  metrics
> ---
>
> Key: KAFKA-4981
> URL: https://issues.apache.org/jira/browse/KAFKA-4981
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The current set of socket-server-metrics include 
> connection-close-rate and connection-creation-rate.
> On the server-side we'd find useful to have rates for 
> connections accepted and connections 'prepared' (when the channel is ready) 
> to see how many clients do not go through handshake or authentication



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics

2017-03-30 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949036#comment-15949036
 ] 

Edoardo Comar commented on KAFKA-4982:
--

[~ijuma] does this need a kip ?

> Add listener tag to socket-server-metrics.connection-... metrics  
> --
>
> Key: KAFKA-4982
> URL: https://issues.apache.org/jira/browse/KAFKA-4982
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> Metrics in socket-server-metrics like connection-count connection-close-rate 
> etc are tagged with networkProcessor:
> where the id of a network processor is just a numeric integer.
> If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id 
> just keeps incrementing and when looking at the metrics it is hard to match 
> the metric tag to a listener. 
> You need to know the number of network threads and the order in which the 
> listeners are declared in the brokers' server.properties.
> We should add a tag showing the listener label, that would also make it much 
> easier to group the metrics in a tool like grafana



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics

2017-03-30 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-4982:


Assignee: Edoardo Comar

> Add listener tag to socket-server-metrics.connection-... metrics  
> --
>
> Key: KAFKA-4982
> URL: https://issues.apache.org/jira/browse/KAFKA-4982
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> Metrics in socket-server-metrics like connection-count connection-close-rate 
> etc are tagged with networkProcessor:
> where the id of a network processor is just a numeric integer.
> If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id 
> just keeps incrementing and when looking at the metrics it is hard to match 
> the metric tag to a listener. 
> You need to know the number of network threads and the order in which the 
> listeners are declared in the brokers' server.properties.
> We should add a tag showing the listener label, that would also make it much 
> easier to group the metrics in a tool like grafana



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4982) Add listener tag to socket-server-metrics.connection-... metrics

2017-03-30 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-4982:


 Summary: Add listener tag to socket-server-metrics.connection-... 
metrics  
 Key: KAFKA-4982
 URL: https://issues.apache.org/jira/browse/KAFKA-4982
 Project: Kafka
  Issue Type: Improvement
Reporter: Edoardo Comar


Metrics in socket-server-metrics like connection-count connection-close-rate 
etc are tagged with networkProcessor:
where the id of a network processor is just a numeric integer.

If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id just 
keeps incrementing and when looking at the metrics it is hard to match the 
metric tag to a listener. 

You need to know the number of network threads and the order in which the 
listeners are declared in the brokers' server.properties.

We should add a tag showing the listener label, that would also make it much 
easier to group the metrics in a tool like grafana








--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4981) Add connection-accept-rate and connection-prepare-rate metrics

2017-03-30 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-4981:


 Summary: Add connection-accept-rate and connection-prepare-rate  
metrics
 Key: KAFKA-4981
 URL: https://issues.apache.org/jira/browse/KAFKA-4981
 Project: Kafka
  Issue Type: Improvement
  Components: network
Reporter: Edoardo Comar
Assignee: Edoardo Comar


The current set of socket-server-metrics include 
connection-close-rate and connection-creation-rate.

On the server-side we'd find useful to have rates for 
connections accepted and connections 'prepared' (when the channel is ready) 
to see how many clients do not go through handshake or authentication





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-03-08 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901349#comment-15901349
 ] 

Edoardo Comar commented on KAFKA-4669:
--

thanks [~ijuma] yes we're planning to move to 0.10.2 very soon

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-03-08 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901075#comment-15901075
 ] 

Edoardo Comar commented on KAFKA-4669:
--

The NPE moved from broker1 to another broker when the former was restarted. 
This suggest to me it's caused by a 'special' client

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-03-07 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899847#comment-15899847
 ] 

Edoardo Comar commented on KAFKA-4669:
--

[~ijuma] It's the same NPE as

https://issues.apache.org/jira/browse/KAFKA-3689?focusedCommentId=15383936&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15383936

could it be caused by an exotic client ?

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-03-07 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899827#comment-15899827
 ] 

Edoardo Comar commented on KAFKA-4669:
--

We have found a strong correlation between the clients getting 

{code}
Uncaught error in kafka producer I/O thread: 
java.lang.IllegalStateException: Correlation id for response (703766) does not 
match request (703764)
{code}

and an NPE in one of our 10.0.1 brokers
{code}
[2017-03-06 17:46:29,827] ERROR Processor got uncaught exception. 
(kafka.network.Processor)
java.lang.NullPointerException
at 
kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:486)
at 
kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:483)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.processCompletedReceives(SocketServer.scala:483)
at kafka.network.Processor.run(SocketServer.scala:413)
at java.lang.Thread.run(Thread.java:809)
{code}

that suggest that somehow 
{code}
  private def processCompletedReceives() {
selector.completedReceives.asScala.foreach { receive =>
  try {
val channel = selector.channel(receive.source)
val session = RequestChannel.Session(new 
KafkaPrincipal(KafkaPrincipal.USER_TYPE, channel.principal.getName),
//NPE if channel is null
...
{code}

[~ijuma] the only clients that are getting the occasional IllegalStateException 
are the ones producing to a partition that has as leader a broker where that 
NPE is appearing in our logs.

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-03-06 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897746#comment-15897746
 ] 

Edoardo Comar commented on KAFKA-4669:
--

We've seen this same exception using 10.0.1 clients against 10.0.1 brokers - 
without doing a {{KafkaProducer.flush()}}

We've seen it happening during 
{{KafkaConsumer.poll()}}
{{KafkaConsumer.commitSync()}}
and
in the network I/O of a  KafkaProducer that is just doing sends.eg.

{code}
java.lang.IllegalStateException: Correlation id for response (4564)
does not match request (4562)
at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486)
at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:381)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
at java.lang.Thread.run(Unknown Source)
{code}

{code}
java.lang.IllegalStateException: Correlation id for response (742) does not 
match request (174)
at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486)
at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:381)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:426)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1059)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1027)
{code}



Usually the response's correlation id is off by just 1 or 2 but we've also seen 
it off by a few hundreds.

When this happens, all subsequent responses are also shifted:
{code}
java.lang.IllegalStateException: Correlation id for response (743)
does not match request (742)
java.lang.IllegalStateException: Correlation id for response (744)
does not match request (743)
java.lang.IllegalStateException: Correlation id for response (745)
does not match request (744)
{code}



> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flu

[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-03-06 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897747#comment-15897747
 ] 

Edoardo Comar commented on KAFKA-4669:
--

It's easy to discard and recreate the consumer instance to recover
however we can't do that with the producer as it occurs in the Sender
thread.

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1368) Upgrade log4j

2017-03-02 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892271#comment-15892271
 ] 

Edoardo Comar commented on KAFKA-1368:
--

Yes [~ijuma] we will pick this one up. I work together with [~mimaison] who has 
self assigned this one.

We will work on this just not yet next week.

What are the compatibility implications? 
You mean that users need to switch the jars on which the kafka-client.jar 
depends on ?

> Upgrade log4j
> -
>
> Key: KAFKA-1368
> URL: https://issues.apache.org/jira/browse/KAFKA-1368
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Vladislav Pernin
>Assignee: Mickael Maison
>
> Upgrade log4j to at least 1.2.16 ou 1.2.17.
> Usage of EnhancedPatternLayout will be possible.
> It allows to set delimiters around the full log, stacktrace included, making 
> log messages collection easier with tools like Logstash.
> Example : <[%d{}]...[%t] %m%throwable>%n
> <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch 
> request for partition [X,6] offset 700 from consumer with correlation id 
> 0 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only 
> have log segments in the range 16021 to 16021.
> at kafka.log.Log.read(Log.scala:429)
> ...
> at java.lang.Thread.run(Thread.java:744)>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure

2017-01-17 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826797#comment-15826797
 ] 

Edoardo Comar commented on KAFKA-4617:
--

[~dhwanikatagade] thanks for the PR !

> gradle-generated core eclipse project has incorrect source folder structure
> ---
>
> Key: KAFKA-4617
> URL: https://issues.apache.org/jira/browse/KAFKA-4617
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Reporter: Edoardo Comar
>Assignee: Dhwani Katagade
>Priority: Minor
>  Labels: build
>
> The gradle-generated Eclipse Scala project for Kafka core has a 
> classpath defined as :
> {code:xml}
>   
>   
>   
> {code}
> because of how the source files are for tests are structured, code navigation 
> / running unit tests fails. The correct structure should be instead :
> {code:xml}
>   
>path="src/test/scala"/>
>   
>   
>   
>   
> {code}
> Moreover, the classpath included as libraries core/build/test and 
> core/build/resources
> which should not be there as the eclipse classes are not generated under build



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure

2017-01-16 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824788#comment-15824788
 ] 

Edoardo Comar commented on KAFKA-4617:
--

[~dhwanikatagade] thanks for the patch, 

The source folders in core are now correct.
I see that you have made the eclipse compiled classes location match the ones 
used by gradle on the CLI.
However two generated Eclipse projects (core and streams) still have a mistake.

Their generated classpaths still include as library entries the output 
directories of other projects they depend on. They should not.

please see my comment in the PR 


> gradle-generated core eclipse project has incorrect source folder structure
> ---
>
> Key: KAFKA-4617
> URL: https://issues.apache.org/jira/browse/KAFKA-4617
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Reporter: Edoardo Comar
>Assignee: Dhwani Katagade
>Priority: Minor
>  Labels: build
>
> The gradle-generated Eclipse Scala project for Kafka core has a 
> classpath defined as :
> {code:xml}
>   
>   
>   
> {code}
> because of how the source files are for tests are structured, code navigation 
> / running unit tests fails. The correct structure should be instead :
> {code:xml}
>   
>path="src/test/scala"/>
>   
>   
>   
>   
> {code}
> Moreover, the classpath included as libraries core/build/test and 
> core/build/resources
> which should not be there as the eclipse classes are not generated under build



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure

2017-01-12 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-4617:
-
Description: 
The gradle-generated Eclipse Scala project for Kafka core has a 
classpath defined as :
{code:xml}



{code}
because of how the source files are for tests are structured, code navigation / 
running unit tests fails. The correct structure should be instead :
{code:xml}






{code}

Moreover, the classpath included as libraries core/build/test and 
core/build/resources
which should not be there as the eclipse classes are not generated under build


  was:
The gradle-generated Eclipse Scala project for Kafka core has a 
classpath defined as :
{code:xml}



{code}
because of how the source files are for tests are structured, code navigation / 
running unit tests fails. The correct structure should be instead :
{code:xml}






{code}


> gradle-generated core eclipse project has incorrect source folder structure
> ---
>
> Key: KAFKA-4617
> URL: https://issues.apache.org/jira/browse/KAFKA-4617
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Reporter: Edoardo Comar
>Priority: Minor
>
> The gradle-generated Eclipse Scala project for Kafka core has a 
> classpath defined as :
> {code:xml}
>   
>   
>   
> {code}
> because of how the source files are for tests are structured, code navigation 
> / running unit tests fails. The correct structure should be instead :
> {code:xml}
>   
>path="src/test/scala"/>
>   
>   
>   
>   
> {code}
> Moreover, the classpath included as libraries core/build/test and 
> core/build/resources
> which should not be there as the eclipse classes are not generated under build



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure

2017-01-11 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818047#comment-15818047
 ] 

Edoardo Comar commented on KAFKA-4617:
--

Hi [~dhwanikatagade], thanks I saw that thread and that's what prompted me to 
open this defect. I am actually not bothered by the different output folder, 
that has not been annoying to me. I know that the eclipse compile output is 
different from the output generated running gradle on the CLI and am happy with 
that.

> gradle-generated core eclipse project has incorrect source folder structure
> ---
>
> Key: KAFKA-4617
> URL: https://issues.apache.org/jira/browse/KAFKA-4617
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Reporter: Edoardo Comar
>Priority: Minor
>
> The gradle-generated Eclipse Scala project for Kafka core has a 
> classpath defined as :
> {code:xml}
>   
>   
>   
> {code}
> because of how the source files are for tests are structured, code navigation 
> / running unit tests fails. The correct structure should be instead :
> {code:xml}
>   
>path="src/test/scala"/>
>   
>   
>   
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4617) gradle-generated core eclipse project has incorrect source folder structure

2017-01-11 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-4617:


 Summary: gradle-generated core eclipse project has incorrect 
source folder structure
 Key: KAFKA-4617
 URL: https://issues.apache.org/jira/browse/KAFKA-4617
 Project: Kafka
  Issue Type: Bug
  Components: build
Reporter: Edoardo Comar
Priority: Minor


The gradle-generated Eclipse Scala project for Kafka core has a 
classpath defined as :
{code:xml}



{code}
because of how the source files are for tests are structured, code navigation / 
running unit tests fails. The correct structure should be instead :
{code:xml}






{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4611) Support custom authentication mechanism

2017-01-10 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814508#comment-15814508
 ] 

Edoardo Comar commented on KAFKA-4611:
--

Hi, I am not sure what you are asking is not covered already :

https://issues.apache.org/jira/browse/KAFKA-4292 proposes custom callback 
handlers (therefore you control how to check username and pwd).

https://issues.apache.org/jira/browse/KAFKA-4259 added the functionality of 
defining jaas configurations in the client config file.

https://issues.apache.org/jira/browse/KAFKA-4180 will add the functionality of 
allowing different jaas config in a single client process.

Please note that Kerberos login uses JAAS too.



> Support custom authentication mechanism
> ---
>
> Key: KAFKA-4611
> URL: https://issues.apache.org/jira/browse/KAFKA-4611
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: mahendiran chandrasekar
>
> Currently there are two login mechanisms supported by kafka client.
> 1) Default Login / Abstract Login which uses JAAS authentication
> 2) Kerberos Login
> Supporting user defined login mechanism's would be nice. 
> This could be achieved by removing the limitation from 
> [here](https://github.com/apache/kafka/blob/0.10.0/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L44)
>  ... Instead get custom login module implemented by user from the configs, 
> gives users the option to implement custom login mechanism. 
> I am running into an issue in setting JAAS authentication system property on 
> all executors of my spark cluster. Having custom mechanism to authorize kafka 
> would be a good improvement for me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion

2017-01-09 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812411#comment-15812411
 ] 

Edoardo Comar commented on KAFKA-4441:
--

Please ignore the previous comment - the {{UnderReplicatedPartitions}} did not 
need to check for deletion, it was triggered during topic creation. New PR 
coming soon

> Kafka Monitoring is incorrect during rapid topic creation and deletion
> --
>
> Key: KAFKA-4441
> URL: https://issues.apache.org/jira/browse/KAFKA-4441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Assignee: Edoardo Comar
>
> Kafka reports several metrics off the state of partitions:
> UnderReplicatedPartitions
> PreferredReplicaImbalanceCount
> OfflinePartitionsCount
> All of these metrics trigger when rapidly creating and deleting topics in a 
> tight loop, although the actual causes of the metrics firing are from topics 
> that are undergoing creation/deletion, and the cluster is otherwise stable.
> Looking through the source code, topic deletion goes through an asynchronous 
> state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
> However, the metrics do not know about the progress of this state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
>  
> I believe the fix to this is relatively simple - we need to make the metrics 
> know that a topic is currently undergoing deletion or creation, and only 
> include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion

2017-01-06 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804874#comment-15804874
 ] 

Edoardo Comar edited comment on KAFKA-4441 at 1/6/17 4:16 PM:
--

For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside 
{{ReplicaManager}} needs to be able to make a check like 
{{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} 

The current startup ordering inside {{KafkaServer}} has the {{ReplicaManager}} 
start before the {{KafkaController}}. 
Could the order be reversed ?
Else the {{ReplicaManager}} could be assigned a {{DeletionChecker}} function 
after the {{KafkaController}} has started. This would be minimally disruptive 
to the current code.

[~ijuma] [~junrao] any preferences ?


was (Author: ecomar):
For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside 
{{ReplicaManager}} needs to be able to make a check 
{{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} 
We will follow up with another PR

> Kafka Monitoring is incorrect during rapid topic creation and deletion
> --
>
> Key: KAFKA-4441
> URL: https://issues.apache.org/jira/browse/KAFKA-4441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Assignee: Edoardo Comar
>
> Kafka reports several metrics off the state of partitions:
> UnderReplicatedPartitions
> PreferredReplicaImbalanceCount
> OfflinePartitionsCount
> All of these metrics trigger when rapidly creating and deleting topics in a 
> tight loop, although the actual causes of the metrics firing are from topics 
> that are undergoing creation/deletion, and the cluster is otherwise stable.
> Looking through the source code, topic deletion goes through an asynchronous 
> state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
> However, the metrics do not know about the progress of this state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
>  
> I believe the fix to this is relatively simple - we need to make the metrics 
> know that a topic is currently undergoing deletion or creation, and only 
> include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion

2017-01-06 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804874#comment-15804874
 ] 

Edoardo Comar commented on KAFKA-4441:
--

For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside 
{{ReplicaManager}} needs to be able to make a check 
{{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} 
We will follow up with another PR

> Kafka Monitoring is incorrect during rapid topic creation and deletion
> --
>
> Key: KAFKA-4441
> URL: https://issues.apache.org/jira/browse/KAFKA-4441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Assignee: Edoardo Comar
>
> Kafka reports several metrics off the state of partitions:
> UnderReplicatedPartitions
> PreferredReplicaImbalanceCount
> OfflinePartitionsCount
> All of these metrics trigger when rapidly creating and deleting topics in a 
> tight loop, although the actual causes of the metrics firing are from topics 
> that are undergoing creation/deletion, and the cluster is otherwise stable.
> Looking through the source code, topic deletion goes through an asynchronous 
> state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
> However, the metrics do not know about the progress of this state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
>  
> I believe the fix to this is relatively simple - we need to make the metrics 
> know that a topic is currently undergoing deletion or creation, and only 
> include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4605) MetricsRegistry during KafkaServerTestHarness polluted by side effects

2017-01-06 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-4605:


 Summary: MetricsRegistry during KafkaServerTestHarness polluted by 
side effects
 Key: KAFKA-4605
 URL: https://issues.apache.org/jira/browse/KAFKA-4605
 Project: Kafka
  Issue Type: Bug
  Components: unit tests
Reporter: Edoardo Comar
Priority: Minor


Components like KafkaController, ReplicaManager etc that extend 
KafkaMetricsGroup 
*apparently* create new metrics but the 'newGauge'
method actually invokes 'MetricsRegistry.getOrAdd' 

so with multiple servers in the same JVM only the first server actually creates 
metrics.

The side effects are 
1) a test cannot fully check the metric - only the metric instance created by 
the first class that registered it

2) after a tearDown, the registry is still ful of metrics and a subsequent test 
will not instantiate new metrics

We've been bitten by the issue in 
https://github.com/apache/kafka/pull/2325





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion

2017-01-06 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804732#comment-15804732
 ] 

Edoardo Comar commented on KAFKA-4441:
--

Raising the severity because we've seen this spurious metric values many times 
in the systems we monitor, to the point that the metric became not trusted 
unless the values persisted for some time.


> Kafka Monitoring is incorrect during rapid topic creation and deletion
> --
>
> Key: KAFKA-4441
> URL: https://issues.apache.org/jira/browse/KAFKA-4441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Assignee: Edoardo Comar
>
> Kafka reports several metrics off the state of partitions:
> UnderReplicatedPartitions
> PreferredReplicaImbalanceCount
> OfflinePartitionsCount
> All of these metrics trigger when rapidly creating and deleting topics in a 
> tight loop, although the actual causes of the metrics firing are from topics 
> that are undergoing creation/deletion, and the cluster is otherwise stable.
> Looking through the source code, topic deletion goes through an asynchronous 
> state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
> However, the metrics do not know about the progress of this state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
>  
> I believe the fix to this is relatively simple - we need to make the metrics 
> know that a topic is currently undergoing deletion or creation, and only 
> include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion

2017-01-06 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-4441:
-
Priority: Major  (was: Minor)

> Kafka Monitoring is incorrect during rapid topic creation and deletion
> --
>
> Key: KAFKA-4441
> URL: https://issues.apache.org/jira/browse/KAFKA-4441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Assignee: Edoardo Comar
>
> Kafka reports several metrics off the state of partitions:
> UnderReplicatedPartitions
> PreferredReplicaImbalanceCount
> OfflinePartitionsCount
> All of these metrics trigger when rapidly creating and deleting topics in a 
> tight loop, although the actual causes of the metrics firing are from topics 
> that are undergoing creation/deletion, and the cluster is otherwise stable.
> Looking through the source code, topic deletion goes through an asynchronous 
> state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
> However, the metrics do not know about the progress of this state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
>  
> I believe the fix to this is relatively simple - we need to make the metrics 
> know that a topic is currently undergoing deletion or creation, and only 
> include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1368) Upgrade log4j

2017-01-05 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801283#comment-15801283
 ] 

Edoardo Comar commented on KAFKA-1368:
--

Moving to Log4j v2 would allow dynamic reconfiguration of the logging levels 
*without restarting the brokers* and without any API changes.

This could be done by a user, switching to the XML format, rather than 
properties, and adding a directive like
{code:xml}

{code}
as described here
http://logging.apache.org/log4j/2.x/manual/configuration.html#AutomaticReconfiguration



> Upgrade log4j
> -
>
> Key: KAFKA-1368
> URL: https://issues.apache.org/jira/browse/KAFKA-1368
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Vladislav Pernin
>
> Upgrade log4j to at least 1.2.16 ou 1.2.17.
> Usage of EnhancedPatternLayout will be possible.
> It allows to set delimiters around the full log, stacktrace included, making 
> log messages collection easier with tools like Logstash.
> Example : <[%d{}]...[%t] %m%throwable>%n
> <[2014-04-08 11:07:20,360] ERROR [KafkaApi-1] Error when processing fetch 
> request for partition [X,6] offset 700 from consumer with correlation id 
> 0 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 700 but we only 
> have log segments in the range 16021 to 16021.
> at kafka.log.Log.read(Log.scala:429)
> ...
> at java.lang.Thread.run(Thread.java:744)>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4180) Shared authentication with multiple active Kafka producers/consumers

2016-12-27 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780212#comment-15780212
 ] 

Edoardo Comar commented on KAFKA-4180:
--

Now that https://issues.apache.org/jira/browse/KAFKA-4259 has been resolved and 
merged in trunk,
this can be merged too

> Shared authentication with multiple active Kafka producers/consumers
> 
>
> Key: KAFKA-4180
> URL: https://issues.apache.org/jira/browse/KAFKA-4180
> Project: Kafka
>  Issue Type: Bug
>  Components: producer , security
>Affects Versions: 0.10.0.1
>Reporter: Guillaume Grossetie
>Assignee: Mickael Maison
>  Labels: authentication, jaas, loginmodule, plain, producer, 
> sasl, user
>
> I'm using Kafka 0.10.0.1 with an SASL authentication on the client:
> {code:title=kafka_client_jaas.conf|borderStyle=solid}
> KafkaClient {
> org.apache.kafka.common.security.plain.PlainLoginModule required
> username="guillaume"
> password="secret";
> };
> {code}
> When using multiple Kafka producers the authentification is shared [1]. In 
> other words it's not currently possible to have multiple Kafka producers in a 
> JVM process.
> Am I missing something ? How can I have multiple active Kafka producers with 
> different credentials ?
> My use case is that I have an application that send messages to multiples 
> clusters (one cluster for logs, one cluster for metrics, one cluster for 
> business data).
> [1] 
> https://github.com/apache/kafka/blob/69ebf6f7be2fc0e471ebd5b7a166468017ff2651/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L35



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4180) Shared authentication with multiple active Kafka producers/consumers

2016-12-14 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748758#comment-15748758
 ] 

Edoardo Comar commented on KAFKA-4180:
--

I have updated the PR based on the discussion in the mailing list - namely the 
key used to cache `LoginManager` instances.
I have retired KIP-83, the PR that closes this issue is mostly based on KIP-85 
and associated https://issues.apache.org/jira/browse/KAFKA-4259

> Shared authentication with multiple active Kafka producers/consumers
> 
>
> Key: KAFKA-4180
> URL: https://issues.apache.org/jira/browse/KAFKA-4180
> Project: Kafka
>  Issue Type: Bug
>  Components: producer , security
>Affects Versions: 0.10.0.1
>Reporter: Guillaume Grossetie
>Assignee: Mickael Maison
>  Labels: authentication, jaas, loginmodule, plain, producer, 
> sasl, user
>
> I'm using Kafka 0.10.0.1 with an SASL authentication on the client:
> {code:title=kafka_client_jaas.conf|borderStyle=solid}
> KafkaClient {
> org.apache.kafka.common.security.plain.PlainLoginModule required
> username="guillaume"
> password="secret";
> };
> {code}
> When using multiple Kafka producers the authentification is shared [1]. In 
> other words it's not currently possible to have multiple Kafka producers in a 
> JVM process.
> Am I missing something ? How can I have multiple active Kafka producers with 
> different credentials ?
> My use case is that I have an application that send messages to multiples 
> clusters (one cluster for logs, one cluster for metrics, one cluster for 
> business data).
> [1] 
> https://github.com/apache/kafka/blob/69ebf6f7be2fc0e471ebd5b7a166468017ff2651/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L35



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4531) Rationalise client configuration validation

2016-12-13 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-4531:


 Summary: Rationalise client configuration validation 
 Key: KAFKA-4531
 URL: https://issues.apache.org/jira/browse/KAFKA-4531
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Edoardo Comar


The broker-side configuration has a {{validateValues()}} method that could be 
introduced also in the client-side {{ProducerConfig}} and {{ConsumerConfig}} 
classes.

The rationale is to centralise constraints between values, like e.g. this one 
currently in the {{KafkaConsumer}} constructor:
{code}
if (this.requestTimeoutMs <= sessionTimeOutMs || 
this.requestTimeoutMs <= fetchMaxWaitMs)
throw new 
ConfigException(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG + " should be greater 
than " + ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG + " and " + 
ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG);
{code}

or custom validation of the provided values, e.g. this one in the 
{{KafkaProducer}} :
{code}
private static int parseAcks(String acksString) {
try {
return acksString.trim().equalsIgnoreCase("all") ? -1 : 
Integer.parseInt(acksString.trim());
} catch (NumberFormatException e) {
throw new ConfigException("Invalid configuration value for 'acks': 
" + acksString);
}
}
{code}
also some new KIPs, e.g. KIP-81 propose constraints among different values,
so it would be good not to scatter them around.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion

2016-12-07 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-4441:


Assignee: Edoardo Comar

> Kafka Monitoring is incorrect during rapid topic creation and deletion
> --
>
> Key: KAFKA-4441
> URL: https://issues.apache.org/jira/browse/KAFKA-4441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Assignee: Edoardo Comar
>Priority: Minor
>
> Kafka reports several metrics off the state of partitions:
> UnderReplicatedPartitions
> PreferredReplicaImbalanceCount
> OfflinePartitionsCount
> All of these metrics trigger when rapidly creating and deleting topics in a 
> tight loop, although the actual causes of the metrics firing are from topics 
> that are undergoing creation/deletion, and the cluster is otherwise stable.
> Looking through the source code, topic deletion goes through an asynchronous 
> state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
> However, the metrics do not know about the progress of this state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
>  
> I believe the fix to this is relatively simple - we need to make the metrics 
> know that a topic is currently undergoing deletion or creation, and only 
> include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4185) Abstract out password verifier in SaslServer as an injectable dependency

2016-10-13 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571363#comment-15571363
 ] 

Edoardo Comar commented on KAFKA-4185:
--

Hi [~piyushvijay] I see that a KIP has been proposed as a wider solution (works 
for other mechanisms too)
https://cwiki.apache.org/confluence/display/KAFKA/KIP-86%3A+Configurable+SASL+callback+handlers

> Abstract out password verifier in SaslServer as an injectable dependency
> 
>
> Key: KAFKA-4185
> URL: https://issues.apache.org/jira/browse/KAFKA-4185
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.0.1
>Reporter: Piyush Vijay
> Fix For: 0.10.0.2
>
>
> Kafka comes with a default SASL/PLAIN implementation which assumes that 
> username and password are present in a JAAS
> config file. People often want to use some other way to provide username and 
> password to SaslServer. Their best bet,
> currently, is to have their own implementation of SaslServer (which would be, 
> in most cases, a copied version of PlainSaslServer
> minus the logic where password verification happens). This is not ideal.
> We believe that there exists a better way to structure the current 
> PlainSaslServer implementation which makes it very
> easy for people to plug-in their custom password verifier without having to 
> rewrite SaslServer or copy any code.
> The idea is to have an injectable dependency interface PasswordVerifier which 
> can be re-implemented based on the
> requirements. There would be no need to re-implement or extend 
> PlainSaslServer class.
> Note that this is commonly asked feature and there have been some attempts in 
> the past to solve this problem:
> https://github.com/apache/kafka/pull/1350
> https://github.com/apache/kafka/pull/1770
> https://issues.apache.org/jira/browse/KAFKA-2629
> https://issues.apache.org/jira/browse/KAFKA-3679
> We believe that this proposed solution does not have the demerits because of 
> previous proposals were rejected.
> I would be happy to discuss more.
> Please find the link to the PR in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3302) Pass kerberos keytab and principal as part of client config

2016-10-06 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15551821#comment-15551821
 ] 

Edoardo Comar commented on KAFKA-3302:
--

looks like [~rsivaram]'s  https://issues.apache.org/jira/browse/KAFKA-4259 
includes also this issue

> Pass kerberos keytab and principal as part of client config 
> 
>
> Key: KAFKA-3302
> URL: https://issues.apache.org/jira/browse/KAFKA-3302
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Reporter: Sriharsha Chintalapani
>Assignee: Sriharsha Chintalapani
> Fix For: 0.10.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3396) Unauthorized topics are returned to the user

2016-09-28 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529377#comment-15529377
 ] 

Edoardo Comar commented on KAFKA-3396:
--

new PR is https://github.com/apache/kafka/pull/1908

> Unauthorized topics are returned to the user
> 
>
> Key: KAFKA-3396
> URL: https://issues.apache.org/jira/browse/KAFKA-3396
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.9.0.0, 0.10.0.0
>Reporter: Grant Henke
>Assignee: Edoardo Comar
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> Kafka's clients and protocol exposes unauthorized topics to the end user. 
> This is often considered a security hole. To some, the topic name is 
> considered sensitive information. Those that do not consider the name 
> sensitive, still consider it more information that allows a user to try and 
> circumvent security.  Instead, if a user does not have access to the topic, 
> the servers should act as if the topic does not exist. 
> To solve this some of the changes could include:
>   - The broker should not return a TOPIC_AUTHORIZATION(29) error for 
> requests (metadata, produce, fetch, etc) that include a topic that the user 
> does not have DESCRIBE access to.
>   - A user should not receive a TopicAuthorizationException when they do 
> not have DESCRIBE access to a topic or the cluster.
>  - The client should not maintain and expose a list of unauthorized 
> topics in org.apache.kafka.common.Cluster. 
> Other changes may be required that are not listed here. Further analysis is 
> needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied

2016-09-27 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-3899:


Assignee: Edoardo Comar

> Consumer.poll() stuck in loop if wrong credentials are supplied
> ---
>
> Key: KAFKA-3899
> URL: https://issues.apache.org/jira/browse/KAFKA-3899
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0, 0.10.0.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> With the broker configured to use SASL PLAIN ,
> if the client is supplying wrong credentials, 
> a consumer calling poll()
> is stuck forever and only inspection of DEBUG-level logging can tell what is 
> wrong.
> [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 
> disconnected (org.apache.kafka.common.network.Selector)
> java.io.EOFException
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4185) Abstract out password verifier in SaslServer as an injectable dependency

2016-09-27 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526623#comment-15526623
 ] 

Edoardo Comar commented on KAFKA-4185:
--

I think this is a duplicate of https://issues.apache.org/jira/browse/KAFKA-3679 
which I decided to close as won't fix
as the procedure is well documented in 
http://kafka.apache.org/0100/documentation.html#security_sasl_plain_brokerconfig


> Abstract out password verifier in SaslServer as an injectable dependency
> 
>
> Key: KAFKA-4185
> URL: https://issues.apache.org/jira/browse/KAFKA-4185
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.0.1
>Reporter: Piyush Vijay
> Fix For: 0.10.0.2
>
>
> Kafka comes with a default SASL/PLAIN implementation which assumes that 
> username and password are present in a JAAS
> config file. People often want to use some other way to provide username and 
> password to SaslServer. Their best bet,
> currently, is to have their own implementation of SaslServer (which would be, 
> in most cases, a copied version of PlainSaslServer
> minus the logic where password verification happens). This is not ideal.
> We believe that there exists a better way to structure the current 
> PlainSaslServer implementation which makes it very
> easy for people to plug-in their custom password verifier without having to 
> rewrite SaslServer or copy any code.
> The idea is to have an injectable dependency interface PasswordVerifier which 
> can be re-implemented based on the
> requirements. There would be no need to re-implement or extend 
> PlainSaslServer class.
> Note that this is commonly asked feature and there have been some attempts in 
> the past to solve this problem:
> https://github.com/apache/kafka/pull/1350
> https://github.com/apache/kafka/pull/1770
> https://issues.apache.org/jira/browse/KAFKA-2629
> https://issues.apache.org/jira/browse/KAFKA-3679
> We believe that this proposed solution does not have the demerits because of 
> previous proposals were rejected.
> I would be happy to discuss more.
> Please find the link to the PR in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4180) Shared authentification with multiple actives Kafka producers/consumers

2016-09-27 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526574#comment-15526574
 ] 

Edoardo Comar commented on KAFKA-4180:
--

Hi [~sriharsha] thanks for pointing this out.
I'd say this one didn't look a duplicate of your JIRA, possibly because yours 
only contains a title with no description/details. 

Also I was going to target KIP-83 to SASL plain, will open a thread on the 
mailing list for discussion, as suggested by the template, rather than on the 
wiki

> Shared authentification with multiple actives Kafka producers/consumers
> ---
>
> Key: KAFKA-4180
> URL: https://issues.apache.org/jira/browse/KAFKA-4180
> Project: Kafka
>  Issue Type: Bug
>  Components: producer , security
>Affects Versions: 0.10.0.1
>Reporter: Guillaume Grossetie
>Assignee: Mickael Maison
>  Labels: authentication, jaas, loginmodule, plain, producer, 
> sasl, user
>
> I'm using Kafka 0.10.0.1 with an SASL authentication on the client:
> {code:title=kafka_client_jaas.conf|borderStyle=solid}
> KafkaClient {
> org.apache.kafka.common.security.plain.PlainLoginModule required
> username="guillaume"
> password="secret";
> };
> {code}
> When using multiple Kafka producers the authentification is shared [1]. In 
> other words it's not currently possible to have multiple Kafka producers in a 
> JVM process.
> Am I missing something ? How can I have multiple active Kafka producers with 
> different credentials ?
> My use case is that I have an application that send messages to multiples 
> clusters (one cluster for logs, one cluster for metrics, one cluster for 
> business data).
> [1] 
> https://github.com/apache/kafka/blob/69ebf6f7be2fc0e471ebd5b7a166468017ff2651/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L35



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3679) Allow reuse of implementation of RFC 4616 in PlainSaslServer

2016-09-27 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-3679.
--
Resolution: Won't Fix

The Documentation
http://kafka.apache.org/0100/documentation.html#security_sasl_plain_brokerconfig
suggests to plugin a security provider and implement your own SaslServer
by copy'n'paste of the provided kafka PlainSaslServer

so closing as wont't fix

> Allow reuse of implementation of RFC 4616 in PlainSaslServer 
> -
>
> Key: KAFKA-3679
> URL: https://issues.apache.org/jira/browse/KAFKA-3679
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> Using SASL PLAIN in production may require a different username/password 
> checking than what is currently in the codebase, based on data contained in 
> the server jaas.conf.
> To do so, a deployment needs to extend the SaslPlainServer as described here
> http://kafka.apache.org/0100/documentation.html#security_sasl_plain_production
> However the evaluate(byes) method still needs to impleemnt RFC4616, so it is 
> useful to separate the password checking from the reading of the data from 
> the wire. 
> A simple extract method into an overridable methos should suffice



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4206) Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners)

2016-09-22 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-4206:
-
Description: 
The current handling of invalid credentials (ie wrong user/password) is to let 
the {{SaslException}} thrown from an implementation of 
{{javax.security.sasl.SaslServer.evaluateResponse()}}
bubble up the call stack until it gets caught in 
{{org.apache.kafka.common.network.Selector.pollSelectionKeys()}}
where the {{KafkaChannel}} gets closed - which will cause the client that made 
the request to be disconnected.

This will happen however after the server has used considerable resources, 
especially for the SSL handshake which appears to be computationally expensive 
in Java.

We have observed that if just a few clients keep repeating requests with the 
wrong credentials, it is quite easy to get all the network processing threads 
in the Kafka server busy doing SSL handshakes.

This makes a Kafka cluster to easily suffer from a Denial Of Service - also non 
intentional  - attack. 
It can be non intentional, i.e. also caused by friendly clients, for example 
because a Kafka Java client Producer supplied with the wrong credentials will 
not throw an exception on publishing, so it may keep attempting to connect 
without the caller realising.

An easy fix which we have implemented and will supply a PR for is to *delay* 
considerably closing the {{KafkaChannel}} in the {{Selector}}, but obviously 
without blocking the processing thread.

This has been tested to be very effective in reducing the cpu usage spikes 
caused by non malicious clients using invalid SASL PLAIN credentials over SSL.


  was:
The current handling of invalid credentials (ie wrong user/password) is to let 
the {{SaslException}} thrown from an implementation of 
{{javax.security.sasl.SaslServer.evaluateResponse()}}
bubble up the call stack until it gets caught in 
{{org.apache.kafka.common.network.Selector.pollSelectionKeys()}}
where the `KafkaChannel` gets closed - which will cause the client that made 
the request to be disconnected.

This will happen however after the server has used considerable resources, 
especially for the SSL handshake which appears to be computationally expensive 
in Java.

We have observed that if just a few clients keep repeating requests with the 
wrong credentials, it is quite easy to get all the network processing threads 
in the Kafka server busy doing SSL handshakes.

This makes a Kafka cluster to easily suffer from a Denial Of Service - also non 
intentional  - attack. 
It can be non intentional, i.e. also caused by friendly clients, for example 
because a Kafka Java client Producer supplied with the wrong credentials will 
not throw an exception on publishing, so it may keep attempting to connect 
without the caller realising.

An easy fix which we have implemented and will supply a PR for is to *delay* 
considerably closing the `KafkaChannel` in the `Selector`, but obviously 
without blocking the processing thread.

This has be tested to be very effective in reducing the cpu usage spikes caused 
by non malicious ssl clients using invalid credentials.



> Improve handling of invalid credentials to mitigate DOS issue (especially on 
> SSL listeners)
> ---
>
> Key: KAFKA-4206
> URL: https://issues.apache.org/jira/browse/KAFKA-4206
> Project: Kafka
>  Issue Type: Improvement
>  Components: network, security
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The current handling of invalid credentials (ie wrong user/password) is to 
> let the {{SaslException}} thrown from an implementation of 
> {{javax.security.sasl.SaslServer.evaluateResponse()}}
> bubble up the call stack until it gets caught in 
> {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}}
> where the {{KafkaChannel}} gets closed - which will cause the client that 
> made the request to be disconnected.
> This will happen however after the server has used considerable resources, 
> especially for the SSL handshake which appears to be computationally 
> expensive in Java.
> We have observed that if just a few clients keep repeating requests with the 
> wrong credentials, it is quite easy to get all the network processing threads 
> in the Kafka server busy doing SSL handshakes.
> This makes a Kafka cluster to easily suffer from a Denial Of Service - also 
> non intentional  - attack. 
> It can be non intentional, i.e. also caused by friendly clients, for example 
> because a Kafka Java client Producer supplied with the wrong credentials will 
> not throw an exception on publishing, so it may keep attempting to connect 
> without the caller realising.
> An easy fix which we have implemented and will supply a PR fo

[jira] [Updated] (KAFKA-4206) Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners)

2016-09-22 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-4206:
-
Description: 
The current handling of invalid credentials (ie wrong user/password) is to let 
the {{SaslException}} thrown from an implementation of 
{{javax.security.sasl.SaslServer.evaluateResponse()}}
bubble up the call stack until it gets caught in 
{{org.apache.kafka.common.network.Selector.pollSelectionKeys()}}
where the `KafkaChannel` gets closed - which will cause the client that made 
the request to be disconnected.

This will happen however after the server has used considerable resources, 
especially for the SSL handshake which appears to be computationally expensive 
in Java.

We have observed that if just a few clients keep repeating requests with the 
wrong credentials, it is quite easy to get all the network processing threads 
in the Kafka server busy doing SSL handshakes.

This makes a Kafka cluster to easily suffer from a Denial Of Service - also non 
intentional  - attack. 
It can be non intentional, i.e. also caused by friendly clients, for example 
because a Kafka Java client Producer supplied with the wrong credentials will 
not throw an exception on publishing, so it may keep attempting to connect 
without the caller realising.

An easy fix which we have implemented and will supply a PR for is to *delay* 
considerably closing the `KafkaChannel` in the `Selector`, but obviously 
without blocking the processing thread.

This has be tested to be very effective in reducing the cpu usage spikes caused 
by non malicious ssl clients using invalid credentials.


  was:
The current handling of invalid credentials (ie wrong user/password) is to let 
the `SaslException` thrown from an implementation of 
`javax.security.sasl.SaslServer.evaluateResponse()`
bubble up the call stack until it gets caught in 
`org.apache.kafka.common.network.Selector.pollSelectionKeys()`
where the `KafkaChannel` gets closed - which will cause the client that made 
the request to be disconnected.

This will happen however after the server has used considerable resources, 
especially for the SSL handshake which appears to be computationally expensive 
in Java.

We have observed that if just a few clients keep repeating requests with the 
wrong credentials, it is quite easy to get all the network processing threads 
in the Kafka server busy doing SSL handshakes.

This makes a Kafka cluster to easily suffer from a Denial Of Service - also non 
intentional  - attack. 
It can be non intentional, i.e. also caused by friendly clients, for example 
because a Kafka Java client Producer supplied with the wrong credentials will 
not throw an exception on publishing, so it may keep attempting to connect 
without the caller realising.

An easy fix which we have implemented and will supply a PR for is to *delay* 
considerably closing the `KafkaChannel` in the `Selector`, but obviously 
without blocking the processing thread.

This has be tested to be very effective in reducing the cpu usage spikes caused 
by non malicious ssl clients using invalid credentials.



> Improve handling of invalid credentials to mitigate DOS issue (especially on 
> SSL listeners)
> ---
>
> Key: KAFKA-4206
> URL: https://issues.apache.org/jira/browse/KAFKA-4206
> Project: Kafka
>  Issue Type: Improvement
>  Components: network, security
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>
> The current handling of invalid credentials (ie wrong user/password) is to 
> let the {{SaslException}} thrown from an implementation of 
> {{javax.security.sasl.SaslServer.evaluateResponse()}}
> bubble up the call stack until it gets caught in 
> {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}}
> where the `KafkaChannel` gets closed - which will cause the client that made 
> the request to be disconnected.
> This will happen however after the server has used considerable resources, 
> especially for the SSL handshake which appears to be computationally 
> expensive in Java.
> We have observed that if just a few clients keep repeating requests with the 
> wrong credentials, it is quite easy to get all the network processing threads 
> in the Kafka server busy doing SSL handshakes.
> This makes a Kafka cluster to easily suffer from a Denial Of Service - also 
> non intentional  - attack. 
> It can be non intentional, i.e. also caused by friendly clients, for example 
> because a Kafka Java client Producer supplied with the wrong credentials will 
> not throw an exception on publishing, so it may keep attempting to connect 
> without the caller realising.
> An easy fix which we have implemented and will supply a PR for is to *delay* 
> considerably 

[jira] [Created] (KAFKA-4206) Improve handling of invalid credentials to mitigate DOS issue (especially on SSL listeners)

2016-09-22 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-4206:


 Summary: Improve handling of invalid credentials to mitigate DOS 
issue (especially on SSL listeners)
 Key: KAFKA-4206
 URL: https://issues.apache.org/jira/browse/KAFKA-4206
 Project: Kafka
  Issue Type: Improvement
  Components: network, security
Affects Versions: 0.10.0.1, 0.10.0.0
Reporter: Edoardo Comar
Assignee: Edoardo Comar


The current handling of invalid credentials (ie wrong user/password) is to let 
the `SaslException` thrown from an implementation of 
`javax.security.sasl.SaslServer.evaluateResponse()`
bubble up the call stack until it gets caught in 
`org.apache.kafka.common.network.Selector.pollSelectionKeys()`
where the `KafkaChannel` gets closed - which will cause the client that made 
the request to be disconnected.

This will happen however after the server has used considerable resources, 
especially for the SSL handshake which appears to be computationally expensive 
in Java.

We have observed that if just a few clients keep repeating requests with the 
wrong credentials, it is quite easy to get all the network processing threads 
in the Kafka server busy doing SSL handshakes.

This makes a Kafka cluster to easily suffer from a Denial Of Service - also non 
intentional  - attack. 
It can be non intentional, i.e. also caused by friendly clients, for example 
because a Kafka Java client Producer supplied with the wrong credentials will 
not throw an exception on publishing, so it may keep attempting to connect 
without the caller realising.

An easy fix which we have implemented and will supply a PR for is to *delay* 
considerably closing the `KafkaChannel` in the `Selector`, but obviously 
without blocking the processing thread.

This has be tested to be very effective in reducing the cpu usage spikes caused 
by non malicious ssl clients using invalid credentials.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3827) log.message.format.version should default to inter.broker.protocol.version

2016-07-11 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370562#comment-15370562
 ] 

Edoardo Comar commented on KAFKA-3827:
--

Hu [~junrao] [~ijuma] [~ewencp]
given that the upgrade docs at 
https://kafka.apache.org/documentation.html#upgrade cannot be relied on, 
because of this issue,
one MUST also set log.message.format.version=0.9 if they set 
inter.broker.protocol.version=0.9

two questions:
1) if one sets the two properties and does a rolling 0.9->0.10 upgrade, 
is the second step of the upgrade to remove both settings (ie have them both 
=0.10) in one go 
or
then the settings should be removed one at a time and restarting each time ?


2) what is the actual disadvantage in NOT setting 
inter.broker.protocol.version=0.9 (and log.message.format.version)
and doing the 0.9->0.10 rolling upgrade anyway ? 

I would guess that the brokers still running on 0.9 won't understand 
replication from brokers already on 0.10 ?
But if  a short outage is tolerated and all the brokers are quickly brought 
back to 0.10 this would be ok, wouldn't it ?

thanks! 


> log.message.format.version should default to inter.broker.protocol.version
> --
>
> Key: KAFKA-3827
> URL: https://issues.apache.org/jira/browse/KAFKA-3827
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Manasvi Gupta
>  Labels: newbie
>
> Currently, if one sets inter.broker.protocol.version to 0.9.0 and restarts 
> the broker, one will get the following exception since 
> log.message.format.version defaults to 0.10.0. It will be more intuitive if 
> log.message.format.version defaults to the value of 
> inter.broker.protocol.version.
> java.lang.IllegalArgumentException: requirement failed: 
> log.message.format.version 0.10.0-IV1 cannot be used when 
> inter.broker.protocol.version is set to 0.9.0.1
>   at scala.Predef$.require(Predef.scala:233)
>   at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:1023)
>   at kafka.server.KafkaConfig.(KafkaConfig.scala:994)
>   at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:743)
>   at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:740)
>   at 
> kafka.server.KafkaServerStartable$.fromProps(KafkaServerStartable.scala:28)
>   at kafka.Kafka$.main(Kafka.scala:58)
>   at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied

2016-06-24 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348181#comment-15348181
 ] 

Edoardo Comar commented on KAFKA-3899:
--

Like in the case of issue https://issues.apache.org/jira/browse/KAFKA-3727
the options include 
- TimeoutException - propagating the timeout passed to poll(timeout)
- another exception thrown out of poll()
- exception passed to a listener of the consumer, like the onCompletion of the 
Callback available to a Producer



> Consumer.poll() stuck in loop if wrong credentials are supplied
> ---
>
> Key: KAFKA-3899
> URL: https://issues.apache.org/jira/browse/KAFKA-3899
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0, 0.10.0.0
>Reporter: Edoardo Comar
>
> With the broker configured to use SASL PLAIN ,
> if the client is supplying wrong credentials, 
> a consumer calling poll()
> is stuck forever and only inspection of DEBUG-level logging can tell what is 
> wrong.
> [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 
> disconnected (org.apache.kafka.common.network.Selector)
> java.io.EOFException
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied

2016-06-24 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-3899:
-
Affects Version/s: 0.10.1.0

> Consumer.poll() stuck in loop if wrong credentials are supplied
> ---
>
> Key: KAFKA-3899
> URL: https://issues.apache.org/jira/browse/KAFKA-3899
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0, 0.10.0.0
>Reporter: Edoardo Comar
>
> With the broker configured to use SASL PLAIN ,
> if the client is supplying wrong credentials, 
> a consumer calling poll()
> is stuck forever and only inspection of DEBUG-level logging can tell what is 
> wrong.
> [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 
> disconnected (org.apache.kafka.common.network.Selector)
> java.io.EOFException
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied

2016-06-24 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348172#comment-15348172
 ] 

Edoardo Comar commented on KAFKA-3899:
--

The Producer is instead more protected against these issues.

The completion callback will be called with
metadata=null 
exception=org.apache.kafka.common.errors.TimeoutException: Failed to update 
metadata after 1000 ms.

and correspondingly the TimeoutException will be thrown by the  
Future returned by send

> Consumer.poll() stuck in loop if wrong credentials are supplied
> ---
>
> Key: KAFKA-3899
> URL: https://issues.apache.org/jira/browse/KAFKA-3899
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Edoardo Comar
>
> With the broker configured to use SASL PLAIN ,
> if the client is supplying wrong credentials, 
> a consumer calling poll()
> is stuck forever and only inspection of DEBUG-level logging can tell what is 
> wrong.
> [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 
> disconnected (org.apache.kafka.common.network.Selector)
> java.io.EOFException
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied

2016-06-24 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348164#comment-15348164
 ] 

Edoardo Comar commented on KAFKA-3899:
--

A less important use case that shows the same behavior is: broker configured to 
use SASL, consumer is not 
again an infinte loop occurs with only debug level tracing showing the problem.
IMHO this is a less important use case as confiuration should be sorted at test 
time, 
but the main use case of no feedback given to the client on wrong credentials 
is a serious issue.

this is the stack trace printed in debug for a misconfigured client:
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)


> Consumer.poll() stuck in loop if wrong credentials are supplied
> ---
>
> Key: KAFKA-3899
> URL: https://issues.apache.org/jira/browse/KAFKA-3899
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Edoardo Comar
>
> With the broker configured to use SASL PLAIN ,
> if the client is supplying wrong credentials, 
> a consumer calling poll()
> is stuck forever and only inspection of DEBUG-level logging can tell what is 
> wrong.
> [2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 
> disconnected (org.apache.kafka.common.network.Selector)
> java.io.EOFException
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3899) Consumer.poll() stuck in loop if wrong credentials are supplied

2016-06-24 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-3899:


 Summary: Consumer.poll() stuck in loop if wrong credentials are 
supplied
 Key: KAFKA-3899
 URL: https://issues.apache.org/jira/browse/KAFKA-3899
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.10.0.0
Reporter: Edoardo Comar


With the broker configured to use SASL PLAIN ,
if the client is supplying wrong credentials, 
a consumer calling poll()
is stuck forever and only inspection of DEBUG-level logging can tell what is 
wrong.

[2016-06-24 12:15:16,455] DEBUG Connection with localhost/127.0.0.1 
disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:239)
at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:182)
at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >