[jira] [Assigned] (KAFKA-4041) kafka unable to reconnect to zookeeper behind an ELB

2018-07-16 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-4041:
--

Assignee: Ismael Juma

> kafka unable to reconnect to zookeeper behind an ELB
> 
>
> Key: KAFKA-4041
> URL: https://issues.apache.org/jira/browse/KAFKA-4041
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1, 0.9.0.1
> Environment: RHEL EC2 instances
>Reporter: prabhakar
>Assignee: Ismael Juma
>Priority: Blocker
>
> Kafka brokers are unable to connect to  zookeeper which is behind an ELB.
> Kafka is using zkClient which is caching the IP address of zookeeper  and 
> even when there is a change in the IP for zookeeper it is using the Old 
> zookeeper IP.
> The server.properties has a DNS name. Ideally kafka should resolve the IP 
> using the DNS in case of any failures connecting to the broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4041) kafka unable to reconnect to zookeeper behind an ELB

2018-07-16 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4041:
---
Fix Version/s: 2.1.0

> kafka unable to reconnect to zookeeper behind an ELB
> 
>
> Key: KAFKA-4041
> URL: https://issues.apache.org/jira/browse/KAFKA-4041
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.1
> Environment: RHEL EC2 instances
>Reporter: prabhakar
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 1.1.2, 2.0.1, 2.1.0
>
>
> Kafka brokers are unable to connect to  zookeeper which is behind an ELB.
> Kafka is using zkClient which is caching the IP address of zookeeper  and 
> even when there is a change in the IP for zookeeper it is using the Old 
> zookeeper IP.
> The server.properties has a DNS name. Ideally kafka should resolve the IP 
> using the DNS in case of any failures connecting to the broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4041) kafka unable to reconnect to zookeeper behind an ELB

2018-07-16 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4041:
---
Fix Version/s: 2.0.1
   1.1.2

> kafka unable to reconnect to zookeeper behind an ELB
> 
>
> Key: KAFKA-4041
> URL: https://issues.apache.org/jira/browse/KAFKA-4041
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.1
> Environment: RHEL EC2 instances
>Reporter: prabhakar
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 1.1.2, 2.0.1, 2.1.0
>
>
> Kafka brokers are unable to connect to  zookeeper which is behind an ELB.
> Kafka is using zkClient which is caching the IP address of zookeeper  and 
> even when there is a change in the IP for zookeeper it is using the Old 
> zookeeper IP.
> The server.properties has a DNS name. Ideally kafka should resolve the IP 
> using the DNS in case of any failures connecting to the broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6784) FindCoordinatorResponse cannot be cast to FetchResponse

2018-07-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545934#comment-16545934
 ] 

ASF GitHub Bot commented on KAFKA-6784:
---

koqizhao closed pull request #4865: KAFKA-6784: FindCoordinatorResponse cannot 
be cast to FetchResponse
URL: https://github.com/apache/kafka/pull/4865
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
index 6d8fb6c2466..01e25196dfc 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
@@ -203,6 +203,10 @@ public int sendFetches() {
 .addListener(new RequestFutureListener() {
 @Override
 public void onSuccess(ClientResponse resp) {
+
sensors.fetchLatency.record(resp.requestLatencyMs());
+if (!(resp.responseBody() instanceof 
FetchResponse))
+return;
+
 FetchResponse response = (FetchResponse) 
resp.responseBody();
 FetchSessionHandler handler = 
sessionHandlers.get(fetchTarget.id());
 if (handler == null) {
@@ -227,8 +231,6 @@ public void onSuccess(ClientResponse resp) {
 completedFetches.add(new 
CompletedFetch(partition, fetchOffset, fetchData, metricAggregator,
 resp.requestHeader().apiVersion()));
 }
-
-
sensors.fetchLatency.record(resp.requestLatencyMs());
 }
 
 @Override


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FindCoordinatorResponse cannot be cast to FetchResponse
> ---
>
> Key: KAFKA-6784
> URL: https://issues.apache.org/jira/browse/KAFKA-6784
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 1.1.0
>Reporter: Qiang Zhao
>Priority: Major
>  Labels: patch
> Fix For: 2.1.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> org.apache.kafka.clients.consumer.internals.Fetcher
>  
> {code:java}
> client.send(fetchTarget, request)
> .addListener(new RequestFutureListener() {
> @Override
> public void onSuccess(ClientResponse resp) {
> FetchResponse response = (FetchResponse) 
> resp.responseBody();
>  
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7167) old segments not deleted despite short retention.ms

2018-07-16 Thread Vahid Hashemian (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545920#comment-16545920
 ] 

Vahid Hashemian commented on KAFKA-7167:


I tried the 1.1.0 release with the config you specified (and a fresh topic), 
and was not able to reproduce the issue. Log segments are getting deleted as 
expected. 

> old segments not deleted despite short retention.ms
> ---
>
> Key: KAFKA-7167
> URL: https://issues.apache.org/jira/browse/KAFKA-7167
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 1.1.0
>Reporter: Christian Tramnitz
>Priority: Critical
>
> We have a three node Kafka cluster running 1.1.0 and (global) 
> log.retentions.ms configured to 720 (2h). (global) log.roll.ms=30 is 
> set to ensure we have new segments every 5 minutes, to clean up old ones 
> granularly.
>  The topic policy has cleanup.policy = delete.
>  
> I can see some old segments being deleted, i.e.
> Jul 16 08:59:17 server460 kafka[26553]: [2018-07-16 08:59:17,321] INFO [Log 
> partition=test-topic-6, dir=/kafka] Deleting segment 18702312 (kafka.log.Log)
> Jul 16 08:59:17 server460 kafka[26553]: [2018-07-16 08:59:17,329] INFO 
> Deleted log /kafka/test-topic-6/18702312.log.deleted. 
> (kafka.log.LogSegment)
> Jul 16 08:59:17 server460 kafka[26553]: [2018-07-16 08:59:17,329] INFO 
> Deleted offset index /kafka/test-topic-6/18702312.index.deleted. 
> (kafka.log.LogSegment)
> Jul 16 08:59:17 server460 kafka[26553]: [2018-07-16 08:59:17,329] INFO 
> Deleted time index 
> /kafka/test-topic-6/18702312.timeindex.deleted. 
> (kafka.log.LogSegment) 
> But apparently not all of them. There are still old(er!) segments in the 
> topic's dir, ultimately growing to fill-up the entire disk.
>  
> Are there any other configuration values that may affect old segment deletion 
> or is this a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-5037) Infinite loop if all input topics are unknown at startup

2018-07-16 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-5037:
--

Assignee: Ted Yu

> Infinite loop if all input topics are unknown at startup
> 
>
> Key: KAFKA-5037
> URL: https://issues.apache.org/jira/browse/KAFKA-5037
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Matthias J. Sax
>Assignee: Ted Yu
>Priority: Major
>  Labels: newbie++, user-experience
> Attachments: 5037.v2.txt, 5037.v4.txt
>
>
> See discusion: https://github.com/apache/kafka/pull/2815
> We will need some rewrite on {{StreamPartitionsAssignor}} and to add much 
> more test for all kind of corner cases, including pattern subscriptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6923) Consolidate ExtendedSerializer/Serializer and ExtendedDeserializer/Deserializer

2018-07-16 Thread Chia-Ping Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545886#comment-16545886
 ] 

Chia-Ping Tsai commented on KAFKA-6923:
---

my bad. :( Let us keep the discussion in the thread!

> Consolidate ExtendedSerializer/Serializer and 
> ExtendedDeserializer/Deserializer
> ---
>
> Key: KAFKA-6923
> URL: https://issues.apache.org/jira/browse/KAFKA-6923
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Ismael Juma
>Assignee: Viktor Somogyi
>Priority: Major
>  Labels: needs-kip
> Fix For: 2.1.0
>
>
> The Javadoc of ExtendedDeserializer states:
> {code}
>  * Prefer {@link Deserializer} if access to the headers is not required. Once 
> Kafka drops support for Java 7, the
>  * {@code deserialize()} method introduced by this interface will be added to 
> Deserializer with a default implementation
>  * so that backwards compatibility is maintained. This interface may be 
> deprecated once that happens.
> {code}
> Since we have dropped Java 7 support, we should figure out how to do this. 
> There are compatibility implications, so a KIP is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6460) Add mocks for state stores used in Streams unit testing

2018-07-16 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545824#comment-16545824
 ] 

Matthias J. Sax commented on KAFKA-6460:


[~shung] Yes, this is not completely resolved yet. Part of this might be 
addressed via 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+Test+Utils]
 

[~guozhang]: should the mock-stores be internal ones? Or part of public 
test-utils package?

> Add mocks for state stores used in Streams unit testing
> ---
>
> Key: KAFKA-6460
> URL: https://issues.apache.org/jira/browse/KAFKA-6460
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie++
>
> We'd like to use mocks for different types of state stores: kv, window, 
> session that can be used to record the number of expected put / get calls 
> used in the DSL operator unit testing. This involves implementing the two 
> interfaces {{StoreSupplier}} and {{StoreBuilder}} that can return a object 
> created from, say, EasyMock, and the object can then be set up with the 
> expected calls.
> In addition, we should also add a mock record collector which can be returned 
> from the mock processor context so that with logging enabled store, users can 
> also validate if the changes have been forwarded to the changelog as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7170) NPE in ConsumerGroupCommand when describe consumer group

2018-07-16 Thread Vahid Hashemian (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545817#comment-16545817
 ] 

Vahid Hashemian commented on KAFKA-7170:


Could be a duplicate of KAFKA-7044.

> NPE in ConsumerGroupCommand when describe consumer group
> 
>
> Key: KAFKA-7170
> URL: https://issues.apache.org/jira/browse/KAFKA-7170
> Project: Kafka
>  Issue Type: Bug
>Reporter: Yuanjin Xu
>Priority: Major
>
> Got the following error when run kafka-consumer-groups to describe a consumer 
> group.
> {code:java}
> Error: Executing consumer group command failed due to null
> [2018-07-16 20:05:21,353] DEBUG Exception in consumer group command 
> (kafka.admin.ConsumerGroupCommand$)
> java.lang.NullPointerException
> at scala.Predef$.Long2long(Predef.scala:363)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$getLogEndOffsets$2.apply(ConsumerGroupCommand.scala:612)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$getLogEndOffsets$2.apply(ConsumerGroupCommand.scala:610)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.getLogEndOffsets(ConsumerGroupCommand.scala:610)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describePartitions(ConsumerGroupCommand.scala:328)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.collectConsumerAssignment(ConsumerGroupCommand.scala:308)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.collectConsumerAssignment(ConsumerGroupCommand.scala:544)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10$$anonfun$13.apply(ConsumerGroupCommand.scala:571)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10$$anonfun$13.apply(ConsumerGroupCommand.scala:565)
> at scala.collection.immutable.List.flatMap(List.scala:338)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10.apply(ConsumerGroupCommand.scala:565)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10.apply(ConsumerGroupCommand.scala:558)
> at scala.Option.map(Option.scala:146)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.collectGroupOffsets(ConsumerGroupCommand.scala:558)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describeGroup(ConsumerGroupCommand.scala:271)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:544)
> at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:77)
> at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7164) Follower should truncate after every leader epoch change

2018-07-16 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-7164:
--

Assignee: (was: Jason Gustafson)

> Follower should truncate after every leader epoch change
> 
>
> Key: KAFKA-7164
> URL: https://issues.apache.org/jira/browse/KAFKA-7164
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
>
> Currently we skip log truncation for followers if a LeaderAndIsr request is 
> received, but the leader does not change. This can lead to log divergence if 
> the follower missed a leader change before the current known leader was 
> reelected. Basically the problem is that the leader may truncate its own log 
> prior to becoming leader again, so the follower would need to reconcile its 
> log again.
> For example, suppose that we have three replicas: r1, r2, and r3. Initially, 
> r1 is the leader in epoch 0 and writes one record at offset 0. r3 replicates 
> this successfully.
> {code}
> r1: 
>   status: leader
>   epoch: 0
>   log: [{id: 0, offset: 0, epoch:0}]
> r2: 
>   status: follower
>   epoch: 0
>   log: []
> r3: 
>   status: follower
>   epoch: 0
>   log: [{id: 0, offset: 0, epoch:0}]
> {code}
> Suppose then that r2 becomes leader in epoch 1. r1 notices the leader change 
> and truncates, but r3 for whatever reason, does not.
> {code}
> r1: 
>   status: follower
>   epoch: 1
>   log: []
> r2: 
>   status: leader
>   epoch: 1
>   log: []
> r3: 
>   status: follower
>   epoch: 0
>   log: [{offset: 0, epoch:0}]
> {code}
> Now suppose that r2 fails and r1 becomes the leader in epoch 2. Immediately 
> it writes a new record:
> {code}
> r1: 
>   status: leader
>   epoch: 2
>   log: [{id: 1, offset: 0, epoch:2}]
> r2: 
>   status: follower
>   epoch: 2
>   log: []
> r3: 
>   status: follower
>   epoch: 0
>   log: [{id: 0, offset: 0, epoch:0}]
> {code}
> If the replica continues fetching with the old epoch, we can have log 
> divergence as noted in KAFKA-6880. However, if r3 successfully receives the 
> new LeaderAndIsr request which updates the epoch to 2, but skips the 
> truncation, then the logs will stay inconsistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7170) NPE in ConsumerGroupCommand when describe consumer group

2018-07-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545748#comment-16545748
 ] 

Ted Yu commented on KAFKA-7170:
---

Can you provide a bit more information: release of Kafka, etc.

If you can describe sequence of events leading to the NPE, that would be easier 
to investigate.

> NPE in ConsumerGroupCommand when describe consumer group
> 
>
> Key: KAFKA-7170
> URL: https://issues.apache.org/jira/browse/KAFKA-7170
> Project: Kafka
>  Issue Type: Bug
>Reporter: Yuanjin Xu
>Priority: Major
>
> Got the following error when run kafka-consumer-groups to describe a consumer 
> group.
> {code:java}
> Error: Executing consumer group command failed due to null
> [2018-07-16 20:05:21,353] DEBUG Exception in consumer group command 
> (kafka.admin.ConsumerGroupCommand$)
> java.lang.NullPointerException
> at scala.Predef$.Long2long(Predef.scala:363)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$getLogEndOffsets$2.apply(ConsumerGroupCommand.scala:612)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$getLogEndOffsets$2.apply(ConsumerGroupCommand.scala:610)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.getLogEndOffsets(ConsumerGroupCommand.scala:610)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describePartitions(ConsumerGroupCommand.scala:328)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.collectConsumerAssignment(ConsumerGroupCommand.scala:308)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.collectConsumerAssignment(ConsumerGroupCommand.scala:544)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10$$anonfun$13.apply(ConsumerGroupCommand.scala:571)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10$$anonfun$13.apply(ConsumerGroupCommand.scala:565)
> at scala.collection.immutable.List.flatMap(List.scala:338)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10.apply(ConsumerGroupCommand.scala:565)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10.apply(ConsumerGroupCommand.scala:558)
> at scala.Option.map(Option.scala:146)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.collectGroupOffsets(ConsumerGroupCommand.scala:558)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describeGroup(ConsumerGroupCommand.scala:271)
> at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:544)
> at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:77)
> at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7170) NPE in ConsumerGroupCommand when describe consumer group

2018-07-16 Thread Yuanjin Xu (JIRA)
Yuanjin Xu created KAFKA-7170:
-

 Summary: NPE in ConsumerGroupCommand when describe consumer group
 Key: KAFKA-7170
 URL: https://issues.apache.org/jira/browse/KAFKA-7170
 Project: Kafka
  Issue Type: Bug
Reporter: Yuanjin Xu


Got the following error when run kafka-consumer-groups to describe a consumer 
group.
{code:java}
Error: Executing consumer group command failed due to null
[2018-07-16 20:05:21,353] DEBUG Exception in consumer group command 
(kafka.admin.ConsumerGroupCommand$)
java.lang.NullPointerException
at scala.Predef$.Long2long(Predef.scala:363)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$getLogEndOffsets$2.apply(ConsumerGroupCommand.scala:612)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$getLogEndOffsets$2.apply(ConsumerGroupCommand.scala:610)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.getLogEndOffsets(ConsumerGroupCommand.scala:610)
at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describePartitions(ConsumerGroupCommand.scala:328)
at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.collectConsumerAssignment(ConsumerGroupCommand.scala:308)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.collectConsumerAssignment(ConsumerGroupCommand.scala:544)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10$$anonfun$13.apply(ConsumerGroupCommand.scala:571)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10$$anonfun$13.apply(ConsumerGroupCommand.scala:565)
at scala.collection.immutable.List.flatMap(List.scala:338)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10.apply(ConsumerGroupCommand.scala:565)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService$$anonfun$10.apply(ConsumerGroupCommand.scala:558)
at scala.Option.map(Option.scala:146)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.collectGroupOffsets(ConsumerGroupCommand.scala:558)
at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describeGroup(ConsumerGroupCommand.scala:271)
at 
kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:544)
at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:77)
at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5037) Infinite loop if all input topics are unknown at startup

2018-07-16 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545735#comment-16545735
 ] 

Guozhang Wang commented on KAFKA-5037:
--

Since {{StreamThread}} is an internal class, adding any public functions on 
that class should not require a KIP.

> Infinite loop if all input topics are unknown at startup
> 
>
> Key: KAFKA-5037
> URL: https://issues.apache.org/jira/browse/KAFKA-5037
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: newbie++, user-experience
> Attachments: 5037.v2.txt, 5037.v4.txt
>
>
> See discusion: https://github.com/apache/kafka/pull/2815
> We will need some rewrite on {{StreamPartitionsAssignor}} and to add much 
> more test for all kind of corner cases, including pattern subscriptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7169) Add support for Custom SASL extensions in OAuth authentication

2018-07-16 Thread Stanislav Kozlovski (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski updated KAFKA-7169:
---
Description: 
KIP: 
[here|https://cwiki.apache.org/confluence/display/KAFKA/KIP-342%3A+Add+support+for+Custom+SASL+extensions+in+OAuth+authentication]

(WIP...)

  was:KIP: here


> Add support for Custom SASL extensions in OAuth authentication
> --
>
> Key: KAFKA-7169
> URL: https://issues.apache.org/jira/browse/KAFKA-7169
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Minor
>
> KIP: 
> [here|https://cwiki.apache.org/confluence/display/KAFKA/KIP-342%3A+Add+support+for+Custom+SASL+extensions+in+OAuth+authentication]
> (WIP...)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7169) Add support for Custom SASL extensions in OAuth authentication

2018-07-16 Thread Stanislav Kozlovski (JIRA)
Stanislav Kozlovski created KAFKA-7169:
--

 Summary: Add support for Custom SASL extensions in OAuth 
authentication
 Key: KAFKA-7169
 URL: https://issues.apache.org/jira/browse/KAFKA-7169
 Project: Kafka
  Issue Type: Improvement
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski


KIP: here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5037) Infinite loop if all input topics are unknown at startup

2018-07-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545477#comment-16545477
 ] 

Ted Yu commented on KAFKA-5037:
---

In KafkaStreams, I am adding the following method in order to detect state 
transition of StreamThread:
{code}
/**
 * An app can set a single {@link StreamThread.StateListener} so that the 
app is notified when state changes.
 *
 * @param listener a new StreamThread state listener
 * @throws IllegalStateException if this {@code KafkaStreams} instance is 
not in state {@link State#CREATED CREATED}.
 */
public void setStreamThreadStateListener(StreamThread.StateListener 
listener) {
{code}
Let me know whether a KIP is needed.

> Infinite loop if all input topics are unknown at startup
> 
>
> Key: KAFKA-5037
> URL: https://issues.apache.org/jira/browse/KAFKA-5037
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: newbie++, user-experience
> Attachments: 5037.v2.txt, 5037.v4.txt
>
>
> See discusion: https://github.com/apache/kafka/pull/2815
> We will need some rewrite on {{StreamPartitionsAssignor}} and to add much 
> more test for all kind of corner cases, including pattern subscriptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7168) Broker shutdown during SSL handshake may be handled as handshake failure

2018-07-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545438#comment-16545438
 ] 

ASF GitHub Bot commented on KAFKA-7168:
---

rajinisivaram opened a new pull request #5371: KAFKA-7168: Treat connection 
close during SSL handshake as retriable
URL: https://github.com/apache/kafka/pull/5371
 
 
   SSL `close_notify` from broker connection close is processed as an 
`SSLException` while unwrapping the final message when the I/O exception due to 
remote close is processed. This should be handled as a retriable `IOException` 
rather than a non-retriable `SslAuthenticationException`.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Broker shutdown during SSL handshake may be handled as handshake failure
> 
>
> Key: KAFKA-7168
> URL: https://issues.apache.org/jira/browse/KAFKA-7168
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.2, 1.1.1, 2.0.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
>
> If broker is shutdown while SSL handshake of a client connection is in 
> progress, the client may process the resulting SSLException as a 
> non-retriable handshake failure rather than a retriable I/O exception. This 
> can cause streams applications to fail during rolling restarts.
> Exception stack trace:
> {quote}
> org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake 
> failed
> Caused by: javax.net.ssl.SSLException: Received close_notify during handshake
>     at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
>     at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1639)
>     at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1607)
>     at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1752)
>     at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1068)
>     at 
> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:890)
>     at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:764)
>     at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
>     at 
> org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:465)
>     at 
> org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:266)
>     at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:88)
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:474)
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:412)
>     at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:258)
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:230)
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:206)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:219)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:284)
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1146)
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:848)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:805)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:771)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:741)
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7168) Broker shutdown during SSL handshake may be handled as handshake failure

2018-07-16 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created KAFKA-7168:
-

 Summary: Broker shutdown during SSL handshake may be handled as 
handshake failure
 Key: KAFKA-7168
 URL: https://issues.apache.org/jira/browse/KAFKA-7168
 Project: Kafka
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.2, 1.1.1, 2.0.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram


If broker is shutdown while SSL handshake of a client connection is in 
progress, the client may process the resulting SSLException as a non-retriable 
handshake failure rather than a retriable I/O exception. This can cause streams 
applications to fail during rolling restarts.

Exception stack trace:

{quote}
org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Caused by: javax.net.ssl.SSLException: Received close_notify during handshake
    at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
    at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1639)
    at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1607)
    at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1752)
    at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1068)
    at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:890)
    at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:764)
    at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
    at 
org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:465)
    at 
org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:266)
    at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:88)
    at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:474)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:412)
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:258)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:230)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:206)
    at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:219)
    at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:284)
    at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1146)
    at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:)
    at 
org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:848)
    at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:805)
    at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:771)
    at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:741)
{quote}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6882) Wrong producer settings may lead to DoS on Kafka Server

2018-07-16 Thread Seweryn Habdank-Wojewodzki (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seweryn Habdank-Wojewodzki updated KAFKA-6882:
--
Description: 
The documentation of the following parameters “linger.ms” and “batch.size” is a 
bit confusing. In fact those parameters wrongly set on the producer side might 
completely destroy BROKER throughput.

I see, that smart developers are reading documentation of those parameters.
Then they want to have super performance and super safety, so they set 
something like this below:

{code}
kafkaProps.put(ProducerConfig.LINGER_MS_CONFIG, 1);
kafkaProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 0);
{code}

Then we have situation, when each and every message is send separately. TCP/IP 
protocol is really busy in that case and when they needed high throughput they 
got much less throughput, as every message is goes separately causing all 
network communication and TCP/IP overhead significant.

Those settings are good only if someone sends critical messages like once a 
while (e.g. one message per minute) and not when throughput is important by 
sending thousands messages per second.

Situation is even worse when smart developers are reading, that for safety, 
they need acknowledges from all cluster nodes. So they are adding:

{code}
kafkaProps.put(ProducerConfig.ACKS_CONFIG, "all");
{code}

And this is the end of Kafka performance! 

Even worse it is not a problem for the Kafka producer. The problem remains at 
the server (cluster, broker) side. The server is so busy by acknowledging *each 
and every* message from all nodes, that other work is NOT performed, so the end 
to end performance is almost none.

I would like to ask you to improve documentation of this parameters.
And consider corner cases is case of providing detailed information how extreme 
values of parameters - namely lowest and highest – may influence work of the 
cluster.
This was documentation issue. 

On the other hand it is security/safety matter.

Technically the problem is that __commit_offsets topic is loaded with enormous 
amount of messages. It leads to the situation, when Kafka Broker is exposed to 
*DoS *due to the Producer settings. Three lines of code a bit load and the 
Kafka cluster is dead.
I suppose there are ways to prevent such a situation on the cluster side, but 
it require some logic to be implemented to detect such a simple but efficient 
DoS.

BTW. Do Kafka Admin Tools provide any kind of "kill" connection, when one or 
the other producer makes problems?


  was:
The documentation of the following parameters “linger.ms” and “batch.size” is a 
bit confusing. In fact those parameters wrongly set on the producer side might 
completely destroy BROKER throughput.

I see, that smart developers are reading documentation of those parameters.
Then they want to have super performance and super safety, so they set 
something like this below:

{code}
kafkaProps.put(ProducerConfig.LINGER_MS_CONFIG, 1);
kafkaProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 0);
{code}

Then we have situation, when each and every message is send separately. TCP/IP 
protocol is really busy in that case and when they needed high throughput they 
got much less throughput, as every message is goes separately causing all 
network communication and TCP/IP overhead significant.

Those settings are good only if someone sends critical messages like once a 
while (e.g. one message per minute) and not when throughput is important by 
sending thousands messages per second.

Situation is even worse when smart developers are reading, that for safety, 
they need acknowledges from all cluster nodes. So they are adding:

{code}
kafkaProps.put(ProducerConfig.ACKS_CONFIG, "all");
{code}

And this is the end of Kafka performance! 

Even worse it is not a problem for the Kafka producer. The problem remains at 
the server (cluster, broker) side. The server is so busy by acknowledging *each 
and every* message from all nodes, that other work is NOT performed, so the end 
to end performance is almost none.

I would like to ask you to improve documentation of this parameters.
And consider corner cases is case of providing detailed information how extreme 
values of parameters - namely lowest and highest – may influence work of the 
cluster.
This was documentation issue. 

On the other hand it is security/safety matter.

Technically the problem is that __commit_offsets topic is loaded with enormous 
amount of messages. It leads to the situation, when Kafka Broker is exposed to 
*DoS *due to the Producer settings. Three lines of code a bit load and the 
Kafka cluster is dead.
I suppose there are ways to prevent such a situation on the cluster side, but 
it require some loginc to be implemented to detect such a simple but efficient 
DoS.

BTW. Do Kafka Admin Tools provide any kind of "kill" connection, when one or 
the other producer makes problems?



> Wrong pro

[jira] [Updated] (KAFKA-6882) Wrong producer settings may lead to DoS on Kafka Server

2018-07-16 Thread Seweryn Habdank-Wojewodzki (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seweryn Habdank-Wojewodzki updated KAFKA-6882:
--
Description: 
The documentation of the following parameters “linger.ms” and “batch.size” is a 
bit confusing. In fact those parameters wrongly set on the producer side might 
completely destroy BROKER throughput.

I see, that smart developers are reading documentation of those parameters.
Then they want to have super performance and super safety, so they set 
something like this below:

{code}
kafkaProps.put(ProducerConfig.LINGER_MS_CONFIG, 1);
kafkaProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 0);
{code}

Then we have situation, when each and every message is send separately. TCP/IP 
protocol is really busy in that case and when they needed high throughput they 
got much less throughput, as every message is goes separately causing all 
network communication and TCP/IP overhead significant.

Those settings are good only if someone sends critical messages like once a 
while (e.g. one message per minute) and not when throughput is important by 
sending thousands messages per second.

Situation is even worse when smart developers are reading, that for safety, 
they need acknowledges from all cluster nodes. So they are adding:

{code}
kafkaProps.put(ProducerConfig.ACKS_CONFIG, "all");
{code}

And this is the end of Kafka performance! 

Even worse it is not a problem for the Kafka producer. The problem remains at 
the server (cluster, broker) side. The server is so busy by acknowledging *each 
and every* message from all nodes, that other work is NOT performed, so the end 
to end performance is almost none.

I would like to ask you to improve documentation of this parameters.
And consider corner cases is case of providing detailed information how extreme 
values of parameters - namely lowest and highest – may influence work of the 
cluster.
This was documentation issue. 

On the other hand it is security/safety matter.

Technically the problem is that __commit_offsets topic is loaded with enormous 
amount of messages. It leads to the situation, when Kafka Broker is exposed to 
*DoS *due to the Producer settings. Three lines of code a bit load and the 
Kafka cluster is dead.
I suppose there are ways to prevent such a situation on the cluster side, but 
it require some loginc to be implemented to detect such a simple but efficient 
DoS.

BTW. Do Kafka Admin Tools provide any kind of "kill" connection, when one or 
the other producer makes problems?


  was:
The documentation of the following parameters “linger.ms” and “batch.size” is a 
bit confusing. In fact those parameters wrongly set on the producer side might 
completely destroy BROKER throughput.

I see, that smart developers are reading documentation of those parameters.
Then they want to have super performance and super safety, so they set 
something like this below:

{code}
kafkaProps.put(ProducerConfig.LINGER_MS_CONFIG, 1);
kafkaProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 0);
{code}

Then we have situation, when each and every message is send separately. TCP/IP 
protocol is really busy in that case and when they needed high throughput they 
got much less throughput, as every message is goes separately causing all 
network communication and TCP/IP overhead significant.

Those settings are good only if someone sends critical messages like once a 
while (e.g. one message per minute) and not when throughput is important by 
sending thousands messages per second.

Situation is even worse when smart developers are reading, that for safety, 
they need acknowledges from all cluster nodes. So they are adding:

{code}
kafkaProps.put(ProducerConfig.ACKS_CONFIG, "all");
{code}

And this is the end of Kafka performance! 

Even worse it is not a problem for the Kafka producer. The problem remains at 
the server (cluster, broker) side. The server is so busy by acknowledging *each 
and every* message from all nodes, that other work is NOT performed, so the end 
to end performance is almost none.

I would like to ask you to improve documentation of this parameters.
And consider corner cases is case of providing detailed information how extreme 
values of parameters - namely lowest and highest – may influence work of the 
cluster.
This was documentation issue. 

On the other hand it is security/safetly matter.

Technically the problem is that __commit_offsets topic is loaded with enormous 
amount of messages. It leads to the situation, when Kafka Broker is exposed to 
*DoS *due to the Producer settings. Three lines of code a bit load and the 
Kafka cluster is dead.
I suppose there are ways to prevent such a situation on the cluster side, but 
it require some loginc to be implemented to detect such a simple but efficient 
DoS.

BTW. Do Kafka Admin Tools provide any kind of "kill" connection, when one or 
the other producer makes problems?



> Wrong p

[jira] [Updated] (KAFKA-6882) Wrong producer settings may lead to DoS on Kafka Server

2018-07-16 Thread Seweryn Habdank-Wojewodzki (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seweryn Habdank-Wojewodzki updated KAFKA-6882:
--
Description: 
The documentation of the following parameters “linger.ms” and “batch.size” is a 
bit confusing. In fact those parameters wrongly set on the producer side might 
completely destroy BROKER throughput.

I see, that smart developers are reading documentation of those parameters.
Then they want to have super performance and super safety, so they set 
something like this below:

{code}
kafkaProps.put(ProducerConfig.LINGER_MS_CONFIG, 1);
kafkaProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 0);
{code}

Then we have situation, when each and every message is send separately. TCP/IP 
protocol is really busy in that case and when they needed high throughput they 
got much less throughput, as every message is goes separately causing all 
network communication and TCP/IP overhead significant.

Those settings are good only if someone sends critical messages like once a 
while (e.g. one message per minute) and not when throughput is important by 
sending thousands messages per second.

Situation is even worse when smart developers are reading, that for safety, 
they need acknowledges from all cluster nodes. So they are adding:

{code}
kafkaProps.put(ProducerConfig.ACKS_CONFIG, "all");
{code}

And this is the end of Kafka performance! 

Even worse it is not a problem for the Kafka producer. The problem remains at 
the server (cluster, broker) side. The server is so busy by acknowledging *each 
and every* message from all nodes, that other work is NOT performed, so the end 
to end performance is almost none.

I would like to ask you to improve documentation of this parameters.
And consider corner cases is case of providing detailed information how extreme 
values of parameters - namely lowest and highest – may influence work of the 
cluster.
This was documentation issue. 

On the other hand it is security/safetly matter.

Technically the problem is that __commit_offsets topic is loaded with enormous 
amount of messages. It leads to the situation, when Kafka Broker is exposed to 
*DoS *due to the Producer settings. Three lines of code a bit load and the 
Kafka cluster is dead.
I suppose there are ways to prevent such a situation on the cluster side, but 
it require some loginc to be implemented to detect such a simple but efficient 
DoS.

BTW. Do Kafka Admin Tools provide any kind of "kill" connection, when one or 
the other producer makes problems?


  was:
The documentation of the following parameters “linger.ms” and “batch.size” is a 
bit confusing. In fact those parameters wrongly set on the producer side might 
completely destroy BROKER throughput.

I see, that smart developers are reading documentation of those parameters.
Then they want to have super performance and super safety, so they set 
something like this below:

{code}
kafkaProps.put(ProducerConfig.LINGER_MS_CONFIG, 1);
kafkaProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 0);
{code}

Then we have situation, when each and every message is send separately. TCP/IP 
protocol is really busy in that case and when they needed high throughput they 
got much less throughput, as every message is goes separately causing all 
network communication and TCP/IP overhead significant.

Those settings are good only if someone sends critical messages like once a 
while (e.g. one message per minute) and not when throughput is important by 
sending thousands messages per second.

Situation is even worse when smart developers are reading that for safety they 
need acknowledges from all cluster nodes. So they are adding:

{code}
kafkaProps.put(ProducerConfig.ACKS_CONFIG, "all");
{code}

And this is the end of Kafka performance! 

Even worse it is not a problem for the Kafka producer. The problem remains at 
the server (cluster, broker) side. The server is so busy by acknowledging *each 
and every* message from all nodes, that other work is NOT performed, so the end 
to end performance is almost none.

I would like to ask you to improve documentation of this parameters.
And consider corner cases is case of providing detailed information how extreme 
values of parameters - namely lowest and highest – may influence work of the 
cluster.
This was documentation issue. 

On the other hand it is security/safetly matter.

Technically the problem is that __commit_offsets topic is loaded with enormous 
amount of messages. It leads to the situation, when Kafka Broker is exposed to 
*DoS *due to the Producer settings. Three lines of code a bit load and the 
Kafka cluster is dead.
I suppose there are ways to prevent such a situation on the cluster side, but 
it require some loginc to be implemented to detect such a simple but efficient 
DoS.

BTW. Do Kafka Admin Tools provide any kind of "kill" connection, when one or 
the other producer makes problems?



> Wrong pr

[jira] [Updated] (KAFKA-6882) Wrong producer settings may lead to DoS on Kafka Server

2018-07-16 Thread Seweryn Habdank-Wojewodzki (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seweryn Habdank-Wojewodzki updated KAFKA-6882:
--
Description: 
The documentation of the following parameters “linger.ms” and “batch.size” is a 
bit confusing. In fact those parameters wrongly set on the producer side might 
completely destroy BROKER throughput.

I see, that smart developers are reading documentation of those parameters.
Then they want to have super performance and super safety, so they set 
something like this below:

{code}
kafkaProps.put(ProducerConfig.LINGER_MS_CONFIG, 1);
kafkaProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 0);
{code}

Then we have situation, when each and every message is send separately. TCP/IP 
protocol is really busy in that case and when they needed high throughput they 
got much less throughput, as every message is goes separately causing all 
network communication and TCP/IP overhead significant.

Those settings are good only if someone sends critical messages like once a 
while (e.g. one message per minute) and not when throughput is important by 
sending thousands messages per second.

Situation is even worse when smart developers are reading that for safety they 
need acknowledges from all cluster nodes. So they are adding:

{code}
kafkaProps.put(ProducerConfig.ACKS_CONFIG, "all");
{code}

And this is the end of Kafka performance! 

Even worse it is not a problem for the Kafka producer. The problem remains at 
the server (cluster, broker) side. The server is so busy by acknowledging *each 
and every* message from all nodes, that other work is NOT performed, so the end 
to end performance is almost none.

I would like to ask you to improve documentation of this parameters.
And consider corner cases is case of providing detailed information how extreme 
values of parameters - namely lowest and highest – may influence work of the 
cluster.
This was documentation issue. 

On the other hand it is security/safetly matter.

Technically the problem is that __commit_offsets topic is loaded with enormous 
amount of messages. It leads to the situation, when Kafka Broker is exposed to 
*DoS *due to the Producer settings. Three lines of code a bit load and the 
Kafka cluster is dead.
I suppose there are ways to prevent such a situation on the cluster side, but 
it require some loginc to be implemented to detect such a simple but efficient 
DoS.

BTW. Do Kafka Admin Tools provide any kind of "kill" connection, when one or 
the other producer makes problems?


  was:
The documentation of the following parameters “linger.ms” and “batch.size” is a 
bit confusing. In fact those parameters wrongly set on the producer side might 
completely destroy BROKER throughput.

I see, that smart developers they are reading documentation of those parameters.
Then they want to have super performance and super safety, so they set 
something like this below:

{code}
kafkaProps.put(ProducerConfig.LINGER_MS_CONFIG, 1);
kafkaProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 0);
{code}

Then we have situation, when each and every message is send separately. TCP/IP 
protocol is really busy in that case and when they needed high throughput they 
got much less throughput, as every message is goes separately causing all 
network communication and TCP/IP overhead significant.

Those settings are good only if someone sends critical messages like once a 
while (e.g. one message per minute) and not when throughput is important by 
sending thousands messages per second.

Situation is even worse when smart developers are reading that for safety they 
need acknowledges from all cluster nodes. So they are adding:

{code}
kafkaProps.put(ProducerConfig.ACKS_CONFIG, "all");
{code}

And this is the end of Kafka performance! 

Even worse it is not a problem for the Kafka producer. The problem remains at 
the server (cluster, broker) side. The server is so busy by acknowledging *each 
and every* message from all nodes, that other work is NOT performed, so the end 
to end performance is almost none.

I would like to ask you to improve documentation of this parameters.
And consider corner cases is case of providing detailed information how extreme 
values of parameters - namely lowest and highest – may influence work of the 
cluster.
This was documentation issue. 

On the other hand it is security/safetly matter.

Technically the problem is that __commit_offsets topic is loaded with enormous 
amount of messages. It leads to the situation, when Kafka Broker is exposed to 
*DoS *due to the Producer settings. Three lines of code a bit load and the 
Kafka cluster is dead.
I suppose there are ways to prevent such a situation on the cluster side, but 
it require some loginc to be implemented to detect such a simple but efficient 
DoS.

BTW. Do Kafka Admin Tools provide any kind of "kill" connection, when one or 
the other producer makes problems?



> Wrong

[jira] [Created] (KAFKA-7167) old segments not deleted despite short retention.ms

2018-07-16 Thread Christian Tramnitz (JIRA)
Christian Tramnitz created KAFKA-7167:
-

 Summary: old segments not deleted despite short retention.ms
 Key: KAFKA-7167
 URL: https://issues.apache.org/jira/browse/KAFKA-7167
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 1.1.0
Reporter: Christian Tramnitz


We have a three node Kafka cluster running 1.1.0 and (global) log.retentions.ms 
configured to 720 (2h). (global) log.roll.ms=30 is set to ensure we 
have new segments every 5 minutes, to clean up old ones granularly.

 The topic policy has cleanup.policy = delete.

 

I can see some old segments being deleted, i.e.

Jul 16 08:59:17 server460 kafka[26553]: [2018-07-16 08:59:17,321] INFO [Log 
partition=test-topic-6, dir=/kafka] Deleting segment 18702312 (kafka.log.Log)
Jul 16 08:59:17 server460 kafka[26553]: [2018-07-16 08:59:17,329] INFO Deleted 
log /kafka/test-topic-6/18702312.log.deleted. (kafka.log.LogSegment)
Jul 16 08:59:17 server460 kafka[26553]: [2018-07-16 08:59:17,329] INFO Deleted 
offset index /kafka/test-topic-6/18702312.index.deleted. 
(kafka.log.LogSegment)
Jul 16 08:59:17 server460 kafka[26553]: [2018-07-16 08:59:17,329] INFO Deleted 
time index /kafka/test-topic-6/18702312.timeindex.deleted. 
(kafka.log.LogSegment) 

But apparently not all of them. There are still old(er!) segments in the 
topic's dir, ultimately growing to fill-up the entire disk.

 

Are there any other configuration values that may affect old segment deletion 
or is this a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6923) Consolidate ExtendedSerializer/Serializer and ExtendedDeserializer/Deserializer

2018-07-16 Thread Viktor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545156#comment-16545156
 ] 

Viktor Somogyi commented on KAFKA-6923:
---

[~chia7712] thanks for the idea. Would you please also add it to the KIP 
discussion thread to see what the community is thinking about this? (this is 
the KIP: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87298242 ).

Personally I'd prefer not to deprecate the Serializer class. I think all of the 
serializer/deserializer implementations in the project (Short, Int, String 
Byte, etc.) uses the 2 parameter method and I think it would be a good practice 
to leave the telescopic methods given that Java really doesn't have any other 
ways to represent optional parameters which the "headers" would be.

> Consolidate ExtendedSerializer/Serializer and 
> ExtendedDeserializer/Deserializer
> ---
>
> Key: KAFKA-6923
> URL: https://issues.apache.org/jira/browse/KAFKA-6923
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Ismael Juma
>Assignee: Viktor Somogyi
>Priority: Major
>  Labels: needs-kip
> Fix For: 2.1.0
>
>
> The Javadoc of ExtendedDeserializer states:
> {code}
>  * Prefer {@link Deserializer} if access to the headers is not required. Once 
> Kafka drops support for Java 7, the
>  * {@code deserialize()} method introduced by this interface will be added to 
> Deserializer with a default implementation
>  * so that backwards compatibility is maintained. This interface may be 
> deprecated once that happens.
> {code}
> Since we have dropped Java 7 support, we should figure out how to do this. 
> There are compatibility implications, so a KIP is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7166) Links are really difficult to see - insufficient contrast

2018-07-16 Thread Sebb (JIRA)
Sebb created KAFKA-7166:
---

 Summary: Links are really difficult to see - insufficient contrast
 Key: KAFKA-7166
 URL: https://issues.apache.org/jira/browse/KAFKA-7166
 Project: Kafka
  Issue Type: Bug
Reporter: Sebb


{color:#0b6d88}The link colour on the pages is very similar to the text 
colour.{color}

This works OK where the links form part of a menu system.
 In that case it's obvious that the the text items must be links.

However it does not provide sufficient contrast where the links are part of 
body text.
 This is particularly true when a whole section of text is a link, so there is 
no contrast within the text.

Users should not have to search to find the links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3530) Making the broker-list option consistent across all tools

2018-07-16 Thread williamguan (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

williamguan updated KAFKA-3530:
---
Affects Version/s: 0.10.0.0
   0.10.1.1
   0.10.2.1

> Making the broker-list option consistent across all tools
> -
>
> Key: KAFKA-3530
> URL: https://issues.apache.org/jira/browse/KAFKA-3530
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.1.1, 0.10.2.1
>Reporter: Jun Rao
>Assignee: williamguan
>Priority: Major
>
> Currently, console-producer uses --broker-list and console-consumer uses 
> --bootstrap-server. This can be confusing to the users. We should standardize 
> the name on all tools using broker list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-3530) Making the broker-list option consistent across all tools

2018-07-16 Thread williamguan (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

williamguan reassigned KAFKA-3530:
--

Assignee: williamguan  (was: Liquan Pei)

> Making the broker-list option consistent across all tools
> -
>
> Key: KAFKA-3530
> URL: https://issues.apache.org/jira/browse/KAFKA-3530
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Jun Rao
>Assignee: williamguan
>Priority: Major
>
> Currently, console-producer uses --broker-list and console-consumer uses 
> --bootstrap-server. This can be confusing to the users. We should standardize 
> the name on all tools using broker list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7160) Add check for group ID length

2018-07-16 Thread Aleksandar Bircakovic (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544966#comment-16544966
 ] 

Aleksandar Bircakovic commented on KAFKA-7160:
--

Hello, what should be the acceptable group ID length?

> Add check for group ID length
> -
>
> Key: KAFKA-7160
> URL: https://issues.apache.org/jira/browse/KAFKA-7160
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Shaobo Liu
>Priority: Minor
>  Labels: newbie
>
> We should limit the length of the group ID, because other system(such as 
> monitor system) would use the group ID when we using  kafka in production 
> environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)