[jira] [Commented] (KAFKA-7888) kafka cluster not recovering - Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition) continously

2019-02-01 Thread Kemal ERDEN (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758105#comment-16758105
 ] 

Kemal ERDEN commented on KAFKA-7888:


same thing happened yesterday night with increased heaps. Not seeing any long 
gc pauses anyway (~0.02 seconds) in the logs.

Once we see this error message on the producer:

{{[kafka-producer-network-thread | producer-1] WARN  
o.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] 3 
partitions have leader brokers without a matching listener, including ...}}

we start seeing messages to those partitions expiring .

{{org.apache.kafka.common.errors.TimeoutException: Expiring 44 record(s) for 
ucTrade-2: 600038 ms has passed since batch creation plus linger time}}

> kafka cluster not recovering - Shrinking ISR from 14,13 to 13 
> (kafka.cluster.Partition) continously
> ---
>
> Key: KAFKA-7888
> URL: https://issues.apache.org/jira/browse/KAFKA-7888
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, replication, zkclient
>Affects Versions: 2.1.0
> Environment: using kafka_2.12-2.1.0
> 3 ZKs 3 Broker cluster, using 3 boxes (1 ZK and 1 broker on each box), 
> default.replication factor: 2, 
> offset replication factor was 1 when the error happened, increased to 2 after 
> seeing this error by reassigning-partitions.
> compression: default (producer) on broker but sending gzip from producers.
> linux (redhat) etx4 kafka logs on single local disk
>Reporter: Kemal ERDEN
>Priority: Major
> Attachments: combined.log, producer.log
>
>
> we're seeing the following repeating logs on our kafka cluster from time to 
> time which seems to cause messages expiring on Producers and the cluster 
> going into a non-recoverable state. The only fix seems to be to restart 
> brokers.
>  {{Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition)}}
>  {{Cached zkVersion [21] not equal to that in zookeeper, skip updating ISR 
> (kafka.cluster.Partition)}}
>  and later on the following log is repeated:
> {{Got user-level KeeperException when processing sessionid:0xe046aa4f8e6 
> type:setData cxid:0x2df zxid:0xa01fd txntype:-1 reqpath:n/a Error 
> Path:/brokers/topics/ucTrade/partitions/6/state Error:KeeperErrorCode = 
> BadVersion for /brokers/topics/ucTrade/partitions/6/state}}
> We haven't interfered with any of the brokers/zookeepers whilst this happened.
> I've attached a combined log which represents a combination of controller, 
> server and state change logs from each broker (ids 13,14 and 15, log files 
> have the suffix b13, b14, b15 respectively)
> We have increased the heaps from 1g to 6g for the brokers and from 512m to 4g 
> for the zookeepers since this happened but not sure if it is relevant. the ZK 
> logs are unfortunately overwritten so can't provide those.
> We produce varying message sizes but some messages are relatively large (6mb) 
> but we use compression on the producers (set to gzip).
> I've attached some logs from one of our producers as well.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7888) kafka cluster not recovering - Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition) continously

2019-02-01 Thread Kemal ERDEN (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kemal ERDEN updated KAFKA-7888:
---
Description: 
we're seeing the following repeating logs on our kafka cluster from time to 
time which seems to cause messages expiring on Producers and the cluster going 
into a non-recoverable state. The only fix seems to be to restart brokers.

{{Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition)}}
 {{Cached zkVersion [21] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)}}

 and later on the following log is repeated:

{{Got user-level KeeperException when processing sessionid:0xe046aa4f8e6 
type:setData cxid:0x2df zxid:0xa01fd txntype:-1 reqpath:n/a Error 
Path:/brokers/topics/ucTrade/partitions/6/state Error:KeeperErrorCode = 
BadVersion for /brokers/topics/ucTrade/partitions/6/state}}

We haven't interfered with any of the brokers/zookeepers whilst this happened.

I've attached a combined log which represents a combination of controller, 
server and state change logs from each broker (ids 13,14 and 15, log files have 
the suffix b13, b14, b15 respectively)

We have increased the heaps from 1g to 6g for the brokers and from 512m to 4g 
for the zookeepers since this happened but not sure if it is relevant. the ZK 
logs are unfortunately overwritten so can't provide those.

We produce varying message sizes but some messages are relatively large (6mb) 
but we use compression on the producers (set to gzip).

I've attached some logs from one of our producers as well.

producer.properties that we've changed:

spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.compression-type=gzip
spring.kafka.producer.retries=5
spring.kafka.producer.acks=-1
spring.kafka.producer.batch-size=1048576

spring.kafka.producer.properties.linger.ms=200
spring.kafka.producer.properties.request.timeout.ms=60
spring.kafka.producer.properties.max.block.ms=24
spring.kafka.producer.properties.max.request.size=104857600

 

 

  was:
we're seeing the following repeating logs on our kafka cluster from time to 
time which seems to cause messages expiring on Producers and the cluster going 
into a non-recoverable state. The only fix seems to be to restart brokers.


 {{Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition)}}
 {{Cached zkVersion [21] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)}}

 and later on the following log is repeated:

{{Got user-level KeeperException when processing sessionid:0xe046aa4f8e6 
type:setData cxid:0x2df zxid:0xa01fd txntype:-1 reqpath:n/a Error 
Path:/brokers/topics/ucTrade/partitions/6/state Error:KeeperErrorCode = 
BadVersion for /brokers/topics/ucTrade/partitions/6/state}}

We haven't interfered with any of the brokers/zookeepers whilst this happened.

I've attached a combined log which represents a combination of controller, 
server and state change logs from each broker (ids 13,14 and 15, log files have 
the suffix b13, b14, b15 respectively)

We have increased the heaps from 1g to 6g for the brokers and from 512m to 4g 
for the zookeepers since this happened but not sure if it is relevant. the ZK 
logs are unfortunately overwritten so can't provide those.

We produce varying message sizes but some messages are relatively large (6mb) 
but we use compression on the producers (set to gzip).

I've attached some logs from one of our producers as well.

 


> kafka cluster not recovering - Shrinking ISR from 14,13 to 13 
> (kafka.cluster.Partition) continously
> ---
>
> Key: KAFKA-7888
> URL: https://issues.apache.org/jira/browse/KAFKA-7888
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, replication, zkclient
>Affects Versions: 2.1.0
> Environment: using kafka_2.12-2.1.0
> 3 ZKs 3 Broker cluster, using 3 boxes (1 ZK and 1 broker on each box), 
> default.replication factor: 2, 
> offset replication factor was 1 when the error happened, increased to 2 after 
> seeing this error by reassigning-partitions.
> compression: default (producer) on broker but sending gzip from producers.
> linux (redhat) etx4 kafka logs on single local disk
>Reporter: Kemal ERDEN
>Priority: Major
> Attachments: combined.log, producer.log
>
>
> we're seeing the following repeating logs on our kafka cluster from time to 
> time which seems to cause messages expiring on Producers and the cluster 
> going into a non-recoverable state. The only fix seems to be to restart 
> brokers.
> {{Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition)}}
>  {{Cached zkVersion [21] not equal to that in zookeeper, skip updating ISR 
> (kafka.cluster.Partition)}}
>  and later on the following lo

[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2019-02-01 Thread Kemal ERDEN (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758146#comment-16758146
 ] 

Kemal ERDEN commented on KAFKA-3042:


We've hit this on 2.1.0 - I actually raised a new Jira KAFKA-7888 which has 
merged logs of the brokers attached and some more detail. Restarting the broker 
which logs the zkVersion issue repeatedly seems to fix the issue but in the 
meantime we have lost expired messages.

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.10.0.0
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
>Assignee: Dong Lin
>Priority: Major
>  Labels: reliability
> Fix For: 2.2.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7719) Improve fairness in SocketServer processors

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758327#comment-16758327
 ] 

ASF GitHub Bot commented on KAFKA-7719:
---

ijuma commented on pull request #6022: KAFKA-7719: Improve fairness in 
SocketServer processors (KIP-402)
URL: https://github.com/apache/kafka/pull/6022
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve fairness in SocketServer processors
> ---
>
> Key: KAFKA-7719
> URL: https://issues.apache.org/jira/browse/KAFKA-7719
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Affects Versions: 2.1.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.2.0
>
>
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-402%3A+Improve+fairness+in+SocketServer+processors
>  for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7401) Broker fails to start when recovering a segment from before the log start offset

2019-02-01 Thread Anna Povzner (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758579#comment-16758579
 ] 

Anna Povzner commented on KAFKA-7401:
-

I verified that the issue was fixed in 2.0/2.1 as part of this PR:  
[https://github.com/apache/kafka/pull/5254.] I will submit the fix for 1.1, and 
will add a unit test to trunk that verifies this behavior.

> Broker fails to start when recovering a segment from before the log start 
> offset
> 
>
> Key: KAFKA-7401
> URL: https://issues.apache.org/jira/browse/KAFKA-7401
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Bob Barrett
>Assignee: Anna Povzner
>Priority: Major
>
> If a segment needs to be recovered (for example, because of a missing index 
> file or uncompleted swap operation) and its base offset is less than the log 
> start offset, the broker will crash with the following error:
> Fatal error during KafkaServer startup. Prepare to shutdown 
> (kafka.server.KafkaServer)
>  java.lang.IllegalArgumentException: inconsistent range
>  at java.util.concurrent.ConcurrentSkipListMap$SubMap.(Unknown Source)
>  at java.util.concurrent.ConcurrentSkipListMap.subMap(Unknown Source)
>  at java.util.concurrent.ConcurrentSkipListMap.subMap(Unknown Source)
>  at kafka.log.Log$$anonfun$12.apply(Log.scala:1579)
>  at kafka.log.Log$$anonfun$12.apply(Log.scala:1578)
>  at scala.Option.map(Option.scala:146)
>  at kafka.log.Log.logSegments(Log.scala:1578)
>  at kafka.log.Log.kafka$log$Log$$recoverSegment(Log.scala:358)
>  at kafka.log.Log$$anonfun$completeSwapOperations$1.apply(Log.scala:389)
>  at kafka.log.Log$$anonfun$completeSwapOperations$1.apply(Log.scala:380)
>  at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
>  at kafka.log.Log.completeSwapOperations(Log.scala:380)
>  at kafka.log.Log.loadSegments(Log.scala:408)
>  at kafka.log.Log.(Log.scala:216)
>  at kafka.log.Log$.apply(Log.scala:1765)
>  at kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:260)
>  at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$11$$anonfun$apply$15$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:340)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>  at java.util.concurrent.FutureTask.run(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> Since these segments are outside the log range, we should delete them, or at 
> least not block broker startup because of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7828) Execute Trogdor tasks with external commands

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758599#comment-16758599
 ] 

ASF GitHub Bot commented on KAFKA-7828:
---

cmccabe commented on pull request #6219: KAFKA-7828: Add ExternalCommandWorker
URL: https://github.com/apache/kafka/pull/6219
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Execute Trogdor tasks with external commands
> 
>
> Key: KAFKA-7828
> URL: https://issues.apache.org/jira/browse/KAFKA-7828
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Xi Yang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7885) Streams: TopologyDescription violates equals-hashCode contract.

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7885:
---
Priority: Minor  (was: Major)

> Streams: TopologyDescription violates equals-hashCode contract.
> ---
>
> Key: KAFKA-7885
> URL: https://issues.apache.org/jira/browse/KAFKA-7885
> Project: Kafka
>  Issue Type: Bug
>Reporter: Piotr Fras
>Priority: Minor
>
> As per JavaSE documentation:
> > If two objects are *equal* according to the *equals*(Object) method, then 
> >calling the *hashCode* method on each of the two objects must produce the 
> >same integer result.
>  
> This is not the case for TopologyDescription.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7888) kafka cluster not recovering - Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition) continously

2019-02-01 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758638#comment-16758638
 ] 

Jun Rao commented on KAFKA-7888:


[~kemalerden], from the log, the "Cached zkVersion" log started around 23:53:35.

 
{code:java}
server_b13.log.2019-01-26-22:[2019-01-26 23:53:35,041] INFO [Partition 
ucTrade-6 broker=13] Cached zkVersion [21] not equal to that in zookeeper, skip 
updating ISR (kafka.cluster.Partition) 
{code}
 

The controller log showed that broker 13 was never able to re-register itself 
in ZK after 23:53:16.

 
{code:java}
controller_b14.log.2019-01-26-23:[2019-01-26 23:53:16,267] INFO [Controller 
id=14] Newly added brokers: , deleted brokers: 13, all live brokers: 14,15 
(kafka.controller.KafkaController)
controller_b14.log.2019-01-26-23:[2019-01-26 23:53:42,281] INFO [Controller 
id=14] Newly added brokers: , deleted brokers: 15, all live brokers: 14 
(kafka.controller.KafkaController)
controller_b14.log.2019-01-26-23:[2019-01-26 23:53:46,809] INFO [Controller 
id=14] Newly added brokers: 15, deleted brokers: , all live brokers: 14,15 
(kafka.controller.KafkaController)
{code}
 

>From broker 13's log, it failed to re-register itself in ZK around 23:53:11.

 
{code:java}
server_b13.log.2019-01-26-22:[2019-01-26 23:53:11,841] ERROR Error while 
creating ephemeral at /brokers/ids/13, node already exists and owner 
'937991457960493056' does not match current session '1010049473220837376' 
(kafka.zk.KafkaZkClient$CheckedEphemeral)
server_b13.log.2019-01-26-22:[2019-01-26 23:53:11,841] INFO Result of znode 
creation at /brokers/ids/13 is: NODEEXISTS (kafka.zk.KafkaZkClient)
{code}
 

We recently fixed KAFKA-7165 which could lead to the above. Perhaps you could 
try 2.2.0 when it's released.

 

 

> kafka cluster not recovering - Shrinking ISR from 14,13 to 13 
> (kafka.cluster.Partition) continously
> ---
>
> Key: KAFKA-7888
> URL: https://issues.apache.org/jira/browse/KAFKA-7888
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, replication, zkclient
>Affects Versions: 2.1.0
> Environment: using kafka_2.12-2.1.0
> 3 ZKs 3 Broker cluster, using 3 boxes (1 ZK and 1 broker on each box), 
> default.replication factor: 2, 
> offset replication factor was 1 when the error happened, increased to 2 after 
> seeing this error by reassigning-partitions.
> compression: default (producer) on broker but sending gzip from producers.
> linux (redhat) etx4 kafka logs on single local disk
>Reporter: Kemal ERDEN
>Priority: Major
> Attachments: combined.log, producer.log
>
>
> we're seeing the following repeating logs on our kafka cluster from time to 
> time which seems to cause messages expiring on Producers and the cluster 
> going into a non-recoverable state. The only fix seems to be to restart 
> brokers.
> {{Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition)}}
>  {{Cached zkVersion [21] not equal to that in zookeeper, skip updating ISR 
> (kafka.cluster.Partition)}}
>  and later on the following log is repeated:
> {{Got user-level KeeperException when processing sessionid:0xe046aa4f8e6 
> type:setData cxid:0x2df zxid:0xa01fd txntype:-1 reqpath:n/a Error 
> Path:/brokers/topics/ucTrade/partitions/6/state Error:KeeperErrorCode = 
> BadVersion for /brokers/topics/ucTrade/partitions/6/state}}
> We haven't interfered with any of the brokers/zookeepers whilst this happened.
> I've attached a combined log which represents a combination of controller, 
> server and state change logs from each broker (ids 13,14 and 15, log files 
> have the suffix b13, b14, b15 respectively)
> We have increased the heaps from 1g to 6g for the brokers and from 512m to 4g 
> for the zookeepers since this happened but not sure if it is relevant. the ZK 
> logs are unfortunately overwritten so can't provide those.
> We produce varying message sizes but some messages are relatively large (6mb) 
> but we use compression on the producers (set to gzip).
> I've attached some logs from one of our producers as well.
> producer.properties that we've changed:
> spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
> spring.kafka.producer.compression-type=gzip
> spring.kafka.producer.retries=5
> spring.kafka.producer.acks=-1
> spring.kafka.producer.batch-size=1048576
> spring.kafka.producer.properties.linger.ms=200
> spring.kafka.producer.properties.request.timeout.ms=60
> spring.kafka.producer.properties.max.block.ms=24
> spring.kafka.producer.properties.max.request.size=104857600
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7885) Streams: TopologyDescription violates equals-hashCode contract.

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758651#comment-16758651
 ] 

Matthias J. Sax commented on KAFKA-7885:


[~MonCalamari] Thanks for the patch. I added you to the list on contributors 
and assigned the ticket to you.

> Streams: TopologyDescription violates equals-hashCode contract.
> ---
>
> Key: KAFKA-7885
> URL: https://issues.apache.org/jira/browse/KAFKA-7885
> Project: Kafka
>  Issue Type: Bug
>Reporter: Piotr Fras
>Assignee: Piotr Fras
>Priority: Minor
>
> As per JavaSE documentation:
> > If two objects are *equal* according to the *equals*(Object) method, then 
> >calling the *hashCode* method on each of the two objects must produce the 
> >same integer result.
>  
> This is not the case for TopologyDescription.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7885) Streams: TopologyDescription violates equals-hashCode contract.

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-7885:
--

Assignee: Piotr Fras

> Streams: TopologyDescription violates equals-hashCode contract.
> ---
>
> Key: KAFKA-7885
> URL: https://issues.apache.org/jira/browse/KAFKA-7885
> Project: Kafka
>  Issue Type: Bug
>Reporter: Piotr Fras
>Assignee: Piotr Fras
>Priority: Minor
>
> As per JavaSE documentation:
> > If two objects are *equal* according to the *equals*(Object) method, then 
> >calling the *hashCode* method on each of the two objects must produce the 
> >same integer result.
>  
> This is not the case for TopologyDescription.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6880) Zombie replicas must be fenced

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6880:
---
Labels: kip  (was: needs-kip)

> Zombie replicas must be fenced
> --
>
> Key: KAFKA-6880
> URL: https://issues.apache.org/jira/browse/KAFKA-6880
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>  Labels: kip
> Fix For: 2.1.0
>
>
> Let's say we have three replicas for a partition: 1, 2 ,and 3.
> In epoch 0, broker 1 is the leader and writes up to offset 50. Broker 2 
> replicates up to offset 50, but broker 3 is a little behind at offset 40. The 
> high watermark is 40. 
> Suppose that broker 2 has a zk session expiration event, but fails to detect 
> it or fails to reestablish a session (e.g. due to a bug like KAFKA-6879), and 
> it continues fetching from broker 1.
> For whatever reason, broker 3 is elected the leader for epoch 1 beginning at 
> offset 40. Broker 1 detects the leader change and truncates its log to offset 
> 40. Some new data is appended up to offset 60, which is fully replicated to 
> broker 1. Broker 2 continues fetching from broker 1 at offset 50, but gets 
> NOT_LEADER_FOR_PARTITION errors, which is retriable and hence broker 2 will 
> retry.
> After some time, broker 1 becomes the leader again for epoch 2. Broker 1 
> begins at offset 60. Broker 2 has not exhausted retries and is now able to 
> fetch at offset 50 and append the last 10 records in order to catch up. 
> However, because it did not observed the leader changes, it never saw the 
> need to truncate its log. Hence offsets 40-49 still reflect the uncommitted 
> changes from epoch 0. Neither KIP-101 nor KIP-279 can fix this because the 
> tail of the log is correct.
> The basic problem is that zombie replicas are not fenced properly by the 
> leader epoch. We actually observed a sequence roughly like this after a 
> broker had partially deadlocked from KAFKA-6879. We should add the leader 
> epoch to fetch requests so that the leader can fence the zombie replicas.
> A related problem is that we currently allow such zombie replicas to be added 
> to the ISR even if they are in an offline state. The problem is that the 
> controller will never elect them, so being part of the ISR does not give the 
> availability guarantee that is intended. This would also be fixed by 
> verifying replica leader epoch in fetch requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7395) Add fencing to replication protocol (KIP-320)

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7395:
---
Labels: kip  (was: )

> Add fencing to replication protocol (KIP-320)
> -
>
> Key: KAFKA-7395
> URL: https://issues.apache.org/jira/browse/KAFKA-7395
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>  Labels: kip
> Fix For: 2.1.0
>
>
> This patch implements the broker-side changes to support fencing improvements 
> from KIP-320: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7440) Use leader epoch in consumer fetch requests

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7440:
---
Labels: kip  (was: )

> Use leader epoch in consumer fetch requests
> ---
>
> Key: KAFKA-7440
> URL: https://issues.apache.org/jira/browse/KAFKA-7440
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>  Labels: kip
>
> This patch adds support in the consumer to use the leader epoch obtained from 
> the metadata in fetch requests: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7471) Multiple Consumer Group Management (Describe, Reset, Delete)

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758700#comment-16758700
 ] 

Matthias J. Sax commented on KAFKA-7471:


Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> Multiple Consumer Group Management (Describe, Reset, Delete)
> 
>
> Key: KAFKA-7471
> URL: https://issues.apache.org/jira/browse/KAFKA-7471
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Alex Dunayevsky
>Assignee: Alex Dunayevsky
>Priority: Major
> Fix For: 2.2.0
>
>
> Functionality needed:
>  * Describe/Delete/Reset offsets on multiple consumer groups at a time 
> (including each group by repeating `--group` parameter)
>  * Describe/Delete/Reset offsets on ALL consumer groups at a time (add new 
> --groups-all option similar to --topics-all)
>  * Generate CSV for multiple consumer groups
> What are the benifits? 
>  * No need to start a new JVM to perform each query on every single consumer 
> group
>  * Abiltity to query groups by their status (for instance, `-v grepping` by 
> `Stable` to spot problematic/dead/empty groups)
>  * Ability to export offsets to reset for multiple consumer groups to a CSV 
> file (needs CSV generation export/import format rework)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7471) Multiple Consumer Group Management (Describe, Reset, Delete)

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7471:
---
Fix Version/s: (was: 2.2.0)

> Multiple Consumer Group Management (Describe, Reset, Delete)
> 
>
> Key: KAFKA-7471
> URL: https://issues.apache.org/jira/browse/KAFKA-7471
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Alex Dunayevsky
>Assignee: Alex Dunayevsky
>Priority: Major
>
> Functionality needed:
>  * Describe/Delete/Reset offsets on multiple consumer groups at a time 
> (including each group by repeating `--group` parameter)
>  * Describe/Delete/Reset offsets on ALL consumer groups at a time (add new 
> --groups-all option similar to --topics-all)
>  * Generate CSV for multiple consumer groups
> What are the benifits? 
>  * No need to start a new JVM to perform each query on every single consumer 
> group
>  * Abiltity to query groups by their status (for instance, `-v grepping` by 
> `Stable` to spot problematic/dead/empty groups)
>  * Ability to export offsets to reset for multiple consumer groups to a CSV 
> file (needs CSV generation export/import format rework)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7383) Verify leader epoch in produce requests (KIP-359)

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7383:
---
Fix Version/s: (was: 2.2.0)

> Verify leader epoch in produce requests (KIP-359)
> -
>
> Key: KAFKA-7383
> URL: https://issues.apache.org/jira/browse/KAFKA-7383
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> Implementation of 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-359%3A+Verify+leader+epoch+in+produce+requests.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7320) Provide ability to disable auto topic creation in KafkaConsumer

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758702#comment-16758702
 ] 

Matthias J. Sax commented on KAFKA-7320:


Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> Provide ability to disable auto topic creation in KafkaConsumer
> ---
>
> Key: KAFKA-7320
> URL: https://issues.apache.org/jira/browse/KAFKA-7320
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Dhruvil Shah
>Assignee: Dhruvil Shah
>Priority: Major
> Fix For: 2.2.0
>
>
> Consumers should have a configuration to control whether subscribing to 
> non-existent topics should automatically create the topic or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7383) Verify leader epoch in produce requests (KIP-359)

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758703#comment-16758703
 ] 

Matthias J. Sax commented on KAFKA-7383:


Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> Verify leader epoch in produce requests (KIP-359)
> -
>
> Key: KAFKA-7383
> URL: https://issues.apache.org/jira/browse/KAFKA-7383
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.2.0
>
>
> Implementation of 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-359%3A+Verify+leader+epoch+in+produce+requests.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6556) allow to attach callbacks in kafka streams, to be triggered when a window is expired

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6556:
---
Labels: kip  (was: )

> allow to attach callbacks in kafka streams, to be triggered when a window is 
> expired 
> -
>
> Key: KAFKA-6556
> URL: https://issues.apache.org/jira/browse/KAFKA-6556
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: igor mazor
>Assignee: John Roesler
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> Allowing to attach callbacks in kafka streams, to be triggered when a window 
> is expired,
>  would help in use cases when the final state of the window is required.
>  It would be also useful if together with that functionally the user would be 
> able to control whether the callback would be triggered in addition to 
> emitting the normal change log down the stream, or only triggering the 
> callback when the window is expired. (for example in cases when only the 
> final window state is required, and any updates to the window state during 
> the window time interval are not important)  
> An example for use case could be left join with proper sql semantics:
>  A left join B currently would emit (a1,null) and (a1,b1) if b1 arrives 
> within the defined join time window.
>  what I would like is to have ONLY ONE result:
>  (a1,null) if no b1 arrived during the defined join time window, OR ONLY 
> (a1,b1) if b1 did arrived in the specified join time window.
>  One possible solution could be to use the current kafka streams left join 
> with downstream processor which would put the results in a windowed Ktable.
>  The window size would be same as for the left join operation, however, only 
> the final state of that window would be emitted down the stream once the time 
> window is expired.
>  So if the left join produces (a1, null) and after X minutes no (a1, b1) was 
> produced, eventually only (a1, null) would be emitted, on the other hand, if 
> the left join produces (a1, null) and after X-t minutes (a1, b1) was produced 
> by the left join operation => only (a1, b1) would be emitted eventually down 
> the stream after X minutes.
>  
> Another use case is when the window state is written to another kafka topic 
> which is then used to persist the window states into a db, However, many 
> times only the final window state
>  is required, and functionality to get only the last window state would help 
> in reducing load from the db, since only the final window state would be 
> persisted to the db, instead of multiple db writes for each window state 
> update. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6556) allow to attach callbacks in kafka streams, to be triggered when a window is expired

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6556:
---
Fix Version/s: (was: 2.2.0)

> allow to attach callbacks in kafka streams, to be triggered when a window is 
> expired 
> -
>
> Key: KAFKA-6556
> URL: https://issues.apache.org/jira/browse/KAFKA-6556
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: igor mazor
>Assignee: John Roesler
>Priority: Major
>  Labels: kip
>
> Allowing to attach callbacks in kafka streams, to be triggered when a window 
> is expired,
>  would help in use cases when the final state of the window is required.
>  It would be also useful if together with that functionally the user would be 
> able to control whether the callback would be triggered in addition to 
> emitting the normal change log down the stream, or only triggering the 
> callback when the window is expired. (for example in cases when only the 
> final window state is required, and any updates to the window state during 
> the window time interval are not important)  
> An example for use case could be left join with proper sql semantics:
>  A left join B currently would emit (a1,null) and (a1,b1) if b1 arrives 
> within the defined join time window.
>  what I would like is to have ONLY ONE result:
>  (a1,null) if no b1 arrived during the defined join time window, OR ONLY 
> (a1,b1) if b1 did arrived in the specified join time window.
>  One possible solution could be to use the current kafka streams left join 
> with downstream processor which would put the results in a windowed Ktable.
>  The window size would be same as for the left join operation, however, only 
> the final state of that window would be emitted down the stream once the time 
> window is expired.
>  So if the left join produces (a1, null) and after X minutes no (a1, b1) was 
> produced, eventually only (a1, null) would be emitted, on the other hand, if 
> the left join produces (a1, null) and after X-t minutes (a1, b1) was produced 
> by the left join operation => only (a1, b1) would be emitted eventually down 
> the stream after X minutes.
>  
> Another use case is when the window state is written to another kafka topic 
> which is then used to persist the window states into a db, However, many 
> times only the final window state
>  is required, and functionality to get only the last window state would help 
> in reducing load from the db, since only the final window state would be 
> persisted to the db, instead of multiple db writes for each window state 
> update. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758705#comment-16758705
 ] 

Matthias J. Sax commented on KAFKA-7026:


Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> Sticky assignor could assign a partition to multiple consumers (KIP-341)
> 
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6161) Add default implementation to close() and configure() for Serializer, Deserializer and Serde

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758706#comment-16758706
 ] 

Matthias J. Sax commented on KAFKA-6161:


Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> Add default implementation to close() and configure() for Serializer, 
> Deserializer and Serde
> 
>
> Key: KAFKA-6161
> URL: https://issues.apache.org/jira/browse/KAFKA-6161
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>Reporter: Evgeny Veretennikov
>Assignee: Chia-Ping Tsai
>Priority: Major
>  Labels: kip
>
> {{Serializer}}, {{Deserializer}} and {{Serde}} interfaces have methods 
> {{configure()}} and {{close()}}. Pretty often one want to leave these methods 
> empty. For example, a lot of serializers inside 
> {{org.apache.kafka.common.serialization}} package have these methods empty:
> {code}
> @Override
> public void configure(Map configs, boolean isKey) {
> // nothing to do
> }
> @Override
> public void close() {
> // nothing to do
> }
> {code}
> To avoid such boilerplate, we may create new interfaces (like 
> {{UnconfiguredSerializer}}), in which we will define these methods empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6049) Kafka Streams: Add Cogroup in the DSL

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758710#comment-16758710
 ] 

Matthias J. Sax commented on KAFKA-6049:


Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> Kafka Streams: Add Cogroup in the DSL
> -
>
> Key: KAFKA-6049
> URL: https://issues.apache.org/jira/browse/KAFKA-6049
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: api, kip, user-experience
>
> When multiple streams aggregate together to form a single larger object (e.g. 
> a shopping website may have a cart stream, a wish list stream, and a 
> purchases stream. Together they make up a Customer), it is very difficult to 
> accommodate this in the Kafka-Streams DSL: it generally requires you to group 
> and aggregate all of the streams to KTables then make multiple outer join 
> calls to end up with a KTable with your desired object. This will create a 
> state store for each stream and a long chain of ValueJoiners that each new 
> record must go through to get to the final object.
> Creating a cogroup method where you use a single state store will:
> * Reduce the number of gets from state stores. With the multiple joins when a 
> new value comes into any of the streams a chain reaction happens where the 
> join processor keep calling ValueGetters until we have accessed all state 
> stores.
> * Slight performance increase. As described above all ValueGetters are called 
> also causing all ValueJoiners to be called forcing a recalculation of the 
> current joined value of all other streams, impacting performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4133) Provide a configuration to control consumer max in-flight fetches

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4133:
---
Labels: kip  (was: )

> Provide a configuration to control consumer max in-flight fetches
> -
>
> Key: KAFKA-4133
> URL: https://issues.apache.org/jira/browse/KAFKA-4133
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Mickael Maison
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> With KIP-74, we now have a good way to limit the size of fetch responses, but 
> it may still be difficult for users to control overall memory since the 
> consumer will send fetches in parallel to all the brokers which own 
> partitions that it is subscribed to. To give users finer control, it might 
> make sense to add a `max.in.flight.fetches` setting to limit the total number 
> of concurrent fetches at any time. This would require a KIP since it's a new 
> configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4133) Provide a configuration to control consumer max in-flight fetches

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4133:
---
Fix Version/s: (was: 2.2.0)

> Provide a configuration to control consumer max in-flight fetches
> -
>
> Key: KAFKA-4133
> URL: https://issues.apache.org/jira/browse/KAFKA-4133
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Mickael Maison
>Priority: Major
>  Labels: kip
>
> With KIP-74, we now have a good way to limit the size of fetch responses, but 
> it may still be difficult for users to control overall memory since the 
> consumer will send fetches in parallel to all the brokers which own 
> partitions that it is subscribed to. To give users finer control, it might 
> make sense to add a `max.in.flight.fetches` setting to limit the total number 
> of concurrent fetches at any time. This would require a KIP since it's a new 
> configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4133) Provide a configuration to control consumer max in-flight fetches

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758711#comment-16758711
 ] 

Matthias J. Sax commented on KAFKA-4133:


Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> Provide a configuration to control consumer max in-flight fetches
> -
>
> Key: KAFKA-4133
> URL: https://issues.apache.org/jira/browse/KAFKA-4133
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Mickael Maison
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> With KIP-74, we now have a good way to limit the size of fetch responses, but 
> it may still be difficult for users to control overall memory since the 
> consumer will send fetches in parallel to all the brokers which own 
> partitions that it is subscribed to. To give users finer control, it might 
> make sense to add a `max.in.flight.fetches` setting to limit the total number 
> of concurrent fetches at any time. This would require a KIP since it's a new 
> configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7873) KafkaBasedLog's consumer should always seek to beginning when starting

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758716#comment-16758716
 ] 

ASF GitHub Bot commented on KAFKA-7873:
---

hachikuji commented on pull request #6203: KAFKA-7873: Always seek to beginning 
in KafkaBasedLog
URL: https://github.com/apache/kafka/pull/6203
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KafkaBasedLog's consumer should always seek to beginning when starting
> --
>
> Key: KAFKA-7873
> URL: https://issues.apache.org/jira/browse/KAFKA-7873
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.1.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
>
> KafkaBasedLog expects that callers set the `group.id` for the consumer 
> configuration, and does not itself set the `group.id` if the caller does not 
> explicitly do so. However, 
> [KIP-289|https://cwiki.apache.org/confluence/display/KAFKA/KIP-289%3A+Improve+the+default+group+id+behavior+in+KafkaConsumer]
>  changed the default for the `group.id` from a blank string to be null, which 
> changes how KafkaBasedLog behaves when no `group.id` is set, and it actually 
> deprecates and issues a warning when no `group.id` is specified.
> When KafkaBasedLog starts up, it should always start from the beginning of 
> the topic and consume to the end. The consumer's logic for where to start is 
> always:
> # explicit seek
> # committed offset (skipped if group.id is null)
> # auto reset behavior
> and currently Connect does not explicitly seek to the beginning and instead 
> relies upon `auto.offset.reset=earliest`. However, if a `group.id` is 
> specified *ant* there are committed offsets, then the consumer will start 
> from the committed offsets rather than from the beginning. If a 'group.id' is 
> not specified, then the auto reset behavior should work.
> However, to avoid the warning and possible exception when no `group.id` is 
> specified, KafkaBasedLog should always call {{consumer.seekToBeginning()}} 
> during startup. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7026:
---
Fix Version/s: (was: 2.2.0)

> Sticky assignor could assign a partition to multiple consumers (KIP-341)
> 
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>  Labels: kip
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6049) Kafka Streams: Add Cogroup in the DSL

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6049:
---
Fix Version/s: (was: 2.2.0)

> Kafka Streams: Add Cogroup in the DSL
> -
>
> Key: KAFKA-6049
> URL: https://issues.apache.org/jira/browse/KAFKA-6049
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: api, kip, user-experience
>
> When multiple streams aggregate together to form a single larger object (e.g. 
> a shopping website may have a cart stream, a wish list stream, and a 
> purchases stream. Together they make up a Customer), it is very difficult to 
> accommodate this in the Kafka-Streams DSL: it generally requires you to group 
> and aggregate all of the streams to KTables then make multiple outer join 
> calls to end up with a KTable with your desired object. This will create a 
> state store for each stream and a long chain of ValueJoiners that each new 
> record must go through to get to the final object.
> Creating a cogroup method where you use a single state store will:
> * Reduce the number of gets from state stores. With the multiple joins when a 
> new value comes into any of the streams a chain reaction happens where the 
> join processor keep calling ValueGetters until we have accessed all state 
> stores.
> * Slight performance increase. As described above all ValueGetters are called 
> also causing all ValueJoiners to be called forcing a recalculation of the 
> current joined value of all other streams, impacting performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6556) allow to attach callbacks in kafka streams, to be triggered when a window is expired

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758708#comment-16758708
 ] 

Matthias J. Sax commented on KAFKA-6556:


Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> allow to attach callbacks in kafka streams, to be triggered when a window is 
> expired 
> -
>
> Key: KAFKA-6556
> URL: https://issues.apache.org/jira/browse/KAFKA-6556
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: igor mazor
>Assignee: John Roesler
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> Allowing to attach callbacks in kafka streams, to be triggered when a window 
> is expired,
>  would help in use cases when the final state of the window is required.
>  It would be also useful if together with that functionally the user would be 
> able to control whether the callback would be triggered in addition to 
> emitting the normal change log down the stream, or only triggering the 
> callback when the window is expired. (for example in cases when only the 
> final window state is required, and any updates to the window state during 
> the window time interval are not important)  
> An example for use case could be left join with proper sql semantics:
>  A left join B currently would emit (a1,null) and (a1,b1) if b1 arrives 
> within the defined join time window.
>  what I would like is to have ONLY ONE result:
>  (a1,null) if no b1 arrived during the defined join time window, OR ONLY 
> (a1,b1) if b1 did arrived in the specified join time window.
>  One possible solution could be to use the current kafka streams left join 
> with downstream processor which would put the results in a windowed Ktable.
>  The window size would be same as for the left join operation, however, only 
> the final state of that window would be emitted down the stream once the time 
> window is expired.
>  So if the left join produces (a1, null) and after X minutes no (a1, b1) was 
> produced, eventually only (a1, null) would be emitted, on the other hand, if 
> the left join produces (a1, null) and after X-t minutes (a1, b1) was produced 
> by the left join operation => only (a1, b1) would be emitted eventually down 
> the stream after X minutes.
>  
> Another use case is when the window state is written to another kafka topic 
> which is then used to persist the window states into a db, However, many 
> times only the final window state
>  is required, and functionality to get only the last window state would help 
> in reducing load from the db, since only the final window state would be 
> persisted to the db, instead of multiple db writes for each window state 
> update. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7320) Provide ability to disable auto topic creation in KafkaConsumer

2019-02-01 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7320:
---
Fix Version/s: (was: 2.2.0)

> Provide ability to disable auto topic creation in KafkaConsumer
> ---
>
> Key: KAFKA-7320
> URL: https://issues.apache.org/jira/browse/KAFKA-7320
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Dhruvil Shah
>Assignee: Dhruvil Shah
>Priority: Major
>
> Consumers should have a configuration to control whether subscribing to 
> non-existent topics should automatically create the topic or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7433) Introduce broker options in TopicCommand to use AdminClient

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758725#comment-16758725
 ] 

ASF GitHub Bot commented on KAFKA-7433:
---

omkreddy commented on pull request #5683: KAFKA-7433: Introduce broker options 
in TopicCommand to use AdminClient (KIP-377)
URL: https://github.com/apache/kafka/pull/5683
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Introduce broker options in TopicCommand to use AdminClient
> ---
>
> Key: KAFKA-7433
> URL: https://issues.apache.org/jira/browse/KAFKA-7433
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Viktor Somogyi
>Assignee: Viktor Somogyi
>Priority: Major
>  Labels: kip
>
> This task aim to add --bootstrap-servers and --admin.config options which 
> enable kafka.admin.TopicCommand to work with the Java based AdminClient.
> Ideally KAFKA-5561 might replace this task but as an incremental step until 
> that succeeds it might be enough just to add these options to the existing 
> command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7873) KafkaBasedLog's consumer should always seek to beginning when starting

2019-02-01 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7873.

   Resolution: Fixed
Fix Version/s: 2.0.2
   2.1.1
   2.2.0

> KafkaBasedLog's consumer should always seek to beginning when starting
> --
>
> Key: KAFKA-7873
> URL: https://issues.apache.org/jira/browse/KAFKA-7873
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.1.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
> Fix For: 2.2.0, 2.1.1, 2.0.2
>
>
> KafkaBasedLog expects that callers set the `group.id` for the consumer 
> configuration, and does not itself set the `group.id` if the caller does not 
> explicitly do so. However, 
> [KIP-289|https://cwiki.apache.org/confluence/display/KAFKA/KIP-289%3A+Improve+the+default+group+id+behavior+in+KafkaConsumer]
>  changed the default for the `group.id` from a blank string to be null, which 
> changes how KafkaBasedLog behaves when no `group.id` is set, and it actually 
> deprecates and issues a warning when no `group.id` is specified.
> When KafkaBasedLog starts up, it should always start from the beginning of 
> the topic and consume to the end. The consumer's logic for where to start is 
> always:
> # explicit seek
> # committed offset (skipped if group.id is null)
> # auto reset behavior
> and currently Connect does not explicitly seek to the beginning and instead 
> relies upon `auto.offset.reset=earliest`. However, if a `group.id` is 
> specified *ant* there are committed offsets, then the consumer will start 
> from the committed offsets rather than from the beginning. If a 'group.id' is 
> not specified, then the auto reset behavior should work.
> However, to avoid the warning and possible exception when no `group.id` is 
> specified, KafkaBasedLog should always call {{consumer.seekToBeginning()}} 
> during startup. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7893) Refactor ConsumerBounceTest.scala

2019-02-01 Thread Stanislav Kozlovski (JIRA)
Stanislav Kozlovski created KAFKA-7893:
--

 Summary: Refactor ConsumerBounceTest.scala
 Key: KAFKA-7893
 URL: https://issues.apache.org/jira/browse/KAFKA-7893
 Project: Kafka
  Issue Type: Improvement
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski


The 
[ConsumerBounceTest|https://github.com/apache/kafka/blob/trunk/core/src/test/scala/integration/kafka/api/ConsumerBounceTest.scala]
 has logic to create consumers and ensure that they are in the same consumer 
group. There is code that already does this in `BaseConsumerTest` and 
`PlaintextConsumerTest` -it would be better if we could make 
`ConsumerBounceTest` reuse that functionality.

As part of [https://github.com/apache/kafka/pull/6163,] it is worth 
investigating if we could assert the newly-introduced test cases there with 
unit tests as opposed to integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7401) Broker fails to start when recovering a segment from before the log start offset

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758757#comment-16758757
 ] 

ASF GitHub Bot commented on KAFKA-7401:
---

apovzner commented on pull request #6220: KAFKA-7401: Fix inconsistent range 
exception on segment recovery
URL: https://github.com/apache/kafka/pull/6220
 
 
   This PR fixes "java.lang.IllegalArgumentException: inconsistent range" which 
happens on broker startup after  unclean shutdown during log cleaning phase 
that creates swap files (in case where base offset < log start offset).  Added 
`testRecoveryAfterCrashAndIncrementedLogStartOffset` that reproduces Kafka-7401.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Broker fails to start when recovering a segment from before the log start 
> offset
> 
>
> Key: KAFKA-7401
> URL: https://issues.apache.org/jira/browse/KAFKA-7401
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Bob Barrett
>Assignee: Anna Povzner
>Priority: Major
>
> If a segment needs to be recovered (for example, because of a missing index 
> file or uncompleted swap operation) and its base offset is less than the log 
> start offset, the broker will crash with the following error:
> Fatal error during KafkaServer startup. Prepare to shutdown 
> (kafka.server.KafkaServer)
>  java.lang.IllegalArgumentException: inconsistent range
>  at java.util.concurrent.ConcurrentSkipListMap$SubMap.(Unknown Source)
>  at java.util.concurrent.ConcurrentSkipListMap.subMap(Unknown Source)
>  at java.util.concurrent.ConcurrentSkipListMap.subMap(Unknown Source)
>  at kafka.log.Log$$anonfun$12.apply(Log.scala:1579)
>  at kafka.log.Log$$anonfun$12.apply(Log.scala:1578)
>  at scala.Option.map(Option.scala:146)
>  at kafka.log.Log.logSegments(Log.scala:1578)
>  at kafka.log.Log.kafka$log$Log$$recoverSegment(Log.scala:358)
>  at kafka.log.Log$$anonfun$completeSwapOperations$1.apply(Log.scala:389)
>  at kafka.log.Log$$anonfun$completeSwapOperations$1.apply(Log.scala:380)
>  at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
>  at kafka.log.Log.completeSwapOperations(Log.scala:380)
>  at kafka.log.Log.loadSegments(Log.scala:408)
>  at kafka.log.Log.(Log.scala:216)
>  at kafka.log.Log$.apply(Log.scala:1765)
>  at kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:260)
>  at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$11$$anonfun$apply$15$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:340)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>  at java.util.concurrent.FutureTask.run(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> Since these segments are outside the log range, we should delete them, or at 
> least not block broker startup because of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7894) When Kafka detached from network; rack isn't logged

2019-02-01 Thread Andrey Falko (JIRA)
Andrey Falko created KAFKA-7894:
---

 Summary: When Kafka detached from network; rack isn't logged
 Key: KAFKA-7894
 URL: https://issues.apache.org/jira/browse/KAFKA-7894
 Project: Kafka
  Issue Type: Bug
Reporter: Andrey Falko


Step 1: Broker is detached from network
Step 2: Logs show "rack: null" even through we have broker.rack set and correct 
broker.rack shows up in the Kafka start sequence

java.io.IOException: Connection to ip-10-0-80-124.ec2.internal:9092 (id: 9 
rack: null) failed. 
22:58:02.354 WARN  o.a.k.c.NetworkClient - [ReplicaFetcher replicaId=8, 
leaderId=6, fetcherId=0] Error connecting to node 
ip-10-0-11-22.ec2.internal:9092 (id: 6 rack: null) 
22:58:02.354 INFO  o.a.k.c.FetchSessionHandler - [ReplicaFetcher replicaId=8, 
leaderId=6, fetcherId=0] Error sending fetch request (sessionId=685825590, 
epoch=INITIAL) to node 6: java.io.IOException: C
onnection to ip-10-0-11-22.ec2.internal:9092 (id: 6 rack: null) failed.. 
java.io.IOException: Connection to ip-10-0-11-22.ec2.internal:9092 (id: 6 rack: 
null) failed. 
22:58:03.483 WARN  o.a.k.c.NetworkClient - [ReplicaFetcher replicaId=8, 
leaderId=5, fetcherId=0] Error connecting to node 
ip-10-0-13-33.ec2.internal:9092 (id: 5 rack: null)

This seems similar to https://issues.apache.org/jira/browse/KAFKA-5463



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7831) Consumer SubscriptionState missing synchronization

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758773#comment-16758773
 ] 

ASF GitHub Bot commented on KAFKA-7831:
---

hachikuji commented on pull request #6221: KAFKA-7831; Do not modify 
subscription state from background thread
URL: https://github.com/apache/kafka/pull/6221
 
 
   Metadata may be updated from the background thread, so we need to protect 
access to SubscriptionState. This patch restructures the metadata handling so 
that we only check pattern subscriptions in the foreground. Additionally, it 
improves the following:
   
   1. SubscriptionState is now the source of truth for the topics that will be 
fetched. We had a lot of messy logic previously to try and keep the the topic 
set in Metadata consistent with the subscription, so this simplifies the logic.
   2. The metadata needs for the producer and consumer are quite different, so 
it made sense to separate the custom logic into separate extensions of 
Metadata. For example, only the producer requires topic expiration.
   3. We've always had an edge case in which a metadata change with an inflight 
request may cause us to effectively miss an expected update. This patch 
implements a separate version inside Metadata which is bumped when the needed 
topics changes.
   4. This patch removes the MetadataListener, which was the cause of 
https://issues.apache.org/jira/browse/KAFKA-7764. 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer SubscriptionState missing synchronization
> --
>
> Key: KAFKA-7831
> URL: https://issues.apache.org/jira/browse/KAFKA-7831
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> ConsumerCoordinator installs a Metadata.Listener in order to update pattern 
> subscriptions after metadata changes. The listener is invoked from 
> NetworkClient.poll, which could happen in the heartbeat thread. Currently, 
> however, there is no synchronization in SubscriptionState to make this safe. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3522) Consider adding version information into rocksDB storage format

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758780#comment-16758780
 ] 

ASF GitHub Bot commented on KAFKA-3522:
---

mjsax commented on pull request #6222: KAFKA-3522: Remove TimestampedByteStore 
from public API
URL: https://github.com/apache/kafka/pull/6222
 
 
   Because KIP-258 slips, this PR moves `TimestampedByteStore` into internal 
package to not leak public API changes in 2.2 release. We will move it back, 
after 2.2 branch it cut.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider adding version information into rocksDB storage format
> ---
>
> Key: KAFKA-3522
> URL: https://issues.apache.org/jira/browse/KAFKA-3522
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: architecture
>
> Kafka Streams does not introduce any modifications to the data format in the 
> underlying Kafka protocol, but it does use RocksDB for persistent state 
> storage, and currently its data format is fixed and hard-coded. We want to 
> consider the evolution path in the future we we change the data format, and 
> hence having some version info stored along with the storage file / directory 
> would be useful.
> And this information could be even out of the storage file; for example, we 
> can just use a small "version indicator" file in the rocksdb directory for 
> this purposes. Thoughts? [~enothereska] [~jkreps]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7236) Add --under-min-isr option to describe topics command

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758810#comment-16758810
 ] 

ASF GitHub Bot commented on KAFKA-7236:
---

KevinLiLu commented on pull request #6224: KAFKA-7236: Add --under-min-isr 
option to describe topics command (KIP-351)
URL: https://github.com/apache/kafka/pull/6224
 
 
   
[KIP-351](https://cwiki.apache.org/confluence/display/KAFKA/KIP-351%3A+Add+--under-min-isr+option+to+describe+topics+command)
   
   - Add `--under-min-isr` option to `TopicCommand` to report partitions under 
the configured `min.insync.replicas` value
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add --under-min-isr option to describe topics command
> -
>
> Key: KAFKA-7236
> URL: https://issues.apache.org/jira/browse/KAFKA-7236
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Kevin Lu
>Assignee: Kevin Lu
>Priority: Minor
>
> The "min.insync.replicas" configuration specifies the minimum number of 
> insync replicas required for a partition to accept messages from the 
> producer. If the insync replica count of a partition falls under the 
> specified "min.insync.replicas", then the broker will reject messages for 
> producers using acks=all. These producers will suffer unavailability as they 
> will see a NotEnoughReplicas or NotEnoughReplicasAfterAppend exception.
> We currently have an UnderMinIsrPartitionCount metric which is useful for 
> identifying when partitions fall under "min.insync.replicas", however it is 
> still difficult to identify which topic partitions are affected and need 
> fixing.
> We can leverage the describe topics command in TopicCommand to add an option 
> "--under-minisr-partitions" to list out exactly which topic partitions are 
> below "min.insync.replicas".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2019-02-01 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758861#comment-16758861
 ] 

Matthias J. Sax commented on KAFKA-7481:


[~hachikuji] This is marked as blocker, was closed, and reopened. Is this going 
into 2.2?

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.2.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7641) Add `consumer.group.max.size` to cap consumer metadata size on broker

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758871#comment-16758871
 ] 

ASF GitHub Bot commented on KAFKA-7641:
---

hachikuji commented on pull request #6163: KAFKA-7641: Introduce 
"group.max.size" config to limit group sizes
URL: https://github.com/apache/kafka/pull/6163
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add `consumer.group.max.size` to cap consumer metadata size on broker
> -
>
> Key: KAFKA-7641
> URL: https://issues.apache.org/jira/browse/KAFKA-7641
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Boyang Chen
>Assignee: Stanislav Kozlovski
>Priority: Major
>  Labels: kip
>
> In the JIRA discussion https://issues.apache.org/jira/browse/KAFKA-7610, 
> Jason concluded an edge case of current consumer protocol which could cause 
> memory burst on broker side:
> ```the case we observed in practice was caused by a consumer that was slow to 
> rejoin the group after a rebalance had begun. At the same time, there were 
> new members that were trying to join the group for the first time. The 
> request timeout was significantly lower than the rebalance timeout, so the 
> JoinGroup of the new members kept timing out. The timeout caused a retry and 
> the group size eventually become quite large because we could not detect the 
> fact that the new members were no longer there.```
> Since many disorganized join group requests are spamming the group metadata, 
> we should define a cap on broker side to avoid one consumer group from 
> growing too large. So far I feel it's appropriate to introduce this as a 
> server config since most times this value is only dealing with error 
> scenarios, client users shouldn't worry about this config.
> KIP-389: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-389%3A+Introduce+a+configurable+consumer+group+size+limit]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7641) Add `consumer.group.max.size` to cap consumer metadata size on broker

2019-02-01 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7641.

   Resolution: Fixed
Fix Version/s: 2.2.0

> Add `consumer.group.max.size` to cap consumer metadata size on broker
> -
>
> Key: KAFKA-7641
> URL: https://issues.apache.org/jira/browse/KAFKA-7641
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Boyang Chen
>Assignee: Stanislav Kozlovski
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> In the JIRA discussion https://issues.apache.org/jira/browse/KAFKA-7610, 
> Jason concluded an edge case of current consumer protocol which could cause 
> memory burst on broker side:
> ```the case we observed in practice was caused by a consumer that was slow to 
> rejoin the group after a rebalance had begun. At the same time, there were 
> new members that were trying to join the group for the first time. The 
> request timeout was significantly lower than the rebalance timeout, so the 
> JoinGroup of the new members kept timing out. The timeout caused a retry and 
> the group size eventually become quite large because we could not detect the 
> fact that the new members were no longer there.```
> Since many disorganized join group requests are spamming the group metadata, 
> we should define a cap on broker side to avoid one consumer group from 
> growing too large. So far I feel it's appropriate to introduce this as a 
> server config since most times this value is only dealing with error 
> scenarios, client users shouldn't worry about this config.
> KIP-389: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-389%3A+Introduce+a+configurable+consumer+group+size+limit]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7798) Expose embedded client context from KafkaStreams threadMetadata

2019-02-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758884#comment-16758884
 ] 

ASF GitHub Bot commented on KAFKA-7798:
---

mjsax commented on pull request #6107: KAFKA-7798: Expose embedded clientIds
URL: https://github.com/apache/kafka/pull/6107
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Expose embedded client context from KafkaStreams threadMetadata
> ---
>
> Key: KAFKA-7798
> URL: https://issues.apache.org/jira/browse/KAFKA-7798
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
>
> A KafkaStreams client today contains multiple embedded clients: producer, 
> consumer and admin client. Currently these client's context like client id 
> are not exposed via KafkaStreams. This ticket proposes to expose those 
> context information at the per-thread basis (since each thread has its own 
> embedded clients) via ThreadMetadata.
> This also has an interplay with KIP-345: as we add group.instance.id in that 
> KIP, this information should also be exposed as well.
> Cf: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-414%3A+Expose+Embedded+ClientIds+in+Kafka+Streams]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)