[jira] [Created] (KAFKA-10594) Enhance Raft exception handling

2020-10-09 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10594:
---

 Summary: Enhance Raft exception handling
 Key: KAFKA-10594
 URL: https://issues.apache.org/jira/browse/KAFKA-10594
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen
Assignee: Boyang Chen


The current exception handling on the Raft implementation is superficial, for 
example we don't treat file system exception and request handling exception 
differently. It's necessary to decide what kind of exception should be fatal, 
what kind of exception should be responding to the client, and what exception 
could be retried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10343) Add IBP based ApiVersion constraint tests

2020-10-13 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10343:

Summary: Add IBP based ApiVersion constraint tests  (was: Remove 2.7 IBP 
for redirection enablement)

> Add IBP based ApiVersion constraint tests
> -
>
> Key: KAFKA-10343
> URL: https://issues.apache.org/jira/browse/KAFKA-10343
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.8.0
>
>
> We need to add ApiVersion constraints test based on IBP to remind future 
> developer bump it when new RPC version is developed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10343) Remove 2.7 IBP for redirection enablement

2020-10-13 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10343:

Description: We need to add ApiVersion constraints test based on IBP to 
remind future developer bump it when new RPC version is developed.  (was: The 
shipment of redirection could not be complete in 2.7. With that being said, we 
need to patch a PR to disable it once the release branch is cut, by removing 
the new IBP flag entirely.)

> Remove 2.7 IBP for redirection enablement
> -
>
> Key: KAFKA-10343
> URL: https://issues.apache.org/jira/browse/KAFKA-10343
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
> Fix For: 2.8.0
>
>
> We need to add ApiVersion constraints test based on IBP to remind future 
> developer bump it when new RPC version is developed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10343) Remove 2.7 IBP for redirection enablement

2020-10-13 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-10343:
---

Assignee: Boyang Chen

> Remove 2.7 IBP for redirection enablement
> -
>
> Key: KAFKA-10343
> URL: https://issues.apache.org/jira/browse/KAFKA-10343
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.8.0
>
>
> We need to add ApiVersion constraints test based on IBP to remind future 
> developer bump it when new RPC version is developed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10343) Remove 2.7 IBP for redirection enablement

2020-09-21 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10343:

Description: The shipment of redirection could not be complete in 2.7. With 
that being said, we need to patch a PR to disable it once the release branch is 
cut, by removing the new IBP flag entirely.  (was: In the bridge release 
broker, AlterClientQuotas should be redirected to the active controller.)

> Remove 2.7 IBP for redirection enablement
> -
>
> Key: KAFKA-10343
> URL: https://issues.apache.org/jira/browse/KAFKA-10343
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> The shipment of redirection could not be complete in 2.7. With that being 
> said, we need to patch a PR to disable it once the release branch is cut, by 
> removing the new IBP flag entirely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10343) Remove 2.7 IBP for redirection enablement

2020-09-21 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10343:

Fix Version/s: 2.7.0

> Remove 2.7 IBP for redirection enablement
> -
>
> Key: KAFKA-10343
> URL: https://issues.apache.org/jira/browse/KAFKA-10343
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> In the bridge release broker, AlterClientQuotas should be redirected to the 
> active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10508) Consider moving ForwardRequestHandler to a separate class

2020-09-21 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10508:
---

 Summary: Consider moving ForwardRequestHandler to a separate class
 Key: KAFKA-10508
 URL: https://issues.apache.org/jira/browse/KAFKA-10508
 Project: Kafka
  Issue Type: Improvement
Reporter: Boyang Chen


With the new redirection template merged in 
[https://github.com/apache/kafka/pull/9103,] the size of KafkaApis file grows 
to 3500+, which is reaching a fair large size. We should consider moving the 
redirection template out as a separate file to reduce the main class size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10343) Remove 2.7 IBP for redirection enablement

2020-09-21 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10343:

Summary: Remove 2.7 IBP for redirection enablement  (was: Redirect 
AlterClientQuotas to the controller)

> Remove 2.7 IBP for redirection enablement
> -
>
> Key: KAFKA-10343
> URL: https://issues.apache.org/jira/browse/KAFKA-10343
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> In the bridge release broker, AlterClientQuotas should be redirected to the 
> active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10350) Add redirect request monitoring metrics

2020-09-21 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-10350:
---

Assignee: Boyang Chen

> Add redirect request monitoring metrics
> ---
>
> Key: KAFKA-10350
> URL: https://issues.apache.org/jira/browse/KAFKA-10350
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> We need to add the metric for monitoring redirection progress as stated in 
> the KIP:
> MBean:kafka.server:type=RequestMetrics,name=NumRequestsForwardingToControllerPerSec,clientId=([-.\w]+)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10181) Create redirection template for configuration change RPCs

2020-09-21 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10181:

Description: In the bridge release broker, 
AlterConfig/IncrementalAlterConfig/CreateTopics/AlterClientQuota should be 
redirected to the active controller. This ticket will ensure those RPCs get 
redirected.  (was: In the bridge release broker, 
AlterConfig/IncrementalAlterConfig should be redirected to the active 
controller.)

> Create redirection template for configuration change RPCs
> -
>
> Key: KAFKA-10181
> URL: https://issues.apache.org/jira/browse/KAFKA-10181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: admin
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> In the bridge release broker, 
> AlterConfig/IncrementalAlterConfig/CreateTopics/AlterClientQuota should be 
> redirected to the active controller. This ticket will ensure those RPCs get 
> redirected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10181) Create redirection template for configuration change RPCs

2020-09-21 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10181:

Summary: Create redirection template for configuration change RPCs  (was: 
Redirect AlterConfig/IncrementalAlterConfig to the controller)

> Create redirection template for configuration change RPCs
> -
>
> Key: KAFKA-10181
> URL: https://issues.apache.org/jira/browse/KAFKA-10181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: admin
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> In the bridge release broker, AlterConfig/IncrementalAlterConfig should be 
> redirected to the active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10475) Using same key reports different count of records for groupBy() and groupByKey() in Kafka Streaming Application

2020-09-21 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199689#comment-17199689
 ] 

Boyang Chen commented on KAFKA-10475:
-

[~saad-rasool] [~guozhang]I don't think we have enough information to reproduce 
this issue, could you give us a sample setup (the application code, the input 
data, and expected output) so that we could verify there is indeed a problem in 
groupByKey?

> Using same key reports different count of records for groupBy() and 
> groupByKey() in Kafka Streaming Application
> ---
>
> Key: KAFKA-10475
> URL: https://issues.apache.org/jira/browse/KAFKA-10475
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
> Environment: Kafka Cluster:
> Kafka Version: kafka_2.12-2.6.0/
> openjdk version "1.8.0_265"
> Kafka Streams:
> Kafka Streams Version: 2.3.0
> openjdk version "11.0.8"
>Reporter: Saad Rasool
>Assignee: Divya Guduru
>Priority: Major
>
>  
> We are experiencing what amounts to “lost packets” in our stream processing 
> when we use custom groupByKey() values. We have a single processor node, with 
> a source topic from which we read packets, do a grouping and aggregation on 
> that group, and output based on a computation that requires access to a 
> statestore.
>  
> Let me give greater details of the problem and how we have tried to 
> understand it until now, below:
> *Overview* We are setting up a Kafka Streams application in which we have to 
> perform windowed operations. We are grouping devices based on a specific key. 
> Following are the sample columns we are using for GroupBy:
>  
> ||Field Name ||Field Value||
> |A|12|
> |B|abc|
> |C|x13|
>  
> Sample Key based on the above data: 12abcx13 where key = Field (A) + Field 
> (B) + Field (C)
> *Problem* Getting a different count of records in two scenarios against the 
> same key When specifying the key ourselves using groupBy() Using groupByKey() 
> to group the data on the ‘Input Kafka Topic’ partitioning key.
> *Description* We were first using the groupBy() function of Kafka streams to 
> group the devices using the key above. In this case, the streams application 
> dropped several records and produced less number of records than expected. 
> However, when we did not specify our own custom grouping using the groupBy() 
> function, and instead used groupByKey() to key the data on the original 
> incoming Kafka partition key, we got the exact number of records which were 
> expected.
> To check that we were using the exact same keys as the input topic for our 
> custom groupBy() function we compared both Keys within the code. The Input 
> topic key and the custom key were exactly the same.
> So now we have come to the conclusion that there is some internal 
> functionality of the groupBy function that we are not able to understand 
> because of which the groupBy function and the groupByKey function both report 
> different counts for the same key. We have searched multiple forums but are 
> unable to understand the reason for this phenomenon.
> *Code Snippet:*
> With groupBykey()
>   
> {code:java}
> KStream myStream = this.stream
> .groupByKey() 
> .windowedBy(TimeWindows.of(Duration.ofMinutes(windowTime)).advanceBy(Duration.ofMinutes(advanceByTime)).grace(Duration.ofSeconds(gracePeriod)))
>  .reduce((value1, value2) -> value2) 
> .suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded())) 
> .toStream() .transform(new myTransformer(this.store.name(), 
> this.store.name());{code}
>  
>   
> With groupBy():
>   
> {code:java}
> KStream myStream = this.stream
> .groupBy((key, value) -> value.A + value.B + value.C, 
> Grouped.with(Serdes.String(), SerdesCollection.getInputSerdes())) 
> .windowedBy(TimeWindows.of(Duration.ofMinutes(windowTime)).advanceBy(Duration.ofMinutes(advanceByTime)).grace(Duration.ofSeconds(gracePeriod)))
>  .reduce((value1, value2) -> value2) 
> .suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded())) 
> .toStream() .transform(new myTransformer(this.store.name()), 
> this.store.name());{code}
>  
>   
> ||*Kafka Cluster Setup*|| ||
> |Number of Nodes|       3|
> |CPU Cores|       2|
> |RAM|     8 Gb|
>  
> ||*Streaming Application Setup*||Version||
> |       {{Kafka Streams Version }}| {{2.3.0}}|
> |          openjdk version| 11.0.8|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10504) It will not work to skip to InitProducerId as lastError is always null

2020-09-21 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199715#comment-17199715
 ] 

Boyang Chen commented on KAFKA-10504:
-

Is this optimization necessary?

> It will not work to skip to InitProducerId as lastError is always null
> --
>
> Key: KAFKA-10504
> URL: https://issues.apache.org/jira/browse/KAFKA-10504
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: HaiyuanZhao
>Assignee: HaiyuanZhao
>Priority: Major
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Kafka-8805 introduced an optimization for txn abort process: If the last 
> error is an INVALID_PRODUCER_ID_MAPPING error, skip directly to InitProduceId.
> However this optimization will not work as the var lastError is always null. 
> Because the txn state will transit to ABORTING_TRANSACTION from 
> ABORTABLE_ERROR when beginAbort is called, and the lastError will updated to 
> null.
> So then EndTxn is always called before InitProduceId.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10513) Newly added topic or partitions are not assigned to running consumer groups using static membership

2020-09-28 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203342#comment-17203342
 ] 

Boyang Chen commented on KAFKA-10513:
-

My understanding is that the consumer was supposed to check the metadata for 
topic partition, and rejoin as necessary when the partition number changes?

> Newly added topic or partitions are not assigned to running consumer groups 
> using static membership
> ---
>
> Key: KAFKA-10513
> URL: https://issues.apache.org/jira/browse/KAFKA-10513
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.6.0
>Reporter: Marlon Ou
>Priority: Major
>
> If consumers are polling messages from a certain topic with static membership 
> and we add new partitions to this topic while the consumers are running, no 
> partition reassignment is ever triggered (and hence messages published into 
> the new partitions are never consumed). 
> To reproduce, simply set group instance IDs on the consumers: 
> {code:java}
> props.setProperty("group.instance.id", instanceId);
> {code}
> And then while the static consumers are running, use Kafka's admin client to 
> add more partitions to the topic:
> {code:java}
> adminClient.createPartitions(...)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9878) Block AddPartitionsToTxn call until the txn markers are committed

2020-05-27 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-9878:
--

Assignee: HaiyuanZhao

> Block AddPartitionsToTxn call until the txn markers are committed
> -
>
> Key: KAFKA-9878
> URL: https://issues.apache.org/jira/browse/KAFKA-9878
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Boyang Chen
>Assignee: HaiyuanZhao
>Priority: Major
>
> Currently the EndTxn call from Producer will immediately return as the 
> control record is written to the txn coordinator log. The ongoing transaction 
> will be going to a pending state to wait for all txn markers to be 
> propagated. In the meantime, producer client will start another new 
> transaction but being rejected constantly until the pending state gets 
> resolved, which is unnecessary round trips and more burden to the broker to 
> handle repetitive requests.
> To avoid this situation, we should make the Producer client wait for txn 
> marker completion instead. This will incur better performance overall, as no 
> more back-off shall be triggered for a subsequent transaction to begin.
> On the other hand, we could also batch complete the AddPartitionsToTxn 
> results if we buffered more than one request in the queue.
> The third change is on the client side, which is to maintain the futures of 
> the AddPartitionsToTxn calls to make more inflight changes as necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10038) ConsumerPerformance.scala supports the setting of client.id

2020-05-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10038:

Component/s: consumer

> ConsumerPerformance.scala supports the setting of client.id
> ---
>
> Key: KAFKA-10038
> URL: https://issues.apache.org/jira/browse/KAFKA-10038
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, core
>Affects Versions: 2.1.1
> Environment: Trunk branch
>Reporter: tigertan
>Priority: Minor
>  Labels: performance
> Fix For: 2.6.0
>
>
> ConsumerPerformance.scala supports the setting of "client.id", which is a 
> reasonable requirement, and the way "console consumer" and "console producer" 
> handle "client.id" can be unified. "client.id" defaults to 
> "perf-consumer-client".
> We often use client.id in quotas, if the script of 
> kafka-producer-perf-test.sh supports the setting of "client.id" , we can do 
> quota testing through scripts without writing our own consumer programs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10038) ConsumerPerformance.scala supports the setting of client.id

2020-05-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10038:

Labels: newbie performance  (was: performance)

> ConsumerPerformance.scala supports the setting of client.id
> ---
>
> Key: KAFKA-10038
> URL: https://issues.apache.org/jira/browse/KAFKA-10038
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, core
>Affects Versions: 2.1.1
> Environment: Trunk branch
>Reporter: tigertan
>Priority: Minor
>  Labels: newbie, performance
> Fix For: 2.6.0
>
>
> ConsumerPerformance.scala supports the setting of "client.id", which is a 
> reasonable requirement, and the way "console consumer" and "console producer" 
> handle "client.id" can be unified. "client.id" defaults to 
> "perf-consumer-client".
> We often use client.id in quotas, if the script of 
> kafka-producer-perf-test.sh supports the setting of "client.id" , we can do 
> quota testing through scripts without writing our own consumer programs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9878) Block AddPartitionsToTxn call until the txn markers are committed

2020-05-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9878:
---
Component/s: producer 
 core

> Block AddPartitionsToTxn call until the txn markers are committed
> -
>
> Key: KAFKA-9878
> URL: https://issues.apache.org/jira/browse/KAFKA-9878
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Boyang Chen
>Priority: Major
>
> Currently the EndTxn call from Producer will immediately return as the 
> control record is written to the txn coordinator log. The ongoing transaction 
> will be going to a pending state to wait for all txn markers to be 
> propagated. In the meantime, producer client will start another new 
> transaction but being rejected constantly until the pending state gets 
> resolved, which is unnecessary round trips and more burden to the broker to 
> handle repetitive requests.
> To avoid this situation, we should make the Producer client wait for txn 
> marker completion instead. This will incur better performance overall, as no 
> more back-off shall be triggered for a subsequent transaction to begin.
> On the other hand, we could also batch complete the AddPartitionsToTxn 
> results if we buffered more than one request in the queue.
> The third change is on the client side, which is to maintain the futures of 
> the AddPartitionsToTxn calls to make more inflight changes as necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9146) Add option to force delete members in stream reset tool

2020-05-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9146:
---
Fix Version/s: 2.6.0

> Add option to force delete members in stream reset tool
> ---
>
> Key: KAFKA-9146
> URL: https://issues.apache.org/jira/browse/KAFKA-9146
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, streams
>Reporter: Boyang Chen
>Assignee: feyman
>Priority: Major
>  Labels: newbie
> Fix For: 2.6.0
>
>
> Sometimes people want to reset the stream application sooner, but blocked by 
> the left-over members inside group coordinator, which only expire after 
> session timeout. When user configures a really long session timeout, it could 
> prevent the group from clearing. We should consider adding the support to 
> cleanup members by forcing them to leave the group. To do that, 
>  # If the stream application is already on static membership, we could call 
> directly from adminClient.removeMembersFromGroup
>  # If the application is on dynamic membership, we should modify 
> adminClient.removeMembersFromGroup interface to allow deletion based on 
> member.id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10042) Make INVALID_PRODUCER_EPOCH abortable from Produce response

2020-05-25 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10042:
---

 Summary: Make INVALID_PRODUCER_EPOCH abortable from Produce 
response
 Key: KAFKA-10042
 URL: https://issues.apache.org/jira/browse/KAFKA-10042
 Project: Kafka
  Issue Type: Sub-task
  Components: producer 
Reporter: Boyang Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9561) Update task input partitions when topic metadata changes

2020-05-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9561:
---
Fix Version/s: 2.6.0

> Update task input partitions when topic metadata changes
> 
>
> Key: KAFKA-9561
> URL: https://issues.apache.org/jira/browse/KAFKA-9561
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Boyang Chen
>Assignee: Stepanenko Vyacheslav
>Priority: Major
>  Labels: beginner, newbie, stream
> Fix For: 2.6.0
>
>
> With https://issues.apache.org/jira/browse/KAFKA-9545, we exposed a 
> possibility that a task could have been alive throughout the rebalance, while 
> the input partitions actually change. For example, a regex subscribed source 
> could have different topics when partitions are added/removed. We need to 
> consider adding the support to expand/shrink the partitions across rebalance 
> to keep task information consistent with subscription data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10064) Add documentation for KIP-571

2020-05-28 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10064:
---

 Summary: Add documentation for KIP-571
 Key: KAFKA-10064
 URL: https://issues.apache.org/jira/browse/KAFKA-10064
 Project: Kafka
  Issue Type: Task
  Components: docs, streams
Reporter: Boyang Chen
Assignee: feyman
 Fix For: 2.6.0


We need to add documentation of KIP-571 similar to what other KIPs going out in 
2.6: [https://github.com/apache/kafka/pull/8621]

 

[~feyman] I'm assigning this to you for now, let me know if there is anything 
missing for context.  Here is the instruction to setup Apache wiki on local: 
[https://cwiki.apache.org/confluence/display/KAFKA/Setup+Kafka+Website+on+Local+Apache+Server]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10055) Add shared internal utility class for Streams module

2020-05-29 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10055:

Summary: Add shared internal utility class for Streams module  (was: Add 
shared internal utility class)

> Add shared internal utility class for Streams module
> 
>
> Key: KAFKA-10055
> URL: https://issues.apache.org/jira/browse/KAFKA-10055
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Minor
>  Labels: beginner, newbie
>
> Throughout the code base we added small helper methods (usually static) when 
> needed. Those helpers are often added to classes they don't really belong to 
> (out of convenience) and may lead to coded duplication.
> We should add a shared utility class that unifies all those helper methods 
> into a single place.
> For example `StreamsThread#eosEnabled`. Please update this ticket description 
> to add more such methods (or just leave a comment).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9891) Invalid state store content after task migration with exactly_once and standby replicas

2020-05-29 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119899#comment-17119899
 ] 

Boyang Chen commented on KAFKA-9891:


[~mateuszjadczyk] I ran your example again, and could see that the assignment 
scenario reproduced with > 1 topic partition. The part I'm confusing is that 
you used 0 ms as offset commit interval, as this would actually cause the 
poison record very likely to commit and replicated. Could you elaborate?

> Invalid state store content after task migration with exactly_once and 
> standby replicas
> ---
>
> Key: KAFKA-9891
> URL: https://issues.apache.org/jira/browse/KAFKA-9891
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.4.1
>Reporter: Mateusz Jadczyk
>Assignee: Boyang Chen
>Priority: Blocker
> Attachments: failedtest, failedtest2, failedtest3, failedtest3_bug, 
> state_store_operations.txt, tasks_assignment.txt
>
>
> We have a simple command id deduplication mechanism (very similar to the one 
> from Kafka Streams examples) based on Kafka Streams State Stores. It stores 
> command ids from the past hour in _persistentWindowStore_. We encountered a 
> problem with the store if there's an exception thrown later in that topology.
>  We run 3 nodes using docker, each with multiple threads set for this 
> particular Streams Application.
> The business flow is as follows (performed within a single subtopology):
>  *  a valid command is sent with command id 
> (_mnl_cmd_31bad7e5-35e7-490e-89f0-616fe967111f_). NODE 1 is running an active 
> task 1_2. First node in the topology analyses if this is a duplicate by 
> checking in the state store (_COMMAND_ID_STORE_), if not puts the command id 
> in the state store and processes the command properly.
>  * an invalid command is sent with the same key but new command id 
> (_mnl_cmd_9f1752da-45b7-4ef7-9ef8-209d826530bc_). Again, check for the 
> duplicated command id is performed, it's not a duplicate, command id is put 
> into the state store. Next node in the topology throws an exception which 
> causes an error on NODE 1 for task 1_2. As a result, transaction is aborted, 
> offsets are not committed. I double checked for the changelog topic - 
> relevant messages are not committed. Therefore, the changelog topic contains 
> only the first command id _mnl_cmd_31bad7e5-35e7-490e-89f0-616fe967111f,_ and 
> not the one which caused a failure.
>  * in the meantime a standby task 1_2 running on NODE 3 replicated 
> _mnl_cmd_31bad7e5-35e7-490e-89f0-616fe967111f_ command id into a local 
> _COMMAND_ID_STORE_
>  * standby task 1_2 on NODE 3 Thread-2 takes over the task as an active one. 
> It checks if this command id is a duplicate - no, it isn't - tries to process 
> the faulty command and throws an exception. Again, transaction aborted, all 
> looks fine.
>  * NODE 3 Thread-1 takes over. It checks for the duplicate. To our surprise, 
> *it is a duplicate!* Even though the transaction has been aborted and the 
> changelog doesn't contain this command id: 
> _mnl_cmd_9f1752da-45b7-4ef7-9ef8-209d826530bc._
>  
> After digging into the Streams logs and some discussion on ([Stack 
> Overflow|https://stackoverflow.com/questions/61247789/invalid-state-store-content-after-aborted-transaction-with-exactly-once-and-stan])
>  we concluded it has something to do with checkpoint files. Here are the 
> detailed logs relevant to checkpoint files.
>  
> {code:java}
> NODE_3 2020-04-15 21:06:14.470 TRACE 1 --- [-StreamThread-2] 
> o.a.k.s.p.i.ProcessorStateManager : stream-thread 
> [Processor-94f7be8e-beec-411f-b4ec-9031527bccdf-StreamThread-2] task 
> [1_2] Checkpointable offsets read from checkpoint: {}
> NODE_3 2020-04-15 21:06:19.413 TRACE 1 --- [-StreamThread-2] 
> o.a.k.s.p.i.ProcessorStateManager : stream-thread 
> [Processor-94f7be8e-beec-411f-b4ec-9031527bccdf-StreamThread-2] task 
> [1_2] Restoring state store COMMAND_ID_STORE from changelog topic 
> Processor-COMMAND_ID_STORE-changelog at checkpoint null
> NODE_3 2020-04-15 21:06:28.470 TRACE 1 --- [-StreamThread-1] 
> o.a.k.s.p.i.ProcessorStateManager : stream-thread 
> [Processor-94f7be8e-beec-411f-b4ec-9031527bccdf-StreamThread-1] 
> standby-task [1_2] Checkpointable offsets read from checkpoint: {}
> NODE_3 2020-04-15 21:06:29.634 TRACE 1 --- [-StreamThread-2] 
> o.a.k.s.s.internals.OffsetCheckpoint : Writing tmp checkpoint file 
> /tmp/kafka-streams/Processor/1_2/.checkpoint.tmp
> NODE_3 2020-04-15 21:06:29.640 TRACE 1 --- [-StreamThread-2] 
> o.a.k.s.s.internals.OffsetCheckpoint : Swapping tmp checkpoint file 
> /tmp/kafka-streams/Processor/1_2/.checkpoint.tmp 
> /tmp/kafka-streams/Processor/1_2/.checkpoint
> NODE_3 2020-04-15 

[jira] [Resolved] (KAFKA-10010) Should make state store registration idempotent

2020-06-01 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-10010.
-
Resolution: Fixed

> Should make state store registration idempotent
> ---
>
> Key: KAFKA-10010
> URL: https://issues.apache.org/jira/browse/KAFKA-10010
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> The current lost all logic doesn't close standby task, which could 
> potentially lead to a tricky condition like below:
> 1. The standby task was initializing as `CREATED` state, and task corrupted 
> exception was thrown from registerStateStores
> 2. The task corrupted exception was caught, and do a non-affected task commit
> 3. The task commit failed due to task migrated exception
> 4. The handleLostAll didn't close the standby task, leaving it as CREATED 
> state
> 5. Next rebalance complete, the same task was assigned back as standby task.
> 6. Illegal Argument exception caught :
> {code:java}
> [2020-05-16T11:56:18-07:00] 
> (streams-soak-trunk-eos-beta_soak_i-065b27929d3e7014a_streamslog) [2020-05-16 
> 18:56:18,050] ERROR 
> [stream-soak-test-ce5e4abb-24f5-48cd-9608-21c59b6b42a3-StreamThread-1] 
> stream-thread 
> [stream-soak-test-ce5e4abb-24f5-48cd-9608-21c59b6b42a3-StreamThread-1] 
> Encountered the following exception during processing and the thread is going 
> to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-05-16T11:56:18-07:00] 
> (streams-soak-trunk-eos-beta_soak_i-065b27929d3e7014a_streamslog) 
> java.lang.IllegalArgumentException: stream-thread 
> [stream-soak-test-ce5e4abb-24f5-48cd-9608-21c59b6b42a3-StreamThread-1] 
> standby-task [1_2] Store KSTREAM-AGGREGATE-STATE-STORE-07 has already 
> been registered.
>         at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStore(ProcessorStateManager.java:269)
>         at 
> org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:112)
>         at 
> org.apache.kafka.streams.state.internals.AbstractRocksDBSegmentedBytesStore.init(AbstractRocksDBSegmentedBytesStore.java:191)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBWindowStore.init(RocksDBWindowStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.init(ChangeLoggingWindowBytesStore.java:54)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.CachingWindowStore.init(CachingWindowStore.java:74)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>         at 
> org.apache.kafka.streams.state.internals.MeteredWindowStore.lambda$init$0(MeteredWindowStore.java:85)
>         at 
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:804)
>         at 
> org.apache.kafka.streams.state.internals.MeteredWindowStore.init(MeteredWindowStore.java:85)
>         at 
> org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:82)
>         at 
> org.apache.kafka.streams.processor.internals.StandbyTask.initializeIfNeeded(StandbyTask.java:89)
>         at 
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:358)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:664)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:550)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:509)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10011) lockedTaskDirectories should be cleared when task gets closed dirty in HandleLostAll

2020-06-01 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-10011.
-
Resolution: Fixed

> lockedTaskDirectories should be cleared when task gets closed dirty in 
> HandleLostAll
> 
>
> Key: KAFKA-10011
> URL: https://issues.apache.org/jira/browse/KAFKA-10011
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Tasks who get closed in handleLostAll don't clear out their position inside 
> lockedTaskDirectories, which causes an illegal state afterwards:
> {code:java}
> [2020-05-17T06:21:54-07:00] 
> (streams-soak-trunk-eos-beta_soak_i-0b021dbf00474b6aa_streamslog) [2020-05-17 
> 13:21:54,127] ERROR 
> [stream-soak-test-150cf9ae-793b-4aac-bea0-0fb61d228b39-StreamThread-3] 
> stream-thread 
> [stream-soak-test-150cf9ae-793b-4aac-bea0-0fb61d228b39-StreamThread-3] 
> Encountered the following exception during processing and the thread is going 
> to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-05-17T06:21:54-07:00] 
> (streams-soak-trunk-eos-beta_soak_i-0b021dbf00474b6aa_streamslog) 
> org.apache.kafka.streams.errors.ProcessorStateException: task directory 
> [/mnt/run/streams/state/stream-soak-test/3_1] doesn't exist and couldn't be 
> created
>         at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:112)
>         at 
> org.apache.kafka.streams.processor.internals.StateDirectory.checkpointFileFor(StateDirectory.java:121)
>         at 
> org.apache.kafka.streams.processor.internals.TaskManager.getTaskOffsetSums(TaskManager.java:498)
>         at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.subscriptionUserData(StreamsPartitionAssignor.java:239)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.metadata(ConsumerCoordinator.java:222)
>         at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest(AbstractCoordinator.java:560)
>         at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.initiateJoinGroup(AbstractCoordinator.java:495)
>         at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:417)
>         at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:506)
>         at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1265)
>         at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231)
>         at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1206)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:770)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:630)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:550)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9891) Invalid state store content after task migration with exactly_once and standby replicas

2020-06-01 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121149#comment-17121149
 ] 

Boyang Chen commented on KAFKA-9891:


[~mateuszjadczyk] If we commit keyOne, then it *should* be materialized inside 
the standby state store copy, and the duplicate key check will see it, correct? 
Will sync with [~guozhang] on whether to backport this checkpoint lock to 2.4.

> Invalid state store content after task migration with exactly_once and 
> standby replicas
> ---
>
> Key: KAFKA-9891
> URL: https://issues.apache.org/jira/browse/KAFKA-9891
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.4.1
>Reporter: Mateusz Jadczyk
>Assignee: Boyang Chen
>Priority: Blocker
> Attachments: failedtest, failedtest2, failedtest3, failedtest3_bug, 
> state_store_operations.txt, tasks_assignment.txt
>
>
> We have a simple command id deduplication mechanism (very similar to the one 
> from Kafka Streams examples) based on Kafka Streams State Stores. It stores 
> command ids from the past hour in _persistentWindowStore_. We encountered a 
> problem with the store if there's an exception thrown later in that topology.
>  We run 3 nodes using docker, each with multiple threads set for this 
> particular Streams Application.
> The business flow is as follows (performed within a single subtopology):
>  *  a valid command is sent with command id 
> (_mnl_cmd_31bad7e5-35e7-490e-89f0-616fe967111f_). NODE 1 is running an active 
> task 1_2. First node in the topology analyses if this is a duplicate by 
> checking in the state store (_COMMAND_ID_STORE_), if not puts the command id 
> in the state store and processes the command properly.
>  * an invalid command is sent with the same key but new command id 
> (_mnl_cmd_9f1752da-45b7-4ef7-9ef8-209d826530bc_). Again, check for the 
> duplicated command id is performed, it's not a duplicate, command id is put 
> into the state store. Next node in the topology throws an exception which 
> causes an error on NODE 1 for task 1_2. As a result, transaction is aborted, 
> offsets are not committed. I double checked for the changelog topic - 
> relevant messages are not committed. Therefore, the changelog topic contains 
> only the first command id _mnl_cmd_31bad7e5-35e7-490e-89f0-616fe967111f,_ and 
> not the one which caused a failure.
>  * in the meantime a standby task 1_2 running on NODE 3 replicated 
> _mnl_cmd_31bad7e5-35e7-490e-89f0-616fe967111f_ command id into a local 
> _COMMAND_ID_STORE_
>  * standby task 1_2 on NODE 3 Thread-2 takes over the task as an active one. 
> It checks if this command id is a duplicate - no, it isn't - tries to process 
> the faulty command and throws an exception. Again, transaction aborted, all 
> looks fine.
>  * NODE 3 Thread-1 takes over. It checks for the duplicate. To our surprise, 
> *it is a duplicate!* Even though the transaction has been aborted and the 
> changelog doesn't contain this command id: 
> _mnl_cmd_9f1752da-45b7-4ef7-9ef8-209d826530bc._
>  
> After digging into the Streams logs and some discussion on ([Stack 
> Overflow|https://stackoverflow.com/questions/61247789/invalid-state-store-content-after-aborted-transaction-with-exactly-once-and-stan])
>  we concluded it has something to do with checkpoint files. Here are the 
> detailed logs relevant to checkpoint files.
>  
> {code:java}
> NODE_3 2020-04-15 21:06:14.470 TRACE 1 --- [-StreamThread-2] 
> o.a.k.s.p.i.ProcessorStateManager : stream-thread 
> [Processor-94f7be8e-beec-411f-b4ec-9031527bccdf-StreamThread-2] task 
> [1_2] Checkpointable offsets read from checkpoint: {}
> NODE_3 2020-04-15 21:06:19.413 TRACE 1 --- [-StreamThread-2] 
> o.a.k.s.p.i.ProcessorStateManager : stream-thread 
> [Processor-94f7be8e-beec-411f-b4ec-9031527bccdf-StreamThread-2] task 
> [1_2] Restoring state store COMMAND_ID_STORE from changelog topic 
> Processor-COMMAND_ID_STORE-changelog at checkpoint null
> NODE_3 2020-04-15 21:06:28.470 TRACE 1 --- [-StreamThread-1] 
> o.a.k.s.p.i.ProcessorStateManager : stream-thread 
> [Processor-94f7be8e-beec-411f-b4ec-9031527bccdf-StreamThread-1] 
> standby-task [1_2] Checkpointable offsets read from checkpoint: {}
> NODE_3 2020-04-15 21:06:29.634 TRACE 1 --- [-StreamThread-2] 
> o.a.k.s.s.internals.OffsetCheckpoint : Writing tmp checkpoint file 
> /tmp/kafka-streams/Processor/1_2/.checkpoint.tmp
> NODE_3 2020-04-15 21:06:29.640 TRACE 1 --- [-StreamThread-2] 
> o.a.k.s.s.internals.OffsetCheckpoint : Swapping tmp checkpoint file 
> /tmp/kafka-streams/Processor/1_2/.checkpoint.tmp 
> /tmp/kafka-streams/Processor/1_2/.checkpoint
> NODE_3 2020-04-15 21:11:15.909 TRACE 1 --- [-StreamThread-1] 
> 

[jira] [Updated] (KAFKA-9910) Implement new transaction timed out error

2020-09-19 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9910:
---
Component/s: core
 clients

> Implement new transaction timed out error
> -
>
> Key: KAFKA-9910
> URL: https://issues.apache.org/jira/browse/KAFKA-9910
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core
>Reporter: Boyang Chen
>Assignee: HaiyuanZhao
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9910) Implement new transaction timed out error

2020-09-19 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9910:
---
Fix Version/s: 2.7.0

> Implement new transaction timed out error
> -
>
> Key: KAFKA-9910
> URL: https://issues.apache.org/jira/browse/KAFKA-9910
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: HaiyuanZhao
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10270) Add a broker to controller channel manager to redirect AlterConfig

2020-07-29 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-10270.
-
Resolution: Fixed

> Add a broker to controller channel manager to redirect AlterConfig
> --
>
> Key: KAFKA-10270
> URL: https://issues.apache.org/jira/browse/KAFKA-10270
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Per KIP-590 requirement, we need to have a dedicate communication channel 
> from broker to the controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10181) Redirect AlterConfig/IncrementalAlterConfig to the controller

2020-07-29 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10181:

Summary: Redirect AlterConfig/IncrementalAlterConfig to the controller  
(was: AlterConfig/IncrementalAlterConfig should go to controller)

> Redirect AlterConfig/IncrementalAlterConfig to the controller
> -
>
> Key: KAFKA-10181
> URL: https://issues.apache.org/jira/browse/KAFKA-10181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: admin
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> In the new Admin client, the request should always be routed towards the 
> controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10270) Add a broker to controller channel manager to redirect AlterConfig

2020-07-29 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10270:

Fix Version/s: 2.7.0

> Add a broker to controller channel manager to redirect AlterConfig
> --
>
> Key: KAFKA-10270
> URL: https://issues.apache.org/jira/browse/KAFKA-10270
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> Per KIP-590 requirement, we need to have a dedicate communication channel 
> from broker to the controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10350) Add redirect request monitoring metrics

2020-08-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10350:
---

 Summary: Add redirect request monitoring metrics
 Key: KAFKA-10350
 URL: https://issues.apache.org/jira/browse/KAFKA-10350
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


We need to add the metric for monitoring redirection progress as stated in the 
KIP:

MBean:kafka.server:type=RequestMetrics,name=NumRequestsForwardingToControllerPerSec,clientId=([-.\w]+)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10181) Redirect AlterConfig/IncrementalAlterConfig to the controller

2020-08-03 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10181:

Description: In the bridge release broker, 
AlterConfig/IncrementalAlterConfig should be redirected to the active 
controller.  (was: In the new Admin client, the 
AlterConfig/IncrementalAlterConfig request should be redirected to the active 
controller.)

> Redirect AlterConfig/IncrementalAlterConfig to the controller
> -
>
> Key: KAFKA-10181
> URL: https://issues.apache.org/jira/browse/KAFKA-10181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: admin
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> In the bridge release broker, AlterConfig/IncrementalAlterConfig should be 
> redirected to the active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10343) Redirect AlterClientQuotas to the controller

2020-08-03 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10343:

Description: In the bridge release broker, AlterClientQuotas should be 
redirected to the active controller.  (was: In the new Admin client, the 
AlterClientQuotas request should be redirected to the active controller.)

> Redirect AlterClientQuotas to the controller
> 
>
> Key: KAFKA-10343
> URL: https://issues.apache.org/jira/browse/KAFKA-10343
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> In the bridge release broker, AlterClientQuotas should be redirected to the 
> active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10342) Redirect Create/DeleteAcls to the controller

2020-08-03 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10342:

Description: In the bridge release broker,Create/DeleteAcls should be 
redirected to the active controller.  (was: In the new Admin client, the 
CreateAcls/DeleteAcls request should be redirected to the active controller.
In the bridge release broker, Create/Renew/ExpireDelegationToken should be 
redirected to the active controller.)

> Redirect Create/DeleteAcls to the controller
> 
>
> Key: KAFKA-10342
> URL: https://issues.apache.org/jira/browse/KAFKA-10342
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> In the bridge release broker,Create/DeleteAcls should be redirected to the 
> active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9705) Zookeeper mutation protocols should be redirected to Controller only

2020-08-03 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9705:
---
Fix Version/s: 2.7.0

> Zookeeper mutation protocols should be redirected to Controller only
> 
>
> Key: KAFKA-9705
> URL: https://issues.apache.org/jira/browse/KAFKA-9705
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> In the bridge release, we need to restrict the direct access of ZK to 
> controller only. This means the existing AlterConfig path should be migrated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10349) Deprecate client side controller access

2020-08-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10349:
---

 Summary: Deprecate client side controller access
 Key: KAFKA-10349
 URL: https://issues.apache.org/jira/browse/KAFKA-10349
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


As stated in KIP-590, we would disallow new admin client to discover the 
controller location for encapsulation. For older broker communication, the 
metadata response will still contain the controller location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10348) Redirect UpdateFeatures to the controller

2020-08-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10348:
---

 Summary: Redirect UpdateFeatures to the controller
 Key: KAFKA-10348
 URL: https://issues.apache.org/jira/browse/KAFKA-10348
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


In the bridge release broker, the UpdateFeatures should be redirected to the 
active controller instead of relying on admin client discovery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10347) Redirect Create/DeleteTopics to the controller

2020-08-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10347:
---

 Summary: Redirect Create/DeleteTopics to the controller
 Key: KAFKA-10347
 URL: https://issues.apache.org/jira/browse/KAFKA-10347
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


In the bridge release broker, the Create/DeleteTopics should be redirected to 
the active controller instead of relying on admin client discovery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10345) Redirect AlterPartitionReassignment to the controller

2020-08-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10345:
---

 Summary: Redirect AlterPartitionReassignment to the controller 
 Key: KAFKA-10345
 URL: https://issues.apache.org/jira/browse/KAFKA-10345
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


In the bridge release broker, the AlterPartitionReassignment should be 
redirected to the active controller instead of relying on admin client 
discovery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10343) Redirect AlterClientQuotas to the controller

2020-08-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10343:
---

 Summary: Redirect AlterClientQuotas to the controller
 Key: KAFKA-10343
 URL: https://issues.apache.org/jira/browse/KAFKA-10343
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


In the new Admin client, the AlterClientQuotas request should be redirected to 
the active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10344) Redirect Create/Renew/ExpireDelegationToken to the controller

2020-08-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10344:
---

 Summary: Redirect Create/Renew/ExpireDelegationToken to the 
controller
 Key: KAFKA-10344
 URL: https://issues.apache.org/jira/browse/KAFKA-10344
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


In the bridge release broker, Create/Renew/ExpireDelegationToken should be 
redirected to the active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10342) Redirect Create/DeleteAcls to the controller

2020-08-03 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10342:

Summary: Redirect Create/DeleteAcls to the controller  (was: Redirect 
CreateAcls/DeleteAcls to the controller)

> Redirect Create/DeleteAcls to the controller
> 
>
> Key: KAFKA-10342
> URL: https://issues.apache.org/jira/browse/KAFKA-10342
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> In the new Admin client, the CreateAcls/DeleteAcls request should be 
> redirected to the active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10342) Redirect Create/DeleteAcls to the controller

2020-08-03 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10342:

Description: 
In the new Admin client, the CreateAcls/DeleteAcls request should be redirected 
to the active controller.
In the bridge release broker, Create/Renew/ExpireDelegationToken should be 
redirected to the active controller.

  was:In the new Admin client, the CreateAcls/DeleteAcls request should be 
redirected to the active controller.


> Redirect Create/DeleteAcls to the controller
> 
>
> Key: KAFKA-10342
> URL: https://issues.apache.org/jira/browse/KAFKA-10342
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> In the new Admin client, the CreateAcls/DeleteAcls request should be 
> redirected to the active controller.
> In the bridge release broker, Create/Renew/ExpireDelegationToken should be 
> redirected to the active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9751) Internal topic creation should go to controller

2020-08-03 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9751:
---
Summary: Internal topic creation should go to controller  (was: Admin 
`FindCoordinator` call should go to controller instead of ZK)

> Internal topic creation should go to controller
> ---
>
> Key: KAFKA-9751
> URL: https://issues.apache.org/jira/browse/KAFKA-9751
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> In current trunk, we are still going to use ZK for topic creation in the 
> routing of FindCoordinatorRequest:
>  val (partition, topicMetadata) = 
> CoordinatorType.forId(findCoordinatorRequest.data.keyType) match {
>         case CoordinatorType.GROUP =>
>           val partition = 
> groupCoordinator.partitionFor(findCoordinatorRequest.data.key)
>           val metadata = getOrCreateInternalTopic(GROUP_METADATA_TOPIC_NAME, 
> request.context.listenerName)
>           (partition, metadata)
> Which should be migrated to controller handling



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9751) Internal topic creation should go to controller

2020-08-03 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9751:
---
Description: For use cases to create internal topics through 
FindCoordinator or Metadata request, receiving broker should route the topic 
creation request to the controller instead of handling by itself.  (was: In 
current trunk, we are still going to use ZK for topic creation in the routing 
of FindCoordinatorRequest:
 val (partition, topicMetadata) = 
CoordinatorType.forId(findCoordinatorRequest.data.keyType) match {
        case CoordinatorType.GROUP =>
          val partition = 
groupCoordinator.partitionFor(findCoordinatorRequest.data.key)
          val metadata = getOrCreateInternalTopic(GROUP_METADATA_TOPIC_NAME, 
request.context.listenerName)
          (partition, metadata)
Which should be migrated to controller handling)

> Internal topic creation should go to controller
> ---
>
> Key: KAFKA-9751
> URL: https://issues.apache.org/jira/browse/KAFKA-9751
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> For use cases to create internal topics through FindCoordinator or Metadata 
> request, receiving broker should route the topic creation request to the 
> controller instead of handling by itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10342) Redirect CreateAcls/DeleteAcls to the controller

2020-08-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10342:
---

 Summary: Redirect CreateAcls/DeleteAcls to the controller
 Key: KAFKA-10342
 URL: https://issues.apache.org/jira/browse/KAFKA-10342
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


In the new Admin client, the CreateAcls/DeleteAcls request should be redirected 
to the active controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10349) Deprecate client side controller access

2020-08-06 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10349:

Description: 
As stated in KIP-590, we would disallow new admin client to discover the 
controller location for encapsulation. For older broker communication, the 
metadata response will still contain the controller location.

Note that this should be done as the last step of the redirection changes to 
ensure all the RPC are migrated and could reach to the controller in new 
version.

  was:As stated in KIP-590, we would disallow new admin client to discover the 
controller location for encapsulation. For older broker communication, the 
metadata response will still contain the controller location.


> Deprecate client side controller access
> ---
>
> Key: KAFKA-10349
> URL: https://issues.apache.org/jira/browse/KAFKA-10349
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> As stated in KIP-590, we would disallow new admin client to discover the 
> controller location for encapsulation. For older broker communication, the 
> metadata response will still contain the controller location.
> Note that this should be done as the last step of the redirection changes to 
> ensure all the RPC are migrated and could reach to the controller in new 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9911) Implement new producer fenced error

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-9911.

Resolution: Fixed

> Implement new producer fenced error
> ---
>
> Key: KAFKA-9911
> URL: https://issues.apache.org/jira/browse/KAFKA-9911
> Project: Kafka
>  Issue Type: Sub-task
>  Components: producer 
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9911) Implement new producer fenced error

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9911:
---
Fix Version/s: 2.7.0

> Implement new producer fenced error
> ---
>
> Key: KAFKA-9911
> URL: https://issues.apache.org/jira/browse/KAFKA-9911
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9911) Implement new producer fenced error

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-9911:
--

Assignee: Boyang Chen

> Implement new producer fenced error
> ---
>
> Key: KAFKA-9911
> URL: https://issues.apache.org/jira/browse/KAFKA-9911
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9911) Implement new producer fenced error

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9911:
---
Component/s: producer 

> Implement new producer fenced error
> ---
>
> Key: KAFKA-9911
> URL: https://issues.apache.org/jira/browse/KAFKA-9911
> Project: Kafka
>  Issue Type: Sub-task
>  Components: producer 
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10354) Kafka Bug

2020-08-04 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171040#comment-17171040
 ] 

Boyang Chen commented on KAFKA-10354:
-

What's this ticket about?

> Kafka Bug
> -
>
> Key: KAFKA-10354
> URL: https://issues.apache.org/jira/browse/KAFKA-10354
> Project: Kafka
>  Issue Type: Bug
>Reporter: Eeranna Kuruva
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4587) Rethink Unification of Caching with Dedupping

2020-08-04 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170971#comment-17170971
 ] 

Boyang Chen commented on KAFKA-4587:


What's the status of this ticket? Do we still want to work on it? [~guozhang] 
[~mjsax]

> Rethink Unification of Caching with Dedupping
> -
>
> Key: KAFKA-4587
> URL: https://issues.apache.org/jira/browse/KAFKA-4587
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>
> This is discussed in PR https://github.com/apache/kafka/pull/1588
> In order to support user-customizable state store suppliers in the DSL we did 
> the following:
> 1) Introduce a {{TupleForwarder}} to forward tuples from cached stores that 
> is wrapping user customized stores.
> 2) Narrow the scope to only dedup on forwarding if it is the default 
> CachingXXStore with wrapper RocksDB. 
> With this, the unification of dedupping and caching is less useful now, and 
> we are complicating the inner implementations for forwarding a lot. We need 
> to re-consider this feature with finer granularity of turning on / off 
> caching per store, potentially with explicit triggers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10042) Make INVALID_PRODUCER_EPOCH abortable from Produce response

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-10042:
---

Assignee: Boyang Chen

> Make INVALID_PRODUCER_EPOCH abortable from Produce response
> ---
>
> Key: KAFKA-10042
> URL: https://issues.apache.org/jira/browse/KAFKA-10042
> Project: Kafka
>  Issue Type: Sub-task
>  Components: producer 
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10042) Make INVALID_PRODUCER_EPOCH abortable from Produce response

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-10042.
-
Resolution: Fixed

> Make INVALID_PRODUCER_EPOCH abortable from Produce response
> ---
>
> Key: KAFKA-10042
> URL: https://issues.apache.org/jira/browse/KAFKA-10042
> Project: Kafka
>  Issue Type: Sub-task
>  Components: producer 
>Reporter: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-12 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176639#comment-17176639
 ] 

Boyang Chen commented on KAFKA-10284:
-

I would say for all.

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: help-wanted
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9803) Allow producers to recover gracefully from transaction timeouts

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9803:
---
Fix Version/s: 2.7.0

> Allow producers to recover gracefully from transaction timeouts
> ---
>
> Key: KAFKA-9803
> URL: https://issues.apache.org/jira/browse/KAFKA-9803
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer , streams
>Reporter: Jason Gustafson
>Assignee: Boyang Chen
>Priority: Major
>  Labels: needs-kip
> Fix For: 2.7.0
>
>
> Transaction timeouts are detected by the transaction coordinator. When the 
> coordinator detects a timeout, it bumps the producer epoch and aborts the 
> transaction. The epoch bump is necessary in order to prevent the current 
> producer from being able to begin writing to a new transaction which was not 
> started through the coordinator.  
> Transactions may also be aborted if a new producer with the same 
> `transactional.id` starts up. Similarly this results in an epoch bump. 
> Currently the coordinator does not distinguish these two cases. Both will end 
> up as a `ProducerFencedException`, which means the producer needs to shut 
> itself down. 
> We can improve this with the new APIs from KIP-360. When the coordinator 
> times out a transaction, it can remember that fact and allow the existing 
> producer to claim the bumped epoch and continue. Roughly the logic would work 
> like this:
> 1. When a transaction times out, set lastProducerEpoch to the current epoch 
> and do the normal bump.
> 2. Any transactional requests from the old epoch result in a new 
> TRANSACTION_TIMED_OUT error code, which is propagated to the application.
> 3. The producer recovers by sending InitProducerId with the current epoch. 
> The coordinator returns the bumped epoch.
> One issue that needs to be addressed is how to handle INVALID_PRODUCER_EPOCH 
> from Produce requests. Partition leaders will not generally know if a bumped 
> epoch was the result of a timed out transaction or a fenced producer. 
> Possibly the producer can treat these errors as abortable when they come from 
> Produce responses. In that case, the user would try to abort the transaction 
> and then we can see if it was due to a timeout or otherwise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10042) Make INVALID_PRODUCER_EPOCH abortable from Produce response

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10042:

Fix Version/s: 2.7.0

> Make INVALID_PRODUCER_EPOCH abortable from Produce response
> ---
>
> Key: KAFKA-10042
> URL: https://issues.apache.org/jira/browse/KAFKA-10042
> Project: Kafka
>  Issue Type: Sub-task
>  Components: producer 
>Reporter: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-12 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-10284:
---

Assignee: Boyang Chen

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>  Labels: help-wanted
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-12 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176639#comment-17176639
 ] 

Boyang Chen edited comment on KAFKA-10284 at 8/12/20, 10:48 PM:


I would say for all. Let me check if [~feyman] is interested in picking this up.


was (Author: bchen225242):
I would say for all.

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: help-wanted
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10400) Add a customized Kafka Streams logo

2020-08-13 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10400:
---

 Summary: Add a customized Kafka Streams logo
 Key: KAFKA-10400
 URL: https://issues.apache.org/jira/browse/KAFKA-10400
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Boyang Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9007) Flaky Test kafka.api.SaslPlaintextConsumerTest.testCoordinatorFailover

2020-08-11 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175662#comment-17175662
 ] 

Boyang Chen commented on KAFKA-9007:


[https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3773/testReport/junit/kafka.api/SaslSslConsumerTest/testCoordinatorFailover/]

 
h3. Error Message

java.lang.AssertionError: expected: but 
was:
h3. Stacktrace

java.lang.AssertionError: expected: but 
was: at org.junit.Assert.fail(Assert.java:89) at 
org.junit.Assert.failNotEquals(Assert.java:835) at 
org.junit.Assert.assertEquals(Assert.java:120) at 
org.junit.Assert.assertEquals(Assert.java:146) at 
kafka.api.AbstractConsumerTest.sendAndAwaitAsyncCommit(AbstractConsumerTest.scala:195)
 at 
kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:302)
 at 
kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:76) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)

> Flaky Test kafka.api.SaslPlaintextConsumerTest.testCoordinatorFailover
> --
>
> Key: KAFKA-9007
> URL: https://issues.apache.org/jira/browse/KAFKA-9007
> Project: Kafka
>  Issue Type: Test
>  Components: core
>Reporter: Bill Bejeck
>Priority: Major
>  Labels: flaky-test
>
> Failed in 
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/25644/testReport/junit/kafka.api/SaslPlaintextConsumerTest/testCoordinatorFailover/]
>  
> {noformat}
> Error Messagejava.lang.AssertionError: expected: but 
> was: cannot be completed since the group has already rebalanced and assigned the 
> partitions to another member. This means that the time between subsequent 
> calls to poll() was longer than the configured max.poll.interval.ms, which 
> typically implies that the poll loop is spending too much time message 
> processing. You can address this either by increasing max.poll.interval.ms or 
> by reducing the maximum size of batches returned in poll() with 
> max.poll.records.)>Stacktracejava.lang.AssertionError: expected: but 
> was: cannot be completed since the group has already rebalanced and assigned the 
> partitions to another member. This means that the time between subsequent 
> calls to poll() was longer than the configured max.poll.interval.ms, which 
> typically implies that the poll loop is spending too much time message 
> processing. You can address this either by increasing max.poll.interval.ms or 
> by reducing the maximum size of batches returned in poll() with 
> max.poll.records.)>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:120)
>   at org.junit.Assert.assertEquals(Assert.java:146)
>   at 
> kafka.api.AbstractConsumerTest.sendAndAwaitAsyncCommit(AbstractConsumerTest.scala:195)
>   at 
> kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:302)
>   at 
> kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:76)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> 

[jira] [Updated] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-11 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10284:

Labels: help-wanted  (was: )

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: help-wanted
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-11 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-10284:
---

Assignee: (was: Boyang Chen)

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-11 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175669#comment-17175669
 ] 

Boyang Chen commented on KAFKA-10284:
-

Resign from this ticket for now, others feel free to pick up.

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: help-wanted
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8033) Flaky Test PlaintextConsumerTest#testFetchInvalidOffset

2020-06-29 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148286#comment-17148286
 ] 

Boyang Chen commented on KAFKA-8033:


failed again: [https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/1290/]
testFetchInvalidOffset
*11:04:50* kafka.api.PlaintextConsumerTest > testFetchInvalidOffset 
FAILED*11:04:50* org.scalatest.exceptions.TestFailedException: Expected 
exception org.apache.kafka.clients.consumer.NoOffsetForPartitionException to be 
thrown, but no exception was thrown*11:04:50* at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)*11:04:50*
 at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)*11:04:50*
 at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)*11:04:50*
 at org.scalatest.Assertions.intercept(Assertions.scala:824)*11:04:50*  
   at org.scalatest.Assertions.intercept$(Assertions.scala:804)*11:04:50*   
  at org.scalatest.Assertions$.intercept(Assertions.scala:1389)*11:04:50*   
  at 
kafka.api.PlaintextConsumerTest.testFetchInvalidOffset(PlaintextConsumerTest.scala:682)*11:04:50*

> Flaky Test PlaintextConsumerTest#testFetchInvalidOffset
> ---
>
> Key: KAFKA-8033
> URL: https://issues.apache.org/jira/browse/KAFKA-8033
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.7.0, 2.6.1
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2829/testReport/junit/kafka.api/PlaintextConsumerTest/testFetchInvalidOffset/]
> {quote}org.scalatest.junit.JUnitTestFailedError: Expected exception 
> org.apache.kafka.clients.consumer.NoOffsetForPartitionException to be thrown, 
> but no exception was thrown{quote}
> STDOUT prints this over and over again:
> {quote}[2019-03-02 04:01:25,576] ERROR [ReplicaFetcher replicaId=0, 
> leaderId=1, fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9009) Flaky Test kafka.integration.MetricsDuringTopicCreationDeletionTest.testMetricsDuringTopicCreateDelete

2020-07-01 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149560#comment-17149560
 ] 

Boyang Chen commented on KAFKA-9009:


Failed 
again:[https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3200/testReport/junit/kafka.integration/MetricsDuringTopicCreationDeletionTest/testMetricsDuringTopicCreateDelete/]


java.lang.AssertionError: assertion failed: UnderReplicatedPartitionCount not 
0: 1 at 
kafka.integration.MetricsDuringTopicCreationDeletionTest.testMetricsDuringTopicCreateDelete(MetricsDuringTopicCreationDeletionTest.scala:122)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
 at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
 at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
 at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
 at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
 at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119)
 at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
 at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
 at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414)
 at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
 at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at 

[jira] [Updated] (KAFKA-9009) Flaky Test kafka.integration.MetricsDuringTopicCreationDeletionTest.testMetricsDuringTopicCreateDelete

2020-07-01 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9009:
---
Affects Version/s: 2.6.0
   2.5.0

> Flaky Test 
> kafka.integration.MetricsDuringTopicCreationDeletionTest.testMetricsDuringTopicCreateDelete
> --
>
> Key: KAFKA-9009
> URL: https://issues.apache.org/jira/browse/KAFKA-9009
> Project: Kafka
>  Issue Type: Test
>  Components: core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Bill Bejeck
>Priority: Major
>  Labels: flaky-test
>
> Failure seen in 
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/25644/testReport/junit/kafka.integration/MetricsDuringTopicCreationDeletionTest/testMetricsDuringTopicCreateDelete/]
>  
> {noformat}
> Error Messagejava.lang.AssertionError: assertion failed: 
> UnderReplicatedPartitionCount not 0: 1Stacktracejava.lang.AssertionError: 
> assertion failed: UnderReplicatedPartitionCount not 0: 1
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> kafka.integration.MetricsDuringTopicCreationDeletionTest.testMetricsDuringTopicCreateDelete(MetricsDuringTopicCreationDeletionTest.scala:123)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> 

[jira] [Commented] (KAFKA-6520) When a Kafka Stream can't communicate with the server, it's Status stays RUNNING

2020-07-02 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150519#comment-17150519
 ] 

Boyang Chen commented on KAFKA-6520:


[~VinceMu] Gave you contributor permission and assign the ticket to you.

> When a Kafka Stream can't communicate with the server, it's Status stays 
> RUNNING
> 
>
> Key: KAFKA-6520
> URL: https://issues.apache.org/jira/browse/KAFKA-6520
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michael Kohout
>Assignee: Vince Mu
>Priority: Major
>  Labels: newbie, user-experience
>
> KIP WIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams]
> When you execute the following scenario the application is always in RUNNING 
> state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer 
> connected(Stream State is still RUNNING, and the uncaught exception handler 
> isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
>  See 
> [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] 
> for a discussion from the google user forum.  This is a link to a related 
> issue.
> -
> Update: there are some discussions on the PR itself which leads me to think 
> that a more general solution should be at the ClusterConnectionStates rather 
> than at the Streams or even Consumer level. One proposal would be:
>  * Add a new metric named `failedConnection` in SelectorMetrics which is 
> recorded at `connect()` and `pollSelectionKeys()` functions, upon capture the 
> IOException / RuntimeException which indicates the connection disconnected.
>  * And then users of Consumer / Streams can monitor on this metric, which 
> normally will only have close to zero values as we have transient 
> disconnects, if it is spiking it means the brokers are consistently being 
> unavailable indicting the state.
> [~Yohan123] WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9831) Failing test: EosIntegrationTest.shouldNotViolateEosIfOneTaskFailsWithState[exactly_once_beta]

2020-07-02 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150516#comment-17150516
 ] 

Boyang Chen commented on KAFKA-9831:


Failed again: 

[https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/1377/testReport/junit/org.apache.kafka.streams.integration/EosIntegrationTest/shouldNotViolateEosIfOneTaskFailsWithState_exactly_once_/]
h3. Error Message

java.lang.AssertionError: Expected: <[KeyValue(0, 0), KeyValue(0, 1), 
KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 
21), KeyValue(0, 28), KeyValue(0, 36), KeyValue(0, 45), KeyValue(0, 55), 
KeyValue(0, 66), KeyValue(0, 78), KeyValue(0, 91), KeyValue(0, 105)]> but: was 
<[KeyValue(0, 0), KeyValue(0, 1), KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 
10), KeyValue(0, 15), KeyValue(0, 21), KeyValue(0, 0), KeyValue(0, 1), 
KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 
21), KeyValue(0, 0), KeyValue(0, 1), KeyValue(0, 3), KeyValue(0, 6), 
KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 21), KeyValue(0, 28)]>
h3. Stacktrace

java.lang.AssertionError: Expected: <[KeyValue(0, 0), KeyValue(0, 1), 
KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 
21), KeyValue(0, 28), KeyValue(0, 36), KeyValue(0, 45), KeyValue(0, 55), 
KeyValue(0, 66), KeyValue(0, 78), KeyValue(0, 91), KeyValue(0, 105)]> but: was 
<[KeyValue(0, 0), KeyValue(0, 1), KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 
10), KeyValue(0, 15), KeyValue(0, 21), KeyValue(0, 0), KeyValue(0, 1), 
KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 
21), KeyValue(0, 0), KeyValue(0, 1), KeyValue(0, 3), KeyValue(0, 6), 
KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 21), KeyValue(0, 28)]>

> Failing test: 
> EosIntegrationTest.shouldNotViolateEosIfOneTaskFailsWithState[exactly_once_beta]
> --
>
> Key: KAFKA-9831
> URL: https://issues.apache.org/jira/browse/KAFKA-9831
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: John Roesler
>Assignee: Matthias J. Sax
>Priority: Major
> Attachments: one.stdout.txt, two.stdout.txt
>
>
> I've seen this fail twice in a row on the same build, but with different 
> errors. Stacktraces follow; stdout is attached.
> One:
> {noformat}
> java.lang.AssertionError: Did not receive all 40 records from topic 
> singlePartitionOutputTopic within 6 ms
> Expected: is a value equal to or greater than <40>
>  but: <39> was less than <40>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$1(IntegrationTestUtils.java:517)
>   at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:415)
>   at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:383)
>   at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:513)
>   at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:491)
>   at 
> org.apache.kafka.streams.integration.EosIntegrationTest.readResult(EosIntegrationTest.java:766)
>   at 
> org.apache.kafka.streams.integration.EosIntegrationTest.shouldNotViolateEosIfOneTaskFailsWithState(EosIntegrationTest.java:473)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at 

[jira] [Assigned] (KAFKA-6520) When a Kafka Stream can't communicate with the server, it's Status stays RUNNING

2020-07-02 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-6520:
--

Assignee: Vince Mu  (was: Milind Jain)

> When a Kafka Stream can't communicate with the server, it's Status stays 
> RUNNING
> 
>
> Key: KAFKA-6520
> URL: https://issues.apache.org/jira/browse/KAFKA-6520
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michael Kohout
>Assignee: Vince Mu
>Priority: Major
>  Labels: newbie, user-experience
>
> KIP WIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams]
> When you execute the following scenario the application is always in RUNNING 
> state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer 
> connected(Stream State is still RUNNING, and the uncaught exception handler 
> isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
>  See 
> [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] 
> for a discussion from the google user forum.  This is a link to a related 
> issue.
> -
> Update: there are some discussions on the PR itself which leads me to think 
> that a more general solution should be at the ClusterConnectionStates rather 
> than at the Streams or even Consumer level. One proposal would be:
>  * Add a new metric named `failedConnection` in SelectorMetrics which is 
> recorded at `connect()` and `pollSelectionKeys()` functions, upon capture the 
> IOException / RuntimeException which indicates the connection disconnected.
>  * And then users of Consumer / Streams can monitor on this metric, which 
> normally will only have close to zero values as we have transient 
> disconnects, if it is spiking it means the brokers are consistently being 
> unavailable indicting the state.
> [~Yohan123] WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-6520) When a Kafka Stream can't communicate with the server, it's Status stays RUNNING

2020-07-02 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-6520:
--

Assignee: Milind Jain

> When a Kafka Stream can't communicate with the server, it's Status stays 
> RUNNING
> 
>
> Key: KAFKA-6520
> URL: https://issues.apache.org/jira/browse/KAFKA-6520
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michael Kohout
>Assignee: Milind Jain
>Priority: Major
>  Labels: newbie, user-experience
>
> KIP WIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams]
> When you execute the following scenario the application is always in RUNNING 
> state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer 
> connected(Stream State is still RUNNING, and the uncaught exception handler 
> isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
>  See 
> [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] 
> for a discussion from the google user forum.  This is a link to a related 
> issue.
> -
> Update: there are some discussions on the PR itself which leads me to think 
> that a more general solution should be at the ClusterConnectionStates rather 
> than at the Streams or even Consumer level. One proposal would be:
>  * Add a new metric named `failedConnection` in SelectorMetrics which is 
> recorded at `connect()` and `pollSelectionKeys()` functions, upon capture the 
> IOException / RuntimeException which indicates the connection disconnected.
>  * And then users of Consumer / Streams can monitor on this metric, which 
> normally will only have close to zero values as we have transient 
> disconnects, if it is spiking it means the brokers are consistently being 
> unavailable indicting the state.
> [~Yohan123] WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10219) KStream API support for multiple cluster broker

2020-06-30 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148818#comment-17148818
 ] 

Boyang Chen commented on KAFKA-10219:
-

Thanks for the proposal, the use case you proposed is reasonable. However, we 
need to better clarify the feature we are going to introduce and the challenges 
we are facing, such as:



1. What does "multiple clusters" suggest? Do we support all input topics in 
cluster A and all output topics in cluster B, or a mixing of topics in random 
cluster A, B, C which needs to be automatically detected by Streams?
2. How do we allocate internal topics? Which cluster should the 
changelog/repartition topics go to, input topic cluster, or the output one?
3. How do we support Exactly-once? Right now the entire framework assumes a 
single cluster context. When switching to multiple cluster, we could no longer 
guarantee exactly-once because we may spam our transaction across multiple 
clusters, and we don't have a centralized coordinator to track the progress.

> KStream API support for multiple cluster broker
> ---
>
> Key: KAFKA-10219
> URL: https://issues.apache.org/jira/browse/KAFKA-10219
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sachin Kurle
>Priority: Major
>
> we are trying to consume from cluster A broker from KStream api and produce 
> to cluster B broker.. we have configuration as boot strap server in consumer 
> and producer configuration but kstream api is picking randomly bootstrap 
> server cluster A or B



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10242) Adding metrics to track the total count of idempotent producers that Broker need to track

2020-07-07 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153229#comment-17153229
 ] 

Boyang Chen commented on KAFKA-10242:
-

Is this really critical? I think we could downgrade to major. 

Besides, we need to kick off a KIP discussion if we haven't done yet, and link 
it to the JIRA.

> Adding metrics to track the total count of idempotent producers that Broker 
> need to track
> -
>
> Key: KAFKA-10242
> URL: https://issues.apache.org/jira/browse/KAFKA-10242
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 2.5.0
>Reporter: Ming Liu
>Priority: Critical
>  Labels: needs-kip
> Fix For: 2.7.0
>
>
> We found it is very useful to track the total number of idempotent producers 
> that broker is tracking.
> In our production environment, we have many idempotent producers for a 
> cluster and sometimes that number increased to very high number which 
> requires some attention to mitigate.
> This is especially true for client (< 2.4) where the client retry might 
> generate too many different idempotent producers which can trigger broker GC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10270) Add a broker to controller channel manager to redirect AlterConfig

2020-07-13 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10270:

Description: Per KIP-590 requirement, we need to have a dedicate 
communication channel from broker to the controller.

> Add a broker to controller channel manager to redirect AlterConfig
> --
>
> Key: KAFKA-10270
> URL: https://issues.apache.org/jira/browse/KAFKA-10270
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Per KIP-590 requirement, we need to have a dedicate communication channel 
> from broker to the controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10270) Add a broker to controller channel manager to redirect AlterConfig

2020-07-13 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10270:
---

 Summary: Add a broker to controller channel manager to redirect 
AlterConfig
 Key: KAFKA-10270
 URL: https://issues.apache.org/jira/browse/KAFKA-10270
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen
Assignee: Boyang Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-06-29 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148200#comment-17148200
 ] 

Boyang Chen commented on KAFKA-10158:
-

Failed again: 

[https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3175/]
h3. Stacktrace

org.junit.ComparisonFailure: --under-replicated-partitions shouldn't return 
anything: ' Topic: 
testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-ryTXst4I8P 
Partition: 0 Leader: 2 Replicas: 0,2 Isr: 2 ' expected:<[]> but was:<[ Topic: 
testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-ryTXst4I8P 
Partition: 0 Leader: 2 Replicas: 0,2 Isr: 2 ]> at 
org.junit.Assert.assertEquals(Assert.java:117) at 
kafka.admin.TopicCommandWithAdminClientTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(TopicCommandWithAdminClientTest.scala:702)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
 at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
 at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
 at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
 at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
 at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119)
 at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
 at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
 at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414)
 at 

[jira] [Assigned] (KAFKA-10239) The groupInstanceId field in DescribeGroup response should be ignorable

2020-07-06 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-10239:
---

Assignee: Boyang Chen

> The groupInstanceId field in DescribeGroup response should be ignorable
> ---
>
> Key: KAFKA-10239
> URL: https://issues.apache.org/jira/browse/KAFKA-10239
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Jason Gustafson
>Assignee: Boyang Chen
>Priority: Critical
> Fix For: 2.6.0, 2.4.2, 2.5.1
>
>
> We noticed the following error in the logs in the handling of a DescribeGroup 
> request:
> ```
> org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to 
> write a non-default groupInstanceId at version 3
> ```
> The problem is that the field is not marked as ignorable. So if the user is 
> relying on static membership and uses an older AdminClient, they will see 
> this error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10237) Properly handle in-memory stores OOM

2020-07-06 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10237:
---

 Summary: Properly handle in-memory stores OOM
 Key: KAFKA-10237
 URL: https://issues.apache.org/jira/browse/KAFKA-10237
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Boyang Chen


We have seen the in-memory store buffered too much data and eventually get OOM. 
Generally speaking, OOM has no real indication of the underlying problem and 
increases the difficulty for user debugging, since the failed thread may not be 
the actual culprit which causes the explosion. If we could get better 
protection to avoid hitting memory limit, or at least giving out a clear guide, 
the end user debugging would be much simpler. 

To make it work, we need to enforce a certain memory limit below heap size and 
take actions  when hitting it. The first question would be, whether we set a 
numeric limit, such as 100MB or 500MB, or a percentile limit, such as 60% or 
80% of total memory.

The second question is about the action itself. One approach would be crashing 
the store immediately and inform the user to increase their application 
capacity. The second approach would be opening up an on-disk store 
spontaneously and offload the data to it.

Personally I'm in favor of approach #2 because it has minimum impact to the 
on-going application. However it is more complex and potentially requires 
significant works to define the proper behavior such as the default store 
configuration, how to manage its lifecycle, etc.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10168) Rename public StreamsConfig variable

2020-06-15 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136292#comment-17136292
 ] 

Boyang Chen commented on KAFKA-10168:
-

[~sbellapu] There is already a WIP PR for this change 

> Rename public StreamsConfig variable
> 
>
> Key: KAFKA-10168
> URL: https://issues.apache.org/jira/browse/KAFKA-10168
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Trivial
>  Labels: kip
>
> All Kafka Streams configuration parameter are exposed via public variables 
> that all end with `_CONFIG` suffix. However, we added the variable of 
> `topology.optimization` as `TOPOLOGY_OPTIMIZATION` instead of 
> `TOPLOGY_OPTIMIZATION_CONFIG`. We should align the variable name.
> KIP-626: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-626%3A+Rename+StreamsConfig+config+variable+name]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10192) Flaky test BlockingConnectorTest#testBlockInConnectorStop

2020-06-22 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10192:
---

 Summary: Flaky test BlockingConnectorTest#testBlockInConnectorStop
 Key: KAFKA-10192
 URL: https://issues.apache.org/jira/browse/KAFKA-10192
 Project: Kafka
  Issue Type: Bug
Reporter: Boyang Chen


h3. [https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/1218/]
h3. Error Message

org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not 
execute PUT request. Error response: \{"error_code":500,"message":"Request 
timed out"}
h3. Stacktrace

org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not 
execute PUT request. Error response: \{"error_code":500,"message":"Request 
timed out"} at 
org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.putConnectorConfig(EmbeddedConnectCluster.java:346)
 at 
org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.configureConnector(EmbeddedConnectCluster.java:300)
 at 
org.apache.kafka.connect.integration.BlockingConnectorTest.createConnectorWithBlock(BlockingConnectorTest.java:185)
 at 
org.apache.kafka.connect.integration.BlockingConnectorTest.testBlockInConnectorStop(BlockingConnectorTest.java:140)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:564) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
 at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
 at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:564) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
 at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
 at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:564) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 

[jira] [Commented] (KAFKA-10055) Add shared internal utility class for Streams module

2020-06-22 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142429#comment-17142429
 ] 

Boyang Chen commented on KAFKA-10055:
-

Need to move to the shared utility class as well

> Add shared internal utility class for Streams module
> 
>
> Key: KAFKA-10055
> URL: https://issues.apache.org/jira/browse/KAFKA-10055
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Vito Jeng
>Priority: Minor
>  Labels: beginner, newbie
>
> Throughout the code base we added small helper methods (usually static) when 
> needed. Those helpers are often added to classes they don't really belong to 
> (out of convenience) and may lead to coded duplication.
> We should add a shared utility class that unifies all those helper methods 
> into a single place.
> For example `StreamsThread#eosEnabled`. Please update this ticket description 
> to add more such methods (or just leave a comment).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10055) Add shared internal utility class for Streams module

2020-06-22 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142429#comment-17142429
 ] 

Boyang Chen edited comment on KAFKA-10055 at 6/22/20, 9:32 PM:
---

Need to move the swallow exception call to the shared utility class as well


was (Author: bchen225242):
Need to move to the shared utility class as well

> Add shared internal utility class for Streams module
> 
>
> Key: KAFKA-10055
> URL: https://issues.apache.org/jira/browse/KAFKA-10055
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Vito Jeng
>Priority: Minor
>  Labels: beginner, newbie
>
> Throughout the code base we added small helper methods (usually static) when 
> needed. Those helpers are often added to classes they don't really belong to 
> (out of convenience) and may lead to coded duplication.
> We should add a shared utility class that unifies all those helper methods 
> into a single place.
> For example `StreamsThread#eosEnabled`. Please update this ticket description 
> to add more such methods (or just leave a comment).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10192) Flaky test BlockingConnectorTest#testBlockInConnectorStop

2020-06-23 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143052#comment-17143052
 ] 

Boyang Chen commented on KAFKA-10192:
-

Failed again: 
[https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/1225/testReport/junit/org.apache.kafka.connect.integration/BlockingConnectorTest/testBlockInConnectorStop/]

> Flaky test BlockingConnectorTest#testBlockInConnectorStop
> -
>
> Key: KAFKA-10192
> URL: https://issues.apache.org/jira/browse/KAFKA-10192
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
>
> h3. [https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/1218/]
> h3. Error Message
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not 
> execute PUT request. Error response: \{"error_code":500,"message":"Request 
> timed out"}
> h3. Stacktrace
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not 
> execute PUT request. Error response: \{"error_code":500,"message":"Request 
> timed out"} at 
> org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.putConnectorConfig(EmbeddedConnectCluster.java:346)
>  at 
> org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.configureConnector(EmbeddedConnectCluster.java:300)
>  at 
> org.apache.kafka.connect.integration.BlockingConnectorTest.createConnectorWithBlock(BlockingConnectorTest.java:185)
>  at 
> org.apache.kafka.connect.integration.BlockingConnectorTest.testBlockInConnectorStop(BlockingConnectorTest.java:140)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:564) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:564) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>  at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
> 

[jira] [Updated] (KAFKA-10135) Extract Task#executeAndMaybeSwallow to be a general utility function into TaskManager

2020-06-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10135:

Fix Version/s: 2.7.0

> Extract Task#executeAndMaybeSwallow to be a general utility function into 
> TaskManager
> -
>
> Key: KAFKA-10135
> URL: https://issues.apache.org/jira/browse/KAFKA-10135
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Boyang Chen
>Assignee: feyman
>Priority: Major
> Fix For: 2.7.0
>
>
> We have a couple of cases where we need to swallow the exception during 
> operations in both Task class and TaskManager class. This utility method 
> should be generalized at least onto TaskManager level. See discussion comment 
> [here|https://github.com/apache/kafka/pull/8833#discussion_r437697665].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10126) Deprecate unused options in ConsumerPerformance

2020-06-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10126:

Fix Version/s: 2.7.0

> Deprecate unused options in ConsumerPerformance
> ---
>
> Key: KAFKA-10126
> URL: https://issues.apache.org/jira/browse/KAFKA-10126
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, tools
>Reporter: jiamei xie
>Assignee: jiamei xie
>Priority: Major
> Fix For: 2.7.0
>
>
> Option numThreadsOpt and numFetchersOpt are unused in ConsumerPerformance. 
> It's a waste of time to test performance vs threads number. So deprecate it 
> and print a warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10135) Extract Task#executeAndMaybeSwallow to be a general utility function into TaskManager

2020-06-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10135:

Affects Version/s: 2.6.0
   2.5.0

> Extract Task#executeAndMaybeSwallow to be a general utility function into 
> TaskManager
> -
>
> Key: KAFKA-10135
> URL: https://issues.apache.org/jira/browse/KAFKA-10135
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: feyman
>Priority: Major
> Fix For: 2.7.0
>
>
> We have a couple of cases where we need to swallow the exception during 
> operations in both Task class and TaskManager class. This utility method 
> should be generalized at least onto TaskManager level. See discussion comment 
> [here|https://github.com/apache/kafka/pull/8833#discussion_r437697665].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10126) Deprecate unused options in ConsumerPerformance

2020-06-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10126:

Issue Type: Improvement  (was: Bug)

> Deprecate unused options in ConsumerPerformance
> ---
>
> Key: KAFKA-10126
> URL: https://issues.apache.org/jira/browse/KAFKA-10126
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, tools
>Affects Versions: 2.4.0, 2.5.0, 2.6.0
>Reporter: jiamei xie
>Assignee: jiamei xie
>Priority: Minor
> Fix For: 2.7.0
>
>
> Option numThreadsOpt and numFetchersOpt are unused in ConsumerPerformance. 
> It's a waste of time to test performance vs threads number. So deprecate it 
> and print a warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10135) Extract Task#executeAndMaybeSwallow to be a general utility function into TaskManager

2020-06-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-10135.
-
Resolution: Fixed

> Extract Task#executeAndMaybeSwallow to be a general utility function into 
> TaskManager
> -
>
> Key: KAFKA-10135
> URL: https://issues.apache.org/jira/browse/KAFKA-10135
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Boyang Chen
>Assignee: feyman
>Priority: Major
>
> We have a couple of cases where we need to swallow the exception during 
> operations in both Task class and TaskManager class. This utility method 
> should be generalized at least onto TaskManager level. See discussion comment 
> [here|https://github.com/apache/kafka/pull/8833#discussion_r437697665].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10126) Deprecate unused options in ConsumerPerformance

2020-06-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10126:

Affects Version/s: 2.6.0
   2.4.0
   2.5.0

> Deprecate unused options in ConsumerPerformance
> ---
>
> Key: KAFKA-10126
> URL: https://issues.apache.org/jira/browse/KAFKA-10126
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, tools
>Affects Versions: 2.4.0, 2.5.0, 2.6.0
>Reporter: jiamei xie
>Assignee: jiamei xie
>Priority: Major
> Fix For: 2.7.0
>
>
> Option numThreadsOpt and numFetchersOpt are unused in ConsumerPerformance. 
> It's a waste of time to test performance vs threads number. So deprecate it 
> and print a warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9439) Add more public API tests for KafkaProducer

2020-06-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9439:
---
Priority: Minor  (was: Major)

> Add more public API tests for KafkaProducer
> ---
>
> Key: KAFKA-9439
> URL: https://issues.apache.org/jira/browse/KAFKA-9439
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Jeff Kim
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.0
>
>
> While working on KIP-447, we realized a lack of test coverage on the 
> KafkaProducer public APIs. For example, `commitTransaction` and 
> `sendOffsetsToTransaction` are not even called in the 
> `KafkaProducerTest.java` and the code coverage is only 75%. 
> Adding more unit tests here will be pretty valuable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10126) Deprecate unused options in ConsumerPerformance

2020-06-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10126:

Priority: Minor  (was: Major)

> Deprecate unused options in ConsumerPerformance
> ---
>
> Key: KAFKA-10126
> URL: https://issues.apache.org/jira/browse/KAFKA-10126
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, tools
>Affects Versions: 2.4.0, 2.5.0, 2.6.0
>Reporter: jiamei xie
>Assignee: jiamei xie
>Priority: Minor
> Fix For: 2.7.0
>
>
> Option numThreadsOpt and numFetchersOpt are unused in ConsumerPerformance. 
> It's a waste of time to test performance vs threads number. So deprecate it 
> and print a warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10173) BufferUnderflowException during Kafka Streams Upgrade

2020-06-24 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144573#comment-17144573
 ] 

Boyang Chen commented on KAFKA-10173:
-

cc [~rhauch]

> BufferUnderflowException during Kafka Streams Upgrade
> -
>
> Key: KAFKA-10173
> URL: https://issues.apache.org/jira/browse/KAFKA-10173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.5.0
>Reporter: Karsten Schnitter
>Assignee: John Roesler
>Priority: Blocker
>  Labels: suppress
> Fix For: 2.6.0, 2.4.2, 2.5.1
>
>
> I migrated a Kafka Streams application from version 2.3.1 to 2.5.0. I 
> followed the steps described in the upgrade guide and set the property 
> {{migrate.from=2.3}}. On my dev system with just one running instance I got 
> the following exception:
> {noformat}
> stream-thread [0-StreamThread-2] Encountered the following error during 
> processing:
> java.nio.BufferUnderflowException: null
>   at java.base/java.nio.HeapByteBuffer.get(Unknown Source)
>   at java.base/java.nio.ByteBuffer.get(Unknown Source)
>   at 
> org.apache.kafka.streams.state.internals.BufferValue.extractValue(BufferValue.java:94)
>   at 
> org.apache.kafka.streams.state.internals.BufferValue.deserialize(BufferValue.java:83)
>   at 
> org.apache.kafka.streams.state.internals.InMemoryTimeOrderedKeyValueBuffer.restoreBatch(InMemoryTimeOrderedKeyValueBuffer.java:368)
>   at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreBatch(CompositeRestoreListener.java:89)
>   at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:92)
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:350)
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:94)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:401)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:779)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:697)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:670)
> {noformat}
> I figured out, that this problem only occurs for stores, where I use the 
> suppress feature. If I rename the changelog topics during the migration, the 
> problem will not occur. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10189) Reset metric EventQueueTimeMs

2020-06-26 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10189:

Labels: core  (was: pull-request-available)

> Reset metric EventQueueTimeMs 
> --
>
> Key: KAFKA-10189
> URL: https://issues.apache.org/jira/browse/KAFKA-10189
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, metrics
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Minor
>  Labels: core
>
> The metric 
> [EventQueueTimeMs|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerEventManager.scala#L81]
>  does not reset and therefore misrepresents the controller event queue time 
> in these two scenarios:
>  1. upon losing leader election - `EventQueueTimeMs` portrays the last event 
> queue time of the previous controller and not the current controller
>  2. no controller events are added to the queue - `EventQueueTimeMs` portrays 
> the most recent event queue time, not the current queue time (which is 0)
> For both cases, we should reset the controller event queue time to 0.
> Implementation:
> Instead of using `LinkedBlockingQueue.take()` 
> [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerEventManager.scala#L118],
>  we can use `LinkedBlockingQueue.poll(long timeout, TimeUnit unit)` and reset 
> `EventQueueTimeMs` if the queue is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10189) Reset metric EventQueueTimeMs

2020-06-26 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10189:

Component/s: core

> Reset metric EventQueueTimeMs 
> --
>
> Key: KAFKA-10189
> URL: https://issues.apache.org/jira/browse/KAFKA-10189
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core, metrics
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Minor
>  Labels: core
>
> The metric 
> [EventQueueTimeMs|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerEventManager.scala#L81]
>  does not reset and therefore misrepresents the controller event queue time 
> in these two scenarios:
>  1. upon losing leader election - `EventQueueTimeMs` portrays the last event 
> queue time of the previous controller and not the current controller
>  2. no controller events are added to the queue - `EventQueueTimeMs` portrays 
> the most recent event queue time, not the current queue time (which is 0)
> For both cases, we should reset the controller event queue time to 0.
> Implementation:
> Instead of using `LinkedBlockingQueue.take()` 
> [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerEventManager.scala#L118],
>  we can use `LinkedBlockingQueue.poll(long timeout, TimeUnit unit)` and reset 
> `EventQueueTimeMs` if the queue is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10189) Reset metric EventQueueTimeMs

2020-06-26 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10189:

Labels:   (was: core)

> Reset metric EventQueueTimeMs 
> --
>
> Key: KAFKA-10189
> URL: https://issues.apache.org/jira/browse/KAFKA-10189
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core, metrics
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Minor
>
> The metric 
> [EventQueueTimeMs|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerEventManager.scala#L81]
>  does not reset and therefore misrepresents the controller event queue time 
> in these two scenarios:
>  1. upon losing leader election - `EventQueueTimeMs` portrays the last event 
> queue time of the previous controller and not the current controller
>  2. no controller events are added to the queue - `EventQueueTimeMs` portrays 
> the most recent event queue time, not the current queue time (which is 0)
> For both cases, we should reset the controller event queue time to 0.
> Implementation:
> Instead of using `LinkedBlockingQueue.take()` 
> [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerEventManager.scala#L118],
>  we can use `LinkedBlockingQueue.poll(long timeout, TimeUnit unit)` and reset 
> `EventQueueTimeMs` if the queue is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10196) Add missing '--version' option to Kafka command producer-performance

2020-06-28 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147509#comment-17147509
 ] 

Boyang Chen commented on KAFKA-10196:
-

Why do we need to display version from the command line? The AppInfoParser 
should already print the version in producer log.
{code:java}
public AppInfo(long startTimeMs) {
  this.startTimeMs = startTimeMs;
   log.info("Kafka version: {}", AppInfoParser.getVersion());
   log.info("Kafka commitId: {}", AppInfoParser.getCommitId());
  log.info("Kafka startTimeMs: {}", startTimeMs);
}{code}

> Add missing '--version' option to Kafka command producer-performance
> 
>
> Key: KAFKA-10196
> URL: https://issues.apache.org/jira/browse/KAFKA-10196
> Project: Kafka
>  Issue Type: Bug
>  Components: producer , tools
>Reporter: jiamei xie
>Assignee: jiamei xie
>Priority: Major
>
> Option '--version'  is missing in Kafka command producer-performance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10181) AlterConfig/IncrementalAlterConfig should go to controller

2020-06-26 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-10181:

Fix Version/s: 2.7.0

> AlterConfig/IncrementalAlterConfig should go to controller
> --
>
> Key: KAFKA-10181
> URL: https://issues.apache.org/jira/browse/KAFKA-10181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: admin
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.7.0
>
>
> In the new Admin client, the request should always be routed towards the 
> controller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    3   4   5   6   7   8   9   10   >