[jira] [Commented] (KAFKA-10513) Newly added topic or partitions are not assigned to running consumer groups using static membership

2020-09-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17202991#comment-17202991
 ] 

Guozhang Wang commented on KAFKA-10513:
---

This seems to be a valid issue after reading the source code, cc [~bchen225242] 
to take another look.

> Newly added topic or partitions are not assigned to running consumer groups 
> using static membership
> ---
>
> Key: KAFKA-10513
> URL: https://issues.apache.org/jira/browse/KAFKA-10513
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.6.0
>Reporter: Marlon Ou
>Priority: Major
>
> If consumers are polling messages from a certain topic with static membership 
> and we add new partitions to this topic while the consumers are running, no 
> partition reassignment is ever triggered (and hence messages published into 
> the new partitions are never consumed). 
> To reproduce, simply set group instance IDs on the consumers: 
> {code:java}
> props.setProperty("group.instance.id", instanceId);
> {code}
> And then while the static consumers are running, use Kafka's admin client to 
> add more partitions to the topic:
> {code:java}
> adminClient.createPartitions(...)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] guozhangwang merged pull request #9241: MINOR: Update the javadoc in GroupMetadataManager.scala

2020-09-27 Thread GitBox


guozhangwang merged pull request #9241:
URL: https://github.com/apache/kafka/pull/9241


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] guozhangwang opened a new pull request #9342: MINOR: Update doc for raft state metrics

2020-09-27 Thread GitBox


guozhangwang opened a new pull request #9342:
URL: https://github.com/apache/kafka/pull/9342


   Found this while reviewing related code. cc @hachikuji to review and merge.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] showuon commented on pull request #9179: KAFKA-10390: Remove ignore case option when grep process info to be more specific

2020-09-27 Thread GitBox


showuon commented on pull request #9179:
URL: https://github.com/apache/kafka/pull/9179#issuecomment-699738111


   @cmccabe , could you take a look at this PR? Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] showuon closed pull request #9159: KAFKA-10378: change jacksonDatabind as compile dependency for clients project

2020-09-27 Thread GitBox


showuon closed pull request #9159:
URL: https://github.com/apache/kafka/pull/9159


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] showuon commented on pull request #9241: MINOR: Update the javadoc in GroupMetadataManager.scala

2020-09-27 Thread GitBox


showuon commented on pull request #9241:
URL: https://github.com/apache/kafka/pull/9241#issuecomment-699737966


   @guozhangwang  , could you help review this small PR? Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] guozhangwang merged pull request #9304: KAFKA-10502:TimestampRouter may occur threadlocal leak

2020-09-27 Thread GitBox


guozhangwang merged pull request #9304:
URL: https://github.com/apache/kafka/pull/9304


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (KAFKA-10502) Threadlocal may can not set null,because it may create a memory leak

2020-09-27 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-10502.
---
Fix Version/s: 2.7.0
   Resolution: Fixed

> Threadlocal  may can not set null,because it may create a memory leak
> -
>
> Key: KAFKA-10502
> URL: https://issues.apache.org/jira/browse/KAFKA-10502
> Project: Kafka
>  Issue Type: Bug
>Reporter: huangyiming
>Priority: Major
> Fix For: 2.7.0
>
>
> When setting Threadlocal to null it may create a memory leak, you can see the 
> link:
> [https://stackoverflow.com/questions/12424838/threadlocal-remove], so I think 
> we should invoke its remove instead 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] guozhangwang commented on pull request #9304: KAFKA-10502:TimestampRouter may occur threadlocal leak

2020-09-27 Thread GitBox


guozhangwang commented on pull request #9304:
URL: https://github.com/apache/kafka/pull/9304#issuecomment-699731935


   Reading on the SO thread I think this fix makes sense. Merging to trunk now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (KAFKA-10502) Threadlocal may can not set null,because it may create a memory leak

2020-09-27 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-10502:
--
Description: 
When setting Threadlocal to null it may create a memory leak, you can see the 
link:
[https://stackoverflow.com/questions/12424838/threadlocal-remove], so I think 
we should invoke its remove instead 

  was:
Threadlocal  may can not set null,because it may create a memory leak, you can 
see the link:
[https://stackoverflow.com/questions/12424838/threadlocal-remove],so I think 
weather can invoke thread local.remove instead 


> Threadlocal  may can not set null,because it may create a memory leak
> -
>
> Key: KAFKA-10502
> URL: https://issues.apache.org/jira/browse/KAFKA-10502
> Project: Kafka
>  Issue Type: Bug
>Reporter: huangyiming
>Priority: Major
>
> When setting Threadlocal to null it may create a memory leak, you can see the 
> link:
> [https://stackoverflow.com/questions/12424838/threadlocal-remove], so I think 
> we should invoke its remove instead 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6579) Consolidate window store and session store unit tests into a single class

2020-09-27 Thread Rohit Deshpande (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17202959#comment-17202959
 ] 

Rohit Deshpande commented on KAFKA-6579:


[~guozhang] can I pick this up? 

> Consolidate window store and session store unit tests into a single class
> -
>
> Key: KAFKA-6579
> URL: https://issues.apache.org/jira/browse/KAFKA-6579
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie, unit-test
>
> For key value store, we have a {{AbstractKeyValueStoreTest}} that is shared 
> among all its implementations; however for window and session stores, each 
> class has its own independent unit test classes that do not share the test 
> coverage. In fact, many of these test classes share the same unit test 
> functions (e.g. {{RocksDBWindowStoreTest}}, 
> {{CompositeReadOnlyWindowStoreTest}} and {{CachingWindowStoreTest}}).
> It is better to use the same pattern as for key value stores to consolidate 
> these test functions into a shared base class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] guozhangwang commented on pull request #9196: KAFKA-10402: Upgrade system tests to python3

2020-09-27 Thread GitBox


guozhangwang commented on pull request #9196:
URL: https://github.com/apache/kafka/pull/9196#issuecomment-699719268


   cc @cmccabe @vvcephei @mimaison for a final look and merge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] nizhikov commented on pull request #9196: KAFKA-10402: Upgrade system tests to python3

2020-09-27 Thread GitBox


nizhikov commented on pull request #9196:
URL: https://github.com/apache/kafka/pull/9196#issuecomment-699695812


   @guozhangwang Great news! Thanks for the help!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-10134) High CPU issue during rebalance in Kafka consumer after upgrading to 2.5

2020-09-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17202931#comment-17202931
 ] 

Guozhang Wang commented on KAFKA-10134:
---

I don't think there's a concrete plan for 2.6.1 yet, for 2.7.0 it is planned 
for Nov.

> High CPU issue during rebalance in Kafka consumer after upgrading to 2.5
> 
>
> Key: KAFKA-10134
> URL: https://issues.apache.org/jira/browse/KAFKA-10134
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.5.0
>Reporter: Sean Guo
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 2.7.0, 2.6.1
>
> Attachments: consumer3.log.2020-08-20.log, 
> consumer5.log.2020-07-22.log
>
>
> We want to utilize the new rebalance protocol to mitigate the stop-the-world 
> effect during the rebalance as our tasks are long running task.
> But after the upgrade when we try to kill an instance to let rebalance happen 
> when there is some load(some are long running tasks >30S) there, the CPU will 
> go sky-high. It reads ~700% in our metrics so there should be several threads 
> are in a tight loop. We have several consumer threads consuming from 
> different partitions during the rebalance. This is reproducible in both the 
> new CooperativeStickyAssignor and old eager rebalance rebalance protocol. The 
> difference is that with old eager rebalance rebalance protocol used the high 
> CPU usage will dropped after the rebalance done. But when using cooperative 
> one, it seems the consumers threads are stuck on something and couldn't 
> finish the rebalance so the high CPU usage won't drop until we stopped our 
> load. Also a small load without long running task also won't cause continuous 
> high CPU usage as the rebalance can finish in that case.
>  
> "executor.kafka-consumer-executor-4" #124 daemon prio=5 os_prio=0 
> cpu=76853.07ms elapsed=841.16s tid=0x7fe11f044000 nid=0x1f4 runnable  
> [0x7fe119aab000]"executor.kafka-consumer-executor-4" #124 daemon prio=5 
> os_prio=0 cpu=76853.07ms elapsed=841.16s tid=0x7fe11f044000 nid=0x1f4 
> runnable  [0x7fe119aab000]   java.lang.Thread.State: RUNNABLE at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:467)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1275)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1241) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
> at
>  
> By debugging into the code we found it looks like the clients are  in a loop 
> on finding the coordinator.
> I also tried the old rebalance protocol for the new version the issue still 
> exists but the CPU will be back to normal when the rebalance is done.
> Also tried the same on the 2.4.1 which seems don't have this issue. So it 
> seems related something changed between 2.4.1 and 2.5.0.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] guozhangwang commented on pull request #9196: KAFKA-10402: Upgrade system tests to python3

2020-09-27 Thread GitBox


guozhangwang commented on pull request #9196:
URL: https://github.com/apache/kafka/pull/9196#issuecomment-699693629


   The test 
`tests/kafkatest/tests/streams/streams_broker_down_resilience_test.py` runs 5 
times without failing. I think this is good to go.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org