[jira] [Comment Edited] (KAFKA-13135) Reduce GroupMetadata lock contention for offset commit requests

2021-07-25 Thread David Mao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387028#comment-17387028
 ] 

David Mao edited comment on KAFKA-13135 at 7/26/21, 4:18 AM:
-

Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state. 
-If we can maintain correctness without passing the groupLock to 
DelayedProduce, we can skip locking the group when sending back the offset 
response. I need to take a look at storeGroup to see if the groupLock is 
necessary there.-

 

https://issues.apache.org/jira/browse/KAFKA-6042 looks relevant before changing 
any of the locking semantics here. It looks like the DelayedProduce lock was 
originally added to avoid deadlock. 


was (Author: david.mao):
Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state. 
If we can maintain correctness without passing the groupLock to DelayedProduce, 
we can skip locking the group when sending back the offset response. I need to 
take a look at storeGroup to see if the groupLock is necessary there.

 

https://issues.apache.org/jira/browse/KAFKA-6042 may be relevant before 
changing any of the locking semantics here.

> Reduce GroupMetadata lock contention for offset commit requests
> ---
>
> Key: KAFKA-13135
> URL: https://issues.apache.org/jira/browse/KAFKA-13135
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Mao
>Priority: Major
>
> as suggested by [~lbradstreet], we can look for similar optimizations to 
> https://issues.apache.org/jira/browse/KAFKA-13134 in the offset commit path.
> It looks like there are some straightforward optimizations possible for the 
> error path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13135) Reduce GroupMetadata lock contention for offset commit requests

2021-07-25 Thread Ismael Juma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387032#comment-17387032
 ] 

Ismael Juma commented on KAFKA-13135:
-

The locking for this code is extremely complex and it's been a source of a lot 
of bugs, so we should be very careful.

> Reduce GroupMetadata lock contention for offset commit requests
> ---
>
> Key: KAFKA-13135
> URL: https://issues.apache.org/jira/browse/KAFKA-13135
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Mao
>Priority: Major
>
> as suggested by [~lbradstreet], we can look for similar optimizations to 
> https://issues.apache.org/jira/browse/KAFKA-13134 in the offset commit path.
> It looks like there are some straightforward optimizations possible for the 
> error path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-13135) Reduce GroupMetadata lock contention for offset commit requests

2021-07-25 Thread David Mao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387028#comment-17387028
 ] 

David Mao edited comment on KAFKA-13135 at 7/26/21, 3:48 AM:
-

Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state. 
If we can maintain correctness without passing the groupLock to DelayedProduce, 
we can skip locking the group when sending back the offset response. I need to 
take a look at storeGroup to see if the groupLock is necessary there.

 

https://issues.apache.org/jira/browse/KAFKA-6042 may be relevant before 
changing any of the locking semantics here.


was (Author: david.mao):
Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state. 
If we can maintain correctness without passing the groupLock to DelayedProduce, 
we can skip locking the group when sending back the offset response. I need to 
take a look at storeGroup to see if the groupLock is necessary there.

> Reduce GroupMetadata lock contention for offset commit requests
> ---
>
> Key: KAFKA-13135
> URL: https://issues.apache.org/jira/browse/KAFKA-13135
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Mao
>Priority: Major
>
> as suggested by [~lbradstreet], we can look for similar optimizations to 
> https://issues.apache.org/jira/browse/KAFKA-13134 in the offset commit path.
> It looks like there are some straightforward optimizations possible for the 
> error path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-13135) Reduce GroupMetadata lock contention for offset commit requests

2021-07-25 Thread David Mao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387028#comment-17387028
 ] 

David Mao edited comment on KAFKA-13135 at 7/26/21, 3:43 AM:
-

Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state. 
If we can maintain correctness without passing the groupLock to DelayedProduce, 
we can skip locking the group when sending back the offset response. I need to 
take a look at storeGroup to see if the groupLock is necessary there.


was (Author: david.mao):
Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state. 
If we can maintain correctness without passing the groupLock to DelayedProduce, 
we can achieve finer grained locking, and skip locking the group when sending 
back the offset response. I need to take a look at storeGroup to see if the 
groupLock is necessary there.

> Reduce GroupMetadata lock contention for offset commit requests
> ---
>
> Key: KAFKA-13135
> URL: https://issues.apache.org/jira/browse/KAFKA-13135
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Mao
>Priority: Major
>
> as suggested by [~lbradstreet], we can look for similar optimizations to 
> https://issues.apache.org/jira/browse/KAFKA-13134 in the offset commit path.
> It looks like there are some straightforward optimizations possible for the 
> error path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-13135) Reduce GroupMetadata lock contention for offset commit requests

2021-07-25 Thread David Mao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387028#comment-17387028
 ] 

David Mao edited comment on KAFKA-13135 at 7/26/21, 3:42 AM:
-

Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state. 
If we can maintain correctness without passing the groupLock to DelayedProduce, 
we can achieve finer grained locking, and skip locking the group when sending 
back the offset response. I need to take a look at storeGroup to see if the 
groupLock is necessary there.


was (Author: david.mao):
Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state, 
so we may be able to avoid passing in the groupLock, and achieve some finer 
grained locking. I need to take a look at storeGroup to see if the groupLock is 
necessary there.

> Reduce GroupMetadata lock contention for offset commit requests
> ---
>
> Key: KAFKA-13135
> URL: https://issues.apache.org/jira/browse/KAFKA-13135
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Mao
>Priority: Major
>
> as suggested by [~lbradstreet], we can look for similar optimizations to 
> https://issues.apache.org/jira/browse/KAFKA-13134 in the offset commit path.
> It looks like there are some straightforward optimizations possible for the 
> error path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13135) Reduce GroupMetadata lock contention for offset commit requests

2021-07-25 Thread David Mao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387028#comment-17387028
 ] 

David Mao commented on KAFKA-13135:
---

Taking a closer look, I think we can also optimize this for the happy path.

appendForGroup passes in the groupLock which gets locked during the entire 
putCacheCallback when completing the DelayedProduce from appending offset 
messages.

We already lock the groupLock inside of the callback when reading group state, 
so we may be able to avoid passing in the groupLock, and achieve some finer 
grained locking. I need to take a look at storeGroup to see if the groupLock is 
necessary there.

> Reduce GroupMetadata lock contention for offset commit requests
> ---
>
> Key: KAFKA-13135
> URL: https://issues.apache.org/jira/browse/KAFKA-13135
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Mao
>Priority: Major
>
> as suggested by [~lbradstreet], we can look for similar optimizations to 
> https://issues.apache.org/jira/browse/KAFKA-13134 in the offset commit path.
> It looks like there are some straightforward optimizations possible for the 
> error path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] chia7712 commented on a change in pull request #11128: MINOR: remove partition-level error from MetadataResponse#errorCounts

2021-07-25 Thread GitBox


chia7712 commented on a change in pull request #11128:
URL: https://github.com/apache/kafka/pull/11128#discussion_r676262149



##
File path: 
clients/src/main/java/org/apache/kafka/common/requests/MetadataResponse.java
##
@@ -100,10 +100,7 @@ public int throttleTimeMs() {
 @Override
 public Map errorCounts() {
 Map errorCounts = new HashMap<>();
-data.topics().forEach(metadata -> {
-metadata.partitions().forEach(p -> updateErrorCounts(errorCounts, 
Errors.forCode(p.errorCode(;

Review comment:
   According to 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/metadata/ZkMetadataCache.scala#L103,
 it is possible that partition-level has error but topic-level gets none. 
Hence, it produces inaccurate count of "error" if we remove this line.
   
   If we want to make correct count of `error` and `none`, we have to add 
following check.
   
   1. if topic has error, we don't need to loop partitions.
   2. if topic has no error, we have to loop partitions.
   
   @ijuma WDYT? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (KAFKA-13132) Upgrading to topic IDs in LISR requests has gaps introduced in 3.0

2021-07-25 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan updated KAFKA-13132:
---
Description: 
With the change in 3.0 to how topic IDs are assigned to logs, a bug was 
inadvertently introduced. Now, topic IDs will only be assigned on the load of 
the log to a partition in LISR requests. This means we will only assign topic 
IDs for newly created topics/partitions, on broker startup, or potentially when 
a partition is reassigned.

 

In the case of upgrading from an IBP before 2.8, we may have a scenario where 
we upgrade the controller to IBP 3.0 (or even 2.8) last. (Ie, the controller is 
IBP < 2.8 and all other brokers are on the newest IBP) Upon the last broker 
upgrading, we will elect a new controller but its LISR request will not result 
in topic IDs being assigned to logs of existing topics. They will only be 
assigned in the cases mentioned above.

*Keep in mind, in this scenario, topic IDs will be still be assigned in the 
controller/ZK to all new and pre-existing topics and will show up in metadata.* 
 This means we are not ensured the same guarantees we had in 2.8. *It is just 
the LISR/partition.metadata part of the code that is affected.* 

 

The problem is two-fold
 1. We ignore LISR requests when the partition leader epoch has not increased 
(previously we assigned the ID before this check)
 2. We only assign the topic ID when we are associating the log with the 
partition in replicamanager for the first time. Though in the scenario 
described above, we have logs associated with partitions that need to be 
upgraded.

 

We should check the if the LISR request is resulting in a topic ID addition and 
add logic to logs already associated to partitions in replica manager.

  was:
With the change in 3.0 to how topic IDs are assigned to logs, a bug was 
inadvertently introduced. Now, topic IDs will only be assigned on the load of 
the log to a partition in LISR requests. This means we will only assign topic 
IDs for newly created topics/partitions, on broker startup, or potentially when 
a partition is reassigned.

 

In the case of upgrading from an IBP before 2.8, we may have a scenario where 
we upgrade the controller to IBP 3.0 last. (Ie, the controller is IBP < 2.8 and 
all other brokers are on IBP 3.0) Upon the last broker upgrading, we will elect 
a new controller but its LISR request will not result in topic IDs being 
assigned to logs. They will only be assigned in the cases mentioned above.

Keep in mind, in this scenario, topic IDs will be still be newly assigned to 
all pre-existing topics and will show up in metadata.  This means we are not 
ensured the same guarantees we had in 2.8. *It is just the 
LISR/partition.metadata part of the code that is affected. The controller and 
ZooKeeper will still correctly assign topic IDs to new topics upon upgrade. We 
will also see this reflected in metadata responses.*

 

The problem is two-fold
 1. We ignore LISR requests when the partition leader epoch has not increased 
(previously we assigned the ID before this check)
 2. We only assign the topic ID when we are associating the log with the 
partition in replicamanager for the first time. Though in the scenario 
described above, we have logs associated with partitions that need to be 
upgraded.

 

We should check the if the LISR request is resulting in a topic ID addition and 
add logic to logs already associated to partitions in replica manager.


> Upgrading to topic IDs in LISR requests has gaps introduced in 3.0
> --
>
> Key: KAFKA-13132
> URL: https://issues.apache.org/jira/browse/KAFKA-13132
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Major
>
> With the change in 3.0 to how topic IDs are assigned to logs, a bug was 
> inadvertently introduced. Now, topic IDs will only be assigned on the load of 
> the log to a partition in LISR requests. This means we will only assign topic 
> IDs for newly created topics/partitions, on broker startup, or potentially 
> when a partition is reassigned.
>  
> In the case of upgrading from an IBP before 2.8, we may have a scenario where 
> we upgrade the controller to IBP 3.0 (or even 2.8) last. (Ie, the controller 
> is IBP < 2.8 and all other brokers are on the newest IBP) Upon the last 
> broker upgrading, we will elect a new controller but its LISR request will 
> not result in topic IDs being assigned to logs of existing topics. They will 
> only be assigned in the cases mentioned above.
> *Keep in mind, in this scenario, topic IDs will be still be assigned in the 
> controller/ZK to all new and pre-existing topics and will show up in 
> metadata.*  This means we are not ensured the same 

[GitHub] [kafka] chia7712 commented on pull request #11128: MINOR: remove partition-level error from MetadataResponse#errorCounts

2021-07-25 Thread GitBox


chia7712 commented on pull request #11128:
URL: https://github.com/apache/kafka/pull/11128#issuecomment-886328806


   > Negative performance impact
   
   this is a good reason.
   
   > The count for NONE would be wrong
   
   As the "scoped error" can be various between responses, we need to count 
NONE case by case. 
   
   thanks for response. will update code later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] ijuma commented on pull request #11128: MINOR: remove partition-level error from MetadataResponse#errorCounts

2021-07-25 Thread GitBox


ijuma commented on pull request #11128:
URL: https://github.com/apache/kafka/pull/11128#issuecomment-886323968


   There are two problems:
   1. The count for NONE would be wrong
   2. Negative performance impact


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] chia7712 commented on pull request #11128: MINOR: remove partition-level error from MetadataResponse#errorCounts

2021-07-25 Thread GitBox


chia7712 commented on pull request #11128:
URL: https://github.com/apache/kafka/pull/11128#issuecomment-886321557


   > Are there other similar cases in the original PR or is this the only one?
   
   `StopReplicResponse` 
(https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/requests/StopReplicaResponse.java#L55)
 has similar issue. will take a look at others.
   
   For another, I'm thinking about better code style. As server-side does not 
add partition error to response when top-level gets error, `errorCounts` should 
be fine to collect all error code from all elements. It seems to me the only 
challenge is that the count of `NONE` may get higher.
   
   Could you share the regression you mentioned to me if I overlook the real 
problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] ijuma commented on pull request #11128: MINOR: remove partition-level error from MetadataResponse#errorCounts

2021-07-25 Thread GitBox


ijuma commented on pull request #11128:
URL: https://github.com/apache/kafka/pull/11128#issuecomment-886314473


   Thanks for the PR. Are there other similar cases in the original PR or is 
this the only one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-13135) Reduce GroupMetadata lock contention for offset commit requests

2021-07-25 Thread David Mao (Jira)
David Mao created KAFKA-13135:
-

 Summary: Reduce GroupMetadata lock contention for offset commit 
requests
 Key: KAFKA-13135
 URL: https://issues.apache.org/jira/browse/KAFKA-13135
 Project: Kafka
  Issue Type: Improvement
Reporter: David Mao


as suggested by [~lbradstreet], we can look for similar optimizations to 
https://issues.apache.org/jira/browse/KAFKA-13134 in the offset commit path.

It looks like there are some straightforward optimizations possible for the 
error path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] chia7712 commented on a change in pull request #9433: KAFKA-10607: Consistent behaviour for response errorCounts()

2021-07-25 Thread GitBox


chia7712 commented on a change in pull request #9433:
URL: https://github.com/apache/kafka/pull/9433#discussion_r676217594



##
File path: 
clients/src/main/java/org/apache/kafka/common/requests/MetadataResponse.java
##
@@ -109,8 +109,10 @@ public int throttleTimeMs() {
 @Override
 public Map errorCounts() {
 Map errorCounts = new HashMap<>();
-data.topics().forEach(metadata ->
-updateErrorCounts(errorCounts, 
Errors.forCode(metadata.errorCode(;
+data.topics().forEach(metadata -> {
+metadata.partitions().forEach(p -> updateErrorCounts(errorCounts, 
Errors.forCode(p.errorCode(;

Review comment:
   https://github.com/apache/kafka/pull/11127




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] chia7712 opened a new pull request #11128: MINOR: remove partition-level error from MetadataResponse#errorCounts

2021-07-25 Thread GitBox


chia7712 opened a new pull request #11128:
URL: https://github.com/apache/kafka/pull/11128


   see https://github.com/apache/kafka/pull/9433#discussion_r676083224
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] chia7712 commented on a change in pull request #9433: KAFKA-10607: Consistent behaviour for response errorCounts()

2021-07-25 Thread GitBox


chia7712 commented on a change in pull request #9433:
URL: https://github.com/apache/kafka/pull/9433#discussion_r676212229



##
File path: 
clients/src/main/java/org/apache/kafka/common/requests/MetadataResponse.java
##
@@ -109,8 +109,10 @@ public int throttleTimeMs() {
 @Override
 public Map errorCounts() {
 Map errorCounts = new HashMap<>();
-data.topics().forEach(metadata ->
-updateErrorCounts(errorCounts, 
Errors.forCode(metadata.errorCode(;
+data.topics().forEach(metadata -> {
+metadata.partitions().forEach(p -> updateErrorCounts(errorCounts, 
Errors.forCode(p.errorCode(;

Review comment:
   > A metadata request has topics as the "scoped error" (one cannot 
request a metadata request for a given partition).
   
   I check the server-side code. You are right. This change should be reverted. 
I will file a PR to fix it ASAP. This behavior can be broken inadvertently 
(server-side/client-side add/remove specifically scoped error). not sure 
whether we can prevent it effectively (maybe add more tests)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (KAFKA-5431) LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException

2021-07-25 Thread Shamsher Singh Rana (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386931#comment-17386931
 ] 

Shamsher Singh Rana edited comment on KAFKA-5431 at 7/25/21, 5:33 PM:
--

Hi [~mswathi] , please update the status of above issue.


was (Author: rana6627):
Hi Swathi, please update the status of above issue.

> LogCleaner stopped due to 
> org.apache.kafka.common.errors.CorruptRecordException
> ---
>
> Key: KAFKA-5431
> URL: https://issues.apache.org/jira/browse/KAFKA-5431
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Carsten Rietz
>Assignee: huxihx
>Priority: Major
>  Labels: reliability
> Fix For: 0.11.0.1, 1.0.0
>
>
> Hey all,
> i have a strange problem with our uat cluster of 3 kafka brokers.
> the __consumer_offsets topic was replicated to two instances and our disks 
> ran full due to a wrong configuration of the log cleaner. We fixed the 
> configuration and updated from 0.10.1.1 to 0.10.2.1 .
> Today i increased the replication of the __consumer_offsets topic to 3 and 
> triggered replication to the third cluster via kafka-reassign-partitions.sh. 
> That went well but i get many errors like
> {code}
> [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for 
> partition [__consumer_offsets,18] offset 0 error Record size is less than the 
> minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
> [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for 
> partition [__consumer_offsets,24] offset 0 error Record size is less than the 
> minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
> {code}
> Which i think are due to the full disk event.
> The log cleaner threads died on these wrong messages:
> {code}
> [2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> org.apache.kafka.common.errors.CorruptRecordException: Record size is less 
> than the minimum record overhead (14)
> [2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped  
> (kafka.log.LogCleaner)
> {code}
> Looking at the file is see that some are truncated and some are jsut empty:
> $ ls -lsh 00594653.log
> 0 -rw-r--r-- 1 user user 100M Jun 12 11:00 00594653.log
> Sadly i do not have the logs any more from the disk full event itsself.
> I have three questions:
> * What is the best way to clean this up? Deleting the old log files and 
> restarting the brokers?
> * Why did kafka not handle the disk full event well? Is this only affecting 
> the cleanup or may we also loose data?
> * Is this maybe caused by the combination of upgrade and disk full?
> And last but not least: Keep up the good work. Kafka is really performing 
> well while being easy to administer and has good documentation!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5431) LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException

2021-07-25 Thread Shamsher Singh Rana (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386931#comment-17386931
 ] 

Shamsher Singh Rana commented on KAFKA-5431:


Hi Swathi, please update the status of above issue.

> LogCleaner stopped due to 
> org.apache.kafka.common.errors.CorruptRecordException
> ---
>
> Key: KAFKA-5431
> URL: https://issues.apache.org/jira/browse/KAFKA-5431
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Carsten Rietz
>Assignee: huxihx
>Priority: Major
>  Labels: reliability
> Fix For: 0.11.0.1, 1.0.0
>
>
> Hey all,
> i have a strange problem with our uat cluster of 3 kafka brokers.
> the __consumer_offsets topic was replicated to two instances and our disks 
> ran full due to a wrong configuration of the log cleaner. We fixed the 
> configuration and updated from 0.10.1.1 to 0.10.2.1 .
> Today i increased the replication of the __consumer_offsets topic to 3 and 
> triggered replication to the third cluster via kafka-reassign-partitions.sh. 
> That went well but i get many errors like
> {code}
> [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for 
> partition [__consumer_offsets,18] offset 0 error Record size is less than the 
> minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
> [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for 
> partition [__consumer_offsets,24] offset 0 error Record size is less than the 
> minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
> {code}
> Which i think are due to the full disk event.
> The log cleaner threads died on these wrong messages:
> {code}
> [2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> org.apache.kafka.common.errors.CorruptRecordException: Record size is less 
> than the minimum record overhead (14)
> [2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped  
> (kafka.log.LogCleaner)
> {code}
> Looking at the file is see that some are truncated and some are jsut empty:
> $ ls -lsh 00594653.log
> 0 -rw-r--r-- 1 user user 100M Jun 12 11:00 00594653.log
> Sadly i do not have the logs any more from the disk full event itsself.
> I have three questions:
> * What is the best way to clean this up? Deleting the old log files and 
> restarting the brokers?
> * Why did kafka not handle the disk full event well? Is this only affecting 
> the cleanup or may we also loose data?
> * Is this maybe caused by the combination of upgrade and disk full?
> And last but not least: Keep up the good work. Kafka is really performing 
> well while being easy to administer and has good documentation!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] omkreddy merged pull request #11125: MINOR: Update `./gradlew allDepInsight` example in README

2021-07-25 Thread GitBox


omkreddy merged pull request #11125:
URL: https://github.com/apache/kafka/pull/11125


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] splett2 opened a new pull request #11127: KAFKA-13134: Give up group metadata lock before sending heartbeat response

2021-07-25 Thread GitBox


splett2 opened a new pull request #11127:
URL: https://github.com/apache/kafka/pull/11127


   ### What
   Small locking improvement to drop the group metadata lock before invoking 
the response callback.
   
   ### Testing
   Relying on existing unit tests since this is a minor change.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-13134) Heartbeat Request high lock contention

2021-07-25 Thread David Mao (Jira)
David Mao created KAFKA-13134:
-

 Summary: Heartbeat Request high lock contention
 Key: KAFKA-13134
 URL: https://issues.apache.org/jira/browse/KAFKA-13134
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: David Mao
Assignee: David Mao


On a cluster with high heartbeat rate, a lock profile showed high contention 
for the GroupMetadata lock.

We can significantly reduce this by invoking the response callback outside of 
the group metadata lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13133) Replace EasyMock and PowerMock with Mockito for AbstractHerderTest

2021-07-25 Thread YI-CHEN WANG (Jira)
YI-CHEN WANG created KAFKA-13133:


 Summary: Replace EasyMock and PowerMock with Mockito for 
AbstractHerderTest
 Key: KAFKA-13133
 URL: https://issues.apache.org/jira/browse/KAFKA-13133
 Project: Kafka
  Issue Type: Sub-task
Reporter: YI-CHEN WANG
Assignee: YI-CHEN WANG






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


permission request

2021-07-25 Thread kahn chen
Hi,

I'm interested to contribute on the kafka project,
jira user name: KahnCheny