Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
ableegoldman commented on PR #15909: URL: https://github.com/apache/kafka/pull/15909#issuecomment-2109082249 Merged to trunk thanks @lianetm ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
ableegoldman merged PR #15909: URL: https://github.com/apache/kafka/pull/15909 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on PR #15909: URL: https://github.com/apache/kafka/pull/15909#issuecomment-2107510189 Build completed with 12 unrelated test failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on PR #15909: URL: https://github.com/apache/kafka/pull/15909#issuecomment-2105274302 Thanks all for the helpful feedback! Let's wait for the build and we should be good @mjsax -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1597219039 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -193,11 +193,8 @@ public NetworkClientDelegate.PollResult poll(long currentTimeMs) { } pollTimer.update(currentTimeMs); if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) { -logger.warn("Consumer poll timeout has expired. This means the time between " + Review Comment: Done, I did like the simplified log but totally agree with your points, both. I've been myself pushing for avoiding changing the existing logs content when possible because I've also heard about customers basing their apps on them. Also agree about the more complete output on the case of not hitting the next poll in a sensible time. So left the log here unchanged (and simplified the other just to not repeat ourselves on the 2 logs). So in the common case that we end up with the 2 log lines, it's just a first one about the situation when it happens, and the 2nd one with the approximate exceeded time when we have the most accurate info. Makes sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1597219039 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -193,11 +193,8 @@ public NetworkClientDelegate.PollResult poll(long currentTimeMs) { } pollTimer.update(currentTimeMs); if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) { -logger.warn("Consumer poll timeout has expired. This means the time between " + Review Comment: Done, I did like the simplified log but totally agree with your points, both. I've been myself pushing for avoiding changing the existing logs content when possible because I've also heard about customers basing their apps on them. Also agree about the more complete output on the case of not hitting the next poll. So left the log here unchanged (and simplified the other just to not repeat ourselves on the 2 logs). So in the common case that we end up with the 2 log lines, it's just a first one about the situation when it happens, and the 2nd one with the approximate exceeded time when we have the most accurate info. Makes sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
ableegoldman commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1597202905 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -193,11 +193,8 @@ public NetworkClientDelegate.PollResult poll(long currentTimeMs) { } pollTimer.update(currentTimeMs); if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) { -logger.warn("Consumer poll timeout has expired. This means the time between " + Review Comment: Can you actually leave this log untouched? On the one hand I kind of agree with this simplification, and logs are by no means a part of the public contract, but I know for a fact that some people have built observation tools and/or dashboards for things like rebalancing issues by searching for relevant log strings such as this one (I know because I built one myself a long time ago) I don't feel super strongly about this so I won't push back if you'd prefer to clean it up, but imo it doesn't hurt to leave the log here as well Also: in some extreme cases, eg an infinite loop in a user's processing logic, the consumer might never return to call `poll` at all. In less extreme cases, eg some kind of long processing that takes on the order of minutes per record, it might be a very very long time before the consumer gets back to poll and logs the message you added. For the latter case, I think it would be valuable to keep this part about increasing the max.poll.interval or lowering the max.poll.records in the message we log here, when the max poll interval is first missed, so that users know what to do immediately and don't have to wait until they actually get through all 1000 records (or whatever max.poll.records is set to) and finally return to poll to see a hint about which configs to change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
AndrewJSchofield commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1597080823 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) { * member to {@link MemberState#JOINING}, so that it rejoins the group. */ public void resetPollTimer(final long pollMs) { +pollTimer.update(pollMs); if (pollTimer.isExpired()) { Review Comment: Yes, makes sense. When I was reviewing the previous iteration, I found myself looking within the Timer at the internal variables and then trying to figure out whether the derivation being performed was valid. Makes sense to do it within the Timer. Perfectly happy with 2 methods like this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on PR #15909: URL: https://github.com/apache/kafka/pull/15909#issuecomment-2104982949 Just to clarify what we're getting here, related to @AndrewJSchofield 's very valid point. With this we get the time between internal poll events, which do not translate exactly to calls to consumer.poll depending on the situation. So the log here will be very helpful to tune the config in cases where the delay that led to leaving the group was due to the client app taking too long to process messages after a call to poll. It would be less accurate in cases where the delay is due to the fetch not getting messages for instance, since we internally generate more poll events while at it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1597000322 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) { * member to {@link MemberState#JOINING}, so that it rejoins the group. */ public void resetPollTimer(final long pollMs) { +pollTimer.update(pollMs); if (pollTimer.isExpired()) { Review Comment: agree, makes total sense, so moved the calculation to the timer, with an `isExpiredBy`. Small twist to what I understand you were suggesting, I kept the `isExpired` check, just to avoid having to deal with the logic of deducing if the timer is expired based on the `isExpiredBy` on the HBManager. Seems better to let the timer know the semantics of when it's considered expired (it does consider >= for instance, so just avoiding to bring those semantics into the HBManager). Makes sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1596994382 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) { * member to {@link MemberState#JOINING}, so that it rejoins the group. */ public void resetPollTimer(final long pollMs) { +pollTimer.update(pollMs); if (pollTimer.isExpired()) { -logger.debug("Poll timer has been reset after it had expired"); +logger.warn("Time between subsequent calls to poll() was longer than the configured" + +"max.poll.interval.ms, exceeded by %s ms. This typically implies that the " + Review Comment: yeap, my bad, I had found it too so it's fixed in a commit above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
AndrewJSchofield commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1596861973 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) { * member to {@link MemberState#JOINING}, so that it rejoins the group. */ public void resetPollTimer(final long pollMs) { +pollTimer.update(pollMs); if (pollTimer.isExpired()) { -logger.debug("Poll timer has been reset after it had expired"); +logger.warn("Time between subsequent calls to poll() was longer than the configured" + +"max.poll.interval.ms, exceeded by %s ms. This typically implies that the " + Review Comment: `{}`? ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) { * member to {@link MemberState#JOINING}, so that it rejoins the group. */ public void resetPollTimer(final long pollMs) { +pollTimer.update(pollMs); if (pollTimer.isExpired()) { Review Comment: I would rather have a method added to `Timer` such as `long hasExpiredBy()` so the check for expiration and the calculation of by how much is encapsulated in the timer itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on PR #15909: URL: https://github.com/apache/kafka/pull/15909#issuecomment-2104687487 Done, so I simplified what we log when the background thread realizes time's up and leaves the group to rejoin eventually (that's all the relevant info at that point). I then moved the log that details the expired max.poll.interval to the place where we can give a more accurate exceeded time, which is on the next app poll event that the background handles. Also updated the test to make sure it checks not only how the exceed time is calculated, but also **where** it is calculated. Makes sense? More accurate now indeed, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1596716965 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -193,11 +193,12 @@ public NetworkClientDelegate.PollResult poll(long currentTimeMs) { } pollTimer.update(currentTimeMs); if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) { -logger.warn("Consumer poll timeout has expired. This means the time between " + +logger.warn("Consumer poll timeout has expired, exceeded by {} ms. This means the time between " + Review Comment: Hey, good point, it would actually take this a step further, where indeed should be more useful. As @AndrewJSchofield pointed, the HB manager will notice sooner in practice (even sooner than the HB interval), but we do know when the next poll happens, so can definitely get a more accurate exceed time (in-between calls to poll, which translates to poll events handled in this same manager). On it...thanks for the comments! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1596716965 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -193,11 +193,12 @@ public NetworkClientDelegate.PollResult poll(long currentTimeMs) { } pollTimer.update(currentTimeMs); if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) { -logger.warn("Consumer poll timeout has expired. This means the time between " + +logger.warn("Consumer poll timeout has expired, exceeded by {} ms. This means the time between " + Review Comment: Hey, good point, it would actually take this a step further, where indeed should be more useful. As @AndrewJSchofield pointed, the HB manager will notice sooner in practice (even sooner than the HB interval), but we do know when the next poll happens, so can definitely get a more accurate exceed time (in-between calls to poll, which translate to poll events handled in this same manager). On it...thanks for the comments! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
AndrewJSchofield commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1596418576 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -193,11 +193,12 @@ public NetworkClientDelegate.PollResult poll(long currentTimeMs) { } pollTimer.update(currentTimeMs); if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) { -logger.warn("Consumer poll timeout has expired. This means the time between " + +logger.warn("Consumer poll timeout has expired, exceeded by {} ms. This means the time between " + Review Comment: I see what you mean. I think that the background thread will notice more quickly than you said, but this just means the "time exceeded" is going to be very close to max poll interval. The heartbeat request manager checks to see whether it is time to send a heartbeat more regularly than it actually sends a heartbeat. Maybe enhancing the logging in `HeartbeatRequestManager.resetPollTimer` would be a suitable point. This is where the heartbeat request manager will notice that it has already left the group because of delinquent polling, and rejoins when the next poll occurs. @lianetm that's probably workable I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
ableegoldman commented on code in PR #15909: URL: https://github.com/apache/kafka/pull/15909#discussion_r1595957783 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ## @@ -193,11 +193,12 @@ public NetworkClientDelegate.PollResult poll(long currentTimeMs) { } pollTimer.update(currentTimeMs); if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) { -logger.warn("Consumer poll timeout has expired. This means the time between " + +logger.warn("Consumer poll timeout has expired, exceeded by {} ms. This means the time between " + Review Comment: IIUC this is what gets logged when the _heartbeat thread_ notices the consumer has failed to poll in time and dropped out of the group -- so the "time exceeded" is just going to be roughly the max poll interval + the heartbeat interval, no? I do think it's a great idea to log the amount of time by which the max poll interval was exceeded, but imo the more useful information is how long after the max poll interval the consumer took to actually hit poll again, not how long the heartbeat thread took to notice it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]
lianetm commented on PR #15909: URL: https://github.com/apache/kafka/pull/15909#issuecomment-2102939146 Hey @mjsax , here is the improved logging following your suggestion, helpful indeed I expect. Would you have a chance to take a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org