Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-13 Thread via GitHub


ableegoldman commented on PR #15909:
URL: https://github.com/apache/kafka/pull/15909#issuecomment-2109082249

   Merged to trunk
   
   thanks @lianetm !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-13 Thread via GitHub


ableegoldman merged PR #15909:
URL: https://github.com/apache/kafka/pull/15909


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-13 Thread via GitHub


lianetm commented on PR #15909:
URL: https://github.com/apache/kafka/pull/15909#issuecomment-2107510189

   Build completed with 12 unrelated test failures. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on PR #15909:
URL: https://github.com/apache/kafka/pull/15909#issuecomment-2105274302

   Thanks all for the helpful feedback! Let's wait for the build and we should 
be good @mjsax


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1597219039


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -193,11 +193,8 @@ public NetworkClientDelegate.PollResult poll(long 
currentTimeMs) {
 }
 pollTimer.update(currentTimeMs);
 if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) {
-logger.warn("Consumer poll timeout has expired. This means the 
time between " +

Review Comment:
   Done, I did like the simplified log but totally agree with your points, 
both. I've been myself pushing for avoiding changing the existing logs content 
when possible because I've also heard about customers basing their apps on 
them. Also agree about the more complete output on the case of not hitting the 
next poll in a sensible time. 
   
   So left the log here unchanged (and simplified the other just to not repeat 
ourselves on the 2 logs). So in the common case that we end up with the 2 log 
lines, it's just a first one about the situation when it happens, and the 2nd 
one with the approximate exceeded time when we have the most accurate info. 
Makes sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1597219039


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -193,11 +193,8 @@ public NetworkClientDelegate.PollResult poll(long 
currentTimeMs) {
 }
 pollTimer.update(currentTimeMs);
 if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) {
-logger.warn("Consumer poll timeout has expired. This means the 
time between " +

Review Comment:
   Done, I did like the simplified log but totally agree with your points, 
both. I've been myself pushing for avoiding changing the existing logs content 
when possible because I've also heard about customers basing their apps on 
them. Also agree about the more complete output on the case of not hitting the 
next poll. 
   
   So left the log here unchanged (and simplified the other just to not repeat 
ourselves on the 2 logs). So in the common case that we end up with the 2 log 
lines, it's just a first one about the situation when it happens, and the 2nd 
one with the approximate exceeded time when we have the most accurate info. 
Makes sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


ableegoldman commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1597202905


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -193,11 +193,8 @@ public NetworkClientDelegate.PollResult poll(long 
currentTimeMs) {
 }
 pollTimer.update(currentTimeMs);
 if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) {
-logger.warn("Consumer poll timeout has expired. This means the 
time between " +

Review Comment:
   Can you actually leave this log untouched? On the one hand I kind of agree 
with this simplification, and logs are by no means a part of the public 
contract, but I know for a fact that some people have built observation tools 
and/or dashboards for things like rebalancing issues by searching for relevant 
log strings such as this one (I know because I built one myself a long time ago)
   
   I don't feel super strongly about this so  I won't push back if you'd prefer 
to clean it up, but imo it doesn't hurt to leave the log here as well
   
   Also: in some extreme cases, eg an infinite loop in a user's processing 
logic, the consumer might never return to call `poll` at all. In less extreme 
cases, eg some kind of long processing that takes on the order of minutes per 
record, it might be a very very long time before the consumer gets back to poll 
and logs the message you added. For the latter case, I think it would be 
valuable to keep this part about increasing the max.poll.interval or lowering 
the max.poll.records in the message we log here, when the max poll interval is 
first missed, so that users know what to do immediately and don't have to wait 
until they actually get through all 1000 records (or whatever max.poll.records 
is set to) and finally return to poll to see a hint about which configs to 
change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


AndrewJSchofield commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1597080823


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) {
  * member to {@link MemberState#JOINING}, so that it rejoins the group.
  */
 public void resetPollTimer(final long pollMs) {
+pollTimer.update(pollMs);
 if (pollTimer.isExpired()) {

Review Comment:
   Yes, makes sense. When I was reviewing the previous iteration, I found 
myself looking within the Timer at the internal variables and then trying to 
figure out whether the derivation being performed was valid. Makes sense to do 
it within the Timer. Perfectly happy with 2 methods like this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on PR #15909:
URL: https://github.com/apache/kafka/pull/15909#issuecomment-2104982949

   Just to clarify what we're getting here, related to @AndrewJSchofield 's 
very valid point. With this we get the time between internal poll events, which 
do not translate exactly to calls to consumer.poll depending on the situation. 
So the log here will be very helpful to tune the config in cases where the 
delay that led to leaving the group was due to the client app taking too long 
to process messages after a call to poll. It would be less accurate in cases 
where the delay is due to the fetch not getting messages for instance, since we 
internally generate more poll events while at it. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1597000322


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) {
  * member to {@link MemberState#JOINING}, so that it rejoins the group.
  */
 public void resetPollTimer(final long pollMs) {
+pollTimer.update(pollMs);
 if (pollTimer.isExpired()) {

Review Comment:
   agree, makes total sense, so moved the calculation to the timer, with an 
`isExpiredBy`. Small twist to what I understand you were suggesting, I kept the 
`isExpired` check, just to avoid having to deal with the logic of deducing if 
the timer is expired based on the `isExpiredBy` on the HBManager. Seems better 
to let the timer know the semantics of when it's considered expired (it does 
consider >= for instance, so just avoiding to bring those semantics into the 
HBManager). Makes sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1596994382


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) {
  * member to {@link MemberState#JOINING}, so that it rejoins the group.
  */
 public void resetPollTimer(final long pollMs) {
+pollTimer.update(pollMs);
 if (pollTimer.isExpired()) {
-logger.debug("Poll timer has been reset after it had expired");
+logger.warn("Time between subsequent calls to poll() was longer 
than the configured" +
+"max.poll.interval.ms, exceeded by %s ms. This typically 
implies that the " +

Review Comment:
   yeap, my bad, I had found it too so it's fixed in a commit above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


AndrewJSchofield commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1596861973


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) {
  * member to {@link MemberState#JOINING}, so that it rejoins the group.
  */
 public void resetPollTimer(final long pollMs) {
+pollTimer.update(pollMs);
 if (pollTimer.isExpired()) {
-logger.debug("Poll timer has been reset after it had expired");
+logger.warn("Time between subsequent calls to poll() was longer 
than the configured" +
+"max.poll.interval.ms, exceeded by %s ms. This typically 
implies that the " +

Review Comment:
   `{}`?



##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -255,11 +257,15 @@ public long maximumTimeToWait(long currentTimeMs) {
  * member to {@link MemberState#JOINING}, so that it rejoins the group.
  */
 public void resetPollTimer(final long pollMs) {
+pollTimer.update(pollMs);
 if (pollTimer.isExpired()) {

Review Comment:
   I would rather have a method added to `Timer` such as `long hasExpiredBy()` 
so the check for expiration and the calculation of by how much is encapsulated 
in the timer itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on PR #15909:
URL: https://github.com/apache/kafka/pull/15909#issuecomment-2104687487

   Done, so I simplified what we log when the background thread realizes time's 
up and leaves the group to rejoin eventually (that's all the relevant info at 
that point). I then moved the log that details the expired max.poll.interval to 
the place where we can give a more accurate exceeded time, which is on the next 
app poll event that the background handles. Also updated the test to make sure 
it checks not only how the exceed time is calculated, but also **where** it is 
calculated. Makes sense? More accurate now indeed, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1596716965


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -193,11 +193,12 @@ public NetworkClientDelegate.PollResult poll(long 
currentTimeMs) {
 }
 pollTimer.update(currentTimeMs);
 if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) {
-logger.warn("Consumer poll timeout has expired. This means the 
time between " +
+logger.warn("Consumer poll timeout has expired, exceeded by {} ms. 
This means the time between " +

Review Comment:
   Hey, good point, it would actually take this a step further, where indeed 
should be more useful. As @AndrewJSchofield pointed, the HB manager will notice 
sooner in practice (even sooner than the HB interval), but we do know when the 
next poll happens, so can definitely get a more accurate exceed time 
(in-between calls to poll, which translates to poll events handled in this same 
manager). On it...thanks for the comments!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


lianetm commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1596716965


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -193,11 +193,12 @@ public NetworkClientDelegate.PollResult poll(long 
currentTimeMs) {
 }
 pollTimer.update(currentTimeMs);
 if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) {
-logger.warn("Consumer poll timeout has expired. This means the 
time between " +
+logger.warn("Consumer poll timeout has expired, exceeded by {} ms. 
This means the time between " +

Review Comment:
   Hey, good point, it would actually take this a step further, where indeed 
should be more useful. As @AndrewJSchofield pointed, the HB manager will notice 
sooner in practice (even sooner than the HB interval), but we do know when the 
next poll happens, so can definitely get a more accurate exceed time 
(in-between calls to poll, which translate to poll events handled in this same 
manager). On it...thanks for the comments!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-10 Thread via GitHub


AndrewJSchofield commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1596418576


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -193,11 +193,12 @@ public NetworkClientDelegate.PollResult poll(long 
currentTimeMs) {
 }
 pollTimer.update(currentTimeMs);
 if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) {
-logger.warn("Consumer poll timeout has expired. This means the 
time between " +
+logger.warn("Consumer poll timeout has expired, exceeded by {} ms. 
This means the time between " +

Review Comment:
   I see what you mean. I think that the background thread will notice more 
quickly than you said, but this just means the "time exceeded" is going to be 
very close to max poll interval. The heartbeat request manager checks to see 
whether it is time to send a heartbeat more regularly than it actually sends a 
heartbeat.
   
   Maybe enhancing the logging in `HeartbeatRequestManager.resetPollTimer` 
would be a suitable point. This is where the heartbeat request manager will 
notice that it has already left the group because of delinquent polling, and 
rejoins when the next poll occurs. @lianetm that's probably workable I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-09 Thread via GitHub


ableegoldman commented on code in PR #15909:
URL: https://github.com/apache/kafka/pull/15909#discussion_r1595957783


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##
@@ -193,11 +193,12 @@ public NetworkClientDelegate.PollResult poll(long 
currentTimeMs) {
 }
 pollTimer.update(currentTimeMs);
 if (pollTimer.isExpired() && !membershipManager.isLeavingGroup()) {
-logger.warn("Consumer poll timeout has expired. This means the 
time between " +
+logger.warn("Consumer poll timeout has expired, exceeded by {} ms. 
This means the time between " +

Review Comment:
   IIUC this is what gets logged when the _heartbeat thread_ notices the 
consumer has failed to poll in time and dropped out of the group -- so the 
"time exceeded" is just going to be roughly the max poll interval + the 
heartbeat interval, no?
   
   I do think it's a great idea to log the amount of time by which the max poll 
interval was exceeded, but imo the more useful information is how long after 
the max poll interval the consumer took to actually hit poll again, not how 
long the heartbeat thread took to notice it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16695: Improve expired poll logging [kafka]

2024-05-09 Thread via GitHub


lianetm commented on PR #15909:
URL: https://github.com/apache/kafka/pull/15909#issuecomment-2102939146

   Hey @mjsax , here is the improved logging following your suggestion, helpful 
indeed I expect. Would you have a chance to take a look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org