jugomezv commented on PR #9994:
URL: https://github.com/apache/pinot/pull/9994#issuecomment-1403953161
> I know this image doesn't have a lot of context, but it's definitely in
milliseconds, and it seems this affects ~7/48 partitions for this topic.
Thanks a lot, the scale in the y-axis is that hours? days? Let me continue
to look in to the consume loop to see if there is other places where
consumption could be stuck that would lead to such increases.
I have looked at the consume loop code and have the following suggestions:
Can you enable debug logs: there is a wealth of debug/info traces in that
can help us tell the difference of consumption patterns between your partitions
and what leads to the ramping up times. Currently there are two places where
this code blocks: on fetching a message batch from stream (configurable timeout
described above) and right after we get an empty batch where we block for
100milliseconds.
We also have a number of other interesting metrics which you should
correlate with the graph above:
LLC_PARTITION_CONSUMING should indicate if the partition is actively
consuming or not
HIGHEST_STREAM_OFFSET_CONSUMED
REALTIME_ROWS_CONSUMED
INVALID_REALTIME_ROWS_DROPPED
INCOMPLETE_REALTIME_ROWS_CONSUMED
Also another question for you: is there any filtering of messages going on?
I noticed that if we do get a batch of messages and all are filtered the metric
could reflect the lag for the last unfiltered message
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]