somandal commented on code in PR #15368:
URL: https://github.com/apache/pinot/pull/15368#discussion_r2029139368
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceSummaryResult.java:
##########
@@ -306,18 +297,133 @@ public Map<String, ServerSegmentChangeInfo>
getServerSegmentChangeInfo() {
}
}
+ public static class ConsumingSegmentToBeMovedSummary {
+ private final int _numConsumingSegmentsToBeMoved;
+ private final int _numServersGettingConsumingSegmentsAdded;
+ private final Map<String, Integer>
_consumingSegmentsToBeMovedWithMostOffsetsToCatchUp;
+ private final Map<String, Integer>
_consumingSegmentsToBeMovedWithOldestAgeInMinutes;
+ private final Map<String, ConsumingSegmentSummaryPerServer>
_serverConsumingSegmentSummary;
+
+ /**
+ * Constructor for ConsumingSegmentToBeMovedSummary
+ * @param numConsumingSegmentsToBeMoved total number of consuming segments
to be moved as part of this rebalance
+ * @param numServersGettingConsumingSegmentsAdded maximum bytes of
consuming segments to be moved to catch up
+ * @param consumingSegmentsToBeMovedWithMostOffsetsToCatchUp top consuming
segments to be moved to catch up.
+ * Map from
segment name to its number of offsets to
+ * catch up on
the new server. This is essentially the
+ * difference
between the latest offset of the stream
+ * and the
segment's start offset of the stream. The
+ * map is set to
null if the number of offsets to catch
+ * up could not
be determined for at least one
+ * consuming
segment
+ * @param consumingSegmentsToBeMovedWithOldestAgeInMinutes oldest
consuming segments to be moved to catch up. Map
+ * from segment
name to its age in minutes. The map is
+ * set to null if
ZK metadata is not available or the
+ * creation time
is not found for at least one consuming
+ * segment.
+ * The age of a
segment is determined by its creation
+ * time from ZK
metadata. Segment age is an approximation
+ * to data age for
a consuming segment. It may not reflect
+ * the actual
oldest age of data in the consuming segment.
+ * For reasons, a
segment could consume events which date
+ * before the
segment created. We collect segment age
+ * here as there
is no obvious way to get the age of the
+ * oldest data in
the stream for a specific consuming
+ * segment.
+ *
consumingSegmentsToBeMovedWithMostOffsetsToCatchUp
+ * is a better
indicator to the cost of moving a consuming
+ * segment if it
presents.
Review Comment:
@J-HowHuang let's not add this
I think I've said this multiple times and I feel like my point is getting
completely lost
- Number of offsets to catch up gives information about how much work needs
to be done when the segment moves (and this is also not a very good estimate as
different Kafka topics can have different payload sizes which will have a large
effect on how much memory / cpu is used)
- Segment age, or any age related type data we're trying to track gives
ideas to the user about how stale the data will be when it first moves. This
can be useful if an application is sensitive to querying stale data. E.g.
application where staleness may not be tolerated well would be ad data for
example.
Is there a use case misunderstanding with this age stuff?
If you really want to call out the impact of each, please call out the above
two points rather than saying something is not important or useful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]