J-HowHuang commented on code in PR #16341:
URL: https://github.com/apache/pinot/pull/16341#discussion_r2216614105
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -411,6 +416,31 @@ private RebalanceResult doRebalance(TableConfig
tableConfig, RebalanceConfig reb
tierToInstancePartitionsMap, rebalanceConfig);
}
}
+
+ // If peer-download is enabled, verify that for all segments with
changes in assignment, it is safe to rebalance
+ // Create the DataLossRiskAssessor which is used to check for data loss
scenarios if peer-download is enabled
+ // for a table. Skip this step if allowPeerDownloadDataLoss = true
+ if (peerSegmentDownloadScheme != null && !allowPeerDownloadDataLoss) {
+ // Setting minAvailableReplicas to 0 since that is what's equivalent
to downtime=true
+ DataLossRiskAssessor dataLossRiskAssessor = new
PeerDownloadTableDataLossRiskAssessor(tableNameWithType,
+ tableConfig, 0, _helixManager, _pinotLLCRealtimeSegmentManager);
+ for (Map.Entry<String, Map<String, String>> segmentToAssignment :
currentAssignment.entrySet()) {
+ String segmentName = segmentToAssignment.getKey();
+ Map<String, String> assignment = segmentToAssignment.getValue();
+ if (!assignment.equals(targetAssignment.get(segmentName))
+ && dataLossRiskAssessor.hasDataLossRisk(segmentName)) {
+ // Fail the rebalance if a segment with the potential for data
loss is found
+ String errorMsg = "Moving segment " + segmentName + " as part of
rebalance is risky for peer-download "
+ + "enabled tables, ensure the deep store has a copy of the
segment and if upsert / dedup enabled "
+ + "that it is completed and try again. It is recommended to
forceCommit and pause ingestion prior to "
+ + "rebalancing";
+ onReturnFailure(errorMsg, new IllegalStateException(errorMsg),
tableRebalanceLogger);
+ return new RebalanceResult(rebalanceJobId,
RebalanceResult.Status.FAILED, errorMsg, instancePartitionsMap,
+ tierToInstancePartitionsMap, targetAssignment,
preChecksResult, summaryResult);
+ }
+ }
+ }
+
Review Comment:
If `downtime=false` and `minAvailableReplicas > 1`, then yes at least one
original instance will not be sent `DROPPED` transition.
Generally when `minAvailableReplicas > 1`, there will be one instance up
that holds the data, be it `ONLINE` or `CONSUMING` in external view. If it's
`ONLINE` then it's good, we know there's at least an available copy to peer
download. On the other hand, if this only original instance is in `CONSUMING`,
it might be in a state other than `COMMITTED` or `RETAINED`, this is the time
where data loss can still happen because it does not hold the immutable
segment, and would need to download that from somewhere else (this is what the
current realtime segment state machine does)
So there's still chance for data loss even when `minAvailableReplicas > 1`,
but these are edge cases and hard to handle. I think @somandal here in this PR
only focus on `minAvailableReplicas=0` or `downtime=true`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]