tibrewalpratik17 commented on issue #12400: URL: https://github.com/apache/pinot/issues/12400#issuecomment-2162931989
> In the case of partial-upsert, an entire partition will be moved to another node, not just a few segments. As you mentioned, allSegmentsLoaded will prevent the consumption from starting on the new node until all the old segments for the partition are available, ensuring the data is re-consumed properly. There is a scenario in partial-upserts where some segments of a partition are moved during a rebalance, and the consuming segment continues consumption. This results in inconsistent data because it misses previous record references. If, instead of this consuming segment being rebalanced and reconsume entire data on new node, it commits the data, we would end up with different data across replicas. Thinking out loud, I can think of few solutions to this: - Approach 1: Ensure that segments are not committed during an ongoing rebalance. Whenever a server reaches the threshold for committing a segment (the `endCriteriaReached` method), it should check for an ongoing rebalance for the table and make a `hold` call to prevent committing. The `hold` method will sleep the thread for a certain period and then check again if the rebalance has finished. Note: This approach requires server-to-controller communication (only for partial-upsert tables) to determine if a rebalance is in progress when commit criteria is met everytime. Ingestion stays at the same offset in that partition until further asked to change state. - Approach 2: Similar to Approach 1, but the controller handles the holding logic in the SegmentCompletionManager#holdingConsumed method. Until a rebalance is completed, the controller will ask the servers to hold committing. Note: It is unclear what happens if the consuming segment is relocated to a new instance and the controller then asks the winner to proceed with the commit job. Will it result in a No-Op operation? Even in this ingestion stays at the same offset in that partition until notified by controller to change state. - Approach 3: Pause consumption for partial-upsert tables (pausing consumption triggers a force-commit as well). Then, perform the rebalance followed by resuming consumption. This entire process should be orchestrated within the rebalance job. Approach 1 and Approach 2: These approaches will only come into play when a commit is happening at a partition-level during an ongoing rebalance. This minimizes the likelihood of ingestion lag during or after the rebalance operation. Approach 3: This approach will cause ingestion lag across all partitions every time a rebalance is triggered for a partial-upsert table, as it pauses and resumes ingestion during the rebalance process. Additionally, it requires the rebalance operation to be more intrusive, as it needs to orchestrate the entire flow of pausing consumption, ensuring the current consuming segments are committed, and then resuming consumption after the rebalance is complete. IMO approach 2 might be neater as controller becomes the sole brain here to decide when to allow commit and when to not. cc @Jackie-Jiang @ankitsultana what are your thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org