RjLi13 opened a new pull request, #15223: URL: https://github.com/apache/iceberg/pull/15223
This is a PR to try to rewrite the history of Async Micro Batch Planner feature to make review easier. Each commit is separated to showcase the flow 1. SparkMicroBatchStream -> SyncSparkMicroBatchPlanner (this relocates the logic planning does to a new class) 2. Migrate duplicated code and circular deps to MicroBatchUtils and BaseSparkMicroBatchPlanner 3. Strip out code from SparkMicroBatchStream to leverage planner and microbatchutils, it becomes entry point for planners. 4. Restore all code to pr state to show any unnecessary changes and not deviate from what is reviewed Note I created a new commit history that deviates from the original commit history. Therefore some of the review comments were merged in to make it a little cleaner but doesn't showcase the original review process. To ensure the files are same as the PR, i used ` git checkout origin/async-micro-batch-planner-spark-3-5 -- <file name>` to ensure no changes. Also AsyncMicroBatchPlanner is the biggest and newest change here, with a background thread to put planned files into the queue to read. It borrows some elements of SparkMicroBatchStream, but is totally new in implementation. As always, credits go to Drew Goya who authored the original feature here at Netflix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
