kbendick opened a new pull request #1627: URL: https://github.com/apache/iceberg/pull/1627
I have been looking into adding support for a proper Spark Structured Streaming V2 Data Source I can see that there was previously a PR, and there was a large amount of discussion around that PR. However, that PR goes all the way back to February 13th. I would love to pick up where that work left off if @jerryshao is not working on it. I can see there was a very large amount of discussion and it appears as though some code that was suggested by @aokolnychyi and others was added in. I started by looking through some of the current code and I noticed this (likely) doc typo. I also added in two tests for behavior checks as I was exploring the behavior of the `MicroBatches.from` builder. I'd like to have one implementation that could eventually possibly be shared by Flink and continuous streaming, as the implementaton is size based and could be dynamic as well, but supporting Trigger.Once() and then triggering sized based batches with deletes in Spark should be prioritized imo as Flink is already working well. Though it should definitely be a source of inspiration when possible. Please let me know what you think and I'll open a ticket cc @rdblue @aokolnychyi @HeartSaVioR @RussellSpitzer ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
