kbendick opened a new pull request #1627:
URL: https://github.com/apache/iceberg/pull/1627


   I have been looking into adding support for a proper Spark Structured 
Streaming V2 Data Source
   
   I can see that there was previously a PR, and there was a large amount of 
discussion around that PR. However, that PR goes all the way back to February 
13th.
   
   I would love to pick up where that work left off if @jerryshao is not 
working on it. I can see there was a very large amount of discussion and it 
appears as though some code that was suggested by @aokolnychyi and others was 
added in.
   
   I started by looking through some of the current code and I noticed this 
(likely) doc typo. I also added in two tests for behavior checks as I was 
exploring the behavior of the `MicroBatches.from` builder.
   
   I'd like to have one implementation that could eventually possibly be shared 
by Flink and continuous streaming, as the implementaton is size based and could 
be dynamic as well, but supporting Trigger.Once() and then triggering sized 
based batches with deletes in Spark should be prioritized imo as Flink is 
already working well. Though it should definitely be a source of inspiration 
when possible.
   
   Please let me know what you think and I'll open a ticket cc @rdblue 
@aokolnychyi @HeartSaVioR @RussellSpitzer


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to