kbendick opened a new issue #1628:
URL: https://github.com/apache/iceberg/issues/1628


   We currently cannot use Spark Structured Streaming for reading from an 
Iceberg table. This seems like a very common need.
   
   We were previously discussing the issue quite a lot back in February, but 
discussion on the topic has slowed down quite a lot.
   
   Some of the code that was discussed has been merged in and I'd like to 
revisit this.
   
   The original PR which has become inactive: 
https://github.com/apache/iceberg/pull/796
   
   My first PR which fixes some documentation in the MicroBatch builder class 
and adds some tests to start ensuring that the functionality which has been 
merged in is working and to start sussing out corner cases. 
https://github.com/apache/iceberg/pull/1627
   
   I will work off of the conversation that has been in 
https://github.com/apache/iceberg/pull/796 as well as what I can find in Slack 
and then I hoped we could revisit issues as they arise. But a decent amount of 
the previously proposed code has been merged in, so I'd like to take a stab at 
piecing it together from the discussion that was previously had as well as 
changes I think that will be needed to support various scenarios, like deletes, 
different triggers, the global watermark, the per stream watermark that's 
declared etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to