adamyasharma2797 opened a new pull request, #10375:
URL: https://github.com/apache/iceberg/pull/10375

   IcebergStreamWriter and IcebergFilesCommitter class have support for writing 
to only one table. And these tables have to be known before instantiating the 
pipeline, i.e. at compile time itself.
   However, there are use cases where a single stream of data needs to write to 
different tables. 
   More importantly, there can be use cases where new version of payloads (with 
different schema) can come to the pipleline since it has started running. These 
new payloads would require run time discoverability of the table they should 
write to.
   This PR creates 4 new classes:
   1. MultiTableStreamWriter: To write to multiple iceberg tables
   2. MultiTableFileCommitter: To commit files in each iceberg table
   3. TableAwareWriteResult: To capture mapping between files written by 
MultiTableIcebergWriter and Iceberg Tables
   4. PayloadSinkProvider: This a new interface that has been added. This 
enables users to provide iceberg table for writing at run time. Users can read 
the record and based on it take decision to write to a table. This has one 
function getOrCreateTable which can ensure capability to do operations of 
creating table externally.
   
   UTs and Integration Tests for all the new  use cases have been added. We 
have tested this setup in production environment. Please suggest if you see any 
issues with the proposed changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to