adamyasharma2797 opened a new pull request, #10375: URL: https://github.com/apache/iceberg/pull/10375
IcebergStreamWriter and IcebergFilesCommitter class have support for writing to only one table. And these tables have to be known before instantiating the pipeline, i.e. at compile time itself. However, there are use cases where a single stream of data needs to write to different tables. More importantly, there can be use cases where new version of payloads (with different schema) can come to the pipleline since it has started running. These new payloads would require run time discoverability of the table they should write to. This PR creates 4 new classes: 1. MultiTableStreamWriter: To write to multiple iceberg tables 2. MultiTableFileCommitter: To commit files in each iceberg table 3. TableAwareWriteResult: To capture mapping between files written by MultiTableIcebergWriter and Iceberg Tables 4. PayloadSinkProvider: This a new interface that has been added. This enables users to provide iceberg table for writing at run time. Users can read the record and based on it take decision to write to a table. This has one function getOrCreateTable which can ensure capability to do operations of creating table externally. UTs and Integration Tests for all the new use cases have been added. We have tested this setup in production environment. Please suggest if you see any issues with the proposed changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org