marvinlanhenke commented on issue #329: URL: https://github.com/apache/iceberg-rust/issues/329#issuecomment-2041548391
> I'm not sure whether my understanding is correct: The target of `table.append()` is used to insert a batch of data into the table. It's seems like a high level API which will use two lower API: > > 1. [writer API](https://github.com/apache/iceberg-rust/issues/34) for convert RecordBatch to DataFile > 2. [transaction API](https://github.com/apache/iceberg-rust/blob/ca9de89ac9d95683c8fe9191f72ab922dc4c7672/crates/iceberg/src/transaction.rs#L30) for commit the DataFile(update the table metadata) > > To separate these two interfaces, I think we don't need to delegate the conversion between `RecordBatch` and `DataFile` in the transaction. I think your understanding is correct - and I agree if the writer API already does the conversion from RecordBatch to DataFile, the Transaction shouldn't be concerned with this issue, since it is a higher-level API. However, the Transaction calls the writer that writes the actual DataFile, which seems reasonable. So the Transaction `append` (if I understand the py impl correctly) does all of those things: - calling the writer to write the DataFile - create an instance of MergingSnapshotProducer -> responsible for writing the manifest, manifest_list, snapshot_update - commit -> update_table() on the Catalog with TableUpdate & TableRequirements @ZENOTME Where would the writer API (which I only know from the design spec in #34) fit best here? Should a Transaction create a new writer everytime a new transaction is created? Or should the Table itself hold a ref to a writer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org