marvinlanhenke opened a new issue, #329: URL: https://github.com/apache/iceberg-rust/issues/329
...out of curiosity, I took a closer look at the pyiceberg impl and how the `Table.append()` works. Now, I would like to pick your brain, in order to understand and track the next steps we have to take to support `append` as well (since we should be getting close to having write support). The goal here is, to extract and create actionable issues. Here is what I understand from the python impl so far (high-level): --- 1. we call `append()` on the Table class with our DataFrame: pa.Table and the snaphot_properties: Dict[str, str] 2. we create a `Transaction` that basically does two things: 2.1. It creates a `_MergingSnapshotProducer` which is (on a high-level) responsible for writing a new ManifestList, creating a new Snapshot (returned as AddSnaphotUpdate) 2.2 It calls `update_table` on the respective Catalog which creates a new metadata.json and returns the new metadata as well as the new metadata_location [pyiceberg-link](https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1314) Here is what I think we need to implement (rough sketch): --- 1. [impl](https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1314)`fn append(...)` on `struct Table`: This should probably accept a RecordBatch as a param, create a new `Transaction`, and delegates further action to the transaction. 2. [impl](https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L362)`fn append(...)` on `struct Transaction`: Receives RecordBatch and snapshot_properties. Performs validation checks. Converts the RecordBatch to a collection of `DataFiles` and creates a `_MergingSnapshotProducer` with the collection. 3. [impl](https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L2745)`_MergingSnapshotProducer`: :: write manifests (added, deleted, existing) :: get next_sequence_number from `TableMetadata` :: update snapshot summaries :: generate manifest_list_path :: write manifest_list :: create a new Snapshot :: return TableUpdate: AddSnapshot 4. impl `update_table` on the concrete Catalog implementations What could be possible Issues here? I think we need to start with the `_MergingSnapshotProducer` (possibly split into mutliple parts) and work our way up the list? Once we have the MergingSnapshotProducer, we can implement the append function on Transaction which basically orchestrates? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org