rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429246160
########## pyiceberg/table/__init__.py: ########## @@ -830,6 +884,49 @@ def history(self) -> List[SnapshotLogEntry]: def update_schema(self, allow_incompatible_changes: bool = False, case_sensitive: bool = True) -> UpdateSchema: return UpdateSchema(self, allow_incompatible_changes=allow_incompatible_changes, case_sensitive=case_sensitive) + def write_arrow(self, df: pa.Table, mode: Literal['append', 'overwrite'] = 'overwrite') -> None: + if len(self.spec().fields) > 0: + raise ValueError("Currently only unpartitioned tables are supported") + + snapshot_id = self.new_snapshot_id() + parent_snapshot_id = current_snapshot.snapshot_id if (current_snapshot := self.current_snapshot()) else None + + data_files = _dataframe_to_data_files(self, snapshot_id=snapshot_id, df=df) + merge = _MergeAppend(table=self, snapshot_id=snapshot_id) Review Comment: I think that much of the logic here should be in `MergeAppend` instead of here so that the logic can be reused. The responsibility of this method is to create data files and configure an append or overwrite. The logic that writes files, updates summaries, etc. should be in a generic update class that can be shared. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org