Re: [PR] Write support [iceberg-python]

via GitHub Sun, 17 Dec 2023 11:14:25 -0800


rdblue commented on code in PR #41:
URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429246160



##########
pyiceberg/table/__init__.py:
##########
@@ -830,6 +884,49 @@ def history(self) -> List[SnapshotLogEntry]:
     def update_schema(self, allow_incompatible_changes: bool = False, 
case_sensitive: bool = True) -> UpdateSchema:
         return UpdateSchema(self, 
allow_incompatible_changes=allow_incompatible_changes, 
case_sensitive=case_sensitive)
 
+    def write_arrow(self, df: pa.Table, mode: Literal['append', 'overwrite'] = 
'overwrite') -> None:
+        if len(self.spec().fields) > 0:
+            raise ValueError("Currently only unpartitioned tables are 
supported")
+
+        snapshot_id = self.new_snapshot_id()
+        parent_snapshot_id = current_snapshot.snapshot_id if (current_snapshot 
:= self.current_snapshot()) else None
+
+        data_files = _dataframe_to_data_files(self, snapshot_id=snapshot_id, 
df=df)
+        merge = _MergeAppend(table=self, snapshot_id=snapshot_id)

Review Comment:
   I think that much of the logic here should be in `MergeAppend` instead of 
here so that the logic can be reused. The responsibility of this method is to 
create data files and configure an append or overwrite. The logic that writes 
files, updates summaries, etc. should be in a generic update class that can be 
shared.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Write support [iceberg-python]

Reply via email to