Fokko commented on code in PR #506:
URL: https://github.com/apache/iceberg-python/pull/506#discussion_r1524358528
##########
pyiceberg/table/__init__.py:
##########
@@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter:
BooleanExpression = ALWAYS_T
for data_file in data_files:
update_snapshot.append_data_file(data_file)
+ def add_files(self, file_paths: List[str]) -> None:
+ """
+ Shorthand API for adding files as data files to the table.
+
+ Args:
+ file_paths: The list of full file paths to be added as data files
to the table
+ """
+ if any(not isinstance(field.transform, IdentityTransform) for field in
self.metadata.spec().fields):
+ raise NotImplementedError("Cannot add_files to a table with
Transform Partitions")
Review Comment:
We can be more permissive. It isn't a problem the table's current
partitioning has something different than a `IdentitiyTransform`, the issue is
that we cannot add DataFiles that use this partitioning (until we find a clever
way of checking this).
##########
pyiceberg/table/__init__.py:
##########
@@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter:
BooleanExpression = ALWAYS_T
for data_file in data_files:
update_snapshot.append_data_file(data_file)
+ def add_files(self, file_paths: List[str]) -> None:
+ """
+ Shorthand API for adding files as data files to the table.
+
+ Args:
+ file_paths: The list of full file paths to be added as data files
to the table
+ """
Review Comment:
It would be great to add a `Raises:` section here indicating which errors to
expect. For example, when a file cannot be found. In such a case, we want to
raise a PyIceberg exception, instead of an Arrow specific exception.
##########
pyiceberg/table/__init__.py:
##########
@@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter:
BooleanExpression = ALWAYS_T
for data_file in data_files:
update_snapshot.append_data_file(data_file)
+ def add_files(self, file_paths: List[str]) -> None:
+ """
+ Shorthand API for adding files as data files to the table.
+
+ Args:
+ file_paths: The list of full file paths to be added as data files
to the table
+ """
+ if any(not isinstance(field.transform, IdentityTransform) for field in
self.metadata.spec().fields):
+ raise NotImplementedError("Cannot add_files to a table with
Transform Partitions")
+
+ if self.name_mapping() is None:
Review Comment:
Technically you don't have to add a name-mapping if the field-IDs are set
##########
pyiceberg/table/__init__.py:
##########
@@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter:
BooleanExpression = ALWAYS_T
for data_file in data_files:
update_snapshot.append_data_file(data_file)
+ def add_files(self, file_paths: List[str]) -> None:
+ """
+ Shorthand API for adding files as data files to the table.
+
+ Args:
+ file_paths: The list of full file paths to be added as data files
to the table
+ """
+ if any(not isinstance(field.transform, IdentityTransform) for field in
self.metadata.spec().fields):
+ raise NotImplementedError("Cannot add_files to a table with
Transform Partitions")
+
+ if self.name_mapping() is None:
+ with self.transaction() as tx:
+ tx.set_properties(**{TableProperties.DEFAULT_NAME_MAPPING:
self.schema().name_mapping.model_dump_json()})
+
+ with self.transaction() as txn:
+ with txn.update_snapshot().fast_append() as update_snapshot:
Review Comment:
Now with https://github.com/apache/iceberg-python/pull/471 merged, this
should work in a single transaction. The updated metadata will be passed into
the UpdateSnapshot class and should pick up the name-mapping.
```suggestion
with tx.update_snapshot().fast_append() as update_snapshot:
```
I think it is important to have this operation in a single transaction,
otherwise, the name mapping might be set, and then if a file is missing, it
will fail and the name-mapping will still be there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]