kevinjqliu commented on code in PR #1607:
URL: https://github.com/apache/iceberg-python/pull/1607#discussion_r1947923652
##########
mkdocs/docs/configuration.md:
##########
@@ -63,6 +63,7 @@ Iceberg tables support table properties to configure table
behavior.
| `write.parquet.page-row-limit` | Number of rows
| 20000 | Set a target threshold for the maximum number
of rows within a column chunk
|
| `write.parquet.dict-size-bytes` | Size in bytes
| 2MB | Set the dictionary page size limit per row
group
|
| `write.metadata.previous-versions-max` | Integer
| 100 | The max number of previous version metadata
files to keep before deleting after commit.
|
+| `write.metadata.delete-after-commit.enabled` | Boolean
| False | Whether to automatically delete old *tracked*
metadata files after each table commit. It will retain a number of the most
recent metadata files, which can be set using property
`write.metadata.previous-versions-max`. |
Review Comment:
👍
https://iceberg.apache.org/docs/1.6.0/maintenance/#remove-old-metadata-files
##########
pyiceberg/catalog/__init__.py:
##########
@@ -858,6 +860,12 @@ def _update_and_stage_table(
enforce_validation=current_table is None,
metadata_location=current_table.metadata_location if current_table
else None,
)
+ io = self._load_file_io(properties=updated_metadata.properties,
location=updated_metadata.location)
+
+ #
https://github.com/apache/iceberg/blob/f6faa58/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L527
+ # delete old metadata if METADATA_DELETE_AFTER_COMMIT_ENABLED is set
to true
+ if current_table is not None:
+ self._delete_old_metadata(io, current_table.metadata,
updated_metadata)
Review Comment:
is this the right place for this operation?
looking at the java implementation, changes are committed to the table
before old metadata files are deleted
https://github.com/apache/iceberg/blob/d935460bbed5da0ac205a67912b70fa682ee84df/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L125-L126
At this point, changes are only applied to the table metadata but is not yet
committed
##########
pyiceberg/catalog/__init__.py:
##########
@@ -858,6 +860,12 @@ def _update_and_stage_table(
enforce_validation=current_table is None,
metadata_location=current_table.metadata_location if current_table
else None,
)
+ io = self._load_file_io(properties=updated_metadata.properties,
location=updated_metadata.location)
+
+ #
https://github.com/apache/iceberg/blob/f6faa58/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L527
+ # delete old metadata if METADATA_DELETE_AFTER_COMMIT_ENABLED is set
to true
+ if current_table is not None:
+ self._delete_old_metadata(io, current_table.metadata,
updated_metadata)
Review Comment:
the java implementation uses `deleteAfterCommit`
https://github.com/apache/iceberg/blob/d935460bbed5da0ac205a67912b70fa682ee84df/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L533
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]