nicor88 commented on code in PR #140:
URL: https://github.com/apache/iceberg-python/pull/140#discussion_r1423647501
##########
pyiceberg/catalog/glue.py:
##########
@@ -177,6 +191,23 @@ def _create_glue_table(self, database_name: str,
table_name: str, table_input: T
except self.glue.exceptions.EntityNotFoundException as e:
raise NoSuchNamespaceError(f"Database {database_name} does not
exist") from e
+ def _update_glue_table(self, database_name: str, table_name: str,
table_input: TableInputTypeDef, version_id: str) -> None:
+ try:
+ self.glue.update_table(DatabaseName=database_name,
TableInput=table_input, VersionId=version_id)
Review Comment:
every time that a glue table is updated, a new version is created, and the
previous versions are retained by default. The amount of table versions per AWS
account is limited, and I've seen such limited reached many times specifically
when using iceberg - see also this issue:
https://github.com/dbt-athena/dbt-athena/issues/524 and this one
https://github.com/dbt-athena/dbt-athena/pull/522
I'm wondering if you considered setting `SkipArchive` to True by default?
-(refer to boto3
[docs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue/client/update_table.html#Glue.Client.update_table))
Alternatively, you can give the final user control over such parameter.
Previous table versions are only relevant for debugging e.g. spotting which
was the old metadata location, but not really helpful for operations like
snapshot rollback, where you need to use spark for it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]