HonahX commented on code in PR #140: URL: https://github.com/apache/iceberg-python/pull/140#discussion_r1438987608
########## pyiceberg/catalog/__init__.py: ########## @@ -587,8 +590,34 @@ def _write_metadata(metadata: TableMetadata, io: FileIO, metadata_path: str) -> ToOutputFile.table_metadata(metadata, io.new_output(metadata_path)) @staticmethod - def _get_metadata_location(location: str) -> str: - return f"{location}/metadata/00000-{uuid.uuid4()}.metadata.json" + def _get_metadata_location(location: str, new_version: int = 0) -> str: + if new_version < 0: + raise ValueError(f"Table metadata version: {new_version} cannot be negative") Review Comment: Thanks for the suggestion! Since this is an internal method, we can rely on the type check to ensure new_version is never None. I think one benefit of doing type-checking is that it reduces the number of `None` checks in our code. Also, I've incorporated your suggested change to the error message. Please let me know if you think the type-check is not enough and we'd better do the None-check here. Thanks! ########## pyiceberg/catalog/glue.py: ########## @@ -177,6 +198,28 @@ def _create_glue_table(self, database_name: str, table_name: str, table_input: T except self.glue.exceptions.EntityNotFoundException as e: raise NoSuchNamespaceError(f"Database {database_name} does not exist") from e + def _update_glue_table(self, database_name: str, table_name: str, table_input: TableInputTypeDef, version_id: str) -> None: + try: + self.glue.update_table( + DatabaseName=database_name, + TableInput=table_input, + SkipArchive=self.properties.get(GLUE_SKIP_ARCHIVE, GLUE_SKIP_ARCHIVE_DEFAULT), + VersionId=version_id, + ) + except self.glue.exceptions.EntityNotFoundException as e: + raise NoSuchTableError(f"Table does not exist: {database_name}.{table_name}") from e Review Comment: Thanks for the suggestion! Updated. ########## pyiceberg/catalog/glue.py: ########## @@ -247,8 +290,52 @@ def _commit_table(self, table_request: CommitTableRequest) -> CommitTableRespons Raises: NoSuchTableError: If a table with the given identifier does not exist. + CommitFailedException: If the commit failed. """ - raise NotImplementedError + identifier_tuple = self.identifier_to_tuple_without_catalog( + tuple(table_request.identifier.namespace.root + [table_request.identifier.name]) + ) + database_name, table_name = self.identifier_to_database_and_table(identifier_tuple) + + current_glue_table = self._get_glue_table(database_name=database_name, table_name=table_name) + glue_table_version_id = current_glue_table.get("VersionId") + if glue_table_version_id is None: Review Comment: Thanks for the suggestion! I agree it is more robust to use `not` here. `glue_table_version_id` is a string so it could be an empty string when version id is unavailable ########## pyiceberg/catalog/__init__.py: ########## @@ -587,8 +590,34 @@ def _write_metadata(metadata: TableMetadata, io: FileIO, metadata_path: str) -> ToOutputFile.table_metadata(metadata, io.new_output(metadata_path)) @staticmethod - def _get_metadata_location(location: str) -> str: - return f"{location}/metadata/00000-{uuid.uuid4()}.metadata.json" + def _get_metadata_location(location: str, new_version: int = 0) -> str: + if new_version < 0: + raise ValueError(f"Table metadata version: {new_version} cannot be negative") + version_str = f"{new_version:05d}" + return f"{location}/metadata/{version_str}-{uuid.uuid4()}.metadata.json" + + @staticmethod + def _parse_metadata_version(metadata_location: str) -> int: + """Parse the version from the metadata location. + + The version is the first part of the file name, before the first dash. + For example, the version of the metadata file + `s3://bucket/db/tb/metadata/00001-6c97e413-d51b-4538-ac70-12fe2a85cb83.metadata.json` + is 1. + If the path does not comply with the pattern, the version is defaulted to be -1, ensuring + that the next metadata file is treated as having version 0. + + Args: + metadata_location (str): The location of the metadata file. + + Returns: + int: The version of the metadata file. -1 if the file name does not have valid version string + """ + file_name = metadata_location.split("/")[-1] Review Comment: Similar reason as above, since this is an internal method, I think we can rely on the type-check to ensure `metadata_location` is not `None`. But please let me know if you think type-check is not enough. ########## pyiceberg/catalog/__init__.py: ########## @@ -74,6 +75,8 @@ LOCATION = "location" EXTERNAL_TABLE = "EXTERNAL_TABLE" +TABLE_METADATA_FILE_NAME_REGEX = re.compile(r"""(\d+)-.*\.json""", re.X) Review Comment: Thanks for the suggestion. I've updated the regex to capture the UUID. Although in java we do not check the UUID validity: https://github.com/apache/iceberg/blob/81bf8d30766b1b129b87abde15239645cb127046/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L382-L404 I think it make sense to add the check here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org