saitharun15 commented on code in PR #13892: URL: https://github.com/apache/iceberg/pull/13892#discussion_r2293024797
########## docs/docs/spark-procedures.md: ########## @@ -731,10 +731,13 @@ Creates a catalog entry for a metadata.json file which already exists but does n #### Usage -| Argument Name | Required? | Type | Description | -|---------------|-----------|------|-------------| -| `table` | ✔️ | string | Table which is to be registered | -| `metadata_file`| ✔️ | string | Metadata file which is to be registered as a new catalog identifier | +| Argument Name | Required? | Type | Description | +|---------------|-----------|------|---------------------------------------------------------------------------------------------------------------------------------------------| +| `table` | ✔️ | string | Table which is to be registered | +| `metadata_file`| ✔️ | string | Metadata file which is to be registered as a new catalog identifier. If metadata folder is provided, the latest metadata file will be used. | + +When a metadata folder is provided as input, the procedure finds the latest metadata file by selecting the one with the highest version number. Review Comment: Hi @singhpk234 @ebyhr thanks for the feedback. The motivation is to improve the user experience by allowing the procedure to accept a directory path, which is often easier for users to identify than a specific metadata file. > picking the highest version number is not correct, what if this belongs to an attempt which never got committed? This behavior is consistent with the existing register_table implementation in [Trino](https://github.com/trinodb/trino/blob/8d84e536758612b74f5dd437f1d5307e43e4931c/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java#L1159)/ [Presto](https://github.com/kiersten-stokes/presto/blob/331e3d8df4a18745a8642dcc8ddbf88d0405a2ca/presto-iceberg/src/main/java/com/facebook/presto/iceberg/RegisterTableProcedure.java#L136), which also selects the highest version. To address the concern with uncommitted files, one possible approach is for Hadoop-type tables, we could rely on version-hint.text to fetch the latest metadata file reliably. For Hive-type tables, however, we might still need to fall back to checking version numbers when registering. Looking forward to hear your thoughts and feedback on this approach. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org