saitharun15 commented on code in PR #13892:
URL: https://github.com/apache/iceberg/pull/13892#discussion_r2293024797


##########
docs/docs/spark-procedures.md:
##########
@@ -731,10 +731,13 @@ Creates a catalog entry for a metadata.json file which 
already exists but does n
 
 #### Usage
 
-| Argument Name | Required? | Type | Description |
-|---------------|-----------|------|-------------|
-| `table`       | ✔️  | string | Table which is to be registered |
-| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a 
new catalog identifier |
+| Argument Name | Required? | Type | Description                               
                                                                                
                  |
+|---------------|-----------|------|---------------------------------------------------------------------------------------------------------------------------------------------|
+| `table`       | ✔️  | string | Table which is to be registered               
                                                                                
              |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a 
new catalog identifier. If metadata folder is provided, the latest metadata 
file will be used. |
+
+When a metadata folder is provided as input, the procedure finds the latest 
metadata file by selecting the one with the highest version number.

Review Comment:
   Hi @singhpk234 @ebyhr thanks for the feedback. The motivation is to improve 
the user experience by allowing the procedure to accept a directory path, which 
is often easier for users to identify than a specific metadata file.
   
   >  picking the highest version number is not correct, what if this belongs 
to an attempt which never got committed?
   
   This behavior is consistent with the existing register_table implementation 
in 
[Trino](https://github.com/trinodb/trino/blob/8d84e536758612b74f5dd437f1d5307e43e4931c/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java#L1159)/
 
[Presto](https://github.com/kiersten-stokes/presto/blob/331e3d8df4a18745a8642dcc8ddbf88d0405a2ca/presto-iceberg/src/main/java/com/facebook/presto/iceberg/RegisterTableProcedure.java#L136),
 which also selects the highest version. 
   To address the concern with uncommitted files, one possible approach is for 
Hadoop-type tables, we could rely on version-hint.text to fetch the latest 
metadata file reliably. For Hive-type tables, however, we might still need to 
fall back to checking version numbers when registering.
   
   Looking forward to hear your thoughts and feedback on this approach. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to