okumin commented on code in PR #5541:
URL: https://github.com/apache/hive/pull/5541#discussion_r1881689498
##########
parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:
##########
@@ -1840,6 +1841,14 @@ tableImplBuckets
-> ^(TOK_ALTERTABLE_BUCKETS $num)
;
+tableWriteOrdered
+@init { pushMsg("table sorted specification", state); }
+@after { popMsg(state); }
+ :
+ KW_WRITE KW_ORDERED KW_BY sortCols=columnNameOrderList
Review Comment:
I give +1 to WRITE LOCALLY ORDERED BY.
I double-checked `WRITE ORDERED BY` vs `WRITE LOCALLY ORDERED BY` with Spark
3.5.1 and Iceberg 1.6.1.
```
spark-sql (default)> CREATE TABLE hadoop_prod.default.test2 (a int) USING
iceberg;
Time taken: 0.089 seconds
spark-sql (default)> ALTER TABLE hadoop_prod.default.test2 WRITE ORDERED BY
a;
Time taken: 0.182 seconds
spark-sql (default)> CREATE TABLE hadoop_prod.default.test3 (a int) USING
iceberg;
Time taken: 0.086 seconds
spark-sql (default)> ALTER TABLE hadoop_prod.default.test3 WRITE LOCALLY
ORDERED BY a;
```
This is the diff.
```
zookage@client-node-0:~$ hdfs dfs -cat
/user/hive/warehouse/catalog/default/test2/metadata/v2.metadata.json >
/tmp/test2.json
zookage@client-node-0:~$ hdfs dfs -cat
/user/hive/warehouse/catalog/default/test3/metadata/v2.metadata.json >
/tmp/test3.json
zookage@client-node-0:~$ diff /tmp/test2.json /tmp/test3.json
3,4c3,4
< "table-uuid" : "821c3cc2-1320-45dc-bb2a-c805778caa91",
< "location" :
"hdfs://hdfs-namenode-0.hdfs-namenode:8020/user/hive/warehouse/catalog/default/test2",
---
> "table-uuid" : "094951f1-6229-43cd-bffe-cb2f086b8dda",
> "location" :
"hdfs://hdfs-namenode-0.hdfs-namenode:8020/user/hive/warehouse/catalog/default/test3",
6c6
< "last-updated-ms" : 1733994463428,
---
> "last-updated-ms" : 1733994491871,
40c40
< "write.distribution-mode" : "range",
---
> "write.distribution-mode" : "none",
50,51c50,51
< "timestamp-ms" : 1733994460005,
< "metadata-file" :
"hdfs://hdfs-namenode-0.hdfs-namenode:8020/user/hive/warehouse/catalog/default/test2/metadata/v1.metadata.json"
---
> "timestamp-ms" : 1733994472725,
> "metadata-file" :
"hdfs://hdfs-namenode-0.hdfs-namenode:8020/user/hive/warehouse/catalog/default/test3/metadata/v1.metadata.json"
```
Looks like, the meaningful difference is only
`write.distribution-mode=range` or `write.distribution-mode=none`. I guess
adding LOCALLY makes more sense unless we give `range`.
I am not a specialist of Apache Spark. Please feel free to correct me if I
am wrong.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]