zratkai commented on code in PR #5541:
URL: https://github.com/apache/hive/pull/5541#discussion_r1875869433
##########
parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:
##########
@@ -1840,6 +1841,14 @@ tableImplBuckets
-> ^(TOK_ALTERTABLE_BUCKETS $num)
;
+tableWriteOrdered
+@init { pushMsg("table sorted specification", state); }
+@after { popMsg(state); }
+ :
+ KW_WRITE KW_ORDERED KW_BY sortCols=columnNameOrderList
Review Comment:
@zhangbutao , @okumin , @deniskuzZ
First of all, thanks for your comments! This syntax comes from the official
Iceberg recommendation, which is for alter table, and to be consistend I
modified the ALTER to CREATE in the syntax, so at CREATE table it is the same
as ALTER TABLE:
[https://iceberg.apache.org/docs/1.7.0/spark-ddl/#alter-table-write-ordered-by]
My plan is to work as a next step on ALTER TABLE after this PR is done. I
already created the ticket for that at the time this ticket was born:
[https://issues.apache.org/jira/browse/HIVE-28587]
About table property idea and comparisms:
The iceberg syntax defines multiple keywords in the above page, not just the
colum names:
"ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC NULLS LAST, id
DESC NULLS FIRST"
The mentioned examples do not cover all of that. E.g. I could not find in
the linked Trino documentation how the ordering is defined (ASC/DESC) or the
null ordering (NULLS FIRST/LAST). With that the table property solution would
look like ugly and hard to remember and write correctly for the user.
Since it is passedas a table property deep in the code here, it works with
table property as well with this commit :
e.g.:
create table ice_orc_sorted (id int, text string) stored by iceberg stored
as orc
TBLPROPERTIES('iceberg.write-order'='{"order-id":1,"fields":[{"transform":"identity","source-id":1,"direction":"asc","null-order":"nulls-last"}]}');
The syntax comes from Iceberg built in format for sorted order, but I do not
find it user friendly and I do not recommend to go that way.
About SORTED BY/ SORT BY/LOCALSORT BY: as you already mentioned it would be
confusing, and the "ALTER/CREATE TABLE prod.db.sample WRITE ORDERED BY category
ASC NULLS LAST, id DESC NULLS FIRST" is recommended by Iceberg, and easy to
understand and remember.
So every DB engine has it's own syntax for this purpose and **neither is
following Iceberg recommendation except this one.**
As far as I found Spark does not even support yet sort order at alter table:
https://spark.apache.org/docs/3.5.3/sql-ref-syntax-ddl-alter-table.html#content
https://spark.apache.org/docs/3.5.3/sql-ref-syntax-ddl-create-table.html
At the linked iceberg discussion even Ryan Blue said this:
"No, I don't think so. These aren't table properties... "
And someone else said:
" I don't think using TBLPROPERTIES is a good idea..."
I hope I answered all the questions!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]