[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6461: Spark-3.3: Store sort-order-id in manifest_entry's data_file

GitBox Tue, 20 Dec 2022 23:17:04 -0800


ajantha-bhat commented on code in PR #6461:
URL: https://github.com/apache/iceberg/pull/6461#discussion_r1054051036



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -654,6 +654,7 @@ public DataWriter<InternalRow> createWriter(int 
partitionId, long taskId, long e
               .dataFileFormat(format)
               .dataSchema(writeSchema)
               .dataSparkType(dsSchema)
+              .dataSortOrder(table.sortOrder())

Review Comment:
   I have updated it to handle "write.distribution-mode" = None case.
   
   I also observed that the testcase I added is using `range` distribution-mode.
   
   I didn't find whether `hash` distribution-mode will use sortkey. Because 
table property documentation doesn't mention about it. 
   
   > Defines distribution of write data: none: don’t shuffle rows; hash: hash 
distribute by partition key ; range: range distribute by partition key or sort 
key if table has an SortOrder
   
   Meanwhile, I am trying to read about this. Please let me know if you know of 
any other cases where data won't be sorted when the sort order is configured. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6461: Spark-3.3: Store sort-order-id in manifest_entry's data_file

Reply via email to