[GitHub] [iceberg] RussellSpitzer edited a comment on pull request #3983: Spark: Spark3 ZOrder Rewrite Strategy

GitBox Fri, 04 Mar 2022 13:00:32 -0800


RussellSpitzer edited a comment on pull request #3983:
URL: https://github.com/apache/iceberg/pull/3983#issuecomment-1045086515



   ```
   ZOrder Sort4Columns with only 8 Bytes of the String Considered when making 
the interleave column
   Iteration   1: 298.384 s/op
   Iteration   2: 316.174 s/op
   Iteration   3: 312.037 s/op
   
   For Reference sort with 4 Columns from the previous run
   Sort4 Columns
   Iteration   1: 234.163 s/op
   Iteration   2: 226.640 s/op
   Iteration   3: 418.113 s/op
   ```
   
   
   The dominating factor here is I'm pretty sure just the additional amount of 
bytes that need to be serialized back and forth and the cost of the interleave 
column, I don't think the sort comparison is really that significant compared 
to those costs
   
   One more test, Forcing the ZOrder Column into exactly 8 Bytes (essentially 
once 8 bytes have been contributed the row is done regardless of how many bytes 
are in the input columns. 
   
   ```
   ZOrder Sort4 Columns but the ZValue is only allowed to be 8 bytes long all 
other information is discarded
   Iteration   1: 284.234 s/op
   Iteration   2: 274.458 s/op
   Iteration   3: 278.901 s/op
   ```
   
   @rdblue added in a version where we have the ZOrder function return a Long 
instead of a byte array. So this is basically the same as the above test which 
only uses a max of 8 interleaved bytes. Then uses ByteBuffer.pos(0).getLong to 
place this value into a Spark Column. It ends up being slower than the byte 
comparison above. I don't know why.
   
   ```
   Iteration   1: 317.637 s/op
   Iteration   2: 311.544 s/op
   Iteration   3: 303.514 s/op
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer edited a comment on pull request #3983: Spark: Spark3 ZOrder Rewrite Strategy

Reply via email to