RussellSpitzer edited a comment on pull request #3983: URL: https://github.com/apache/iceberg/pull/3983#issuecomment-1045086515
``` ZOrder Sort4Columns with only 8 Bytes of the String Considered when making the interleave column Iteration 1: 298.384 s/op Iteration 2: 316.174 s/op Iteration 3: 312.037 s/op For Reference sort with 4 Columns from the previous run Sort4 Columns Iteration 1: 234.163 s/op Iteration 2: 226.640 s/op Iteration 3: 418.113 s/op ``` The dominating factor here is I'm pretty sure just the additional amount of bytes that need to be serialized back and forth and the cost of the interleave column, I don't think the sort comparison is really that significant compared to those costs One more test, Forcing the ZOrder Column into exactly 8 Bytes (essentially once 8 bytes have been contributed the row is done regardless of how many bytes are in the input columns. ``` ZOrder Sort4 Columns but the ZValue is only allowed to be 8 bytes long all other information is discarded Iteration 1: 284.234 s/op Iteration 2: 274.458 s/op Iteration 3: 278.901 s/op ``` @rdblue added in a version where we have the ZOrder function return a Long instead of a byte array. So this is basically the same as the above test which only uses a max of 8 interleaved bytes. Then uses ByteBuffer.pos(0).getLong to place this value into a Spark Column. It ends up being slower than the byte comparison above. I don't know why. ``` Iteration 1: 317.637 s/op Iteration 2: 311.544 s/op Iteration 3: 303.514 s/op ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
