Re: [PR] Add support for DELTA_BINARY_PACKED Parquet encoding [iceberg]


eric-maynard commented on PR #13391:
URL: https://github.com/apache/iceberg/pull/13391#issuecomment-3046039514


   @RussellSpitzer absolutely, within `VectorizedDeltaEncodedValuesReader` most 
of the divergences should be related to the different ways that Iceberg and 
Spark want to actually handle the decoded values. So compare the following 
pairs of code pointers:
   1. 
[Iceberg](https://github.com/apache/iceberg/pull/13391/files#diff-00cc677b0a78396297d8f924c78cc867a5e3301bbd75095d0983f8544caa145eR162)
 / 
[Spark](https://github.com/apache/spark/blob/branch-4.0/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedDeltaBinaryPackedReader.java#L227)
   2. 
[Iceberg](https://github.com/apache/iceberg/pull/13391/files#diff-00cc677b0a78396297d8f924c78cc867a5e3301bbd75095d0983f8544caa145eR263)
 / 
[Spark](https://github.com/apache/spark/blob/46b6ccbd93c4fe5c2b72f730a776a2739bdbc7b4/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedValuesReader.java#L78)
   
   Besides this, we _lack_ support for some Spark features like skipping or 
Spark's `readIntegersWithRebase`, so lots of code is removed.
   
   Beyond `VectorizedDeltaEncodedValuesReader` itself, only small changes are 
needed to actually plug `VectorizedDeltaEncodedValuesReader` into our reader 
stack, which already diverges from Spark's.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for DELTA_BINARY_PACKED Parquet encoding [iceberg]

Reply via email to