jhorstmann opened a new issue, #7739:
URL: https://github.com/apache/arrow-rs/issues/7739

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   While looking into parquet loading performance, I noticed that the 
definition levels for simple nullable columns often contain very short rle 
runs. These seem to mess with branch prediction when decoding and decrease 
performance.
   
   I managed to reproduce this in a benchmark by creating a custom RleEncoder 
for a bitwidth of 1, which only writes bitpacked runs. With 90% of bits being 
set, these are the results:
   
   ```
   rle_decoder/decode_bitpacked
                           time:   [121.83 µs 121.98 µs 122.13 µs]
                           thrpt:  [8.5857 Gelem/s 8.5966 Gelem/s 8.6068 
Gelem/s]
   rle_decoder/decode_hybrid
                           time:   [661.33 µs 662.31 µs 663.56 µs]
                           thrpt:  [1.5802 Gelem/s 1.5832 Gelem/s 1.5856 
Gelem/s]
   ```
   
   Note the **decoder** in this benchmark is the same for both cases, the only 
difference is how the levels are serialized.
   
   I also hacked some statistics into the decoder. For the above benchmark 
these are the distributions of rle vs bitpacked encoded runs. On average, the 
rle runs are only 16 elements long, and the bitpacked only encoding even seems 
to be shorter:
   
   ```
   [parquet/benches/rle.rs:292:9] bitpacked_data.len() = 135169
   [parquet/benches/rle.rs:292:9] &bitpacked_stats = HybridRleStats {
       num_values: 1048576,
       num_rle_values: 0,
       num_rle_runs: 0,
       num_bitpacked_values: 1048576,
       num_bitpacked_runs: 2048,
   }
   [parquet/benches/rle.rs:296:9] hybrid_data.len() = 157050
   [parquet/benches/rle.rs:296:9] &hybrid_stats = HybridRleStats {
       num_values: 1048576,
       num_rle_values: 501915,
       num_rle_runs: 29541,
       num_bitpacked_values: 546661,
       num_bitpacked_runs: 29541,
   }
   ```
   
   
   **Describe the solution you'd like**
   
   I'd like the RleEncoder to be smarter about switching between bitpacked and 
rle runs. Maybe this is a one-line change in 
[`encodings/rle.rs`](https://github.com/apache/arrow-rs/blob/df702cfc714dd25060f6a493570221df18b2c598/parquet/src/encodings/rle.rs#L258),
 but I find the logic there a bit hard to follow. Alternatives could be a 
specialized implementation for single-bit levels, or making this configurable 
for the user.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to