etseidl commented on code in PR #9477:
URL: https://github.com/apache/arrow-rs/pull/9477#discussion_r2879683099


##########
parquet/src/encodings/decoding.rs:
##########
@@ -770,15 +770,44 @@ where
 
             // At this point we have read the deltas to `buffer` we now need 
to offset
             // these to get back to the original values that were encoded
-            for v in &mut buffer[read..read + batch_read] {
-                // It is OK for deltas to contain "overflowed" values after 
encoding,
-                // e.g. i64::MAX - i64::MIN, so we use `wrapping_add` to 
"overflow" again and
-                // restore original value.
-                *v = v
-                    .wrapping_add(&self.min_delta)
-                    .wrapping_add(&self.last_value);
-
-                self.last_value = *v;
+            //
+            // Optimization: if the bit_width for the miniblock is 0, then we 
can employ
+            // a faster decoding method than setting `value[i] = value[i-1] + 
value[i] + min_delta`.
+            // Where min_delta is 0 (all values in the miniblock are the 
same), we can simply
+            // set all values to `self.last_value`. In the case of non-zero 
min_delta (values
+            // in the mini-block form an arithmetic progression) each value 
can be computed via
+            // `value[i] = (i + 1) * min_delta + last_value`. In both cases we 
remove the
+            // dependence on the preceding value.
+            // Kudos to @pitrou for the idea 
https://github.com/apache/arrow/pull/49296
+            if bit_width == 0 {
+                let min_delta = self.min_delta.as_i64()?;
+                if min_delta == 0 {

Review Comment:
   Done. It's good for 5-10% on my laptop. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to