[GitHub] [arrow] tachyonwill opened a new pull request, #13456: PARQUET-2163: Handle decimal schemas with large fixed_len_byte_arrays

GitBox Tue, 28 Jun 2022 15:29:25 -0700


tachyonwill opened a new pull request, #13456:
URL: https://github.com/apache/arrow/pull/13456


   The precision calculation had been overflowing to infinity when the
   length of the fixed_len_byte_array > 128, triggering an error when then
   trying to convert infinity to an int32. We can actually simplify the
   logic by noting that log_b(a^(x)) = log_b(a)*x. This avoids the
   intermediate infinity. We also added a check for extremely large value
   sizes implying a max precision that cannot fit in int32. Even 129 byte
   decimal seems extreme.
   
   The formula Parquet C++ was using is technically incorrect vs the
   Parquet specification. The specification says that the max precision is
   floor(log_10(2^(B*8 -1) - 1)), where the C++ implementation was omitting the
   outer -1. However, this is okay as it is easy to prove that these values
   will always be the same (ignoring the realities of FP arithmetic) & in
   practice all three formulas agree through 128 when using FP.
   
   Bug found through fuzzing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] tachyonwill opened a new pull request, #13456: PARQUET-2163: Handle decimal schemas with large fixed_len_byte_arrays

Reply via email to