pgaref edited a comment on pull request #586:
URL: https://github.com/apache/orc/pull/586#issuecomment-752990616


   Apologies for the slight delay but this got more complicated than I expected.
   While testing the newly added [Java 
test](https://github.com/apache/orc/pull/586/files#diff-8f851a9e2ba9c8fd9f357a9df6482df5c264fdfdbb0e7625e9b2fe71ca9366a7R1226)
 I realised that the values were encoded using **DIRECT** encoding instead of 
the expected **PATCHED_BASE**.
   
   The reason for picking **DIRECT** encoding was that some of the values 
exceed the 
[**BASE_VALUE_LIMIT**](https://github.com/apache/orc/blob/f2201f396cdd0c4e5b15589ec499e477235a553c/java/core/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java#L537)
 that was introduced as part of ORC-616 (setting an upper limit to 56 bits 
numbers).
   
   More precisely, **RunLengthIntegerWriterV2** checks if Math.abs(min) -- 
where min is the min input number of the sequence -- is greater than or equal 
to (1 << 56) bits -- to make sure the value of baseBytes is never above 8.  
Seems that this check was never introduced to the Cpp side of things.
   
   That said, I would recommend to port the above logic to the Cpp library to 
keep things consistent.
   @chaoyli  please let me know if you want to make the change and again thanks 
for the patience :) 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to