pitrou opened a new issue, #47053:
URL: https://github.com/apache/arrow/issues/47053

   ### Describe the enhancement requested
   
   The `ValidateRunEndEncodedChildren` function uses ordered comparisons in two 
places to validate the run-end encoded data:
   
   1. To check there are at least as many values as run ends: 
https://github.com/apache/arrow/blob/8b2336058c1dd5eba3293ab736cfbe8e0c38dc2b/cpp/src/arrow/util/ree_util.cc#L199-L202
   2. To check that the last run-end does not overpass the logical offset and 
length: 
https://github.com/apache/arrow/blob/8b2336058c1dd5eba3293ab736cfbe8e0c38dc2b/cpp/src/arrow/util/ree_util.cc#L218-L224
   
   It seems that the current checks can let through some programming errors. An 
example is https://github.com/apache/arrow/issues/47029 where the JSON C++ 
reader would read the integration data as having logical length 7 even though 
the generated run-ends were much larger.
   
   Is there a reason for not doing equality testing for these checks?
   
   ### Component(s)
   
   C++, Integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to