AlenkaF commented on issue #36388:
URL: https://github.com/apache/arrow/issues/36388#issuecomment-1623362606

   Oh sorry, the last link to the C++ is wrong. I meant to add this:
   
   
https://github.com/apache/arrow/blob/fb8760e4d749c718d95fa784600f51e8b6fd2f43/cpp/src/arrow/array/util.cc#L851
   
   After talking to @jorisvandenbossche about this issue I would like to add a 
proposed fix for it in the C++ here (contributions welcome).
   
   The _Negative offsets in binary array_ message is coming from 
`CreateOffsetsBuffer`:
   
   
https://github.com/apache/arrow/blob/fb8760e4d749c718d95fa784600f51e8b6fd2f43/cpp/src/arrow/array/util.cc#L808-L817
   
   where the `value_length` * (the number of repetitions) exceeds the `int64` 
limit.
   
   We could add a check for the overflow in the `RepeatedArrayFactory` for 
binary type:
   
   
https://github.com/apache/arrow/blob/fb8760e4d749c718d95fa784600f51e8b6fd2f43/cpp/src/arrow/array/util.cc#L638-L649
   
   using `MultiplyWithOverflow`, something similar to what we do here:
   
   
https://github.com/apache/arrow/blob/fb8760e4d749c718d95fa784600f51e8b6fd2f43/python/pyarrow/src/arrow/python/datetime.h#L162-L163


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to