AlenkaF commented on issue #49740:
URL: https://github.com/apache/arrow/issues/49740#issuecomment-4295607756

   Thank you for opening the issue!
   I can reproduce the segfault locally, the repr provided is very clear. I did 
a bit of looking into the issue and asked Copilot for some help. What I found 
was that the segfault only happens for short strings and works for long 
strings, for example:
   
   ```python
   In [2]: long = pa.chunked_array([
      ...:     pa.array([b'a' * 13, b'e' * 13], type=pa.binary()),
      ...:     pa.array([b'a' * 13, b'e' * 13], type=pa.binary()),
      ...: ]).combine_chunks().cast(pa.binary_view())
   
   In [3]: long._export_to_c(ctypes.addressof(c_array), 
ctypes.addressof(c_schema))
      ...: print("Long strings: OK")
      ...: 
   Long strings: OK
   ```
   
   ```python
   In [4]: short = pa.chunked_array([
      ...:     pa.array([b'a', b'e'], type=pa.binary()),
      ...:     pa.array([b'a', b'e'], type=pa.binary()),
      ...: ]).combine_chunks().cast(pa.binary_view())
   
   In [5]: short._export_to_c(ctypes.addressof(c_array), 
ctypes.addressof(c_schema))
      ...: print("Short strings: OK")  # never reached
   [1]    18460 segmentation fault  ipython
   ```
   
   Copilot is pointing out that the bug is in the `scalar_cast_string.cc` 
implementation where the extra data buffer is dropped because 
`all_entries_are_inline` is `True`
   
   
https://github.com/apache/arrow/blob/e8b7b4e35e231a0fcdbfa74f6a6b0075108dd5dc/cpp/src/arrow/compute/kernels/scalar_cast_string.cc#L465-L467
   
   So this would be a bug in the cast kernel. We might also update the bridge 
file 
   
https://github.com/apache/arrow/blob/e8b7b4e35e231a0fcdbfa74f6a6b0075108dd5dc/cpp/src/arrow/c/bridge.cc#L606-L614
   guarding against null-pointer variadic buffers.
   
   cc @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to