raulcd opened a new issue, #49321:
URL: https://github.com/apache/arrow/issues/49321

   ### Describe the enhancement requested
   
   As part of:
   - https://github.com/apache/arrow/issues/36411
   
   The discussion about adding a sanitizers build for PyArrow popped up. I am 
creating this issue to track the discussion and raise it as a separate 
enhancement.
   
   So far the summary of the discussion there:
   
   > I think the main difficulty for a PyArrow sanitizers build is that the 
sanitizer instrumentation should be enabled in CPython as well (and potentially 
NumPy?).
   
    _Originally posted by @pitrou 
[#36411](https://github.com/apache/arrow/issues/36411#issuecomment-3916508307)_
   
   > You may be interested in how numpy & scipy are doing this, in conjunction 
with CPython. That setup uses pixi as a kind of "light-weight conda-build" 
orchestrator that wraps the various rebuilds (independent of whether that's via 
CMake/meson/whatever):
   > * https://github.com/python/cpython/issues/142466
   > * https://github.com/python/cpython/pull/142872
   > * https://github.com/numpy/numpy/pull/30510
   > * https://github.com/scipy/scipy/pull/24066
   > * etc. 
   
    _Originally posted by @h-vetinari in 
[#36411](https://github.com/apache/arrow/issues/36411#issuecomment-3916859990)_
   
   > That's an ideal setup but I don't think its required - you could use point 
LD_PRELOAD to the sanitizer library to have it loaded correctly from a process 
that was not built with sanitizers enabled (i.e. Python). We used to do that in 
CI with pandas, although we did abandon it after time due to it being a 
maintenance burden
   
    _Originally posted by @WillAyd in 
[#36411](https://github.com/apache/arrow/issues/36411#issuecomment-3916925502)_
   
   > Is that enough, though? Ideally, the code is instrumented at compile time 
(memory accesses etc.). For example, if PyArrow passes a bogus memory pointer 
to NumPy, we want ASan to notice and that might not happen if NumPy was not 
compiled with ASan enabled.
   >  
   
    _Originally posted by @pitrou in 
[#36411](https://github.com/apache/arrow/issues/36411#issuecomment-3916949900)_
   
   > Yeah, for ASAN/TSAN, you need to instrument the other relevant libraries, 
which means rebuilding them, which is generally a huge pain, which is why the 
approach I referenced above provides a real benefit. Once all the pieces are in 
place, it comes down to
   > ```
   > pixi run test-asan -t some_test
   > ```
   > which rebuilds (& caches) instrumented cpython, numpy etc. as necessary. I 
haven't been very involved, but the scipy PR contains more details; and I'm 
pretty sure that Lucas wouldn't mind answering questions (not tagged here 
because it's already a bit OT). 
   
    _Originally posted by @h-vetinari in 
[#36411](https://github.com/apache/arrow/issues/36411#issuecomment-3917154312)_
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to