bdice opened a new issue, #34944:
URL: https://github.com/apache/arrow/issues/34944
### Describe the bug, including details regarding any error messages,
version, and platform.
I worked with @shwina recently on a problem we saw in cudf, and we
identified a bug in PyArrow. The bug can be reproduced (but only
intermittently) with the following snippet:
```python
import pyarrow as pa
class A:
def __getitem__(self, key):
return 3
pa.array(A())
```
I can run this snippet under `pdb` by inserting `breakpoint()` before
`pa.array(A())` and then continuing from the breakpoint by pressing `c`. With
`pdb`, I consistently get errors like:
```
WARNING: Logging before InitGoogleLogging() is written to STDERR
F20230406 15:12:07.595571 153661 inference.cc:348] Check failed: _s.ok()
Operation failed: internal::ImportDecimalType(&decimal_type_)
Bad status: Unknown error: <built-in function __import__> returned a result
with an exception set. Detail: Python exception: SystemError
*** Check failure stack trace: ***
```
Without `pdb`, the error is _sometimes_ this one (which I expect) and
_sometimes_ the crash shown above.
```python
Traceback (most recent call last):
File "/home/bdice/issue.py", line 16, in <module>
pa.array(A())
File "pyarrow/array.pxi", line 320, in pyarrow.lib.array
File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 144, in
pyarrow.lib.pyarrow_internal_check_status
TypeError: object of type 'A' has no len()
```
We want a `TypeError` to be raised rather than getting the crash above under
`pdb`. The crash is reproducible without `pdb` if the snippet is run
repeatedly. This might suggest some kind of memory corruption is happening
behind the scenes.
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]