`pyarrow` (version `0.10.0`) appears to crash sporadically with a segmentation 
fault when reading parquet files if it is used in a program where `torch` is 
imported first.

A self-contained example is available here: 
https://gitlab.com/ostrokach/pyarrow_pytorch_segfault.

Basically, running

```bash
python -X faulthandler -c "import torch; import pyarrow.parquet as pq; _ = 
pq.ParquetFile('example.parquet').read_row_group(0)"
```

sooner or later results in a segfault:

```python-traceback
Fatal Python error: Segmentation fault

Current thread 0x00007f52959bb740 (most recent call first):
  File 
"/home/kimlab1/strokach/anaconda/lib/python3.6/site-packages/pyarrow/parquet.py",
 line 125 in read_row_group
  File "<string>", line 1 in <module>
./test_fail.sh: line 5: 42612 Segmentation fault      (core dumped) python -X 
faulthandler -c "import torch; import pyarrow.parquet as pq; _ = 
pq.ParquetFile('example.parquet').read_row_group(0)"
```

 The number of iterations before a segfault varies, but it usually happens 
within the first several calls.

Running 

```bash
python -X faulthandler -c "import pyarrow.parquet as pq; import torch; _ = 
pq.ParquetFile('example.parquet').read_row_group(0)"
```

works without a problem.

 

[ Full content available at: https://github.com/apache/arrow/issues/2637 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to