Yevgeni Litvin created ARROW-5260:
-------------------------------------
Summary: [Python][C++] Crash when deserializing from components in
a fresh new process
Key: ARROW-5260
URL: https://issues.apache.org/jira/browse/ARROW-5260
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 0.12.1, 0.13.0, 0.12.0
Reporter: Yevgeni Litvin
Trying to deserialize a table from component in a fresh new process crashes
with sigsegv:
{noformat}
#1 0x00007fffd5eb93f0 in arrow::py::unwrap_buffer(_object*,
std::shared_ptr<arrow::Buffer>*) ()
from
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
#2 0x00007fffd5e69260 in arrow::py::GetSerializedFromComponents(int, int, int,
_object*, arrow::py::SerializedPyObject*) () from
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
#3 0x00007fffd6b1cafe in
__pyx_pw_7pyarrow_3lib_18SerializedPyObject_7from_components(_object*,
_object*, _object*) () from
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#4 0x00000000004ad919 in PyCFunction_Call ()
#5 0x00007fffd6a88d10 in __Pyx_PyObject_Call(_object*, _object*, _object*)
[clone .constprop.1186] ()
from
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#6 0x00007fffd6a41872 in __Pyx__PyObject_CallOneArg(_object*, _object*) ()
from
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#7 0x00007fffd6a89e59 in __Pyx_PyObject_CallOneArg(_object*, _object*) ()
from
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#8 0x00007fffd6ab087f in
__pyx_pw_7pyarrow_3lib_165deserialize_components(_object*, _object*, _object*)
()
from
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#9 0x00000000004adca7 in _PyCFunction_FastCallKeywords ()
#10 0x0000000000545e34 in ?? ()
#11 0x000000000054ac8c in _PyEval_EvalFrameDefault ()
#12 0x0000000000545a51 in ?? ()
#13 0x0000000000546890 in PyEval_EvalCode ()
#14 0x000000000042a9a8 in PyRun_FileExFlags ()
#15 0x000000000042ab8d in PyRun_SimpleFileExFlags ()
#16 0x000000000043e0ba in Py_Main ()
#17 0x0000000000421b04 in main ()
{noformat}
The following snippet can be used to reproduce the issue:
{code:java}
import pickle
import sys
import pandas as pd
import pyarrow as pa
if __name__ == '__main__':
if sys.argv[1] == 'w':
df = pd.DataFrame({'int': [1, 2], 'str': ['a', 'b']})
table = pa.Table.from_pandas(df)
table_serialized = pa.serialize(table)
table_serialized_components = table_serialized.to_components()
with open('/tmp/p.pickle', 'wb') as f:
pickle.dump(table_serialized_components, f)
print('/tmp/p.pickle written ok')
if sys.argv[1] == 'r':
# UNCOMMENT THE FOLLOWING LINE TO AVOID THE CRASH
#pa.serialize(0)
with open('/tmp/p.pickle', 'rb') as f:
table_serialized_components = pickle.load(f)
table = pa.deserialize_components(table_serialized_components)
print(table)
{code}
Then run:
{code:java}
$ python pa_serialization_crashes.py w
/tmp/p.pickle written ok
$ python pa_serialization_crashes.py r
Segmentation fault (core dumped){code}
The crash would not occur if you try to serialize unrelated data before the
deserialization (see a commented out line in the reproduction instructions)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)