[ 
https://issues.apache.org/jira/browse/ARROW-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164722#comment-17164722
 ] 

Jim Pivarski commented on ARROW-9556:
-------------------------------------

The second case (segfault on construction) is due to top-level nulls, but in 
the first case (segfault on get-item), the nulls are on the leaf nodes.

I'll take a look at the revised format specification, but top-level nulls have 
only been removed from unions, right? (Top-level vs not top-level isn't 
distinguishable on unions, but it would be visible on records or lists.)

> Segfaults in UnionArray with null values
> ----------------------------------------
>
>                 Key: ARROW-9556
>                 URL: https://issues.apache.org/jira/browse/ARROW-9556
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 1.0.0
>         Environment: Conda, but pyarrow was installed using pip (in the conda 
> environment)
>            Reporter: Jim Pivarski
>            Priority: Major
>
> Extracting null values from a UnionArray containing nulls and constructing a 
> UnionArray with a bitmask in pyarrow.Array.from_buffers causes segfaults in 
> pyarrow 1.0.0. I have an environment with pyarrow 0.17.0 and all of the 
> following run correctly without segfaults in the older version.
> Here's a UnionArray that works (because there are no nulls):
>  
> {code:java}
> # GOOD
> a = pyarrow.UnionArray.from_sparse(
>  pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()),
>  [
>  pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4]),
>  pyarrow.array([True, True, False, True, False]),
>  ],
> )
> a.to_pylist(){code}
>  
> Here's one the fails when you try a.to_pylist() or even just a[2], because 
> one of the children has a null at 2:
>  
> {code:java}
> # SEGFAULT
> a = pyarrow.UnionArray.from_sparse(
>  pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()),
>  [
>  pyarrow.array([0.0, 1.1, None, 3.3, 4.4]),
>  pyarrow.array([True, True, False, True, False]),
>  ],
> )
> a.to_pylist() # also just a[2] causes a segfault{code}
>  
> Here's another that fails because both children have nulls; the segfault 
> occurs at both positions with nulls:
>  
> {code:java}
> # SEGFAULT
> a = pyarrow.UnionArray.from_sparse(
>  pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()),
>  [
>  pyarrow.array([0.0, 1.1, None, 3.3, 4.4]),
>  pyarrow.array([True, None, False, True, False]),
>  ],
> )
> a.to_pylist() # also a[1] and a[2] cause segfaults{code}
>  
> Here's one that succeeds, but it's dense, rather than sparse:
>  
> {code:java}
> # GOOD
> a = pyarrow.UnionArray.from_dense(
>  pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()),
>  pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()),
>  [pyarrow.array([0.0, 1.1, 2.2, 3.3]), pyarrow.array([True, True, False])],
> )
> a.to_pylist(){code}
>  
> Here's a dense that fails because one child has a null:
>  
> {code:java}
> # SEGFAULT
> a = pyarrow.UnionArray.from_dense(
>  pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()),
>  pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()),
>  [pyarrow.array([0.0, 1.1, None, 3.3]), pyarrow.array([True, True, False])],
> )
> a.to_pylist() # also just a[3] causes a segfault{code}
>  
> Here's a dense that fails in two positions because both children have a null:
>  
> {code:java}
> # SEGFAULT
> a = pyarrow.UnionArray.from_dense(
>  pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()),
>  pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()),
>  [pyarrow.array([0.0, 1.1, None, 3.3]), pyarrow.array([True, None, False])],
> )
> a.to_pylist() # also a[3] and a[5] cause segfaults{code}
>  
> In all of the above, we created the UnionArray using its from_dense method. 
> We could instead create it with pyarrow.Array.from_buffers. If created with 
> content0 and content1 that have no nulls, it's fine, but if created with 
> nulls in the content, it segfaults as soon as you view the null value.
>  
> {code:java}
> # GOOD
> content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4])
> content1 = pyarrow.array([True, True, False, True, False])
> # SEGFAULT
> content0 = pyarrow.array([0.0, 1.1, 2.2, None, 4.4])
> content1 = pyarrow.array([True, True, False, True, False])
> types = pyarrow.union(
>  [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)],
>  "sparse",
>  [0, 1],
> )
> a = pyarrow.Array.from_buffers(
>  types,
>  5,
>  [
>  None,
>  pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 1], numpy.int8)),
>  ],
>  children=[content0, content1],
> )
> a.to_pylist() # also just a[3] causes a segfault{code}
>  
> Similarly for a dense union.
>  
> {code:java}
> # GOOD
> content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3])
> content1 = pyarrow.array([True, True, False])
> # SEGFAULT
> content0 = pyarrow.array([0.0, 1.1, None, 3.3])
> content1 = pyarrow.array([True, True, False])
> types = pyarrow.union(
>  [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)],
>  "dense",
>  [0, 1],
> )
> a = pyarrow.Array.from_buffers(
>  types,
>  7,
>  [
>  None,
>  pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 0, 1, 1], numpy.int8)),
>  pyarrow.py_buffer(numpy.array([0, 0, 1, 2, 3, 1, 2], numpy.int32)),
>  ],
>  children=[content0, content1],
> )
> a.to_pylist() # also just a[3] causes a segfault{code}
>  
> The next segfaults are different: instead of putting the null values in the 
> content, we put the null value in the UnionArray itself. This time, it 
> segfaults when it is being created. It also prints some output (all of the 
> above were silent segfaults).
>  
> {code:java}
> # SEGFAULT (even to create)
> content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4])
> content1 = pyarrow.array([True, True, False, True, False])
> types = pyarrow.union(
>  [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)],
>  "sparse",
>  [0, 1],
> )
> a = pyarrow.Array.from_buffers(
>  types,
>  5,
>  [
>  pyarrow.py_buffer(numpy.array([251], numpy.uint8)), # (11111011)
>  pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 1], numpy.int8)),
>  # exepct null here -----^
> # None <--- placeholder required in pyarrow 0.17.0, not 1.0.0
>  ],
>  children=[content0, content1],
> )
> # /arrow/cpp/src/arrow/array/array_nested.cc:617: Check failed: 
> (data_->buffers[0]) == (nullptr) 
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(+0x4e9938)[0x7feea9937938]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow4util8ArrowLogD1Ev+0xdd)[0x7feea993814d]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow16SparseUnionArray7SetDataESt10shared_ptrINS_9ArrayDataEE+0x144)[0x7feea9a869a4]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow16SparseUnionArrayC1ESt10shared_ptrINS_9ArrayDataEE+0x5a)[0x7feea9a86a2a]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15VisitTypeInlineINS_8internal16ArrayDataWrapperEEENS_6StatusERKNS_8DataTypeEPT_+0x9fc)[0x7feea9a5145c]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x3f)[0x7feea9a2698f]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1c7853)[0x7feeaa998853]
> # python(+0x13af9e)[0x56146ee77f9e]
> # python(_PyObject_MakeTpCall+0x3bf)[0x56146ee6d30f]
> # python(_PyEval_EvalFrameDefault+0x5452)[0x56146ef20602]
> # python(_PyEval_EvalCodeWithName+0x260)[0x56146ef06190]
> # python(PyEval_EvalCode+0x23)[0x56146ef07a03]
> # python(+0x23e2f2)[0x56146ef7b2f2]
> # python(+0x251082)[0x56146ef8e082]
> # python(+0x1063b9)[0x56146ee433b9]
> # python(PyRun_InteractiveLoopFlags+0xea)[0x56146ee43559]
> # python(+0x1065f3)[0x56146ee435f3]
> # python(+0x106817)[0x56146ee43817]
> # python(Py_BytesMain+0x39)[0x56146ef91a19]
> # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7feeac198b97]
> # python(+0x1f8807)[0x56146ef35807]
> # Aborted (core dumped)
> {code}
>  
> And similarly for dense.
>  
> {code:java}
> # SEGFAULT (even to create)
> content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3])
> content1 = pyarrow.array([True, True, False])
> types = pyarrow.union(
>  [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)],
>  "dense",
>  [0, 1],
> )
> a = pyarrow.Array.from_buffers(
>  types,
>  7,
>  [
>  pyarrow.py_buffer(numpy.array([251], numpy.uint8)), # (11111011)
>  pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 0, 1, 1], numpy.int8)),
>  pyarrow.py_buffer(numpy.array([0, 0, 1, 2, 3, 1, 2], numpy.int32)),
>  # exepct null here -----^
>  ],
>  children=[content0, content1],
> )
> # /arrow/cpp/src/arrow/array/array_nested.cc:627: Check failed: 
> (data_->buffers[0]) == (nullptr) 
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(+0x4e9938)[0x7f2fb6ad7938]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow4util8ArrowLogD1Ev+0xdd)[0x7f2fb6ad814d]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15DenseUnionArray7SetDataERKSt10shared_ptrINS_9ArrayDataEE+0x174)[0x7f2fb6c274a4]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15DenseUnionArrayC2ERKSt10shared_ptrINS_9ArrayDataEE+0x44)[0x7f2fb6c27524]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15VisitTypeInlineINS_8internal16ArrayDataWrapperEEENS_6StatusERKNS_8DataTypeEPT_+0xb14)[0x7f2fb6bf1574]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x3f)[0x7f2fb6bc698f]
> # 
> /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1c7853)[0x7f2fb7b38853]
> # python(+0x13af9e)[0x558cf09edf9e]
> # python(_PyObject_MakeTpCall+0x3bf)[0x558cf09e330f]
> # python(_PyEval_EvalFrameDefault+0x5452)[0x558cf0a96602]
> # python(_PyEval_EvalCodeWithName+0x260)[0x558cf0a7c190]
> # python(PyEval_EvalCode+0x23)[0x558cf0a7da03]
> # python(+0x23e2f2)[0x558cf0af12f2]
> # python(+0x251082)[0x558cf0b04082]
> # python(+0x1063b9)[0x558cf09b93b9]
> # python(PyRun_InteractiveLoopFlags+0xea)[0x558cf09b9559]
> # python(+0x1065f3)[0x558cf09b95f3]
> # python(+0x106817)[0x558cf09b9817]
> # python(Py_BytesMain+0x39)[0x558cf0b07a19]
> # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f2fb9338b97]
> # python(+0x1f8807)[0x558cf0aab807]
> # Aborted (core dumped){code}
>  
> It might be two distinct bugs, but they're both related to UnionArrays and 
> nulls, and they're both newer than 0.17.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to