[ 
https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Zhuang reassigned ARROW-3765:
------------------------------------

    Assignee: Siyuan Zhuang

> [Gandiva] Segfault when the validity bitmap has not been allocated
> ------------------------------------------------------------------
>
>                 Key: ARROW-3765
>                 URL: https://issues.apache.org/jira/browse/ARROW-3765
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Gandiva
>            Reporter: Siyuan Zhuang
>            Assignee: Siyuan Zhuang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is because the `validity buffer` could be `None`:
> {code}
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10)))
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [None, <pyarrow.lib.Buffer object at 0x110c1a228>]
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0)
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [<pyarrow.lib.Buffer object at 0x11a2b3030>, <pyarrow.lib.Buffer object at 
> 0x11a2b3228>]{code}
> But Gandiva has not implemented it yet, thus accessing a nullptr:
> {code}
> void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const 
> arrow::ArrayData& array_data, EvalBatch* eval_batch) { 
>     int buffer_idx = 0;
>     // TODO:  
>     // - validity is optional 
>     uint8_t* validity_buf = 
> const_cast<uint8_t*>(array_data.buffers[buffer_idx]->data());
>     eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
>     ++buffer_idx;
> {code}
>  
> Reproduce code:
> {code:java}
> frame_data = np.random.randint(0, 100, size=(2**22, 10))
> table = pa.Table.from_pandas(df)
> filt = ...  # Create any gandiva filter
> r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # 
> segfault{code}
>  Backtrace:
> {code:java}
> * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>  * frame #0: 0x00000001060184fc 
> libarrow.12.dylib`arrow::Buffer::data(this=0x0000000000000000) const at 
> buffer.h:162
>  frame #1: 0x0000000106fbed78 
> libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x0000000100624dc8,
>  desc=0x000000010101e138, array_data=0x000000010061f8e8, 
> eval_batch=0x0000000100796848) at annotator.cc:65
>  frame #2: 0x0000000106fbf4ed 
> libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x0000000100624dc8,
>  record_batch=0x00000001007a45b8, out_vector=size=1) at annotator.cc:94
>  frame #3: 0x00000001071449b7 
> libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x0000000100624da0, 
> record_batch=0x00000001007a45b8, output_vector=size=1) at 
> llvm_generator.cc:102
>  frame #4: 0x0000000107059a4f 
> libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x000000010079c668, 
> batch=0x00000001007a45b8, 
> out_selection=std::__1::shared_ptr<gandiva::SelectionVector>::element_type @ 
> 0x00000001007a43e8 strong=2 weak=1) at filter.cc:106
>  frame #5: 0x000000010948e002 
> gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*,
>  _object*, _object*) + 1986
>  frame #6: 0x0000000100140e8b Python`_PyCFunction_FastCallDict + 475
>  frame #7: 0x00000001001d28ca Python`call_function + 602
>  frame #8: 0x00000001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #9: 0x00000001001d3cf9 Python`fast_function + 569
>  frame #10: 0x00000001001d2899 Python`call_function + 553
>  frame #11: 0x00000001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #12: 0x00000001001d34c6 Python`_PyEval_EvalCodeWithName + 2902
>  frame #13: 0x00000001001c96e0 Python`PyEval_EvalCode + 48
>  frame #14: 0x00000001002029ae Python`PyRun_FileExFlags + 174
>  frame #15: 0x0000000100201f75 Python`PyRun_SimpleFileExFlags + 277
>  frame #16: 0x000000010021ef46 Python`Py_Main + 3558
>  frame #17: 0x0000000100000e08 Python`___lldb_unnamed_symbol1$$Python + 248
>  frame #18: 0x00007fff6ea72085 libdyld.dylib`start + 1{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to