[jira] [Commented] (ARROW-2082) [Python] SegFault in pyarrow.parquet.write_table with specific options

Wes McKinney (JIRA) Mon, 12 Mar 2018 21:07:28 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396514#comment-16396514
 ]


Wes McKinney commented on ARROW-2082:
-------------------------------------

Here's the backtrace for this:

{code}
#0  0x00007fffece34769 in arrow::PoolBuffer::Reserve (this=0x139c180, 
capacity=1024) at ../src/arrow/buffer.cc:101
#1  0x00007fffece34b2f in arrow::PoolBuffer::Resize (this=0x139c180, 
new_size=1024, shrink_to_fit=true) at ../src/arrow/buffer.cc:112
#2  0x00007fffcb5fc506 in parquet::AllocateBuffer (pool=0x7fffed519300 
<completed>, size=1024) at ../src/parquet/util/memory.cc:501
#3  0x00007fffcb5fc75e in parquet::InMemoryOutputStream::InMemoryOutputStream 
(this=0x1487090, pool=0x7fffed519300 <completed>, initial_capacity=1024) at 
../src/parquet/util/memory.cc:423
#4  0x00007fffcb5335ca in 
parquet::PlainEncoder<parquet::DataType<(parquet::Type::type)2> >::PlainEncoder 
(this=0x7fffffff9170, descr=0x1104060, pool=0x7fffed519300 <completed>)
    at ../src/parquet/encoding-internal.h:188
#5  0x00007fffcb5defa2 in 
parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2> 
>::PlainEncode (this=0xbbee60, src=@0xbbeec8: -729020189051312384, 
dst=0x7fffffff9258)
    at ../src/parquet/statistics.cc:228
#6  0x00007fffcb5def07 in 
parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2> 
>::EncodeMin (this=0xbbee60) at ../src/parquet/statistics.cc:204
#7  0x00007fffcb5df1c3 in 
parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2> 
>::Encode (this=0xbbee60) at ../src/parquet/statistics.cc:219
#8  0x00007fffcb5348f7 in 
parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2> 
>::GetPageStatistics (this=0x81d2b0) at ../src/parquet/column_writer.cc:520
#9  0x00007fffcb52ca76 in parquet::ColumnWriter::AddDataPage (this=0x81d2b0) at 
../src/parquet/column_writer.cc:386
#10 0x00007fffcb52c0eb in parquet::ColumnWriter::FlushBufferedDataPages 
(this=0x81d2b0) at ../src/parquet/column_writer.cc:447
#11 0x00007fffcb52ddb0 in parquet::ColumnWriter::Close (this=0x81d2b0) at 
../src/parquet/column_writer.cc:431
#12 0x00007fffcb4d6657 in parquet::arrow::(anonymous 
namespace)::ArrowColumnWriter::Close (this=0x7fffffff9b48) at 
../src/parquet/arrow/writer.cc:347
#13 0x00007fffcb4e758e in parquet::arrow::FileWriter::Impl::WriteColumnChunk 
(this=0x15adee0, data=warning: RTTI symbol not found for class 
'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray, 
std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 
'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray, 
std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 2, weak 0) 0x1717cc0, offset=0, size=5)
    at ../src/parquet/arrow/writer.cc:982
#14 0x00007fffcb4d507b in parquet::arrow::FileWriter::WriteColumnChunk 
(this=0x125bc30, data=warning: RTTI symbol not found for class 
'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray, 
std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 
'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray, 
std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 2, weak 0) 0x1717cc0, offset=0, size=5)
    at ../src/parquet/arrow/writer.cc:1011
#15 0x00007fffcb4d5ba6 in parquet::arrow::FileWriter::WriteTable 
(this=0x125bc30, table=..., chunk_size=5) at ../src/parquet/arrow/writer.cc:1086
{code}

Not sure what's going wrong yet

> [Python] SegFault in pyarrow.parquet.write_table with specific options
> ----------------------------------------------------------------------
>
>                 Key: ARROW-2082
>                 URL: https://issues.apache.org/jira/browse/ARROW-2082
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>         Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu 
> Xenial (Python 3.5)
>            Reporter: Clément Bouscasse
>            Priority: Major
>             Fix For: 0.9.0
>
>
> I originally filed an issue in the pandas project but we've tracked it down 
> to arrow itself, when called via pandas in specific circumstances:
> [https://github.com/pandas-dev/pandas/issues/19493]
> basically using
> {code:java}
>  df.to_parquet('filename.parquet', flavor='spark'){code}
> gives a seg fault if `df` contains a datetime column.
> Under the covers,  pandas translates this to the following call:
> {code:java}
> pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy', 
> coerce_timestamps='ms')
> {code}
> which gives me an instant crash.
> There is a repro on the github ticket.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2082) [Python] SegFault in pyarrow.parquet.write_table with specific options

Reply via email to