[ https://issues.apache.org/jira/browse/ARROW-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396514#comment-16396514 ]
Wes McKinney commented on ARROW-2082: ------------------------------------- Here's the backtrace for this: {code} #0 0x00007fffece34769 in arrow::PoolBuffer::Reserve (this=0x139c180, capacity=1024) at ../src/arrow/buffer.cc:101 #1 0x00007fffece34b2f in arrow::PoolBuffer::Resize (this=0x139c180, new_size=1024, shrink_to_fit=true) at ../src/arrow/buffer.cc:112 #2 0x00007fffcb5fc506 in parquet::AllocateBuffer (pool=0x7fffed519300 <completed>, size=1024) at ../src/parquet/util/memory.cc:501 #3 0x00007fffcb5fc75e in parquet::InMemoryOutputStream::InMemoryOutputStream (this=0x1487090, pool=0x7fffed519300 <completed>, initial_capacity=1024) at ../src/parquet/util/memory.cc:423 #4 0x00007fffcb5335ca in parquet::PlainEncoder<parquet::DataType<(parquet::Type::type)2> >::PlainEncoder (this=0x7fffffff9170, descr=0x1104060, pool=0x7fffed519300 <completed>) at ../src/parquet/encoding-internal.h:188 #5 0x00007fffcb5defa2 in parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2> >::PlainEncode (this=0xbbee60, src=@0xbbeec8: -729020189051312384, dst=0x7fffffff9258) at ../src/parquet/statistics.cc:228 #6 0x00007fffcb5def07 in parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2> >::EncodeMin (this=0xbbee60) at ../src/parquet/statistics.cc:204 #7 0x00007fffcb5df1c3 in parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2> >::Encode (this=0xbbee60) at ../src/parquet/statistics.cc:219 #8 0x00007fffcb5348f7 in parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2> >::GetPageStatistics (this=0x81d2b0) at ../src/parquet/column_writer.cc:520 #9 0x00007fffcb52ca76 in parquet::ColumnWriter::AddDataPage (this=0x81d2b0) at ../src/parquet/column_writer.cc:386 #10 0x00007fffcb52c0eb in parquet::ColumnWriter::FlushBufferedDataPages (this=0x81d2b0) at ../src/parquet/column_writer.cc:447 #11 0x00007fffcb52ddb0 in parquet::ColumnWriter::Close (this=0x81d2b0) at ../src/parquet/column_writer.cc:431 #12 0x00007fffcb4d6657 in parquet::arrow::(anonymous namespace)::ArrowColumnWriter::Close (this=0x7fffffff9b48) at ../src/parquet/arrow/writer.cc:347 #13 0x00007fffcb4e758e in parquet::arrow::FileWriter::Impl::WriteColumnChunk (this=0x15adee0, data=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray, std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>' warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray, std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>' std::shared_ptr (count 2, weak 0) 0x1717cc0, offset=0, size=5) at ../src/parquet/arrow/writer.cc:982 #14 0x00007fffcb4d507b in parquet::arrow::FileWriter::WriteColumnChunk (this=0x125bc30, data=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray, std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>' warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray, std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>' std::shared_ptr (count 2, weak 0) 0x1717cc0, offset=0, size=5) at ../src/parquet/arrow/writer.cc:1011 #15 0x00007fffcb4d5ba6 in parquet::arrow::FileWriter::WriteTable (this=0x125bc30, table=..., chunk_size=5) at ../src/parquet/arrow/writer.cc:1086 {code} Not sure what's going wrong yet > [Python] SegFault in pyarrow.parquet.write_table with specific options > ---------------------------------------------------------------------- > > Key: ARROW-2082 > URL: https://issues.apache.org/jira/browse/ARROW-2082 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.8.0 > Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu > Xenial (Python 3.5) > Reporter: Clément Bouscasse > Priority: Major > Fix For: 0.9.0 > > > I originally filed an issue in the pandas project but we've tracked it down > to arrow itself, when called via pandas in specific circumstances: > [https://github.com/pandas-dev/pandas/issues/19493] > basically using > {code:java} > df.to_parquet('filename.parquet', flavor='spark'){code} > gives a seg fault if `df` contains a datetime column. > Under the covers, pandas translates this to the following call: > {code:java} > pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy', > coerce_timestamps='ms') > {code} > which gives me an instant crash. > There is a repro on the github ticket. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)