[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Knupp updated ARROW-1167: ------------------------------ Description: When writing a pyarrow Table (instantiated from a Pandas dataframe reading in a ~5GB CSV file) to a parquet file, the interpreter cores with the following stack trace from gdb: {code} #0 __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 #1 0x00007fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char const*, long) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #2 0x00007fbaa5c0ce97 in parquet::PlainEncoder<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #3 0x00007fbaa5c18855 in parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)6> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray const*) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #4 0x00007fbaa5c189d5 in parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)6> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #5 0x00007fbaa5be0900 in arrow::Status parquet::arrow::FileWriter::Impl::TypedWriteBatch<parquet::DataType<(parquet::Type::type)6>, arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr<arrow::Array> const&, long, short const*, short const*) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #6 0x00007fbaa5be171d in parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #7 0x00007fbaa5be1dad in parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #8 0x00007fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table const&, long) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #9 0x00007fbaa51e1f53 in __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, _object*) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so #10 0x00000000004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 #11 0x0000000000529885 in do_call (nk=<optimized out>, na=<optimized out>, pp_stack=0x7ffe6510a6c0, func=<optimized out>) at ../Python/ceval.c:4933 #12 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510a6c0) at ../Python/ceval.c:4732 #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #14 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018 #15 0x0000000000528eee in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffe6510a8d0, func=<optimized out>) at ../Python/ceval.c:4813 #16 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510a8d0) at ../Python/ceval.c:4730 #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #18 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018 #19 0x0000000000528eee in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffe6510aae0, func=<optimized out>) at ../Python/ceval.c:4813 #20 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510aae0) at ../Python/ceval.c:4730 #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #22 0x0000000000528814 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffe6510ac10, func=<optimized out>) at ../Python/ceval.c:4803 #23 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510ac10) at ../Python/ceval.c:4730 #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #25 0x0000000000528814 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffe6510ad40, func=<optimized out>) at ../Python/ceval.c:4803 #26 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510ad40) at ../Python/ceval.c:4730 #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #28 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018 #29 0x000000000052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 #30 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777 #31 0x00000000005fd2c2 in run_mod () at ../Python/pythonrun.c:976 #32 0x00000000005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 #33 0x00000000005ff95c in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:396 #34 0x000000000063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318 #35 Py_Main () at ../Modules/main.c:768 #36 0x00000000004cfe41 in main () at ../Programs/python.c:65 #37 0x00007fbadf0db830 in __libc_start_main (main=0x4cfd60 <main>, argc=2, argv=0x7ffe6510b1c8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe6510b1b8) at ../csu/libc-start.c:291 #38 0x00000000005d5f29 in _start () {code] This is occurring in a pretty vanilla call to `pq.write_table(table, output)`. Before the crash, I'm able to print out the table's schema and it looks a little odd (all columns are explicitly specified in {{pandas.read_csv()}} to be strings... {code] _id: string ref_id: string ref_no: string stage: string stage2_ref_id: string org_id: string classification: string solicitation_no: string notice_type: string business_category: string procurement_mode: string funding_instrument: string funding_source: string approved_budget: string publish_date: string closing_date: string contract_duration: string calendar_type: string trade_agreement: string pre_bid_date: string pre_bid_venue: string procuring_entity_org_id: string procuring_entity_org: string client_agency_org_id: string client_agency_org: string contact_person: string contact_person_address: string tender_title: string description: string other_info: string reason: string created_by: string creation_date: string modified_date: string special_instruction: string collection_contact: string tender_status: string collection_point: string date_available: string serialid: string __index_level_0__: int64 -- metadata -- pandas: {"index_columns": ["__index_level_0__"], "columns": [{"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "ref_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "ref_no"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "stage"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "stage2_ref_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "org_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "classification"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "solicitation_no"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "notice_type"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "business_category"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "procurement_mode"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "funding_instrument"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "funding_source"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "approved_budget"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "publish_date"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "closing_date"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "contract_duration"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "calendar_type"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "trade_agreement"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "pre_bid_date"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "pre_bid_venue"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "procuring_entity_org_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "procuring_entity_org"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "client_agency_org_id"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "client_agency_org"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "contact_person"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "contact_person_address"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "tender_title"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "description"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "other_info"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "reason"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "created_by"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "creation_date"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "modified_date"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "special_instruction"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "collection_contact"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "tender_status"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "collection_point"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "date_available"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "serialid"}, {"pandas_type": "int64", "numpy_type": "int64", "metadata": null, "name": "__index_level_0__"}], "pandas_version": "0.19.2"} Segmentation fault (core dumped) {code] was: When writing a pyarrow Table (instantiated from a Pandas dataframe reading in a ~5GB CSV file) to a parquet file, the interpreter cores with the following stack trace from gdb: ``` #0 __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 #1 0x00007fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char const*, long) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #2 0x00007fbaa5c0ce97 in parquet::PlainEncoder<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #3 0x00007fbaa5c18855 in parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)6> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray const*) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #4 0x00007fbaa5c189d5 in parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)6> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #5 0x00007fbaa5be0900 in arrow::Status parquet::arrow::FileWriter::Impl::TypedWriteBatch<parquet::DataType<(parquet::Type::type)6>, arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr<arrow::Array> const&, long, short const*, short const*) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #6 0x00007fbaa5be171d in parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #7 0x00007fbaa5be1dad in parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #8 0x00007fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table const&, long) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 #9 0x00007fbaa51e1f53 in __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, _object*) () from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so #10 0x00000000004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 #11 0x0000000000529885 in do_call (nk=<optimized out>, na=<optimized out>, pp_stack=0x7ffe6510a6c0, func=<optimized out>) at ../Python/ceval.c:4933 #12 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510a6c0) at ../Python/ceval.c:4732 #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #14 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018 #15 0x0000000000528eee in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffe6510a8d0, func=<optimized out>) at ../Python/ceval.c:4813 #16 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510a8d0) at ../Python/ceval.c:4730 #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #18 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018 #19 0x0000000000528eee in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffe6510aae0, func=<optimized out>) at ../Python/ceval.c:4813 #20 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510aae0) at ../Python/ceval.c:4730 #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #22 0x0000000000528814 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffe6510ac10, func=<optimized out>) at ../Python/ceval.c:4803 #23 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510ac10) at ../Python/ceval.c:4730 #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #25 0x0000000000528814 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffe6510ad40, func=<optimized out>) at ../Python/ceval.c:4803 #26 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510ad40) at ../Python/ceval.c:4730 #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 #28 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018 #29 0x000000000052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 #30 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777 #31 0x00000000005fd2c2 in run_mod () at ../Python/pythonrun.c:976 #32 0x00000000005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 #33 0x00000000005ff95c in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:396 #34 0x000000000063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318 #35 Py_Main () at ../Modules/main.c:768 #36 0x00000000004cfe41 in main () at ../Programs/python.c:65 #37 0x00007fbadf0db830 in __libc_start_main (main=0x4cfd60 <main>, argc=2, argv=0x7ffe6510b1c8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe6510b1b8) at ../csu/libc-start.c:291 #38 0x00000000005d5f29 in _start () ``` This is occurring in a pretty vanilla call to `pq.write_table(table, output)`. Before the crash, I'm able to print out the table's schema and it looks a little odd (all columns are explicitly specified in {{pandas.read_csv()}} to be strings... ``` _id: string ref_id: string ref_no: string stage: string stage2_ref_id: string org_id: string classification: string solicitation_no: string notice_type: string business_category: string procurement_mode: string funding_instrument: string funding_source: string approved_budget: string publish_date: string closing_date: string contract_duration: string calendar_type: string trade_agreement: string pre_bid_date: string pre_bid_venue: string procuring_entity_org_id: string procuring_entity_org: string client_agency_org_id: string client_agency_org: string contact_person: string contact_person_address: string tender_title: string description: string other_info: string reason: string created_by: string creation_date: string modified_date: string special_instruction: string collection_contact: string tender_status: string collection_point: string date_available: string serialid: string __index_level_0__: int64 -- metadata -- pandas: {"index_columns": ["__index_level_0__"], "columns": [{"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "ref_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "ref_no"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "stage"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "stage2_ref_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "org_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "classification"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "solicitation_no"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "notice_type"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "business_category"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "procurement_mode"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "funding_instrument"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "funding_source"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "approved_budget"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "publish_date"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "closing_date"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "contract_duration"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "calendar_type"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "trade_agreement"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "pre_bid_date"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "pre_bid_venue"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "procuring_entity_org_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "procuring_entity_org"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "client_agency_org_id"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "client_agency_org"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "contact_person"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "contact_person_address"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "tender_title"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "description"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "other_info"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "reason"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "created_by"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "creation_date"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "modified_date"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "special_instruction"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "collection_contact"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "tender_status"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "collection_point"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": "date_available"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": "serialid"}, {"pandas_type": "int64", "numpy_type": "int64", "metadata": null, "name": "__index_level_0__"}], "pandas_version": "0.19.2"} Segmentation fault (core dumped) ``` > Writing pyarrow Table to Parquet core dumps > ------------------------------------------- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug > Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x00007fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x00007fbaa5c0ce97 in > parquet::PlainEncoder<parquet::DataType<(parquet::Type::type)6> > >::Put(parquet::ByteArray const*, int) () > from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x00007fbaa5c18855 in > parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)6> > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () > from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x00007fbaa5c189d5 in > parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)6> > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () > from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x00007fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch<parquet::DataType<(parquet::Type::type)6>, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr<arrow::Array> > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x00007fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x00007fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x00007fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x00007fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () > from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x00000000004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x0000000000529885 in do_call (nk=<optimized out>, na=<optimized out>, > pp_stack=0x7ffe6510a6c0, func=<optimized out>) at ../Python/ceval.c:4933 > #12 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x0000000000528eee in fast_function (nk=<optimized out>, na=<optimized > out>, n=<optimized out>, pp_stack=0x7ffe6510a8d0, func=<optimized out>) at > ../Python/ceval.c:4813 > #16 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x0000000000528eee in fast_function (nk=<optimized out>, na=<optimized > out>, n=<optimized out>, pp_stack=0x7ffe6510aae0, func=<optimized out>) at > ../Python/ceval.c:4813 > #20 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x0000000000528814 in fast_function (nk=<optimized out>, na=<optimized > out>, n=<optimized out>, pp_stack=0x7ffe6510ac10, func=<optimized out>) at > ../Python/ceval.c:4803 > #23 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x0000000000528814 in fast_function (nk=<optimized out>, na=<optimized > out>, n=<optimized out>, pp_stack=0x7ffe6510ad40, func=<optimized out>) at > ../Python/ceval.c:4803 > #26 call_function (oparg=<optimized out>, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x000000000052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, > locals=<optimized out>) at ../Python/ceval.c:777 > #31 0x00000000005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x00000000005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x00000000005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x000000000063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 > L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318 > #35 Py_Main () at ../Modules/main.c:768 > #36 0x00000000004cfe41 in main () at ../Programs/python.c:65 > #37 0x00007fbadf0db830 in __libc_start_main (main=0x4cfd60 <main>, argc=2, > argv=0x7ffe6510b1c8, init=<optimized out>, fini=<optimized out>, > rtld_fini=<optimized out>, stack_end=0x7ffe6510b1b8) > at ../csu/libc-start.c:291 > #38 0x00000000005d5f29 in _start () > {code] > This is occurring in a pretty vanilla call to `pq.write_table(table, > output)`. Before the crash, I'm able to print out the table's schema and it > looks a little odd (all columns are explicitly specified in > {{pandas.read_csv()}} to be strings... > {code] > _id: string > ref_id: string > ref_no: string > stage: string > stage2_ref_id: string > org_id: string > classification: string > solicitation_no: string > notice_type: string > business_category: string > procurement_mode: string > funding_instrument: string > funding_source: string > approved_budget: string > publish_date: string > closing_date: string > contract_duration: string > calendar_type: string > trade_agreement: string > pre_bid_date: string > pre_bid_venue: string > procuring_entity_org_id: string > procuring_entity_org: string > client_agency_org_id: string > client_agency_org: string > contact_person: string > contact_person_address: string > tender_title: string > description: string > other_info: string > reason: string > created_by: string > creation_date: string > modified_date: string > special_instruction: string > collection_contact: string > tender_status: string > collection_point: string > date_available: string > serialid: string > __index_level_0__: int64 > -- metadata -- > pandas: {"index_columns": ["__index_level_0__"], "columns": [{"pandas_type": > "unicode", "numpy_type": "object", "metadata": null, "name": "_id"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "ref_id"}, {"pandas_type": "unicode", "numpy_type": "object", "metadata": > null, "name": "ref_no"}, {"pandas_type": "unicode", "numpy_type": "object", > "metadata": null, "name": "stage"}, {"pandas_type": "mixed", "numpy_type": > "object", "metadata": null, "name": "stage2_ref_id"}, {"pandas_type": > "unicode", "numpy_type": "object", "metadata": null, "name": "org_id"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "classification"}, {"pandas_type": "mixed", "numpy_type": "object", > "metadata": null, "name": "solicitation_no"}, {"pandas_type": "unicode", > "numpy_type": "object", "metadata": null, "name": "notice_type"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "business_category"}, {"pandas_type": "unicode", "numpy_type": "object", > "metadata": null, "name": "procurement_mode"}, {"pandas_type": "mixed", > "numpy_type": "object", "metadata": null, "name": "funding_instrument"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "funding_source"}, {"pandas_type": "unicode", "numpy_type": "object", > "metadata": null, "name": "approved_budget"}, {"pandas_type": "mixed", > "numpy_type": "object", "metadata": null, "name": "publish_date"}, > {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": > "closing_date"}, {"pandas_type": "unicode", "numpy_type": "object", > "metadata": null, "name": "contract_duration"}, {"pandas_type": "mixed", > "numpy_type": "object", "metadata": null, "name": "calendar_type"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "trade_agreement"}, {"pandas_type": "mixed", "numpy_type": "object", > "metadata": null, "name": "pre_bid_date"}, {"pandas_type": "mixed", > "numpy_type": "object", "metadata": null, "name": "pre_bid_venue"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "procuring_entity_org_id"}, {"pandas_type": "unicode", "numpy_type": > "object", "metadata": null, "name": "procuring_entity_org"}, {"pandas_type": > "unicode", "numpy_type": "object", "metadata": null, "name": > "client_agency_org_id"}, {"pandas_type": "mixed", "numpy_type": "object", > "metadata": null, "name": "client_agency_org"}, {"pandas_type": "mixed", > "numpy_type": "object", "metadata": null, "name": "contact_person"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "contact_person_address"}, {"pandas_type": "mixed", "numpy_type": "object", > "metadata": null, "name": "tender_title"}, {"pandas_type": "mixed", > "numpy_type": "object", "metadata": null, "name": "description"}, > {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": > "other_info"}, {"pandas_type": "mixed", "numpy_type": "object", "metadata": > null, "name": "reason"}, {"pandas_type": "unicode", "numpy_type": "object", > "metadata": null, "name": "created_by"}, {"pandas_type": "unicode", > "numpy_type": "object", "metadata": null, "name": "creation_date"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "modified_date"}, {"pandas_type": "mixed", "numpy_type": "object", > "metadata": null, "name": "special_instruction"}, {"pandas_type": "mixed", > "numpy_type": "object", "metadata": null, "name": "collection_contact"}, > {"pandas_type": "mixed", "numpy_type": "object", "metadata": null, "name": > "tender_status"}, {"pandas_type": "mixed", "numpy_type": "object", > "metadata": null, "name": "collection_point"}, {"pandas_type": "mixed", > "numpy_type": "object", "metadata": null, "name": "date_available"}, > {"pandas_type": "unicode", "numpy_type": "object", "metadata": null, "name": > "serialid"}, {"pandas_type": "int64", "numpy_type": "int64", "metadata": > null, "name": "__index_level_0__"}], "pandas_version": "0.19.2"} > Segmentation fault (core dumped) > {code] -- This message was sent by Atlassian JIRA (v6.4.14#64029)