[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089168#comment-16089168 ] Wes McKinney commented on ARROW-1167: - Moving this to 0.5.0. Added the overflow checks in ARROW-1177 https://github.com/apache/arrow/pull/853 but I think we can resolve this temporarily by chunking the binary column when it gets too large in Table.from_pandas > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > Fix For: 0.5.0 > > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x00
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076847#comment-16076847 ] Jeff Knupp commented on ARROW-1167: --- Ah, OK. I misunderstood. The pandas problem is somewhat similar but that one _is_ caused by the size of the type used to calculate memory (re)allocations. Nevermind! I'll ask this question on the pandas PR. I'll also look into implementing a chunked array in {{Table.from_pandas}}. > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonru
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076837#comment-16076837 ] Wes McKinney commented on ARROW-1167: - What do you mean by "Does it make sense to move to int64 to track buffer sizes?" ? The problem in Arrow is different, I think -- the variable length offsets are overflowing, the underlying memory buffers all use 64-bit integers. There is ARROW-750 to add string/binary types with 64-bit offsets, but in the meantime the easier route is to create a chunked array in Table.from_pandas rather than one huge array > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076822#comment-16076822 ] Jeff Knupp commented on ARROW-1167: --- So [~wesmckinn], pandas has the exact same bug (a bit easier to trigger) as reported here: https://github.com/pandas-dev/pandas/issues/16798. I tracked down where the allocation that triggers the issue is occurring and unsurprisingly it's when growing the buffer to accommodate the size of the data. I've confirmed that this, also, results in an integer overflow for the size to be allocated. Now, that's all well and good, but I'd actually like to fix all of these issues in the two projects. *Does it make sense to move to int64 to track buffer sizes*? We can still check for overflow, but this solves the underlying issue as well. Let me know what you think. > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/c
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071805#comment-16071805 ] Wes McKinney commented on ARROW-1167: - OK, I believe the root cause is that one of the columns in this dataset has over 2GB of string data in it, which is causing an undetected overflow in the int32 offsets in the underlying `BinaryArray` object. So there's a bunch of things that need to happen: * Detecting int32 overflow in BinaryBuilder (so constructing a malformed BinaryArray like this isn't possible) * Making sure such overflows are raised properly out of Table.from_pandas * Providing for chunked table construction in {{Table.from_pandas]} (which will help you fix this problem) cc [~xhochy] > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEva
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071802#comment-16071802 ] Wes McKinney commented on ARROW-1167: - Thanks. I'm able to reproduce; I will dig in and try to figure out the root cause > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 > L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318 > #35 Py_Main () at ../Modules/main.c:76
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071612#comment-16071612 ] Jeff Knupp commented on ARROW-1167: --- [~wesmckinn] I've attached a link to a bzip2d version of the source file that reliably reproduces the issue. I tried to see if I could reproduce it with a subset of the file's data, but I was only able to get it to crash (stepping in increments of 1,000,000 lines) at 18,000,000 out of ~18,800,000 lines, so I am just posting the original file in its entirety. Let me know if you have issues reproducing. Link: [test_data.csv.bz2|https://www.dropbox.com/s/hguhamz0gdv2uzv/test_data.csv.bz2?dl=0] MD5 for uncompressed file: > MD5 (./test_data.csv) = 9f92942dab60d1fde04773d57759fce2 > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071476#comment-16071476 ] Jeff Knupp commented on ARROW-1167: --- Ah, it may be that the file isn't hitting whatever limit/line was causing the error, since I was actually building the file to aid in recreating https://github.com/pandas-dev/pandas/issues/16798. I'll post a link to the smallest file I can create that reproduces the error (though it may require most of the 3+ GB file). > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_Simpl
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070891#comment-16070891 ] Wes McKinney commented on ARROW-1167: - With that data file I'm running the following code with master branches in a debug build and no core dump: {code} import os import pandas as pd import pyarrow as pa import pyarrow.parquet as pq DATA_PATH = os.path.expanduser('~/Downloads/test_data.csv.bz2') dtypes = {'_id': str, 'approved_budget': str, 'business_category': str, 'calendar_type': str, 'classification': str, 'client_agency_org': str, 'client_agency_org_id': str, 'closing_date': str, 'collection_contact': str, 'collection_point': str, 'contact_person': str, 'contact_person_address': str, 'contract_duration': str, 'created_by': str, 'creation_date': str, 'date_available': str, 'description': str, 'funding_instrument': str, 'funding_source': str, 'modified_date': str, 'notice_type': str, 'org_id': str, 'other_info': str, 'pre_bid_date': str, 'pre_bid_venue': str, 'procurement_mode': str, 'procuring_entity_org': str, 'procuring_entity_org_id': str, 'publish_date': str, 'reason': str, 'ref_id': str, 'ref_no': str, 'serialid': str, 'solicitation_no': str, 'special_instruction': str, 'stage': str, 'stage2_ref_id': str, 'tender_status': str, 'tender_title': str, 'trade_agreement': str} df = pd.read_csv(DATA_PATH, dtype=dtypes) table = pa.Table.from_pandas(df) pq.write_table(table, 'test.parquet') {code} I'm using pandas 0.18.1; I will try latest pandas version / release builds / pyarrow 0.4.1 later. Let me know if I should use different code > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070576#comment-16070576 ] Jeff Knupp commented on ARROW-1167: --- Smallest amount I could get it to core reliably on is 500,000 lines (1 GB uncompressed). Here is a link to a bzip2 compressed version: https://www.dropbox.com/s/hguhamz0gdv2uzv/test_data.csv.bz2?dl=0 MD5 of uncompressed file is listed below: MD5 (test_data.csv) = 9a66139195677008b4fcb56468e19234 > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ..
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068952#comment-16068952 ] Jeff Knupp commented on ARROW-1167: --- I can't upload the whole thing (it's > 5GB) but I can certainly upload a portion of it. Let me grab it and see if I can reproduce on a small portion of the file. On Thu, Jun 29, 2017 at 4:36 PM, Phillip Cloud (JIRA) > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068944#comment-16068944 ] Phillip Cloud commented on ARROW-1167: -- [~jeffknupp] Can you upload all or part of that CSV file? > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 > L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318 > #35 Py_Main () at ../Modules/main.c:768 > #36 0x004cf
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068821#comment-16068821 ] Wes McKinney commented on ARROW-1167: - There's another bug here ("mixed" is not a valid pandas_type in the metadata), reported in ARROW-1168 > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 > L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318 > #35 Py_Main () at
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068742#comment-16068742 ] Jeff Knupp commented on ARROW-1167: --- Yeah, I thought the same thing re: https://github.com/apache/parquet-cpp/pull/195. It's "odd" because there shouldn't be/aren't any bytes objects. It's a plain-text CSV and I'd be shocked if there were even any non-ascii values in there. This is from the stock pyarrow 0.4.1 install from PyPI. I can do a debug build of the latest version and report what I find. > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068718#comment-16068718 ] Wes McKinney commented on ARROW-1167: - It seems like this could be a manifestation of https://github.com/apache/parquet-cpp/pull/195; if so then one of the DCHECKs will get triggered in a debug build, and then we can try to figure out the underlying cause > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068674#comment-16068674 ] Wes McKinney commented on ARROW-1167: - Also could you clarify how the schema is "odd"? It looks like some columns have both unicode and bytes objects. > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 > L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318 > #35 Py_
[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps
[ https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068672#comment-16068672 ] Wes McKinney commented on ARROW-1167: - What version of the software is this? Can you see if you can reproduce the failure with a debug build? A backtrace with debug symbols enabled would be helpful. If there's any way that one of us can repro the issues ourselves that would be very helpful > Writing pyarrow Table to Parquet core dumps > --- > > Key: ARROW-1167 > URL: https://issues.apache.org/jira/browse/ARROW-1167 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Knupp > > When writing a pyarrow Table (instantiated from a Pandas dataframe reading in > a ~5GB CSV file) to a parquet file, the interpreter cores with the following > stack trace from gdb: > {code} > #0 __memmove_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181 > #1 0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char > const*, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #2 0x7fbaa5c0ce97 in > parquet::PlainEncoder > >::Put(parquet::ByteArray const*, int) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #3 0x7fbaa5c18855 in > parquet::TypedColumnWriter > >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray > const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #4 0x7fbaa5c189d5 in > parquet::TypedColumnWriter > >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #5 0x7fbaa5be0900 in arrow::Status > parquet::arrow::FileWriter::Impl::TypedWriteBatch, > arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr > const&, long, short const*, short const*) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #6 0x7fbaa5be171d in > parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () > from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #7 0x7fbaa5be1dad in > parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #8 0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table > const&, long) () from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1 > #9 0x7fbaa51e1f53 in > __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, > _object*) () >from > /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so > #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98 > #11 0x00529885 in do_call (nk=, na=, > pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933 > #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at > ../Python/ceval.c:4732 > #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at > ../Python/ceval.c:4813 > #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at > ../Python/ceval.c:4730 > #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at > ../Python/ceval.c:4813 > #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at > ../Python/ceval.c:4730 > #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at > ../Python/ceval.c:4803 > #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at > ../Python/ceval.c:4730 > #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at > ../Python/ceval.c:4803 > #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at > ../Python/ceval.c:4730 > #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236 > #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at > ../Python/ceval.c:4018 > #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039 > #30 PyEval_EvalCode (co=, globals=, > locals=) at ../Python/ceval.c:777 > #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976 > #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929 > #33 0x005ff95c in PyRun_SimpleFileExFlags () at > ../Python/pythonrun.c:396 > #34 0x00