[jira] [Commented] (ARROW-1169) C++: jemalloc externalproject doesn't build with CMake's ninja generator

2017-06-29 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069009#comment-16069009
 ] 

Uwe L. Korn commented on ARROW-1169:


PR: https://github.com/apache/arrow/pull/796

> C++: jemalloc externalproject doesn't build with CMake's ninja generator
> 
>
> Key: ARROW-1169
> URL: https://issues.apache.org/jira/browse/ARROW-1169
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.4.1
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
> Fix For: 0.5.0
>
>
> To build {{jemalloc}} we currently use {{$CMAKE_MAKE_COMMAND}} which is the 
> command CMake generated "makefiles" for. But jemalloc is setup to use 
> configure & make always, thus this fails with the Ninja generator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1169) C++: jemalloc externalproject doesn't build with CMake's ninja generator

2017-06-29 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-1169:
--

 Summary: C++: jemalloc externalproject doesn't build with CMake's 
ninja generator
 Key: ARROW-1169
 URL: https://issues.apache.org/jira/browse/ARROW-1169
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.4.1
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.5.0


To build {{jemalloc}} we currently use {{$CMAKE_MAKE_COMMAND}} which is the 
command CMake generated "makefiles" for. But jemalloc is setup to use configure 
& make always, thus this fails with the Ninja generator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Jeff Knupp (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068952#comment-16068952
 ] 

Jeff Knupp commented on ARROW-1167:
---

I can't upload the whole thing (it's > 5GB) but I can certainly upload a
portion of it. Let me grab it and see if I can reproduce on a small portion
of the file.

On Thu, Jun 29, 2017 at 4:36 PM, Phillip Cloud (JIRA) 



> Writing pyarrow Table to Parquet core dumps
> ---
>
> Key: ARROW-1167
> URL: https://issues.apache.org/jira/browse/ARROW-1167
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jeff Knupp
>
> When writing a pyarrow Table (instantiated from a Pandas dataframe reading in 
> a ~5GB CSV file) to a parquet file, the interpreter cores with the following 
> stack trace from gdb:
> {code}
> #0  __memmove_avx_unaligned () at 
> ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
> #1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
> const*, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #2  0x7fbaa5c0ce97 in 
> parquet::PlainEncoder 
> >::Put(parquet::ByteArray const*, int) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #3  0x7fbaa5c18855 in 
> parquet::TypedColumnWriter 
> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray 
> const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #4  0x7fbaa5c189d5 in 
> parquet::TypedColumnWriter 
> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #5  0x7fbaa5be0900 in arrow::Status 
> parquet::arrow::FileWriter::Impl::TypedWriteBatch,
>  arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
> const&, long, short const*, short const*) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #6  0x7fbaa5be171d in 
> parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () 
> from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #7  0x7fbaa5be1dad in 
> parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
> const&, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #9  0x7fbaa51e1f53 in 
> __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
> _object*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
> #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
> #11 0x00529885 in do_call (nk=, na=, 
> pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
> #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
> ../Python/ceval.c:4732
> #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at 
> ../Python/ceval.c:4813
> #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
> ../Python/ceval.c:4730
> #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at 
> ../Python/ceval.c:4813
> #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
> ../Python/ceval.c:4730
> #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at 
> ../Python/ceval.c:4803
> #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
> ../Python/ceval.c:4730
> #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at 
> ../Python/ceval.c:4803
> #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
> ../Python/ceval.c:4730
> #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
> #30 PyEval_EvalCode (co=, globals=, 
> locals=) at ../Python/ceval.c:777
> #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
> #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
> #33 0x005ff95c in PyRun_SimpleFileExFlags () at 
> ../Python/pythonrun.c:396
> #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0

[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068944#comment-16068944
 ] 

Phillip Cloud commented on ARROW-1167:
--

[~jeffknupp] Can you upload all or part of that CSV file?

> Writing pyarrow Table to Parquet core dumps
> ---
>
> Key: ARROW-1167
> URL: https://issues.apache.org/jira/browse/ARROW-1167
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jeff Knupp
>
> When writing a pyarrow Table (instantiated from a Pandas dataframe reading in 
> a ~5GB CSV file) to a parquet file, the interpreter cores with the following 
> stack trace from gdb:
> {code}
> #0  __memmove_avx_unaligned () at 
> ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
> #1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
> const*, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #2  0x7fbaa5c0ce97 in 
> parquet::PlainEncoder 
> >::Put(parquet::ByteArray const*, int) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #3  0x7fbaa5c18855 in 
> parquet::TypedColumnWriter 
> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray 
> const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #4  0x7fbaa5c189d5 in 
> parquet::TypedColumnWriter 
> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #5  0x7fbaa5be0900 in arrow::Status 
> parquet::arrow::FileWriter::Impl::TypedWriteBatch,
>  arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
> const&, long, short const*, short const*) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #6  0x7fbaa5be171d in 
> parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () 
> from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #7  0x7fbaa5be1dad in 
> parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
> const&, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #9  0x7fbaa51e1f53 in 
> __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
> _object*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
> #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
> #11 0x00529885 in do_call (nk=, na=, 
> pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
> #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
> ../Python/ceval.c:4732
> #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at 
> ../Python/ceval.c:4813
> #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
> ../Python/ceval.c:4730
> #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at 
> ../Python/ceval.c:4813
> #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
> ../Python/ceval.c:4730
> #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at 
> ../Python/ceval.c:4803
> #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
> ../Python/ceval.c:4730
> #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at 
> ../Python/ceval.c:4803
> #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
> ../Python/ceval.c:4730
> #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
> #30 PyEval_EvalCode (co=, globals=, 
> locals=) at ../Python/ceval.c:777
> #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
> #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
> #33 0x005ff95c in PyRun_SimpleFileExFlags () at 
> ../Python/pythonrun.c:396
> #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 
> L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318
> #35 Py_Main () at ../Modules/main.c:768
> #36 0x004cf

[jira] [Assigned] (ARROW-1168) [Python] pandas metadata may contain "mixed" data types

2017-06-29 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud reassigned ARROW-1168:


Assignee: Phillip Cloud

> [Python] pandas metadata may contain "mixed" data types
> ---
>
> Key: ARROW-1168
> URL: https://issues.apache.org/jira/browse/ARROW-1168
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Phillip Cloud
> Fix For: 0.5.0
>
>
> cc [~cpcloud] -- see schema in ARROW-1167. If infer_dtype returns "mixed" 
> then it will pass through 
> https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py#L42



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1164) C++: Templated functions need ARROW_EXPORT instead of ARROW_TEMPLATE_EXPORT

2017-06-29 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-1164.

Resolution: Won't Fix

Solved instead by https://github.com/apache/arrow/pull/794

> C++: Templated functions need ARROW_EXPORT instead of ARROW_TEMPLATE_EXPORT
> ---
>
> Key: ARROW-1164
> URL: https://issues.apache.org/jira/browse/ARROW-1164
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
> Fix For: 0.5.0
>
>
> This only breaks on older Unix linkers. Weird things happening there.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1166) Errors in Struct type's example and missing reference in Layout.md

2017-06-29 Thread Fang Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068903#comment-16068903
 ] 

Fang Zheng commented on ARROW-1166:
---

I see. Thanks for the clarification.

> Errors in Struct type's example and missing reference in Layout.md
> --
>
> Key: ARROW-1166
> URL: https://issues.apache.org/jira/browse/ARROW-1166
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.4.1
>Reporter: Fang Zheng
>Assignee: Fang Zheng
>Priority: Trivial
>  Labels: documentation, easyfix
> Fix For: 0.5.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> Hi,
> There seem to be several minor issues in Layout.md: 
> 1. In the example for Struct type 
> (https://github.com/apache/arrow/blob/master/format/Layout.md#struct-type), 
> the second array element is "{null, 2}" in the text, but in the Arrow 
> representation, the value buffer for field 0 has "bob" in the corresponding 
> slot. Either the text should be changed from "null" to "bob", or the Null 
> count, Null bitmap buffer, and Offsets buffer should be changed to count the 
> second element in field 0 as null. 
> 2. In the same Struct type example, the third element in the Struct array is 
> "null" in text, but the value buffer for field 1 has 3 in the corresponding 
> slot in the Arrow representation. The Null count, Null bitmap buffer, and 
> Value buffer should be changed to count the third element in field 1 as null.
> 3. In the Dictionary encoding section 
> (https://github.com/apache/arrow/blob/master/format/Layout.md#dictionary-encoding),
>  there is a missing reference to "Message.fbs". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1166) Errors in Struct type's example and missing reference in Layout.md

2017-06-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068897#comment-16068897
 ] 

Wes McKinney commented on ARROW-1166:
-

As I commented in the PR, in item 2, it is acceptable for the child array to 
have null count 0 even though the parent struct has a null in that slot. I 
agree that your changes make the example in particular more clear (as though we 
were constructing the struct from the input data), though

> Errors in Struct type's example and missing reference in Layout.md
> --
>
> Key: ARROW-1166
> URL: https://issues.apache.org/jira/browse/ARROW-1166
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.4.1
>Reporter: Fang Zheng
>Assignee: Fang Zheng
>Priority: Trivial
>  Labels: documentation, easyfix
> Fix For: 0.5.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> Hi,
> There seem to be several minor issues in Layout.md: 
> 1. In the example for Struct type 
> (https://github.com/apache/arrow/blob/master/format/Layout.md#struct-type), 
> the second array element is "{null, 2}" in the text, but in the Arrow 
> representation, the value buffer for field 0 has "bob" in the corresponding 
> slot. Either the text should be changed from "null" to "bob", or the Null 
> count, Null bitmap buffer, and Offsets buffer should be changed to count the 
> second element in field 0 as null. 
> 2. In the same Struct type example, the third element in the Struct array is 
> "null" in text, but the value buffer for field 1 has 3 in the corresponding 
> slot in the Arrow representation. The Null count, Null bitmap buffer, and 
> Value buffer should be changed to count the third element in field 1 as null.
> 3. In the Dictionary encoding section 
> (https://github.com/apache/arrow/blob/master/format/Layout.md#dictionary-encoding),
>  there is a missing reference to "Message.fbs". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1166) Errors in Struct type's example and missing reference in Layout.md

2017-06-29 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1166:
---

Assignee: Fang Zheng

> Errors in Struct type's example and missing reference in Layout.md
> --
>
> Key: ARROW-1166
> URL: https://issues.apache.org/jira/browse/ARROW-1166
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.4.1
>Reporter: Fang Zheng
>Assignee: Fang Zheng
>Priority: Trivial
>  Labels: documentation, easyfix
> Fix For: 0.5.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> Hi,
> There seem to be several minor issues in Layout.md: 
> 1. In the example for Struct type 
> (https://github.com/apache/arrow/blob/master/format/Layout.md#struct-type), 
> the second array element is "{null, 2}" in the text, but in the Arrow 
> representation, the value buffer for field 0 has "bob" in the corresponding 
> slot. Either the text should be changed from "null" to "bob", or the Null 
> count, Null bitmap buffer, and Offsets buffer should be changed to count the 
> second element in field 0 as null. 
> 2. In the same Struct type example, the third element in the Struct array is 
> "null" in text, but the value buffer for field 1 has 3 in the corresponding 
> slot in the Arrow representation. The Null count, Null bitmap buffer, and 
> Value buffer should be changed to count the third element in field 1 as null.
> 3. In the Dictionary encoding section 
> (https://github.com/apache/arrow/blob/master/format/Layout.md#dictionary-encoding),
>  there is a missing reference to "Message.fbs". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1165) [C++] Refactor PythonDecimalToArrowDecimal to not use templates

2017-06-29 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-1165.

Resolution: Fixed

Issue resolved by pull request 794
[https://github.com/apache/arrow/pull/794]

> [C++] Refactor PythonDecimalToArrowDecimal to not use templates
> ---
>
> Key: ARROW-1165
> URL: https://issues.apache.org/jira/browse/ARROW-1165
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.4.1
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1166) Errors in Struct type's example and missing reference in Layout.md

2017-06-29 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1166.
-
Resolution: Fixed

Issue resolved by pull request 795
[https://github.com/apache/arrow/pull/795]

> Errors in Struct type's example and missing reference in Layout.md
> --
>
> Key: ARROW-1166
> URL: https://issues.apache.org/jira/browse/ARROW-1166
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.4.1
>Reporter: Fang Zheng
>Priority: Trivial
>  Labels: documentation, easyfix
> Fix For: 0.5.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> Hi,
> There seem to be several minor issues in Layout.md: 
> 1. In the example for Struct type 
> (https://github.com/apache/arrow/blob/master/format/Layout.md#struct-type), 
> the second array element is "{null, 2}" in the text, but in the Arrow 
> representation, the value buffer for field 0 has "bob" in the corresponding 
> slot. Either the text should be changed from "null" to "bob", or the Null 
> count, Null bitmap buffer, and Offsets buffer should be changed to count the 
> second element in field 0 as null. 
> 2. In the same Struct type example, the third element in the Struct array is 
> "null" in text, but the value buffer for field 1 has 3 in the corresponding 
> slot in the Arrow representation. The Null count, Null bitmap buffer, and 
> Value buffer should be changed to count the third element in field 1 as null.
> 3. In the Dictionary encoding section 
> (https://github.com/apache/arrow/blob/master/format/Layout.md#dictionary-encoding),
>  there is a missing reference to "Message.fbs". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1127) pyarrow 4.1 import failure on Travis

2017-06-29 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1127.
-
   Resolution: Fixed
 Assignee: Wes McKinney
Fix Version/s: 0.4.1

This should be good now, please re-open if the problem recurs

> pyarrow 4.1 import failure on Travis
> 
>
> Key: ARROW-1127
> URL: https://issues.apache.org/jira/browse/ARROW-1127
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.4.1
>Reporter: Jeff Reback
>Assignee: Wes McKinney
> Fix For: 0.4.1
>
>
> Our last pandas good run was 3 days ago, Failures started 15 hours ago 
> (https://travis-ci.org/pandas-dev/pandas/jobs/244286287). 
> Here pyarrow (0.4.1) is a dep of feather (0.4.0)
> The failure *only* appears to be on unix (works ok on macosx).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068821#comment-16068821
 ] 

Wes McKinney commented on ARROW-1167:
-

There's another bug here ("mixed" is not a valid pandas_type in the metadata), 
reported in ARROW-1168

> Writing pyarrow Table to Parquet core dumps
> ---
>
> Key: ARROW-1167
> URL: https://issues.apache.org/jira/browse/ARROW-1167
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jeff Knupp
>
> When writing a pyarrow Table (instantiated from a Pandas dataframe reading in 
> a ~5GB CSV file) to a parquet file, the interpreter cores with the following 
> stack trace from gdb:
> {code}
> #0  __memmove_avx_unaligned () at 
> ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
> #1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
> const*, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #2  0x7fbaa5c0ce97 in 
> parquet::PlainEncoder 
> >::Put(parquet::ByteArray const*, int) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #3  0x7fbaa5c18855 in 
> parquet::TypedColumnWriter 
> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray 
> const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #4  0x7fbaa5c189d5 in 
> parquet::TypedColumnWriter 
> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #5  0x7fbaa5be0900 in arrow::Status 
> parquet::arrow::FileWriter::Impl::TypedWriteBatch,
>  arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
> const&, long, short const*, short const*) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #6  0x7fbaa5be171d in 
> parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () 
> from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #7  0x7fbaa5be1dad in 
> parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
> const&, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #9  0x7fbaa51e1f53 in 
> __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
> _object*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
> #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
> #11 0x00529885 in do_call (nk=, na=, 
> pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
> #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
> ../Python/ceval.c:4732
> #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at 
> ../Python/ceval.c:4813
> #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
> ../Python/ceval.c:4730
> #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at 
> ../Python/ceval.c:4813
> #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
> ../Python/ceval.c:4730
> #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at 
> ../Python/ceval.c:4803
> #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
> ../Python/ceval.c:4730
> #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at 
> ../Python/ceval.c:4803
> #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
> ../Python/ceval.c:4730
> #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
> #30 PyEval_EvalCode (co=, globals=, 
> locals=) at ../Python/ceval.c:777
> #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
> #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
> #33 0x005ff95c in PyRun_SimpleFileExFlags () at 
> ../Python/pythonrun.c:396
> #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 
> L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318
> #35 Py_Main () at 

[jira] [Created] (ARROW-1168) [Python] pandas metadata may contain "mixed" data types

2017-06-29 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1168:
---

 Summary: [Python] pandas metadata may contain "mixed" data types
 Key: ARROW-1168
 URL: https://issues.apache.org/jira/browse/ARROW-1168
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.5.0


cc [~cpcloud] -- see schema in ARROW-1167. If infer_dtype returns "mixed" then 
it will pass through 
https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py#L42



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Jeff Knupp (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068742#comment-16068742
 ] 

Jeff Knupp commented on ARROW-1167:
---

Yeah, I thought the same thing re: 
https://github.com/apache/parquet-cpp/pull/195. It's "odd" because there 
shouldn't be/aren't any bytes objects. It's a plain-text CSV and I'd be shocked 
if there were even any non-ascii values in there. This is from the stock 
pyarrow 0.4.1 install from PyPI. I can do a debug build of the latest version 
and report what I find.

> Writing pyarrow Table to Parquet core dumps
> ---
>
> Key: ARROW-1167
> URL: https://issues.apache.org/jira/browse/ARROW-1167
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jeff Knupp
>
> When writing a pyarrow Table (instantiated from a Pandas dataframe reading in 
> a ~5GB CSV file) to a parquet file, the interpreter cores with the following 
> stack trace from gdb:
> {code}
> #0  __memmove_avx_unaligned () at 
> ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
> #1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
> const*, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #2  0x7fbaa5c0ce97 in 
> parquet::PlainEncoder 
> >::Put(parquet::ByteArray const*, int) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #3  0x7fbaa5c18855 in 
> parquet::TypedColumnWriter 
> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray 
> const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #4  0x7fbaa5c189d5 in 
> parquet::TypedColumnWriter 
> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #5  0x7fbaa5be0900 in arrow::Status 
> parquet::arrow::FileWriter::Impl::TypedWriteBatch,
>  arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
> const&, long, short const*, short const*) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #6  0x7fbaa5be171d in 
> parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () 
> from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #7  0x7fbaa5be1dad in 
> parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
> const&, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #9  0x7fbaa51e1f53 in 
> __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
> _object*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
> #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
> #11 0x00529885 in do_call (nk=, na=, 
> pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
> #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
> ../Python/ceval.c:4732
> #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at 
> ../Python/ceval.c:4813
> #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
> ../Python/ceval.c:4730
> #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at 
> ../Python/ceval.c:4813
> #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
> ../Python/ceval.c:4730
> #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at 
> ../Python/ceval.c:4803
> #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
> ../Python/ceval.c:4730
> #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at 
> ../Python/ceval.c:4803
> #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
> ../Python/ceval.c:4730
> #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
> #30 PyEval_EvalCode (co=, globals=, 
> locals=) at ../Python/ceval.c:777
> #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
> #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:

[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068718#comment-16068718
 ] 

Wes McKinney commented on ARROW-1167:
-

It seems like this could be a manifestation of 
https://github.com/apache/parquet-cpp/pull/195; if so then one of the DCHECKs 
will get triggered in a debug build, and then we can try to figure out the 
underlying cause

> Writing pyarrow Table to Parquet core dumps
> ---
>
> Key: ARROW-1167
> URL: https://issues.apache.org/jira/browse/ARROW-1167
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jeff Knupp
>
> When writing a pyarrow Table (instantiated from a Pandas dataframe reading in 
> a ~5GB CSV file) to a parquet file, the interpreter cores with the following 
> stack trace from gdb:
> {code}
> #0  __memmove_avx_unaligned () at 
> ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
> #1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
> const*, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #2  0x7fbaa5c0ce97 in 
> parquet::PlainEncoder 
> >::Put(parquet::ByteArray const*, int) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #3  0x7fbaa5c18855 in 
> parquet::TypedColumnWriter 
> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray 
> const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #4  0x7fbaa5c189d5 in 
> parquet::TypedColumnWriter 
> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #5  0x7fbaa5be0900 in arrow::Status 
> parquet::arrow::FileWriter::Impl::TypedWriteBatch,
>  arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
> const&, long, short const*, short const*) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #6  0x7fbaa5be171d in 
> parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () 
> from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #7  0x7fbaa5be1dad in 
> parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
> const&, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #9  0x7fbaa51e1f53 in 
> __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
> _object*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
> #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
> #11 0x00529885 in do_call (nk=, na=, 
> pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
> #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
> ../Python/ceval.c:4732
> #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at 
> ../Python/ceval.c:4813
> #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
> ../Python/ceval.c:4730
> #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at 
> ../Python/ceval.c:4813
> #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
> ../Python/ceval.c:4730
> #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at 
> ../Python/ceval.c:4803
> #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
> ../Python/ceval.c:4730
> #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at 
> ../Python/ceval.c:4803
> #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
> ../Python/ceval.c:4730
> #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
> #30 PyEval_EvalCode (co=, globals=, 
> locals=) at ../Python/ceval.c:777
> #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
> #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
> #33 0x005ff95c in PyRun_SimpleFileExFlags () at 
> ../Python/pythonrun.c:396
> #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510

[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068674#comment-16068674
 ] 

Wes McKinney commented on ARROW-1167:
-

Also could you clarify how the schema is "odd"? It looks like some columns have 
both unicode and bytes objects. 

> Writing pyarrow Table to Parquet core dumps
> ---
>
> Key: ARROW-1167
> URL: https://issues.apache.org/jira/browse/ARROW-1167
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jeff Knupp
>
> When writing a pyarrow Table (instantiated from a Pandas dataframe reading in 
> a ~5GB CSV file) to a parquet file, the interpreter cores with the following 
> stack trace from gdb:
> {code}
> #0  __memmove_avx_unaligned () at 
> ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
> #1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
> const*, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #2  0x7fbaa5c0ce97 in 
> parquet::PlainEncoder 
> >::Put(parquet::ByteArray const*, int) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #3  0x7fbaa5c18855 in 
> parquet::TypedColumnWriter 
> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray 
> const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #4  0x7fbaa5c189d5 in 
> parquet::TypedColumnWriter 
> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #5  0x7fbaa5be0900 in arrow::Status 
> parquet::arrow::FileWriter::Impl::TypedWriteBatch,
>  arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
> const&, long, short const*, short const*) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #6  0x7fbaa5be171d in 
> parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () 
> from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #7  0x7fbaa5be1dad in 
> parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
> const&, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #9  0x7fbaa51e1f53 in 
> __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
> _object*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
> #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
> #11 0x00529885 in do_call (nk=, na=, 
> pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
> #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
> ../Python/ceval.c:4732
> #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at 
> ../Python/ceval.c:4813
> #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
> ../Python/ceval.c:4730
> #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at 
> ../Python/ceval.c:4813
> #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
> ../Python/ceval.c:4730
> #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at 
> ../Python/ceval.c:4803
> #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
> ../Python/ceval.c:4730
> #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at 
> ../Python/ceval.c:4803
> #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
> ../Python/ceval.c:4730
> #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
> #30 PyEval_EvalCode (co=, globals=, 
> locals=) at ../Python/ceval.c:777
> #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
> #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
> #33 0x005ff95c in PyRun_SimpleFileExFlags () at 
> ../Python/pythonrun.c:396
> #34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 
> L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318
> #35 Py_

[jira] [Commented] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068672#comment-16068672
 ] 

Wes McKinney commented on ARROW-1167:
-

What version of the software is this? Can you see if you can reproduce the 
failure with a debug build? A backtrace with debug symbols enabled would be 
helpful. If there's any way that one of us can repro the issues ourselves that 
would be very helpful

> Writing pyarrow Table to Parquet core dumps
> ---
>
> Key: ARROW-1167
> URL: https://issues.apache.org/jira/browse/ARROW-1167
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jeff Knupp
>
> When writing a pyarrow Table (instantiated from a Pandas dataframe reading in 
> a ~5GB CSV file) to a parquet file, the interpreter cores with the following 
> stack trace from gdb:
> {code}
> #0  __memmove_avx_unaligned () at 
> ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
> #1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
> const*, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #2  0x7fbaa5c0ce97 in 
> parquet::PlainEncoder 
> >::Put(parquet::ByteArray const*, int) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #3  0x7fbaa5c18855 in 
> parquet::TypedColumnWriter 
> >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray 
> const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #4  0x7fbaa5c189d5 in 
> parquet::TypedColumnWriter 
> >::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #5  0x7fbaa5be0900 in arrow::Status 
> parquet::arrow::FileWriter::Impl::TypedWriteBatch,
>  arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
> const&, long, short const*, short const*) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #6  0x7fbaa5be171d in 
> parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () 
> from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #7  0x7fbaa5be1dad in 
> parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
> const&, long) () from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
> #9  0x7fbaa51e1f53 in 
> __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
> _object*) ()
>from 
> /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
> #10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
> #11 0x00529885 in do_call (nk=, na=, 
> pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
> #12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
> ../Python/ceval.c:4732
> #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #14 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #15 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510a8d0, func=) at 
> ../Python/ceval.c:4813
> #16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
> ../Python/ceval.c:4730
> #17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #18 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #19 0x00528eee in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510aae0, func=) at 
> ../Python/ceval.c:4813
> #20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
> ../Python/ceval.c:4730
> #21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #22 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ac10, func=) at 
> ../Python/ceval.c:4803
> #23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
> ../Python/ceval.c:4730
> #24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #25 0x00528814 in fast_function (nk=, na= out>, n=, pp_stack=0x7ffe6510ad40, func=) at 
> ../Python/ceval.c:4803
> #26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
> ../Python/ceval.c:4730
> #27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
> #28 0x0052d2e3 in _PyEval_EvalCodeWithName () at 
> ../Python/ceval.c:4018
> #29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
> #30 PyEval_EvalCode (co=, globals=, 
> locals=) at ../Python/ceval.c:777
> #31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
> #32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
> #33 0x005ff95c in PyRun_SimpleFileExFlags () at 
> ../Python/pythonrun.c:396
> #34 0x00

[jira] [Updated] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Jeff Knupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Knupp updated ARROW-1167:
--
Description: 
When writing a pyarrow Table (instantiated from a Pandas dataframe reading in a 
~5GB CSV file) to a parquet file, the interpreter cores with the following 
stack trace from gdb:

{code}
#0  __memmove_avx_unaligned () at 
../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
#1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
const*, long) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#2  0x7fbaa5c0ce97 in 
parquet::PlainEncoder 
>::Put(parquet::ByteArray const*, int) ()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#3  0x7fbaa5c18855 in 
parquet::TypedColumnWriter 
>::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray const*) 
()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#4  0x7fbaa5c189d5 in 
parquet::TypedColumnWriter 
>::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#5  0x7fbaa5be0900 in arrow::Status 
parquet::arrow::FileWriter::Impl::TypedWriteBatch,
 arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
const&, long, short const*, short const*) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#6  0x7fbaa5be171d in 
parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#7  0x7fbaa5be1dad in 
parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
const&, long) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#9  0x7fbaa51e1f53 in 
__pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
_object*) ()
   from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
#10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
#11 0x00529885 in do_call (nk=, na=, 
pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
#12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
../Python/ceval.c:4732
#13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#14 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#15 0x00528eee in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510a8d0, func=) at 
../Python/ceval.c:4813
#16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
../Python/ceval.c:4730
#17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#18 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#19 0x00528eee in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510aae0, func=) at 
../Python/ceval.c:4813
#20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
../Python/ceval.c:4730
#21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#22 0x00528814 in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510ac10, func=) at 
../Python/ceval.c:4803
#23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
../Python/ceval.c:4730
#24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#25 0x00528814 in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510ad40, func=) at 
../Python/ceval.c:4803
#26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
../Python/ceval.c:4730
#27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#28 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#30 PyEval_EvalCode (co=, globals=, 
locals=) at ../Python/ceval.c:777
#31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
#32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
#33 0x005ff95c in PyRun_SimpleFileExFlags () at 
../Python/pythonrun.c:396
#34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 
L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318
#35 Py_Main () at ../Modules/main.c:768
#36 0x004cfe41 in main () at ../Programs/python.c:65
#37 0x7fbadf0db830 in __libc_start_main (main=0x4cfd60 , argc=2, 
argv=0x7ffe6510b1c8, init=, fini=, 
rtld_fini=, stack_end=0x7ffe6510b1b8)
at ../csu/libc-start.c:291
#38 0x005d5f29 in _start ()
{code}

This is occurring in a pretty vanilla call to `pq.write_table(table, output)`. 
Before the crash, I'm able to print out the table's schema and it looks a 
little odd (all columns are explicitly specified in {{pandas.read_csv()}} to be 
strings...

{code}
_id: string
ref_id: string
ref_no: string
stage: string
stage2_ref_id: string
org_id: s

[jira] [Updated] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Jeff Knupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Knupp updated ARROW-1167:
--
Description: 
When writing a pyarrow Table (instantiated from a Pandas dataframe reading in a 
~5GB CSV file) to a parquet file, the interpreter cores with the following 
stack trace from gdb:

{code}
#0  __memmove_avx_unaligned () at 
../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
#1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
const*, long) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#2  0x7fbaa5c0ce97 in 
parquet::PlainEncoder 
>::Put(parquet::ByteArray const*, int) ()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#3  0x7fbaa5c18855 in 
parquet::TypedColumnWriter 
>::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray const*) 
()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#4  0x7fbaa5c189d5 in 
parquet::TypedColumnWriter 
>::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#5  0x7fbaa5be0900 in arrow::Status 
parquet::arrow::FileWriter::Impl::TypedWriteBatch,
 arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
const&, long, short const*, short const*) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#6  0x7fbaa5be171d in 
parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#7  0x7fbaa5be1dad in 
parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
const&, long) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#9  0x7fbaa51e1f53 in 
__pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
_object*) ()
   from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
#10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
#11 0x00529885 in do_call (nk=, na=, 
pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
#12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
../Python/ceval.c:4732
#13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#14 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#15 0x00528eee in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510a8d0, func=) at 
../Python/ceval.c:4813
#16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
../Python/ceval.c:4730
#17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#18 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#19 0x00528eee in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510aae0, func=) at 
../Python/ceval.c:4813
#20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
../Python/ceval.c:4730
#21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#22 0x00528814 in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510ac10, func=) at 
../Python/ceval.c:4803
#23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
../Python/ceval.c:4730
#24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#25 0x00528814 in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510ad40, func=) at 
../Python/ceval.c:4803
#26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
../Python/ceval.c:4730
#27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#28 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#30 PyEval_EvalCode (co=, globals=, 
locals=) at ../Python/ceval.c:777
#31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
#32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
#33 0x005ff95c in PyRun_SimpleFileExFlags () at 
../Python/pythonrun.c:396
#34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 
L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318
#35 Py_Main () at ../Modules/main.c:768
#36 0x004cfe41 in main () at ../Programs/python.c:65
#37 0x7fbadf0db830 in __libc_start_main (main=0x4cfd60 , argc=2, 
argv=0x7ffe6510b1c8, init=, fini=, 
rtld_fini=, stack_end=0x7ffe6510b1b8)
at ../csu/libc-start.c:291
#38 0x005d5f29 in _start ()
{code]

This is occurring in a pretty vanilla call to `pq.write_table(table, output)`. 
Before the crash, I'm able to print out the table's schema and it looks a 
little odd (all columns are explicitly specified in {{pandas.read_csv()}} to be 
strings...

{code]
_id: string
ref_id: string
ref_no: string
stage: string
stage2_ref_id: string
org_id: s

[jira] [Created] (ARROW-1167) Writing pyarrow Table to Parquet core dumps

2017-06-29 Thread Jeff Knupp (JIRA)
Jeff Knupp created ARROW-1167:
-

 Summary: Writing pyarrow Table to Parquet core dumps
 Key: ARROW-1167
 URL: https://issues.apache.org/jira/browse/ARROW-1167
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Jeff Knupp


When writing a pyarrow Table (instantiated from a Pandas dataframe reading in a 
~5GB CSV file) to a parquet file, the interpreter cores with the following 
stack trace from gdb:

```
#0  __memmove_avx_unaligned () at 
../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:181
#1  0x7fbaa5c779f1 in parquet::InMemoryOutputStream::Write(unsigned char 
const*, long) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#2  0x7fbaa5c0ce97 in 
parquet::PlainEncoder 
>::Put(parquet::ByteArray const*, int) ()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#3  0x7fbaa5c18855 in 
parquet::TypedColumnWriter 
>::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray const*) 
()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#4  0x7fbaa5c189d5 in 
parquet::TypedColumnWriter 
>::WriteBatch(long, short const*, short const*, parquet::ByteArray const*) ()
   from /home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#5  0x7fbaa5be0900 in arrow::Status 
parquet::arrow::FileWriter::Impl::TypedWriteBatch,
 arrow::BinaryType>(parquet::ColumnWriter*, std::shared_ptr 
const&, long, short const*, short const*) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#6  0x7fbaa5be171d in 
parquet::arrow::FileWriter::Impl::WriteColumnChunk(arrow::Array const&) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#7  0x7fbaa5be1dad in 
parquet::arrow::FileWriter::WriteColumnChunk(arrow::Array const&) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#8  0x7fbaa5be2047 in parquet::arrow::FileWriter::WriteTable(arrow::Table 
const&, long) () from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/libparquet.so.1
#9  0x7fbaa51e1f53 in 
__pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, 
_object*) ()
   from 
/home/ubuntu/.local/lib/python3.5/site-packages/pyarrow/_parquet.cpython-35m-x86_64-linux-gnu.so
#10 0x004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
#11 0x00529885 in do_call (nk=, na=, 
pp_stack=0x7ffe6510a6c0, func=) at ../Python/ceval.c:4933
#12 call_function (oparg=, pp_stack=0x7ffe6510a6c0) at 
../Python/ceval.c:4732
#13 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#14 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#15 0x00528eee in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510a8d0, func=) at 
../Python/ceval.c:4813
#16 call_function (oparg=, pp_stack=0x7ffe6510a8d0) at 
../Python/ceval.c:4730
#17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#18 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#19 0x00528eee in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510aae0, func=) at 
../Python/ceval.c:4813
#20 call_function (oparg=, pp_stack=0x7ffe6510aae0) at 
../Python/ceval.c:4730
#21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#22 0x00528814 in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510ac10, func=) at 
../Python/ceval.c:4803
#23 call_function (oparg=, pp_stack=0x7ffe6510ac10) at 
../Python/ceval.c:4730
#24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#25 0x00528814 in fast_function (nk=, na=, n=, pp_stack=0x7ffe6510ad40, func=) at 
../Python/ceval.c:4803
#26 call_function (oparg=, pp_stack=0x7ffe6510ad40) at 
../Python/ceval.c:4730
#27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#28 0x0052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#29 0x0052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#30 PyEval_EvalCode (co=, globals=, 
locals=) at ../Python/ceval.c:777
#31 0x005fd2c2 in run_mod () at ../Python/pythonrun.c:976
#32 0x005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
#33 0x005ff95c in PyRun_SimpleFileExFlags () at 
../Python/pythonrun.c:396
#34 0x0063e7d6 in run_file (p_cf=0x7ffe6510afb0, filename=0x2161260 
L"scripts/parquet_export.py", fp=0x226fde0) at ../Modules/main.c:318
#35 Py_Main () at ../Modules/main.c:768
#36 0x004cfe41 in main () at ../Programs/python.c:65
#37 0x7fbadf0db830 in __libc_start_main (main=0x4cfd60 , argc=2, 
argv=0x7ffe6510b1c8, init=, fini=, 
rtld_fini=, stack_end=0x7ffe6510b1b8)
at ../csu/libc-start.c:291
#38 0x005d5f29 in _start ()
```

This is occurring in a pretty vanilla call to `pq.write_table(table, output)`. 
Before the crash, I'm able to print out the table's schema and it looks a 
little odd (all columns are explicitly specified in {{pandas.read_csv

[jira] [Updated] (ARROW-1165) [C++] Refactor PythonDecimalToArrowDecimal to not use templates

2017-06-29 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud updated ARROW-1165:
-
Summary: [C++] Refactor PythonDecimalToArrowDecimal to not use templates  
(was: Refactor PythonDecimalToArrowDecimal to not use templates)

> [C++] Refactor PythonDecimalToArrowDecimal to not use templates
> ---
>
> Key: ARROW-1165
> URL: https://issues.apache.org/jira/browse/ARROW-1165
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.4.1
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1166) Errors in Struct type's example and missing reference in Layout.md

2017-06-29 Thread Fang Zheng (JIRA)
Fang Zheng created ARROW-1166:
-

 Summary: Errors in Struct type's example and missing reference in 
Layout.md
 Key: ARROW-1166
 URL: https://issues.apache.org/jira/browse/ARROW-1166
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.4.1
Reporter: Fang Zheng
Priority: Trivial
 Fix For: 0.5.0


Hi,
There seem to be several minor issues in Layout.md: 

1. In the example for Struct type 
(https://github.com/apache/arrow/blob/master/format/Layout.md#struct-type), the 
second array element is "{null, 2}" in the text, but in the Arrow 
representation, the value buffer for field 0 has "bob" in the corresponding 
slot. Either the text should be changed from "null" to "bob", or the Null 
count, Null bitmap buffer, and Offsets buffer should be changed to count the 
second element in field 0 as null. 

2. In the same Struct type example, the third element in the Struct array is 
"null" in text, but the value buffer for field 1 has 3 in the corresponding 
slot in the Arrow representation. The Null count, Null bitmap buffer, and Value 
buffer should be changed to count the third element in field 1 as null.

3. In the Dictionary encoding section 
(https://github.com/apache/arrow/blob/master/format/Layout.md#dictionary-encoding),
 there is a missing reference to "Message.fbs". 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1165) Refactor PythonDecimalToArrowDecimal to not use templates

2017-06-29 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-1165:


 Summary: Refactor PythonDecimalToArrowDecimal to not use templates
 Key: ARROW-1165
 URL: https://issues.apache.org/jira/browse/ARROW-1165
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.4.1
Reporter: Phillip Cloud
Assignee: Phillip Cloud
 Fix For: 0.5.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1162) Transfer Between Empty Lists Should Not Invoke Callback

2017-06-29 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned ARROW-1162:
---

Assignee: Deneche A. Hakim

> Transfer Between Empty Lists Should Not Invoke Callback
> ---
>
> Key: ARROW-1162
> URL: https://issues.apache.org/jira/browse/ARROW-1162
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Affects Versions: 0.4.1
>Reporter: Sudheesh Katkam
>Assignee: Deneche A. Hakim
> Fix For: 0.5.0
>
>
> Here's the test that fails:
> {code}
> public class TestBufferOwnershipTransfer {
>   //...
>   private static class Pointer {
> T value;
>   }
>   private static CallBack newTriggerCallback(final Pointer trigger) {
> trigger.value = false;
> return new CallBack() {
>   @Override
>   public void doWork() {
> trigger.value = true;
>   }
> };
>   }
>   @Test
>   public void emptyListTransferShouldNotTriggerSchemaChange() {
> final BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
> final Pointer trigger1 = new Pointer<>();
> final Pointer trigger2 = new Pointer<>();
> final ListVector v1 = new ListVector("v1", allocator,
> FieldType.nullable(ArrowType.Null.INSTANCE),
> newTriggerCallback(trigger1));
> final ListVector v2 = new ListVector("v2", allocator,
> FieldType.nullable(ArrowType.Null.INSTANCE),
> newTriggerCallback(trigger2));
> v1.makeTransferPair(v2).transfer();
> assertFalse(trigger1.value);
> assertFalse(trigger2.value); // fails
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1162) Transfer Between Empty Lists Should Not Invoke Callback

2017-06-29 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved ARROW-1162.
-
Resolution: Fixed

Issue resolved by pull request 791
[https://github.com/apache/arrow/pull/791]

> Transfer Between Empty Lists Should Not Invoke Callback
> ---
>
> Key: ARROW-1162
> URL: https://issues.apache.org/jira/browse/ARROW-1162
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Affects Versions: 0.4.1
>Reporter: Sudheesh Katkam
> Fix For: 0.5.0
>
>
> Here's the test that fails:
> {code}
> public class TestBufferOwnershipTransfer {
>   //...
>   private static class Pointer {
> T value;
>   }
>   private static CallBack newTriggerCallback(final Pointer trigger) {
> trigger.value = false;
> return new CallBack() {
>   @Override
>   public void doWork() {
> trigger.value = true;
>   }
> };
>   }
>   @Test
>   public void emptyListTransferShouldNotTriggerSchemaChange() {
> final BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
> final Pointer trigger1 = new Pointer<>();
> final Pointer trigger2 = new Pointer<>();
> final ListVector v1 = new ListVector("v1", allocator,
> FieldType.nullable(ArrowType.Null.INSTANCE),
> newTriggerCallback(trigger1));
> final ListVector v2 = new ListVector("v2", allocator,
> FieldType.nullable(ArrowType.Null.INSTANCE),
> newTriggerCallback(trigger2));
> v1.makeTransferPair(v2).transfer();
> assertFalse(trigger1.value);
> assertFalse(trigger2.value); // fails
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1163) [Plasma] Java client for Plasma

2017-06-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068415#comment-16068415
 ] 

Wes McKinney commented on ARROW-1163:
-

Along with the 0.5.0 release, we should have a blog post that explains how the 
object store works and call for contributions, which might help with recruiting 
some Java developers to get involved

> [Plasma] Java client for Plasma
> ---
>
> Key: ARROW-1163
> URL: https://issues.apache.org/jira/browse/ARROW-1163
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Philipp Moritz
>
> We should start thinking about how a Java client for plasma would look like. 
> Given the focus of arrow to support Python, C++ and Java really well, it is 
> the next important target after Python and C++.
> My preliminary thoughts on it are the following ones: We can either go with 
> JNI and wrap the C++ client or (in my opinion preferable) write a pure Java 
> client. It would communicate with the Plasma store via Java flatbuffers over 
> sockets.
> It seems that the only thing blocking a pure Java client at the moment is the 
> way we ship file descriptors for the memory mapped files between store and 
> client (see the file fling.cc in the Plasma repo). We would need to get rid 
> of that because there is no pure Java API that allows transferring file 
> descriptors over a process boundary. So the way to transfer memory mapped 
> files over process boundaries then is probably to use the file system and 
> keep the memory mapped files in the file system instead of unlinking them 
> immediately (as we do at the moment), so they can be opened by the client 
> process via their path.
> The challenge in this case is how to clean the files up and make sure they 
> are not lying around if the plasma store crashes. One option is to store the 
> plasma store PID with the file (i.e. as part of the file name) and let the 
> plasma store clean them up the next time it is started); maybe there is OS 
> level support for temporary files we can reuse.
> I probably won't get to this for a while, so if anybody needs this or has 
> free cycles, they should feel free to chime in. Also opinions on the design 
> are appreciated!
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1164) C++: Templated functions need ARROW_EXPORT instead of ARROW_TEMPLATE_EXPORT

2017-06-29 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-1164:
--

 Summary: C++: Templated functions need ARROW_EXPORT instead of 
ARROW_TEMPLATE_EXPORT
 Key: ARROW-1164
 URL: https://issues.apache.org/jira/browse/ARROW-1164
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.5.0


This only breaks on older Unix linkers. Weird things happening there.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)