[jira] [Commented] (ARROW-3754) [Packaging] Zstd configure error on linux package builds

2018-11-11 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683287#comment-16683287
 ] 

Kouhei Sutou commented on ARROW-3754:
-

We need to add {{libstd.so}} support to use {{libzstd-dev}}.

> [Packaging] Zstd configure error on linux package builds
> 
>
> Key: ARROW-3754
> URL: https://issues.apache.org/jira/browse/ARROW-3754
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.12.0
>
>
> Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759
> Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805
> Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811
> Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727
> Perhaps this commit is related: 
> https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f
> cc [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-3754) [Packaging] Zstd configure error on linux package builds

2018-11-11 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683287#comment-16683287
 ] 

Kouhei Sutou edited comment on ARROW-3754 at 11/12/18 6:33 AM:
---

We need to add {{libstd.so}} to support {{libzstd-dev}}.


was (Author: kou):
We need to add {{libstd.so}} support to use {{libzstd-dev}}.

> [Packaging] Zstd configure error on linux package builds
> 
>
> Key: ARROW-3754
> URL: https://issues.apache.org/jira/browse/ARROW-3754
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.12.0
>
>
> Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759
> Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805
> Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811
> Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727
> Perhaps this commit is related: 
> https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f
> cc [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3754) [Packaging] Zstd configure error on linux package builds

2018-11-11 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683238#comment-16683238
 ] 

Kouhei Sutou commented on ARROW-3754:
-

CMake on Ubuntu Xeniel is 3.5.1 but CMake 3.5.1 doesn't support 
{{SOURCE_SUBDIR}} of {{ExternaLProject_Add}}.

BTW, we should use {{libzstd-dev}} instead of vendored Zstandard.

> [Packaging] Zstd configure error on linux package builds
> 
>
> Key: ARROW-3754
> URL: https://issues.apache.org/jira/browse/ARROW-3754
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.12.0
>
>
> Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759
> Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805
> Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811
> Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727
> Perhaps this commit is related: 
> https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f
> cc [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3754) [Packaging] Zstd configure error on linux package builds

2018-11-11 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3754:
---

Assignee: Kouhei Sutou

> [Packaging] Zstd configure error on linux package builds
> 
>
> Key: ARROW-3754
> URL: https://issues.apache.org/jira/browse/ARROW-3754
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.12.0
>
>
> Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759
> Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805
> Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811
> Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727
> Perhaps this commit is related: 
> https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f
> cc [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3762) [C++] Arrow table reads error when overflowing capacity of BinaryArray

2018-11-11 Thread Jason Kiley (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683132#comment-16683132
 ] 

Jason Kiley commented on ARROW-3762:


No worries; I know there's a ton going on. When I ran into it again yesterday, 
I looked up the issues and noticed that there were some differences in what 
folks were reporting (e.g. categoricals), so I thought it might help to point 
out that I see it without them (and with a little context about when I do and 
what the data looks like). Sorry if I'm the motivation for your tweet, but my 
intent was to add information.

> [C++] Arrow table reads error when overflowing capacity of BinaryArray
> --
>
> Key: ARROW-3762
> URL: https://issues.apache.org/jira/browse/ARROW-3762
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Chris Ellison
>Priority: Major
> Fix For: 0.12.0
>
>
> When reading a parquet file with binary data > 2 GiB, we get an ArrowIOError 
> due to it not creating chunked arrays. Reading each row group individually 
> and then concatenating the tables works, however.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> x = pa.array(list('1' * 2**30))
> demo = 'demo.parquet'
> def scenario():
> t = pa.Table.from_arrays([x], ['x'])
> writer = pq.ParquetWriter(demo, t.schema)
> for i in range(2):
> writer.write_table(t)
> writer.close()
> pf = pq.ParquetFile(demo)
> # pyarrow.lib.ArrowIOError: Arrow error: Invalid: BinaryArray cannot 
> contain more than 2147483646 bytes, have 2147483647
> t2 = pf.read()
> # Works, but note, there are 32 row groups, not 2 as suggested by:
> # 
> https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing
> tables = [pf.read_row_group(i) for i in range(pf.num_row_groups)]
> t3 = pa.concat_tables(tables)
> scenario()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3597) [Gandiva] gandiva should integrate with ADD_ARROW_TEST for tests

2018-11-11 Thread Pindikura Ravindra (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pindikura Ravindra reassigned ARROW-3597:
-

Assignee: Pindikura Ravindra

> [Gandiva] gandiva should integrate with ADD_ARROW_TEST for tests
> 
>
> Key: ARROW-3597
> URL: https://issues.apache.org/jira/browse/ARROW-3597
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3765) Gandiva segfault when using int64 recordbatch as its input

2018-11-11 Thread Siyuan Zhuang (JIRA)
Siyuan Zhuang created ARROW-3765:


 Summary: Gandiva segfault when using int64 recordbatch as its input
 Key: ARROW-3765
 URL: https://issues.apache.org/jira/browse/ARROW-3765
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Gandiva
Reporter: Siyuan Zhuang


This is because the `validity buffer` could be `None`:
{code}
>>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10)))
>>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
[None, ]
>>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0)
>>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
[, ]{code}
But Gandiva has not implemented it yet, thus accessing a nullptr:
{code}
void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const 
arrow::ArrayData& array_data, EvalBatch* eval_batch) { 
int buffer_idx = 0;
// TODO:  
// - validity is optional 
uint8_t* validity_buf = 
const_cast(array_data.buffers[buffer_idx]->data());
eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
++buffer_idx;
{code}
 

Reproduce code:
{code:java}
frame_data = np.random.randint(0, 100, size=(2**22, 10))
table = pa.Table.from_pandas(df)
filt = ...  # Create any gandiva filter
r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # 
segfault{code}
 Backtrace:
{code:java}
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=1, address=0x10)
 * frame #0: 0x0001060184fc 
libarrow.12.dylib`arrow::Buffer::data(this=0x) const at 
buffer.h:162
 frame #1: 0x000106fbed78 
libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8,
 desc=0x00010101e138, array_data=0x00010061f8e8, 
eval_batch=0x000100796848) at annotator.cc:65
 frame #2: 0x000106fbf4ed 
libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8,
 record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94
 frame #3: 0x0001071449b7 
libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, 
record_batch=0x0001007a45b8, output_vector=size=1) at llvm_generator.cc:102
 frame #4: 0x000107059a4f 
libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, 
batch=0x0001007a45b8, 
out_selection=std::__1::shared_ptr::element_type @ 
0x0001007a43e8 strong=2 weak=1) at filter.cc:106
 frame #5: 0x00010948e002 
gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*,
 _object*, _object*) + 1986
 frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475
 frame #7: 0x0001001d28ca Python`call_function + 602
 frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
 frame #9: 0x0001001d3cf9 Python`fast_function + 569
 frame #10: 0x0001001d2899 Python`call_function + 553
 frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
 frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902
 frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48
 frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174
 frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277
 frame #16: 0x00010021ef46 Python`Py_Main + 3558
 frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248
 frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3765) [Gandiva] Segfault when validity bitmap has not been allocated

2018-11-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683047#comment-16683047
 ] 

Wes McKinney commented on ARROW-3765:
-

Yes, in a number of places we don't allocate a validity bitmap for data without 
nulls. For large datasets, it would be wasteful to have to create a bitmap with 
all 1's

> [Gandiva] Segfault when validity bitmap has not been allocated
> --
>
> Key: ARROW-3765
> URL: https://issues.apache.org/jira/browse/ARROW-3765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Siyuan Zhuang
>Priority: Major
>
> This is because the `validity buffer` could be `None`:
> {code}
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10)))
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [None, ]
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0)
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [,  0x11a2b3228>]{code}
> But Gandiva has not implemented it yet, thus accessing a nullptr:
> {code}
> void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const 
> arrow::ArrayData& array_data, EvalBatch* eval_batch) { 
> int buffer_idx = 0;
> // TODO:  
> // - validity is optional 
> uint8_t* validity_buf = 
> const_cast(array_data.buffers[buffer_idx]->data());
> eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
> ++buffer_idx;
> {code}
>  
> Reproduce code:
> {code:java}
> frame_data = np.random.randint(0, 100, size=(2**22, 10))
> table = pa.Table.from_pandas(df)
> filt = ...  # Create any gandiva filter
> r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # 
> segfault{code}
>  Backtrace:
> {code:java}
> * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>  * frame #0: 0x0001060184fc 
> libarrow.12.dylib`arrow::Buffer::data(this=0x) const at 
> buffer.h:162
>  frame #1: 0x000106fbed78 
> libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8,
>  desc=0x00010101e138, array_data=0x00010061f8e8, 
> eval_batch=0x000100796848) at annotator.cc:65
>  frame #2: 0x000106fbf4ed 
> libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8,
>  record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94
>  frame #3: 0x0001071449b7 
> libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, 
> record_batch=0x0001007a45b8, output_vector=size=1) at 
> llvm_generator.cc:102
>  frame #4: 0x000107059a4f 
> libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, 
> batch=0x0001007a45b8, 
> out_selection=std::__1::shared_ptr::element_type @ 
> 0x0001007a43e8 strong=2 weak=1) at filter.cc:106
>  frame #5: 0x00010948e002 
> gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*,
>  _object*, _object*) + 1986
>  frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475
>  frame #7: 0x0001001d28ca Python`call_function + 602
>  frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #9: 0x0001001d3cf9 Python`fast_function + 569
>  frame #10: 0x0001001d2899 Python`call_function + 553
>  frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902
>  frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48
>  frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174
>  frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277
>  frame #16: 0x00010021ef46 Python`Py_Main + 3558
>  frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248
>  frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3765) [Gandiva] Segfault when validity bitmap has not been allocated

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3765:

Summary: [Gandiva] Segfault when validity bitmap has not been allocated  
(was: Gandiva segfault when using int64 recordbatch as its input)

> [Gandiva] Segfault when validity bitmap has not been allocated
> --
>
> Key: ARROW-3765
> URL: https://issues.apache.org/jira/browse/ARROW-3765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Siyuan Zhuang
>Priority: Major
>
> This is because the `validity buffer` could be `None`:
> {code}
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10)))
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [None, ]
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0)
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [,  0x11a2b3228>]{code}
> But Gandiva has not implemented it yet, thus accessing a nullptr:
> {code}
> void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const 
> arrow::ArrayData& array_data, EvalBatch* eval_batch) { 
> int buffer_idx = 0;
> // TODO:  
> // - validity is optional 
> uint8_t* validity_buf = 
> const_cast(array_data.buffers[buffer_idx]->data());
> eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
> ++buffer_idx;
> {code}
>  
> Reproduce code:
> {code:java}
> frame_data = np.random.randint(0, 100, size=(2**22, 10))
> table = pa.Table.from_pandas(df)
> filt = ...  # Create any gandiva filter
> r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # 
> segfault{code}
>  Backtrace:
> {code:java}
> * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>  * frame #0: 0x0001060184fc 
> libarrow.12.dylib`arrow::Buffer::data(this=0x) const at 
> buffer.h:162
>  frame #1: 0x000106fbed78 
> libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8,
>  desc=0x00010101e138, array_data=0x00010061f8e8, 
> eval_batch=0x000100796848) at annotator.cc:65
>  frame #2: 0x000106fbf4ed 
> libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8,
>  record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94
>  frame #3: 0x0001071449b7 
> libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, 
> record_batch=0x0001007a45b8, output_vector=size=1) at 
> llvm_generator.cc:102
>  frame #4: 0x000107059a4f 
> libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, 
> batch=0x0001007a45b8, 
> out_selection=std::__1::shared_ptr::element_type @ 
> 0x0001007a43e8 strong=2 weak=1) at filter.cc:106
>  frame #5: 0x00010948e002 
> gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*,
>  _object*, _object*) + 1986
>  frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475
>  frame #7: 0x0001001d28ca Python`call_function + 602
>  frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #9: 0x0001001d3cf9 Python`fast_function + 569
>  frame #10: 0x0001001d2899 Python`call_function + 553
>  frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902
>  frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48
>  frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174
>  frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277
>  frame #16: 0x00010021ef46 Python`Py_Main + 3558
>  frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248
>  frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3764) [C++] Port Python "ParquetDataset" business logic to C++

2018-11-11 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3764:
---

 Summary: [C++] Port Python "ParquetDataset" business logic to C++
 Key: ARROW-3764
 URL: https://issues.apache.org/jira/browse/ARROW-3764
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


Along with defining appropriate abstractions for dealing with generic 
filesystems in C++, we should implement the machinery for reading multiple 
Parquet files in C++ so that it can reused in GLib, R, and Ruby. Otherwise 
these languages will have to reimplement things, and this would surely result 
in inconsistent features, bugs in some implementations but not others



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3763) [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder

2018-11-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683036#comment-16683036
 ] 

Wes McKinney commented on ARROW-3763:
-

I moved this JIRA here from Apache Parquet as it's more Arrow-related

> [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly 
> into arrow::BinaryBuilder
> ---
>
> Key: ARROW-3763
> URL: https://issues.apache.org/jira/browse/ARROW-3763
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> As a follow up to PARQUET-820. This may yield some performance benefits. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3763) [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3763:

Fix Version/s: 0.13.0

> [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly 
> into arrow::BinaryBuilder
> ---
>
> Key: ARROW-3763
> URL: https://issues.apache.org/jira/browse/ARROW-3763
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> As a follow up to PARQUET-820. This may yield some performance benefits. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3763) [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3763:

Summary: [C++] Write Parquet ByteArray / FixedLenByteArray reader batches 
directly into arrow::BinaryBuilder  (was: [C++] Write ByteArray / 
FixedLenByteArray reader batches directly into arrow::BinaryBuilder)

> [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly 
> into arrow::BinaryBuilder
> ---
>
> Key: ARROW-3763
> URL: https://issues.apache.org/jira/browse/ARROW-3763
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> As a follow up to PARQUET-820. This may yield some performance benefits. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Moved] (ARROW-3763) [C++] Write ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney moved PARQUET-832 to ARROW-3763:
-

Component/s: (was: parquet-cpp)
 C++
   Workflow: jira  (was: patch-available, re-open possible)
Key: ARROW-3763  (was: PARQUET-832)
Project: Apache Arrow  (was: Parquet)

> [C++] Write ByteArray / FixedLenByteArray reader batches directly into 
> arrow::BinaryBuilder
> ---
>
> Key: ARROW-3763
> URL: https://issues.apache.org/jira/browse/ARROW-3763
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> As a follow up to PARQUET-820. This may yield some performance benefits. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3763) [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3763:

Labels: parquet  (was: )

> [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly 
> into arrow::BinaryBuilder
> ---
>
> Key: ARROW-3763
> URL: https://issues.apache.org/jira/browse/ARROW-3763
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> As a follow up to PARQUET-820. This may yield some performance benefits. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3762) [C++] Arrow table reads error when overflowing capacity of BinaryArray

2018-11-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683024#comment-16683024
 ] 

Wes McKinney commented on ARROW-3762:
-

I just moved this issue back to Apache Arrow since it's more Arrow-related and 
the codebases are now one

> [C++] Arrow table reads error when overflowing capacity of BinaryArray
> --
>
> Key: ARROW-3762
> URL: https://issues.apache.org/jira/browse/ARROW-3762
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Chris Ellison
>Priority: Major
> Fix For: 0.12.0
>
>
> When reading a parquet file with binary data > 2 GiB, we get an ArrowIOError 
> due to it not creating chunked arrays. Reading each row group individually 
> and then concatenating the tables works, however.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> x = pa.array(list('1' * 2**30))
> demo = 'demo.parquet'
> def scenario():
> t = pa.Table.from_arrays([x], ['x'])
> writer = pq.ParquetWriter(demo, t.schema)
> for i in range(2):
> writer.write_table(t)
> writer.close()
> pf = pq.ParquetFile(demo)
> # pyarrow.lib.ArrowIOError: Arrow error: Invalid: BinaryArray cannot 
> contain more than 2147483646 bytes, have 2147483647
> t2 = pf.read()
> # Works, but note, there are 32 row groups, not 2 as suggested by:
> # 
> https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing
> tables = [pf.read_row_group(i) for i in range(pf.num_row_groups)]
> t3 = pa.concat_tables(tables)
> scenario()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Moved] (ARROW-3762) [C++] Arrow table reads error when overflowing capacity of BinaryArray

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney moved PARQUET-1239 to ARROW-3762:
--

Fix Version/s: (was: cpp-1.6.0)
   0.12.0
Affects Version/s: (was: cpp-1.4.0)
  Component/s: (was: parquet-cpp)
 Workflow: jira  (was: patch-available, re-open possible)
  Key: ARROW-3762  (was: PARQUET-1239)
  Project: Apache Arrow  (was: Parquet)

> [C++] Arrow table reads error when overflowing capacity of BinaryArray
> --
>
> Key: ARROW-3762
> URL: https://issues.apache.org/jira/browse/ARROW-3762
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Chris Ellison
>Priority: Major
> Fix For: 0.12.0
>
>
> When reading a parquet file with binary data > 2 GiB, we get an ArrowIOError 
> due to it not creating chunked arrays. Reading each row group individually 
> and then concatenating the tables works, however.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> x = pa.array(list('1' * 2**30))
> demo = 'demo.parquet'
> def scenario():
> t = pa.Table.from_arrays([x], ['x'])
> writer = pq.ParquetWriter(demo, t.schema)
> for i in range(2):
> writer.write_table(t)
> writer.close()
> pf = pq.ParquetFile(demo)
> # pyarrow.lib.ArrowIOError: Arrow error: Invalid: BinaryArray cannot 
> contain more than 2147483646 bytes, have 2147483647
> t2 = pf.read()
> # Works, but note, there are 32 row groups, not 2 as suggested by:
> # 
> https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing
> tables = [pf.read_row_group(i) for i in range(pf.num_row_groups)]
> t3 = pa.concat_tables(tables)
> scenario()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva

2018-11-11 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-3746:
-

Assignee: Philipp Moritz

> [Gandiva] [Python] Make it possible to list all functions registered with 
> Gandiva
> -
>
> Key: ARROW-3746
> URL: https://issues.apache.org/jira/browse/ARROW-3746
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This will also be useful for documentation purposes (right now it is not very 
> easy to get a list of all the functions that are registered).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva

2018-11-11 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3746.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 2933
[https://github.com/apache/arrow/pull/2933]

> [Gandiva] [Python] Make it possible to list all functions registered with 
> Gandiva
> -
>
> Key: ARROW-3746
> URL: https://issues.apache.org/jira/browse/ARROW-3746
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This will also be useful for documentation purposes (right now it is not very 
> easy to get a list of all the functions that are registered).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3673) [Go] implement Time64 array

2018-11-11 Thread Alexandre Crayssac (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683009#comment-16683009
 ] 

Alexandre Crayssac commented on ARROW-3673:
---

Submitted PR: https://github.com/apache/arrow/pull/2944

> [Go] implement Time64 array
> ---
>
> Key: ARROW-3673
> URL: https://issues.apache.org/jira/browse/ARROW-3673
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3672) [Go] implement Time32 array

2018-11-11 Thread Alexandre Crayssac (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683010#comment-16683010
 ] 

Alexandre Crayssac commented on ARROW-3672:
---

Submitted PR: https://github.com/apache/arrow/pull/2944

> [Go] implement Time32 array
> ---
>
> Key: ARROW-3672
> URL: https://issues.apache.org/jira/browse/ARROW-3672
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3672) [Go] implement Time32 array

2018-11-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3672:
--
Labels: pull-request-available  (was: )

> [Go] implement Time32 array
> ---
>
> Key: ARROW-3672
> URL: https://issues.apache.org/jira/browse/ARROW-3672
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3759) [R] Run test suite on Windows in Appveyor

2018-11-11 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3759:
---

 Summary: [R] Run test suite on Windows in Appveyor
 Key: ARROW-3759
 URL: https://issues.apache.org/jira/browse/ARROW-3759
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3761) [R] Bindings for CompressedInputStream, CompressedOutputStream

2018-11-11 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3761:
---

 Summary: [R] Bindings for CompressedInputStream, 
CompressedOutputStream
 Key: ARROW-3761
 URL: https://issues.apache.org/jira/browse/ARROW-3761
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Wes McKinney


See also {{pyarrow.input_stream/output_stream}} which can automatically 
construct compressed reader/writer objects



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3310) [R] Create wrapper classes for various Arrow IO interfaces

2018-11-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682993#comment-16682993
 ] 

Wes McKinney commented on ARROW-3310:
-

[~romainfrancois] this seems partially complete. What remains? Also, check out 
the {{pyarrow.input_stream}} and {{output_stream}} methods that were just added 
to the project to make things simpler for users. R should probably implement 
the same thing

https://github.com/apache/arrow/blob/master/python/pyarrow/io.pxi#L1437

> [R] Create wrapper classes for various Arrow IO interfaces
> --
>
> Key: ARROW-3310
> URL: https://issues.apache.org/jira/browse/ARROW-3310
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> * InputStream
> * OutputStream
> * RandomAccessFile
> * WritableFile
> * BufferOutputStream
> * BufferReader
> * OSFile
> * MemoryMappedFile
> * HdfsFile
> and so on. depends on ARROW-3306



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-912) [Python] Account for multiarch systems in development.rst

2018-11-11 Thread Krisztian Szucs (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682996#comment-16682996
 ] 

Krisztian Szucs commented on ARROW-912:
---

[~wesmckinn] Simply just mention it in the developer documentation? 


> [Python] Account for multiarch systems in development.rst
> -
>
> Key: ARROW-912
> URL: https://issues.apache.org/jira/browse/ARROW-912
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> Some systems will install libraries in lib64



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3760) [R] Support Arrow CSV reader

2018-11-11 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3760:
---

 Summary: [R] Support Arrow CSV reader 
 Key: ARROW-3760
 URL: https://issues.apache.org/jira/browse/ARROW-3760
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Wes McKinney


This should compose with any of the other file interfaces



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3758) [R] Build R library on Windows, document build instructions for Windows developers

2018-11-11 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3758:
---

 Summary: [R] Build R library on Windows, document build 
instructions for Windows developers
 Key: ARROW-3758
 URL: https://issues.apache.org/jira/browse/ARROW-3758
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3757) [R] R bindings for Flight RPC client

2018-11-11 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3757:
---

 Summary: [R] R bindings for Flight RPC client
 Key: ARROW-3757
 URL: https://issues.apache.org/jira/browse/ARROW-3757
 Project: Apache Arrow
  Issue Type: New Feature
  Components: FlightRPC, R
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3316) [R] Multi-threaded conversion from R data.frame to Arrow table / record batch

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3316:

Description: This is the companion issue to ARROW-2968, like 
{{pyarrow.Table.from_pandas}}

> [R] Multi-threaded conversion from R data.frame to Arrow table / record batch
> -
>
> Key: ARROW-3316
> URL: https://issues.apache.org/jira/browse/ARROW-3316
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> This is the companion issue to ARROW-2968, like {{pyarrow.Table.from_pandas}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2968) [R] Multi-threaded conversion from Arrow table to R data.frame

2018-11-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682992#comment-16682992
 ] 

Wes McKinney commented on ARROW-2968:
-

As part of this, should also expose the global thread pool options in the R API

> [R] Multi-threaded conversion from Arrow table to R data.frame
> --
>
> Key: ARROW-2968
> URL: https://issues.apache.org/jira/browse/ARROW-2968
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
>
> like {{pyarrow.Table.to_pandas}} with {{use_threads=True}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1445) [Python] Segfault when using libhdfs3 in pyarrow using latest API

2018-11-11 Thread Krisztian Szucs (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682991#comment-16682991
 ] 

Krisztian Szucs commented on ARROW-1445:


[~wesmckinn] Nope, We have a docker setup for testing hdfs integration with 
libhdfs3==2.2.31

> [Python] Segfault when using libhdfs3 in pyarrow using latest API
> -
>
> Key: ARROW-1445
> URL: https://issues.apache.org/jira/browse/ARROW-1445
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.6.0
>Reporter: James Porritt
>Priority: Major
>
> I'm encoutering a segfault when using libhdfs3 with pyarrow.
> My script is:
> {code}
> import pyarrow
> def main():
> hdfs = pyarrow.hdfs.connect("", , "", 
> driver='libhdfs')
> print hdfs.ls('')
> hdfs3a = pyarrow.HdfsClient("", , "", 
> driver='libhdfs3')
> print hdfs3a.ls('')
> hdfs3b = pyarrow.hdfs.connect("", , "", 
> driver='libhdfs3')
> print hdfs3b.ls('')
> main()
> {code}
> The first two hdfs connections yield the correct list. The third yields:
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f69c0c8b57f, pid=88070, tid=140092200666880
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 
> 1.8.0_60-b27)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libc.so.6+0x13357f]  __strlen_sse42+0xf
> {noformat}
> It dumps an error report file too.
> I created my conda environment with:
> {noformat}
> conda create -n parquet
> source activate parquet
> conda install pyarrow libhdfs3 -c conda-forge
> {noformat}
> The packages used are:
> {noformat}
> arrow-cpp 0.6.0   np113py27_1conda-forge
> boost-cpp 1.64.01conda-forge
> bzip2 1.0.6 1conda-forge
> ca-certificates   2017.7.27.1   0conda-forge
> certifi   2017.7.27.1  py27_0conda-forge
> curl  7.54.10conda-forge
> icu   58.1  1conda-forge
> krb5  1.14.20conda-forge
> libgcrypt 1.8.0 0conda-forge
> libgpg-error  1.27  0conda-forge
> libgsasl  1.8.0 1conda-forge
> libhdfs3  2.3   0conda-forge
> libiconv  1.14  4conda-forge
> libntlm   1.4   0conda-forge
> libssh2   1.8.0 1conda-forge
> libuuid   1.0.3 1conda-forge
> libxml2   2.9.4 4conda-forge
> mkl   2017.0.3  0  
> ncurses   5.9  10conda-forge
> numpy 1.13.1   py27_0  
> openssl   1.0.2l0conda-forge
> pandas0.20.3   py27_1conda-forge
> parquet-cpp   1.3.0.pre 1conda-forge
> pip   9.0.1py27_0conda-forge
> protobuf  3.3.2py27_0conda-forge
> pyarrow   0.6.0   np113py27_1conda-forge
> python2.7.131conda-forge
> python-dateutil   2.6.1py27_0conda-forge
> pytz  2017.2   py27_0conda-forge
> readline  6.2   0conda-forge
> setuptools36.2.2   py27_0conda-forge
> six   1.10.0   py27_1conda-forge
> sqlite3.13.01conda-forge
> tk8.5.192conda-forge
> wheel 0.29.0   py27_0conda-forge
> xz5.2.3 0conda-forge
> zlib  1.2.110conda-forge
> {noformat}
> I've set my ARROW_LIBHDFS_DIR to point at the location of the libhdfs3.so 
> file.
> I've populated my CLASSPATH as per the documentation.
> Please advise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3366) [R] Dockerfile for docker-compose setup

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3366:

Component/s: R

> [R] Dockerfile for docker-compose setup
> ---
>
> Key: ARROW-3366
> URL: https://issues.apache.org/jira/browse/ARROW-3366
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Introduced by https://github.com/apache/arrow/pull/2572



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3750) [R] Pass various wrapped Arrow objects created in Python into R with zero copy via reticulate

2018-11-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682990#comment-16682990
 ] 

Wes McKinney commented on ARROW-3750:
-

Methods would need to be added to the Cython extension types to give the memory 
address of the smart pointer object they contain

> [R] Pass various wrapped Arrow objects created in Python into R with zero 
> copy via reticulate
> -
>
> Key: ARROW-3750
> URL: https://issues.apache.org/jira/browse/ARROW-3750
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
>
> A user may wish to use some functionality available only in pyarrow using 
> reticulate; it would be useful to be able to construct an R wrapper object to 
> the C++ object inside the corresponding Python type, e.g. {{pyarrow.Table}}. 
> This probably will require some new functions to return the memory address of 
> the shared_ptr/unique_ptr inside the Cython types so that a function on the R 
> side can copy the smart pointer and create the corresponding R wrapper type
> cc [~pitrou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2018-11-11 Thread Krisztian Szucs (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682989#comment-16682989
 ] 

Krisztian Szucs commented on ARROW-1581:


What is the desired user flow here? Listing nightly wheel artifacts on a web 
page? And having a latest link to download the most recent wheel?

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2018-11-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682983#comment-16682983
 ] 

Wes McKinney commented on ARROW-1581:
-

Reopening until we figure out the developer workflow for using the nightly 
builds, and document it on https://cwiki.apache.org/confluence/display/ARROW

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reopened ARROW-1581:
-

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2018-11-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682978#comment-16682978
 ] 

Wes McKinney commented on ARROW-1581:
-

So I have to object a little bit to resolving this issue as there's no 
visibility into these builds. If someone wanted to start using them, how would 
they do that? I know that [~robertnishihara] [~pcmoritz] and others would make 
use of these for development

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3488) [Packaging] Separate crossbow task definition files for packaging and tests

2018-11-11 Thread Krisztian Szucs (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682941#comment-16682941
 ] 

Krisztian Szucs commented on ARROW-3488:


Resolved via https://github.com/apache/arrow/pull/2755

> [Packaging] Separate crossbow task definition files for packaging and tests
> ---
>
> Key: ARROW-3488
> URL: https://issues.apache.org/jira/browse/ARROW-3488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>
> In the first iteration tests.yml should contain just a single task:
> hdfs-integration =>  docker-compose hdfs-integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2018-11-11 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-1581.

Resolution: Fixed

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3488) [Packaging] Separate crossbow task definition files for packaging and tests

2018-11-11 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-3488.

Resolution: Fixed

> [Packaging] Separate crossbow task definition files for packaging and tests
> ---
>
> Key: ARROW-3488
> URL: https://issues.apache.org/jira/browse/ARROW-3488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>
> In the first iteration tests.yml should contain just a single task:
> hdfs-integration =>  docker-compose hdfs-integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (ARROW-3488) [Packaging] Separate crossbow task definition files for packaging and tests

2018-11-11 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reopened ARROW-3488:


> [Packaging] Separate crossbow task definition files for packaging and tests
> ---
>
> Key: ARROW-3488
> URL: https://issues.apache.org/jira/browse/ARROW-3488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>
> In the first iteration tests.yml should contain just a single task:
> hdfs-integration =>  docker-compose hdfs-integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3488) [Packaging] Separate crossbow task definition files for packaging and tests

2018-11-11 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-3488.

Resolution: Fixed

> [Packaging] Separate crossbow task definition files for packaging and tests
> ---
>
> Key: ARROW-3488
> URL: https://issues.apache.org/jira/browse/ARROW-3488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>
> In the first iteration tests.yml should contain just a single task:
> hdfs-integration =>  docker-compose hdfs-integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3747) [C++] Flip order of data members in arrow::Decimal128

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3747.
-
Resolution: Fixed

Issue resolved by pull request 2940
[https://github.com/apache/arrow/pull/2940]

> [C++] Flip order of data members in arrow::Decimal128
> -
>
> Key: ARROW-3747
> URL: https://issues.apache.org/jira/browse/ARROW-3747
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As discussed in https://github.com/apache/arrow/pull/2845, this will enable a 
> data buffer to be correctly interpreted as {{Decimal128**}}, so memcpy and 
> other operations will work



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3710) [CI/Python] Run nightly tests against pandas master

2018-11-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3710.
-
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 2943
[https://github.com/apache/arrow/pull/2943]

> [CI/Python] Run nightly tests against pandas master
> ---
>
> Key: ARROW-3710
> URL: https://issues.apache.org/jira/browse/ARROW-3710
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Follow-up of [https://github.com/apache/arrow/pull/2758] and 
> https://github.com/apache/arrow/pull/2755



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3710) [CI/Python] Run nightly tests against pandas master

2018-11-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3710:
--
Labels: pull-request-available  (was: )

> [CI/Python] Run nightly tests against pandas master
> ---
>
> Key: ARROW-3710
> URL: https://issues.apache.org/jira/browse/ARROW-3710
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Follow-up of [https://github.com/apache/arrow/pull/2758] and 
> https://github.com/apache/arrow/pull/2755



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3701) [Gandiva] Add support for decimal operations

2018-11-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3701:
--
Labels: pull-request-available  (was: )

> [Gandiva] Add support for decimal operations
> 
>
> Key: ARROW-3701
> URL: https://issues.apache.org/jira/browse/ARROW-3701
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
>
> To begin with, will add support for 128-bit decimals. There are two parts :
>  # llvm_generator needs to understand decimal types (value, precision, scale)
>  # code decimal operations : add/subtract/multiply/divide/mod/..
>  ** This will be c++ code that can be pre-compiled to emit IR code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3756) [CI/Docker/Java] Java tests are failing in docker-compose setup

2018-11-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3756:
--
Labels: docker pull-request-available  (was: docker)

> [CI/Docker/Java] Java tests are failing in docker-compose setup
> ---
>
> Key: ARROW-3756
> URL: https://issues.apache.org/jira/browse/ARROW-3756
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: docker, pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)