[jira] [Commented] (ARROW-3754) [Packaging] Zstd configure error on linux package builds
[ https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683287#comment-16683287 ] Kouhei Sutou commented on ARROW-3754: - We need to add {{libstd.so}} support to use {{libzstd-dev}}. > [Packaging] Zstd configure error on linux package builds > > > Key: ARROW-3754 > URL: https://issues.apache.org/jira/browse/ARROW-3754 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.12.0 > > > Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759 > Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805 > Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811 > Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727 > Perhaps this commit is related: > https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f > cc [~kou] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-3754) [Packaging] Zstd configure error on linux package builds
[ https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683287#comment-16683287 ] Kouhei Sutou edited comment on ARROW-3754 at 11/12/18 6:33 AM: --- We need to add {{libstd.so}} to support {{libzstd-dev}}. was (Author: kou): We need to add {{libstd.so}} support to use {{libzstd-dev}}. > [Packaging] Zstd configure error on linux package builds > > > Key: ARROW-3754 > URL: https://issues.apache.org/jira/browse/ARROW-3754 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.12.0 > > > Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759 > Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805 > Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811 > Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727 > Perhaps this commit is related: > https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f > cc [~kou] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3754) [Packaging] Zstd configure error on linux package builds
[ https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683238#comment-16683238 ] Kouhei Sutou commented on ARROW-3754: - CMake on Ubuntu Xeniel is 3.5.1 but CMake 3.5.1 doesn't support {{SOURCE_SUBDIR}} of {{ExternaLProject_Add}}. BTW, we should use {{libzstd-dev}} instead of vendored Zstandard. > [Packaging] Zstd configure error on linux package builds > > > Key: ARROW-3754 > URL: https://issues.apache.org/jira/browse/ARROW-3754 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.12.0 > > > Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759 > Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805 > Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811 > Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727 > Perhaps this commit is related: > https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f > cc [~kou] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3754) [Packaging] Zstd configure error on linux package builds
[ https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3754: --- Assignee: Kouhei Sutou > [Packaging] Zstd configure error on linux package builds > > > Key: ARROW-3754 > URL: https://issues.apache.org/jira/browse/ARROW-3754 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.12.0 > > > Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759 > Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805 > Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811 > Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727 > Perhaps this commit is related: > https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f > cc [~kou] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3762) [C++] Arrow table reads error when overflowing capacity of BinaryArray
[ https://issues.apache.org/jira/browse/ARROW-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683132#comment-16683132 ] Jason Kiley commented on ARROW-3762: No worries; I know there's a ton going on. When I ran into it again yesterday, I looked up the issues and noticed that there were some differences in what folks were reporting (e.g. categoricals), so I thought it might help to point out that I see it without them (and with a little context about when I do and what the data looks like). Sorry if I'm the motivation for your tweet, but my intent was to add information. > [C++] Arrow table reads error when overflowing capacity of BinaryArray > -- > > Key: ARROW-3762 > URL: https://issues.apache.org/jira/browse/ARROW-3762 > Project: Apache Arrow > Issue Type: Bug >Reporter: Chris Ellison >Priority: Major > Fix For: 0.12.0 > > > When reading a parquet file with binary data > 2 GiB, we get an ArrowIOError > due to it not creating chunked arrays. Reading each row group individually > and then concatenating the tables works, however. > > {code:java} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > x = pa.array(list('1' * 2**30)) > demo = 'demo.parquet' > def scenario(): > t = pa.Table.from_arrays([x], ['x']) > writer = pq.ParquetWriter(demo, t.schema) > for i in range(2): > writer.write_table(t) > writer.close() > pf = pq.ParquetFile(demo) > # pyarrow.lib.ArrowIOError: Arrow error: Invalid: BinaryArray cannot > contain more than 2147483646 bytes, have 2147483647 > t2 = pf.read() > # Works, but note, there are 32 row groups, not 2 as suggested by: > # > https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing > tables = [pf.read_row_group(i) for i in range(pf.num_row_groups)] > t3 = pa.concat_tables(tables) > scenario() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3597) [Gandiva] gandiva should integrate with ADD_ARROW_TEST for tests
[ https://issues.apache.org/jira/browse/ARROW-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pindikura Ravindra reassigned ARROW-3597: - Assignee: Pindikura Ravindra > [Gandiva] gandiva should integrate with ADD_ARROW_TEST for tests > > > Key: ARROW-3597 > URL: https://issues.apache.org/jira/browse/ARROW-3597 > Project: Apache Arrow > Issue Type: Task > Components: Gandiva >Reporter: Pindikura Ravindra >Assignee: Pindikura Ravindra >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3765) Gandiva segfault when using int64 recordbatch as its input
Siyuan Zhuang created ARROW-3765: Summary: Gandiva segfault when using int64 recordbatch as its input Key: ARROW-3765 URL: https://issues.apache.org/jira/browse/ARROW-3765 Project: Apache Arrow Issue Type: Bug Components: C++, Gandiva Reporter: Siyuan Zhuang This is because the `validity buffer` could be `None`: {code} >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))) >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers() [None, ] >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0) >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers() [, ]{code} But Gandiva has not implemented it yet, thus accessing a nullptr: {code} void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const arrow::ArrayData& array_data, EvalBatch* eval_batch) { int buffer_idx = 0; // TODO: // - validity is optional uint8_t* validity_buf = const_cast(array_data.buffers[buffer_idx]->data()); eval_batch->SetBuffer(desc.validity_idx(), validity_buf); ++buffer_idx; {code} Reproduce code: {code:java} frame_data = np.random.randint(0, 100, size=(2**22, 10)) table = pa.Table.from_pandas(df) filt = ... # Create any gandiva filter r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # segfault{code} Backtrace: {code:java} * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10) * frame #0: 0x0001060184fc libarrow.12.dylib`arrow::Buffer::data(this=0x) const at buffer.h:162 frame #1: 0x000106fbed78 libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8, desc=0x00010101e138, array_data=0x00010061f8e8, eval_batch=0x000100796848) at annotator.cc:65 frame #2: 0x000106fbf4ed libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8, record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94 frame #3: 0x0001071449b7 libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, record_batch=0x0001007a45b8, output_vector=size=1) at llvm_generator.cc:102 frame #4: 0x000107059a4f libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, batch=0x0001007a45b8, out_selection=std::__1::shared_ptr::element_type @ 0x0001007a43e8 strong=2 weak=1) at filter.cc:106 frame #5: 0x00010948e002 gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*, _object*, _object*) + 1986 frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475 frame #7: 0x0001001d28ca Python`call_function + 602 frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616 frame #9: 0x0001001d3cf9 Python`fast_function + 569 frame #10: 0x0001001d2899 Python`call_function + 553 frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616 frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902 frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48 frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174 frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277 frame #16: 0x00010021ef46 Python`Py_Main + 3558 frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248 frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3765) [Gandiva] Segfault when validity bitmap has not been allocated
[ https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683047#comment-16683047 ] Wes McKinney commented on ARROW-3765: - Yes, in a number of places we don't allocate a validity bitmap for data without nulls. For large datasets, it would be wasteful to have to create a bitmap with all 1's > [Gandiva] Segfault when validity bitmap has not been allocated > -- > > Key: ARROW-3765 > URL: https://issues.apache.org/jira/browse/ARROW-3765 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Gandiva >Reporter: Siyuan Zhuang >Priority: Major > > This is because the `validity buffer` could be `None`: > {code} > >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))) > >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers() > [None, ] > >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0) > >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers() > [, 0x11a2b3228>]{code} > But Gandiva has not implemented it yet, thus accessing a nullptr: > {code} > void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const > arrow::ArrayData& array_data, EvalBatch* eval_batch) { > int buffer_idx = 0; > // TODO: > // - validity is optional > uint8_t* validity_buf = > const_cast(array_data.buffers[buffer_idx]->data()); > eval_batch->SetBuffer(desc.validity_idx(), validity_buf); > ++buffer_idx; > {code} > > Reproduce code: > {code:java} > frame_data = np.random.randint(0, 100, size=(2**22, 10)) > table = pa.Table.from_pandas(df) > filt = ... # Create any gandiva filter > r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # > segfault{code} > Backtrace: > {code:java} > * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS > (code=1, address=0x10) > * frame #0: 0x0001060184fc > libarrow.12.dylib`arrow::Buffer::data(this=0x) const at > buffer.h:162 > frame #1: 0x000106fbed78 > libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8, > desc=0x00010101e138, array_data=0x00010061f8e8, > eval_batch=0x000100796848) at annotator.cc:65 > frame #2: 0x000106fbf4ed > libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8, > record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94 > frame #3: 0x0001071449b7 > libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, > record_batch=0x0001007a45b8, output_vector=size=1) at > llvm_generator.cc:102 > frame #4: 0x000107059a4f > libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, > batch=0x0001007a45b8, > out_selection=std::__1::shared_ptr::element_type @ > 0x0001007a43e8 strong=2 weak=1) at filter.cc:106 > frame #5: 0x00010948e002 > gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*, > _object*, _object*) + 1986 > frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475 > frame #7: 0x0001001d28ca Python`call_function + 602 > frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616 > frame #9: 0x0001001d3cf9 Python`fast_function + 569 > frame #10: 0x0001001d2899 Python`call_function + 553 > frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616 > frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902 > frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48 > frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174 > frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277 > frame #16: 0x00010021ef46 Python`Py_Main + 3558 > frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248 > frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3765) [Gandiva] Segfault when validity bitmap has not been allocated
[ https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3765: Summary: [Gandiva] Segfault when validity bitmap has not been allocated (was: Gandiva segfault when using int64 recordbatch as its input) > [Gandiva] Segfault when validity bitmap has not been allocated > -- > > Key: ARROW-3765 > URL: https://issues.apache.org/jira/browse/ARROW-3765 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Gandiva >Reporter: Siyuan Zhuang >Priority: Major > > This is because the `validity buffer` could be `None`: > {code} > >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))) > >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers() > [None, ] > >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0) > >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers() > [, 0x11a2b3228>]{code} > But Gandiva has not implemented it yet, thus accessing a nullptr: > {code} > void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const > arrow::ArrayData& array_data, EvalBatch* eval_batch) { > int buffer_idx = 0; > // TODO: > // - validity is optional > uint8_t* validity_buf = > const_cast(array_data.buffers[buffer_idx]->data()); > eval_batch->SetBuffer(desc.validity_idx(), validity_buf); > ++buffer_idx; > {code} > > Reproduce code: > {code:java} > frame_data = np.random.randint(0, 100, size=(2**22, 10)) > table = pa.Table.from_pandas(df) > filt = ... # Create any gandiva filter > r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # > segfault{code} > Backtrace: > {code:java} > * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS > (code=1, address=0x10) > * frame #0: 0x0001060184fc > libarrow.12.dylib`arrow::Buffer::data(this=0x) const at > buffer.h:162 > frame #1: 0x000106fbed78 > libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8, > desc=0x00010101e138, array_data=0x00010061f8e8, > eval_batch=0x000100796848) at annotator.cc:65 > frame #2: 0x000106fbf4ed > libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8, > record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94 > frame #3: 0x0001071449b7 > libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, > record_batch=0x0001007a45b8, output_vector=size=1) at > llvm_generator.cc:102 > frame #4: 0x000107059a4f > libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, > batch=0x0001007a45b8, > out_selection=std::__1::shared_ptr::element_type @ > 0x0001007a43e8 strong=2 weak=1) at filter.cc:106 > frame #5: 0x00010948e002 > gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*, > _object*, _object*) + 1986 > frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475 > frame #7: 0x0001001d28ca Python`call_function + 602 > frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616 > frame #9: 0x0001001d3cf9 Python`fast_function + 569 > frame #10: 0x0001001d2899 Python`call_function + 553 > frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616 > frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902 > frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48 > frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174 > frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277 > frame #16: 0x00010021ef46 Python`Py_Main + 3558 > frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248 > frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3764) [C++] Port Python "ParquetDataset" business logic to C++
Wes McKinney created ARROW-3764: --- Summary: [C++] Port Python "ParquetDataset" business logic to C++ Key: ARROW-3764 URL: https://issues.apache.org/jira/browse/ARROW-3764 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Along with defining appropriate abstractions for dealing with generic filesystems in C++, we should implement the machinery for reading multiple Parquet files in C++ so that it can reused in GLib, R, and Ruby. Otherwise these languages will have to reimplement things, and this would surely result in inconsistent features, bugs in some implementations but not others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3763) [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder
[ https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683036#comment-16683036 ] Wes McKinney commented on ARROW-3763: - I moved this JIRA here from Apache Parquet as it's more Arrow-related > [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly > into arrow::BinaryBuilder > --- > > Key: ARROW-3763 > URL: https://issues.apache.org/jira/browse/ARROW-3763 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > As a follow up to PARQUET-820. This may yield some performance benefits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3763) [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder
[ https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3763: Fix Version/s: 0.13.0 > [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly > into arrow::BinaryBuilder > --- > > Key: ARROW-3763 > URL: https://issues.apache.org/jira/browse/ARROW-3763 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > As a follow up to PARQUET-820. This may yield some performance benefits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3763) [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder
[ https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3763: Summary: [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder (was: [C++] Write ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder) > [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly > into arrow::BinaryBuilder > --- > > Key: ARROW-3763 > URL: https://issues.apache.org/jira/browse/ARROW-3763 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > As a follow up to PARQUET-820. This may yield some performance benefits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Moved] (ARROW-3763) [C++] Write ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder
[ https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney moved PARQUET-832 to ARROW-3763: - Component/s: (was: parquet-cpp) C++ Workflow: jira (was: patch-available, re-open possible) Key: ARROW-3763 (was: PARQUET-832) Project: Apache Arrow (was: Parquet) > [C++] Write ByteArray / FixedLenByteArray reader batches directly into > arrow::BinaryBuilder > --- > > Key: ARROW-3763 > URL: https://issues.apache.org/jira/browse/ARROW-3763 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > As a follow up to PARQUET-820. This may yield some performance benefits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3763) [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder
[ https://issues.apache.org/jira/browse/ARROW-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3763: Labels: parquet (was: ) > [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly > into arrow::BinaryBuilder > --- > > Key: ARROW-3763 > URL: https://issues.apache.org/jira/browse/ARROW-3763 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > As a follow up to PARQUET-820. This may yield some performance benefits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3762) [C++] Arrow table reads error when overflowing capacity of BinaryArray
[ https://issues.apache.org/jira/browse/ARROW-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683024#comment-16683024 ] Wes McKinney commented on ARROW-3762: - I just moved this issue back to Apache Arrow since it's more Arrow-related and the codebases are now one > [C++] Arrow table reads error when overflowing capacity of BinaryArray > -- > > Key: ARROW-3762 > URL: https://issues.apache.org/jira/browse/ARROW-3762 > Project: Apache Arrow > Issue Type: Bug >Reporter: Chris Ellison >Priority: Major > Fix For: 0.12.0 > > > When reading a parquet file with binary data > 2 GiB, we get an ArrowIOError > due to it not creating chunked arrays. Reading each row group individually > and then concatenating the tables works, however. > > {code:java} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > x = pa.array(list('1' * 2**30)) > demo = 'demo.parquet' > def scenario(): > t = pa.Table.from_arrays([x], ['x']) > writer = pq.ParquetWriter(demo, t.schema) > for i in range(2): > writer.write_table(t) > writer.close() > pf = pq.ParquetFile(demo) > # pyarrow.lib.ArrowIOError: Arrow error: Invalid: BinaryArray cannot > contain more than 2147483646 bytes, have 2147483647 > t2 = pf.read() > # Works, but note, there are 32 row groups, not 2 as suggested by: > # > https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing > tables = [pf.read_row_group(i) for i in range(pf.num_row_groups)] > t3 = pa.concat_tables(tables) > scenario() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Moved] (ARROW-3762) [C++] Arrow table reads error when overflowing capacity of BinaryArray
[ https://issues.apache.org/jira/browse/ARROW-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney moved PARQUET-1239 to ARROW-3762: -- Fix Version/s: (was: cpp-1.6.0) 0.12.0 Affects Version/s: (was: cpp-1.4.0) Component/s: (was: parquet-cpp) Workflow: jira (was: patch-available, re-open possible) Key: ARROW-3762 (was: PARQUET-1239) Project: Apache Arrow (was: Parquet) > [C++] Arrow table reads error when overflowing capacity of BinaryArray > -- > > Key: ARROW-3762 > URL: https://issues.apache.org/jira/browse/ARROW-3762 > Project: Apache Arrow > Issue Type: Bug >Reporter: Chris Ellison >Priority: Major > Fix For: 0.12.0 > > > When reading a parquet file with binary data > 2 GiB, we get an ArrowIOError > due to it not creating chunked arrays. Reading each row group individually > and then concatenating the tables works, however. > > {code:java} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > x = pa.array(list('1' * 2**30)) > demo = 'demo.parquet' > def scenario(): > t = pa.Table.from_arrays([x], ['x']) > writer = pq.ParquetWriter(demo, t.schema) > for i in range(2): > writer.write_table(t) > writer.close() > pf = pq.ParquetFile(demo) > # pyarrow.lib.ArrowIOError: Arrow error: Invalid: BinaryArray cannot > contain more than 2147483646 bytes, have 2147483647 > t2 = pf.read() > # Works, but note, there are 32 row groups, not 2 as suggested by: > # > https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing > tables = [pf.read_row_group(i) for i in range(pf.num_row_groups)] > t3 = pa.concat_tables(tables) > scenario() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva
[ https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-3746: - Assignee: Philipp Moritz > [Gandiva] [Python] Make it possible to list all functions registered with > Gandiva > - > > Key: ARROW-3746 > URL: https://issues.apache.org/jira/browse/ARROW-3746 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > This will also be useful for documentation purposes (right now it is not very > easy to get a list of all the functions that are registered). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva
[ https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3746. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 2933 [https://github.com/apache/arrow/pull/2933] > [Gandiva] [Python] Make it possible to list all functions registered with > Gandiva > - > > Key: ARROW-3746 > URL: https://issues.apache.org/jira/browse/ARROW-3746 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This will also be useful for documentation purposes (right now it is not very > easy to get a list of all the functions that are registered). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3673) [Go] implement Time64 array
[ https://issues.apache.org/jira/browse/ARROW-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683009#comment-16683009 ] Alexandre Crayssac commented on ARROW-3673: --- Submitted PR: https://github.com/apache/arrow/pull/2944 > [Go] implement Time64 array > --- > > Key: ARROW-3673 > URL: https://issues.apache.org/jira/browse/ARROW-3673 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Sebastien Binet >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3672) [Go] implement Time32 array
[ https://issues.apache.org/jira/browse/ARROW-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683010#comment-16683010 ] Alexandre Crayssac commented on ARROW-3672: --- Submitted PR: https://github.com/apache/arrow/pull/2944 > [Go] implement Time32 array > --- > > Key: ARROW-3672 > URL: https://issues.apache.org/jira/browse/ARROW-3672 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Sebastien Binet >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3672) [Go] implement Time32 array
[ https://issues.apache.org/jira/browse/ARROW-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3672: -- Labels: pull-request-available (was: ) > [Go] implement Time32 array > --- > > Key: ARROW-3672 > URL: https://issues.apache.org/jira/browse/ARROW-3672 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Sebastien Binet >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3759) [R] Run test suite on Windows in Appveyor
Wes McKinney created ARROW-3759: --- Summary: [R] Run test suite on Windows in Appveyor Key: ARROW-3759 URL: https://issues.apache.org/jira/browse/ARROW-3759 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3761) [R] Bindings for CompressedInputStream, CompressedOutputStream
Wes McKinney created ARROW-3761: --- Summary: [R] Bindings for CompressedInputStream, CompressedOutputStream Key: ARROW-3761 URL: https://issues.apache.org/jira/browse/ARROW-3761 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Wes McKinney See also {{pyarrow.input_stream/output_stream}} which can automatically construct compressed reader/writer objects -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3310) [R] Create wrapper classes for various Arrow IO interfaces
[ https://issues.apache.org/jira/browse/ARROW-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682993#comment-16682993 ] Wes McKinney commented on ARROW-3310: - [~romainfrancois] this seems partially complete. What remains? Also, check out the {{pyarrow.input_stream}} and {{output_stream}} methods that were just added to the project to make things simpler for users. R should probably implement the same thing https://github.com/apache/arrow/blob/master/python/pyarrow/io.pxi#L1437 > [R] Create wrapper classes for various Arrow IO interfaces > -- > > Key: ARROW-3310 > URL: https://issues.apache.org/jira/browse/ARROW-3310 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > * InputStream > * OutputStream > * RandomAccessFile > * WritableFile > * BufferOutputStream > * BufferReader > * OSFile > * MemoryMappedFile > * HdfsFile > and so on. depends on ARROW-3306 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-912) [Python] Account for multiarch systems in development.rst
[ https://issues.apache.org/jira/browse/ARROW-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682996#comment-16682996 ] Krisztian Szucs commented on ARROW-912: --- [~wesmckinn] Simply just mention it in the developer documentation? > [Python] Account for multiarch systems in development.rst > - > > Key: ARROW-912 > URL: https://issues.apache.org/jira/browse/ARROW-912 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > Some systems will install libraries in lib64 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3760) [R] Support Arrow CSV reader
Wes McKinney created ARROW-3760: --- Summary: [R] Support Arrow CSV reader Key: ARROW-3760 URL: https://issues.apache.org/jira/browse/ARROW-3760 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Wes McKinney This should compose with any of the other file interfaces -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3758) [R] Build R library on Windows, document build instructions for Windows developers
Wes McKinney created ARROW-3758: --- Summary: [R] Build R library on Windows, document build instructions for Windows developers Key: ARROW-3758 URL: https://issues.apache.org/jira/browse/ARROW-3758 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3757) [R] R bindings for Flight RPC client
Wes McKinney created ARROW-3757: --- Summary: [R] R bindings for Flight RPC client Key: ARROW-3757 URL: https://issues.apache.org/jira/browse/ARROW-3757 Project: Apache Arrow Issue Type: New Feature Components: FlightRPC, R Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3316) [R] Multi-threaded conversion from R data.frame to Arrow table / record batch
[ https://issues.apache.org/jira/browse/ARROW-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3316: Description: This is the companion issue to ARROW-2968, like {{pyarrow.Table.from_pandas}} > [R] Multi-threaded conversion from R data.frame to Arrow table / record batch > - > > Key: ARROW-3316 > URL: https://issues.apache.org/jira/browse/ARROW-3316 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > This is the companion issue to ARROW-2968, like {{pyarrow.Table.from_pandas}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2968) [R] Multi-threaded conversion from Arrow table to R data.frame
[ https://issues.apache.org/jira/browse/ARROW-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682992#comment-16682992 ] Wes McKinney commented on ARROW-2968: - As part of this, should also expose the global thread pool options in the R API > [R] Multi-threaded conversion from Arrow table to R data.frame > -- > > Key: ARROW-2968 > URL: https://issues.apache.org/jira/browse/ARROW-2968 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Wes McKinney >Priority: Major > > like {{pyarrow.Table.to_pandas}} with {{use_threads=True}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1445) [Python] Segfault when using libhdfs3 in pyarrow using latest API
[ https://issues.apache.org/jira/browse/ARROW-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682991#comment-16682991 ] Krisztian Szucs commented on ARROW-1445: [~wesmckinn] Nope, We have a docker setup for testing hdfs integration with libhdfs3==2.2.31 > [Python] Segfault when using libhdfs3 in pyarrow using latest API > - > > Key: ARROW-1445 > URL: https://issues.apache.org/jira/browse/ARROW-1445 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.6.0 >Reporter: James Porritt >Priority: Major > > I'm encoutering a segfault when using libhdfs3 with pyarrow. > My script is: > {code} > import pyarrow > def main(): > hdfs = pyarrow.hdfs.connect("", , "", > driver='libhdfs') > print hdfs.ls('') > hdfs3a = pyarrow.HdfsClient("", , "", > driver='libhdfs3') > print hdfs3a.ls('') > hdfs3b = pyarrow.hdfs.connect("", , "", > driver='libhdfs3') > print hdfs3b.ls('') > main() > {code} > The first two hdfs connections yield the correct list. The third yields: > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f69c0c8b57f, pid=88070, tid=140092200666880 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build > 1.8.0_60-b27) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [libc.so.6+0x13357f] __strlen_sse42+0xf > {noformat} > It dumps an error report file too. > I created my conda environment with: > {noformat} > conda create -n parquet > source activate parquet > conda install pyarrow libhdfs3 -c conda-forge > {noformat} > The packages used are: > {noformat} > arrow-cpp 0.6.0 np113py27_1conda-forge > boost-cpp 1.64.01conda-forge > bzip2 1.0.6 1conda-forge > ca-certificates 2017.7.27.1 0conda-forge > certifi 2017.7.27.1 py27_0conda-forge > curl 7.54.10conda-forge > icu 58.1 1conda-forge > krb5 1.14.20conda-forge > libgcrypt 1.8.0 0conda-forge > libgpg-error 1.27 0conda-forge > libgsasl 1.8.0 1conda-forge > libhdfs3 2.3 0conda-forge > libiconv 1.14 4conda-forge > libntlm 1.4 0conda-forge > libssh2 1.8.0 1conda-forge > libuuid 1.0.3 1conda-forge > libxml2 2.9.4 4conda-forge > mkl 2017.0.3 0 > ncurses 5.9 10conda-forge > numpy 1.13.1 py27_0 > openssl 1.0.2l0conda-forge > pandas0.20.3 py27_1conda-forge > parquet-cpp 1.3.0.pre 1conda-forge > pip 9.0.1py27_0conda-forge > protobuf 3.3.2py27_0conda-forge > pyarrow 0.6.0 np113py27_1conda-forge > python2.7.131conda-forge > python-dateutil 2.6.1py27_0conda-forge > pytz 2017.2 py27_0conda-forge > readline 6.2 0conda-forge > setuptools36.2.2 py27_0conda-forge > six 1.10.0 py27_1conda-forge > sqlite3.13.01conda-forge > tk8.5.192conda-forge > wheel 0.29.0 py27_0conda-forge > xz5.2.3 0conda-forge > zlib 1.2.110conda-forge > {noformat} > I've set my ARROW_LIBHDFS_DIR to point at the location of the libhdfs3.so > file. > I've populated my CLASSPATH as per the documentation. > Please advise. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3366) [R] Dockerfile for docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3366: Component/s: R > [R] Dockerfile for docker-compose setup > --- > > Key: ARROW-3366 > URL: https://issues.apache.org/jira/browse/ARROW-3366 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Time Spent: 7h 20m > Remaining Estimate: 0h > > Introduced by https://github.com/apache/arrow/pull/2572 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3750) [R] Pass various wrapped Arrow objects created in Python into R with zero copy via reticulate
[ https://issues.apache.org/jira/browse/ARROW-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682990#comment-16682990 ] Wes McKinney commented on ARROW-3750: - Methods would need to be added to the Cython extension types to give the memory address of the smart pointer object they contain > [R] Pass various wrapped Arrow objects created in Python into R with zero > copy via reticulate > - > > Key: ARROW-3750 > URL: https://issues.apache.org/jira/browse/ARROW-3750 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Wes McKinney >Priority: Major > > A user may wish to use some functionality available only in pyarrow using > reticulate; it would be useful to be able to construct an R wrapper object to > the C++ object inside the corresponding Python type, e.g. {{pyarrow.Table}}. > This probably will require some new functions to return the memory address of > the shared_ptr/unique_ptr inside the Cython types so that a function on the R > side can copy the smart pointer and create the corresponding R wrapper type > cc [~pitrou] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682989#comment-16682989 ] Krisztian Szucs commented on ARROW-1581: What is the desired user flow here? Listing nightly wheel artifacts on a web page? And having a latest link to download the most recent wheel? > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682983#comment-16682983 ] Wes McKinney commented on ARROW-1581: - Reopening until we figure out the developer workflow for using the nightly builds, and document it on https://cwiki.apache.org/confluence/display/ARROW > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reopened ARROW-1581: - > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682978#comment-16682978 ] Wes McKinney commented on ARROW-1581: - So I have to object a little bit to resolving this issue as there's no visibility into these builds. If someone wanted to start using them, how would they do that? I know that [~robertnishihara] [~pcmoritz] and others would make use of these for development > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3488) [Packaging] Separate crossbow task definition files for packaging and tests
[ https://issues.apache.org/jira/browse/ARROW-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682941#comment-16682941 ] Krisztian Szucs commented on ARROW-3488: Resolved via https://github.com/apache/arrow/pull/2755 > [Packaging] Separate crossbow task definition files for packaging and tests > --- > > Key: ARROW-3488 > URL: https://issues.apache.org/jira/browse/ARROW-3488 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > > In the first iteration tests.yml should contain just a single task: > hdfs-integration => docker-compose hdfs-integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-1581. Resolution: Fixed > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3488) [Packaging] Separate crossbow task definition files for packaging and tests
[ https://issues.apache.org/jira/browse/ARROW-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-3488. Resolution: Fixed > [Packaging] Separate crossbow task definition files for packaging and tests > --- > > Key: ARROW-3488 > URL: https://issues.apache.org/jira/browse/ARROW-3488 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > > In the first iteration tests.yml should contain just a single task: > hdfs-integration => docker-compose hdfs-integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (ARROW-3488) [Packaging] Separate crossbow task definition files for packaging and tests
[ https://issues.apache.org/jira/browse/ARROW-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reopened ARROW-3488: > [Packaging] Separate crossbow task definition files for packaging and tests > --- > > Key: ARROW-3488 > URL: https://issues.apache.org/jira/browse/ARROW-3488 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > > In the first iteration tests.yml should contain just a single task: > hdfs-integration => docker-compose hdfs-integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3488) [Packaging] Separate crossbow task definition files for packaging and tests
[ https://issues.apache.org/jira/browse/ARROW-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-3488. Resolution: Fixed > [Packaging] Separate crossbow task definition files for packaging and tests > --- > > Key: ARROW-3488 > URL: https://issues.apache.org/jira/browse/ARROW-3488 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > > In the first iteration tests.yml should contain just a single task: > hdfs-integration => docker-compose hdfs-integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3747) [C++] Flip order of data members in arrow::Decimal128
[ https://issues.apache.org/jira/browse/ARROW-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3747. - Resolution: Fixed Issue resolved by pull request 2940 [https://github.com/apache/arrow/pull/2940] > [C++] Flip order of data members in arrow::Decimal128 > - > > Key: ARROW-3747 > URL: https://issues.apache.org/jira/browse/ARROW-3747 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > As discussed in https://github.com/apache/arrow/pull/2845, this will enable a > data buffer to be correctly interpreted as {{Decimal128**}}, so memcpy and > other operations will work -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3710) [CI/Python] Run nightly tests against pandas master
[ https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3710. - Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 2943 [https://github.com/apache/arrow/pull/2943] > [CI/Python] Run nightly tests against pandas master > --- > > Key: ARROW-3710 > URL: https://issues.apache.org/jira/browse/ARROW-3710 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Follow-up of [https://github.com/apache/arrow/pull/2758] and > https://github.com/apache/arrow/pull/2755 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3710) [CI/Python] Run nightly tests against pandas master
[ https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3710: -- Labels: pull-request-available (was: ) > [CI/Python] Run nightly tests against pandas master > --- > > Key: ARROW-3710 > URL: https://issues.apache.org/jira/browse/ARROW-3710 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > > Follow-up of [https://github.com/apache/arrow/pull/2758] and > https://github.com/apache/arrow/pull/2755 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3701) [Gandiva] Add support for decimal operations
[ https://issues.apache.org/jira/browse/ARROW-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3701: -- Labels: pull-request-available (was: ) > [Gandiva] Add support for decimal operations > > > Key: ARROW-3701 > URL: https://issues.apache.org/jira/browse/ARROW-3701 > Project: Apache Arrow > Issue Type: Task > Components: Gandiva >Reporter: Pindikura Ravindra >Assignee: Pindikura Ravindra >Priority: Major > Labels: pull-request-available > > To begin with, will add support for 128-bit decimals. There are two parts : > # llvm_generator needs to understand decimal types (value, precision, scale) > # code decimal operations : add/subtract/multiply/divide/mod/.. > ** This will be c++ code that can be pre-compiled to emit IR code -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3756) [CI/Docker/Java] Java tests are failing in docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3756: -- Labels: docker pull-request-available (was: docker) > [CI/Docker/Java] Java tests are failing in docker-compose setup > --- > > Key: ARROW-3756 > URL: https://issues.apache.org/jira/browse/ARROW-3756 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Java >Reporter: Krisztian Szucs >Priority: Major > Labels: docker, pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)