[jira] [Assigned] (ARROW-9688) [C++] Supporting Windows ARM64 builds
[ https://issues.apache.org/jira/browse/ARROW-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-9688: --- Assignee: Niyas > [C++] Supporting Windows ARM64 builds > - > > Key: ARROW-9688 > URL: https://issues.apache.org/jira/browse/ARROW-9688 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.0 > Environment: Windows >Reporter: Mukul Sabharwal >Assignee: Niyas >Priority: Minor > Labels: pull-request-available > Original Estimate: 336h > Time Spent: 1h > Remaining Estimate: 335h > > I was trying to build the Arrow library so I could use it to generate parquet > files on Windows ARM64, but it currently fails to compile for a few reasons. > I thought I'd enumerate them here, so someone more familiar with the project > could spearhead it. > In SetupCxxFlags.cmake > * the MSVC branch for ARROW_CPU_FLAG STREQUAL "x86" is taken even though I'm > building ARM64, this may be a more fundamental error somewhere else that > needs correction and maybe things would work better, but an inspection of > other branches seemed to indicate that ARM64 is assumed to be missing from > MSVC and the keywrod "aarch64" (not a term used in the Windows ecosystem) is > prevalent in the cmake files. So the first thing I did was I stubbed it out > and set SSE42, AVX and AVX512 to be not present > * In bit_util.h I provided implementations for popcount32, popcount64 that > were not neon accelerated, although neon_cnt is provided by msvc (for n64) > * Removed nmintrin.h since that is x64/x64 specific. Note, _BitScanReverse > and _BitScanForward are Microsoft specific and support on ARM64. > * cpu_info.cc needed tweaks for cpuid stuff, I just returned false and > didn't really care too much about any upstream effects. flag_mappings and > num_flags ought be defined in the not WIN32 ifdef, since they're not actually > used. > After these changes I was able to remove the vcpkg restriction that > artificially failed the library from compiling on arm64 and I was able to > successfully compile for both arm64-windows-static and arm64-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9688) [C++] Supporting Windows ARM64 builds
[ https://issues.apache.org/jira/browse/ARROW-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-9688: Summary: [C++] Supporting Windows ARM64 builds (was: Supporting Windows ARM64 builds) > [C++] Supporting Windows ARM64 builds > - > > Key: ARROW-9688 > URL: https://issues.apache.org/jira/browse/ARROW-9688 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.0 > Environment: Windows >Reporter: Mukul Sabharwal >Priority: Minor > Labels: pull-request-available > Original Estimate: 336h > Time Spent: 50m > Remaining Estimate: 335h 10m > > I was trying to build the Arrow library so I could use it to generate parquet > files on Windows ARM64, but it currently fails to compile for a few reasons. > I thought I'd enumerate them here, so someone more familiar with the project > could spearhead it. > In SetupCxxFlags.cmake > * the MSVC branch for ARROW_CPU_FLAG STREQUAL "x86" is taken even though I'm > building ARM64, this may be a more fundamental error somewhere else that > needs correction and maybe things would work better, but an inspection of > other branches seemed to indicate that ARM64 is assumed to be missing from > MSVC and the keywrod "aarch64" (not a term used in the Windows ecosystem) is > prevalent in the cmake files. So the first thing I did was I stubbed it out > and set SSE42, AVX and AVX512 to be not present > * In bit_util.h I provided implementations for popcount32, popcount64 that > were not neon accelerated, although neon_cnt is provided by msvc (for n64) > * Removed nmintrin.h since that is x64/x64 specific. Note, _BitScanReverse > and _BitScanForward are Microsoft specific and support on ARM64. > * cpu_info.cc needed tweaks for cpuid stuff, I just returned false and > didn't really care too much about any upstream effects. flag_mappings and > num_flags ought be defined in the not WIN32 ifdef, since they're not actually > used. > After these changes I was able to remove the vcpkg restriction that > artificially failed the library from compiling on arm64 and I was able to > successfully compile for both arm64-windows-static and arm64-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14291) Use CI for linting
[ https://issues.apache.org/jira/browse/ARROW-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427443#comment-17427443 ] Kouhei Sutou commented on ARROW-14291: -- Lint is done by https://github.com/apache/arrow/blob/master/.github/workflows/dev.yml#L35 . > Use CI for linting > -- > > Key: ARROW-14291 > URL: https://issues.apache.org/jira/browse/ARROW-14291 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > > Currently the development process requires the developer to lint their code > before committing it. This can be inefficient for changes made in the browser > and when one has a different compiler setup than that used for linting. As > described in > [https://dev.to/flipp-engineering/linting-only-changed-files-with-github-actions-4ddp] > development efficiency can be improved if altered code is automatically > linted by a CI action. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-14291) Use CI for linting
[ https://issues.apache.org/jira/browse/ARROW-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427442#comment-17427442 ] Kouhei Sutou edited comment on ARROW-14291 at 10/12/21, 4:34 AM: - We already have lint fix by CI feature: https://github.com/apache/arrow/blob/master/.github/workflows/comment_bot.yml#L52 Our C++ lint targets are only {{cpp/src/}}: https://github.com/apache/arrow/blob/master/cpp/CMakeLists.txt#L243 They don't include {{cpp/examples}}. was (Author: kou): We already have this feature: https://github.com/apache/arrow/blob/master/.github/workflows/comment_bot.yml#L52 Our C++ lint targets are only {{cpp/src/}}: https://github.com/apache/arrow/blob/master/cpp/CMakeLists.txt#L243 They don't include {{cpp/examples}}. > Use CI for linting > -- > > Key: ARROW-14291 > URL: https://issues.apache.org/jira/browse/ARROW-14291 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > > Currently the development process requires the developer to lint their code > before committing it. This can be inefficient for changes made in the browser > and when one has a different compiler setup than that used for linting. As > described in > [https://dev.to/flipp-engineering/linting-only-changed-files-with-github-actions-4ddp] > development efficiency can be improved if altered code is automatically > linted by a CI action. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14291) Use CI for linting
[ https://issues.apache.org/jira/browse/ARROW-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427442#comment-17427442 ] Kouhei Sutou commented on ARROW-14291: -- We already have this feature: https://github.com/apache/arrow/blob/master/.github/workflows/comment_bot.yml#L52 Our C++ lint targets are only {{cpp/src/}}: https://github.com/apache/arrow/blob/master/cpp/CMakeLists.txt#L243 They don't include {{cpp/examples}}. > Use CI for linting > -- > > Key: ARROW-14291 > URL: https://issues.apache.org/jira/browse/ARROW-14291 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > > Currently the development process requires the developer to lint their code > before committing it. This can be inefficient for changes made in the browser > and when one has a different compiler setup than that used for linting. As > described in > [https://dev.to/flipp-engineering/linting-only-changed-files-with-github-actions-4ddp] > development efficiency can be improved if altered code is automatically > linted by a CI action. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14291) Use CI for linting
Benson Muite created ARROW-14291: Summary: Use CI for linting Key: ARROW-14291 URL: https://issues.apache.org/jira/browse/ARROW-14291 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Benson Muite Assignee: Benson Muite Currently the development process requires the developer to lint their code before committing it. This can be inefficient for changes made in the browser and when one has a different compiler setup than that used for linting. As described in [https://dev.to/flipp-engineering/linting-only-changed-files-with-github-actions-4ddp] development efficiency can be improved if altered code is automatically linted by a CI action. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-14269) [C++] Consolidate utf8 benchmark
[ https://issues.apache.org/jira/browse/ARROW-14269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibo Cai resolved ARROW-14269. -- Fix Version/s: 6.0.0 Resolution: Fixed Issue resolved by pull request 11376 [https://github.com/apache/arrow/pull/11376] > [C++] Consolidate utf8 benchmark > > > Key: ARROW-14269 > URL: https://issues.apache.org/jira/browse/ARROW-14269 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I find some trivial (and obviously irrelevant) changes to UTF8 validation > code may cause big variances to benchmark result. > UTF8 validation functions are inlined and called directly in benchmark. The > compiler may try to optimize them together with the benchmark loop. > Un-inline the benchmark-ed functions makes the result predictable and > explainable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14290) [C++] String comparison in between ternary kernel
[ https://issues.apache.org/jira/browse/ARROW-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427434#comment-17427434 ] Benson Muite commented on ARROW-14290: -- Yes, initial implementation does not allow for comparison of strings with keys, which is important for many applications. As you pointed out, a related issue for sorting is https://issues.apache.org/jira/browse/ARROW-12046 > [C++] String comparison in between ternary kernel > - > > Key: ARROW-14290 > URL: https://issues.apache.org/jira/browse/ARROW-14290 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > > String comparisons in C++ will use order by unicode. This may not be suitable > in many language applications, for example when using characters from > languages that use more than ASCII. Sorting algorithms can often allow for > the use of custom comparison functions. It would be helpful to allow for > this for the between kernel as well. Initial work on the between kernel is > being tracked in https://issues.apache.org/jira/browse/ARROW-9843 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14260) [C++] GTest linker error with vcpkg and Visual Studio 2019
[ https://issues.apache.org/jira/browse/ARROW-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427347#comment-17427347 ] Kouhei Sutou commented on ARROW-14260: -- Can we see build command line for the link failure? > [C++] GTest linker error with vcpkg and Visual Studio 2019 > -- > > Key: ARROW-14260 > URL: https://issues.apache.org/jira/browse/ARROW-14260 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Ian Cook >Priority: Major > > The *test-build-vcpkg-win* nightly Crossbow job is failing with these linker > errors: > {code:java} > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) void __cdecl > testing::internal2::PrintBytesInObjectTo(unsigned char const *,unsigned > __int64,class std::basic_ostream > *)" > (__imp_?PrintBytesInObjectTo@internal2@testing@@YAXPEBE_KPEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@@Z) > referenced in function "class std::basic_ostream std::char_traits > & __cdecl testing::internal2::operator<< std::char_traits,class std::_Vector_iterator std::_Vector_val > > > >(class std::basic_ostream > &,class > std::_Vector_iterator arrow::compute::ExecNode *> > > const &)" > (??$?6DU?$char_traits@D@std@@V?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@1@@internal2@testing@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@AEAV23@AEBV?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@3@@Z) > > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > referenced in function "void __cdecl arrow::fs::AssertFileContents(class > arrow::fs::FileSystem *,class std::basic_string std::char_traits,class std::allocator > const &,class > std::basic_string,class > std::allocator > const &)" > (?AssertFileContents@fs@arrow@@YAXPEAVFileSystem@12@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@1@Z) > > unity_0_cxx.obj : error LNK2001: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > {code} > Link to the error where it occurs in the full log: > https://github.com/ursacomputing/crossbow/runs/3799925986#step:4:2737 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14290) [C++] String comparison in between ternary kernel
[ https://issues.apache.org/jira/browse/ARROW-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427318#comment-17427318 ] Antoine Pitrou commented on ARROW-14290: Is this different from ARROW-12046 ? > [C++] String comparison in between ternary kernel > - > > Key: ARROW-14290 > URL: https://issues.apache.org/jira/browse/ARROW-14290 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > > String comparisons in C++ will use order by unicode. This may not be suitable > in many language applications, for example when using characters from > languages that use more than ASCII. Sorting algorithms can often allow for > the use of custom comparison functions. It would be helpful to allow for > this for the between kernel as well. Initial work on the between kernel is > being tracked in https://issues.apache.org/jira/browse/ARROW-9843 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14290) [C++] String comparison in between ternary kernel
[ https://issues.apache.org/jira/browse/ARROW-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-14290: --- Component/s: C++ > [C++] String comparison in between ternary kernel > - > > Key: ARROW-14290 > URL: https://issues.apache.org/jira/browse/ARROW-14290 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > > String comparisons in C++ will use order by unicode. This may not be suitable > in many language applications, for example when using characters from > languages that use more than ASCII. Sorting algorithms can often allow for > the use of custom comparison functions. It would be helpful to allow for > this for the between kernel as well. Initial work on the between kernel is > being tracked in https://issues.apache.org/jira/browse/ARROW-9843 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-14252) [R] Partial matching of arguments warning
[ https://issues.apache.org/jira/browse/ARROW-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-14252. - Fix Version/s: (was: 7.0.0) 6.0.0 Resolution: Fixed Issue resolved by pull request 11371 [https://github.com/apache/arrow/pull/11371] > [R] Partial matching of arguments warning > - > > Key: ARROW-14252 > URL: https://issues.apache.org/jira/browse/ARROW-14252 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Nicola Crane >Assignee: Dragoș Moldovan-Grünfeld >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > There are a few examples of partially matched arguments in the code. One > example is below, but there could be others. > {code:r} > Failure (test-dplyr-query.R:46:3): dim() on query > `via_batch <- rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(input = > record_batch(tbl` threw an unexpected warning. > Message: partial match of 'filtered' to 'filtered_rows' > Class: simpleWarning/warning/condition > Backtrace: > 1. arrow:::expect_dplyr_equal(...) test-dplyr-query.R:46:2 > 11. arrow::dim.arrow_dplyr_query(.) > 12. base::isTRUE(x$filtered) /Users/dragos/Documents/arrow/r/R/dplyr.R:147:2 > Failure (test-dplyr-query.R:46:3): dim() on query > `via_table <- rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(input = > Table$create(tbl` threw an unexpected warning. > Message: partial match of 'filtered' to 'filtered_rows' > Class: simpleWarning/warning/condition > Backtrace: > 1. arrow:::expect_dplyr_equal(...) test-dplyr-query.R:46:2 > 11. arrow::dim.arrow_dplyr_query(.) > 12. base::isTRUE(x$filtered) /Users/dragos/Documents/arrow/r/R/dplyr.R:147:2 > {code} > This is the relevant line of code in the example above: > https://github.com/apache/arrow/blob/25a6f591d1f162106b74e29870ebd4012e9874cc/r/R/dplyr.R#L150 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14074) [C++][Compute] Sketch a C++ consumer of compute IR
[ https://issues.apache.org/jira/browse/ARROW-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14074: --- Labels: pull-request-available query-engine (was: query-engine) > [C++][Compute] Sketch a C++ consumer of compute IR > -- > > Key: ARROW-14074 > URL: https://issues.apache.org/jira/browse/ARROW-14074 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Compute IR >Reporter: Ben Kietzman >Assignee: Ben Kietzman >Priority: Major > Labels: pull-request-available, query-engine > Fix For: 7.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-14062 adds a basic compute Intermediate Representation. Allowing c++ > compute to consume this and produce ExecPlans will allow more straightforward > and less tightly coupled usage of ExecPlans from bindings. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14161) [C++][Parquet][Docs] Reading/Writing Parquet Files
[ https://issues.apache.org/jira/browse/ARROW-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rares Vernica updated ARROW-14161: -- Description: Missing documentation on Reading/Writing Parquet files C++ api: * [WriteTable|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow10WriteTableERKN5arrow5TableEP10MemoryPoolNSt10shared_ptrIN5arrow2io12OutputStreamEEE7int64_tNSt10shared_ptrI16WriterPropertiesEENSt10shared_ptrI21ArrowWriterPropertiesEE] missing docs on chunk_size found some [here|https://github.com/apache/parquet-cpp/blob/642da055adf009652689b20e68a198cffb857651/examples/parquet-arrow/src/reader-writer.cc#L53] _size of the RowGroup in the parquet file. Normally you would choose this to be rather large_ * Typo in file reader [example|https://arrow.apache.org/docs/cpp/parquet.html#filereader] the include should be {{#include "parquet/arrow/reader.h"}} * [WriteProperties/Builder|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet16WriterPropertiesE] missing docs on {{compression}} * Missing example on using WriteProperties was: Missing documentation on Reading/Writing Parquet files C++ api: * [WriteTable|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow10WriteTableERKN5arrow5TableEP10MemoryPoolNSt10shared_ptrIN5arrow2io12OutputStreamEEE7int64_tNSt10shared_ptrI16WriterPropertiesEENSt10shared_ptrI21ArrowWriterPropertiesEE] missing docs on chunk_size found some [here|https://github.com/apache/parquet-cpp/blob/642da055adf009652689b20e68a198cffb857651/examples/parquet-arrow/src/reader-writer.cc#L53] _size of the RowGroup in the parquet file. Normally you would choose this to be rather large_ * Typo in file reader [example|https://arrow.apache.org/docs/cpp/parquet.html#filereader] the include should be {{#include "parquet/arrow/reader.h"}} * {{[WriteProperties/Builder|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet16WriterPropertiesE] missing docs on compression}} * Missing example on using WriteProperties > [C++][Parquet][Docs] Reading/Writing Parquet Files > -- > > Key: ARROW-14161 > URL: https://issues.apache.org/jira/browse/ARROW-14161 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: Rares Vernica >Priority: Minor > > Missing documentation on Reading/Writing Parquet files C++ api: > * > [WriteTable|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow10WriteTableERKN5arrow5TableEP10MemoryPoolNSt10shared_ptrIN5arrow2io12OutputStreamEEE7int64_tNSt10shared_ptrI16WriterPropertiesEENSt10shared_ptrI21ArrowWriterPropertiesEE] > missing docs on chunk_size found some > [here|https://github.com/apache/parquet-cpp/blob/642da055adf009652689b20e68a198cffb857651/examples/parquet-arrow/src/reader-writer.cc#L53] > _size of the RowGroup in the parquet file. Normally you would choose this to > be rather large_ > * Typo in file reader > [example|https://arrow.apache.org/docs/cpp/parquet.html#filereader] the > include should be {{#include "parquet/arrow/reader.h"}} > * > [WriteProperties/Builder|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet16WriterPropertiesE] > missing docs on {{compression}} > * Missing example on using WriteProperties -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14161) [C++][Parquet][Docs] Reading/Writing Parquet Files
[ https://issues.apache.org/jira/browse/ARROW-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rares Vernica updated ARROW-14161: -- Description: Missing documentation on Reading/Writing Parquet files C++ api: * [WriteTable|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow10WriteTableERKN5arrow5TableEP10MemoryPoolNSt10shared_ptrIN5arrow2io12OutputStreamEEE7int64_tNSt10shared_ptrI16WriterPropertiesEENSt10shared_ptrI21ArrowWriterPropertiesEE] missing docs on chunk_size found some [here|https://github.com/apache/parquet-cpp/blob/642da055adf009652689b20e68a198cffb857651/examples/parquet-arrow/src/reader-writer.cc#L53] _size of the RowGroup in the parquet file. Normally you would choose this to be rather large_ * Typo in file reader [example|https://arrow.apache.org/docs/cpp/parquet.html#filereader] the include should be {{#include "parquet/arrow/reader.h"}} * {{[WriteProperties/Builder|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet16WriterPropertiesE] missing docs on compression}} * Missing example on using WriteProperties was: Missing documentation on Reading/Writing Parquet files C++ api: * [WriteTable|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow10WriteTableERKN5arrow5TableEP10MemoryPoolNSt10shared_ptrIN5arrow2io12OutputStreamEEE7int64_tNSt10shared_ptrI16WriterPropertiesEENSt10shared_ptrI21ArrowWriterPropertiesEE] missing docs on chunk_size found some [here|https://github.com/apache/parquet-cpp/blob/642da055adf009652689b20e68a198cffb857651/examples/parquet-arrow/src/reader-writer.cc#L53] _size of the RowGroup in the parquet file. Normally you would choose this to be rather large_ * Typo in file reader [example|https://arrow.apache.org/docs/cpp/parquet.html#filereader] the include should be {{#include "parquet/arrow/reader.h"}} > [C++][Parquet][Docs] Reading/Writing Parquet Files > -- > > Key: ARROW-14161 > URL: https://issues.apache.org/jira/browse/ARROW-14161 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: Rares Vernica >Priority: Minor > > Missing documentation on Reading/Writing Parquet files C++ api: > * > [WriteTable|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow10WriteTableERKN5arrow5TableEP10MemoryPoolNSt10shared_ptrIN5arrow2io12OutputStreamEEE7int64_tNSt10shared_ptrI16WriterPropertiesEENSt10shared_ptrI21ArrowWriterPropertiesEE] > missing docs on chunk_size found some > [here|https://github.com/apache/parquet-cpp/blob/642da055adf009652689b20e68a198cffb857651/examples/parquet-arrow/src/reader-writer.cc#L53] > _size of the RowGroup in the parquet file. Normally you would choose this to > be rather large_ > * Typo in file reader > [example|https://arrow.apache.org/docs/cpp/parquet.html#filereader] the > include should be {{#include "parquet/arrow/reader.h"}} > * > {{[WriteProperties/Builder|https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet16WriterPropertiesE] > missing docs on compression}} > * Missing example on using WriteProperties -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9843) [C++] Implement Between ternary kernel
[ https://issues.apache.org/jira/browse/ARROW-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Muite updated ARROW-9843: Summary: [C++] Implement Between ternary kernel (was: [C++] Implement Between trinary kernel) > [C++] Implement Between ternary kernel > -- > > Key: ARROW-9843 > URL: https://issues.apache.org/jira/browse/ARROW-9843 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Benson Muite >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > A specialized {{between(arr, left_bound, right_bound)}} kernel would avoid > multiple scans and AND operation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14262) [C++] Document and rename is_in_meta_binary
[ https://issues.apache.org/jira/browse/ARROW-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Li updated ARROW-14262: - Labels: kernel (was: ) > [C++] Document and rename is_in_meta_binary > --- > > Key: ARROW-14262 > URL: https://issues.apache.org/jira/browse/ARROW-14262 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Weston Pace >Priority: Major > Labels: kernel > > The is_in_meta_binary and index_in_meta_binary functions do not have any > "_doc" elements. I had simply ignored them assuming they were some kind of > specialized function that shouldn't be exposed for general consumption (see > ARROW-13949) but I recently discovered they are legitimate binary variants of > their unary counterparts. > If we want to continue to expose these functions we should rename them (meta > I assume means meta function but the python/r user has no idea what a meta > function is) and add _doc elements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14289) [C++] Change Scanner::Head to return a RecordBatchReader
Neal Richardson created ARROW-14289: --- Summary: [C++] Change Scanner::Head to return a RecordBatchReader Key: ARROW-14289 URL: https://issues.apache.org/jira/browse/ARROW-14289 Project: Apache Arrow Issue Type: Improvement Components: C++, R Reporter: Neal Richardson Fix For: 7.0.0 Following ARROW-9731 and ARROW-13893. This would make it more natural to work with ExecPlans that return a RecordBatchReader when you Run them. Alternatively, we could move the business to RecordBatchReader::Head. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14288) [R] Implement nrow on some collapsed queries
Neal Richardson created ARROW-14288: --- Summary: [R] Implement nrow on some collapsed queries Key: ARROW-14288 URL: https://issues.apache.org/jira/browse/ARROW-14288 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 7.0.0 collapse() doesn't always mean we can't determine the number of rows. We can try to solve some cases: * head/tail: compute number of rows, take the smaller of that and the head/tail number * if filter == TRUE, take the number of rows of .data (which may contain a query) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9688) Supporting Windows ARM64 builds
[ https://issues.apache.org/jira/browse/ARROW-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427264#comment-17427264 ] Niyas commented on ARROW-9688: -- I've created a [PR|https://github.com/apache/arrow/pull/11383] to enable building with clang-cl > Supporting Windows ARM64 builds > --- > > Key: ARROW-9688 > URL: https://issues.apache.org/jira/browse/ARROW-9688 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.0 > Environment: Windows >Reporter: Mukul Sabharwal >Priority: Minor > Labels: pull-request-available > Original Estimate: 336h > Time Spent: 20m > Remaining Estimate: 335h 40m > > I was trying to build the Arrow library so I could use it to generate parquet > files on Windows ARM64, but it currently fails to compile for a few reasons. > I thought I'd enumerate them here, so someone more familiar with the project > could spearhead it. > In SetupCxxFlags.cmake > * the MSVC branch for ARROW_CPU_FLAG STREQUAL "x86" is taken even though I'm > building ARM64, this may be a more fundamental error somewhere else that > needs correction and maybe things would work better, but an inspection of > other branches seemed to indicate that ARM64 is assumed to be missing from > MSVC and the keywrod "aarch64" (not a term used in the Windows ecosystem) is > prevalent in the cmake files. So the first thing I did was I stubbed it out > and set SSE42, AVX and AVX512 to be not present > * In bit_util.h I provided implementations for popcount32, popcount64 that > were not neon accelerated, although neon_cnt is provided by msvc (for n64) > * Removed nmintrin.h since that is x64/x64 specific. Note, _BitScanReverse > and _BitScanForward are Microsoft specific and support on ARM64. > * cpu_info.cc needed tweaks for cpuid stuff, I just returned false and > didn't really care too much about any upstream effects. flag_mappings and > num_flags ought be defined in the not WIN32 ifdef, since they're not actually > used. > After these changes I was able to remove the vcpkg restriction that > artificially failed the library from compiling on arm64 and I was able to > successfully compile for both arm64-windows-static and arm64-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9688) Supporting Windows ARM64 builds
[ https://issues.apache.org/jira/browse/ARROW-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9688: -- Labels: pull-request-available (was: ) > Supporting Windows ARM64 builds > --- > > Key: ARROW-9688 > URL: https://issues.apache.org/jira/browse/ARROW-9688 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 1.0.0 > Environment: Windows >Reporter: Mukul Sabharwal >Priority: Minor > Labels: pull-request-available > Original Estimate: 336h > Time Spent: 10m > Remaining Estimate: 335h 50m > > I was trying to build the Arrow library so I could use it to generate parquet > files on Windows ARM64, but it currently fails to compile for a few reasons. > I thought I'd enumerate them here, so someone more familiar with the project > could spearhead it. > In SetupCxxFlags.cmake > * the MSVC branch for ARROW_CPU_FLAG STREQUAL "x86" is taken even though I'm > building ARM64, this may be a more fundamental error somewhere else that > needs correction and maybe things would work better, but an inspection of > other branches seemed to indicate that ARM64 is assumed to be missing from > MSVC and the keywrod "aarch64" (not a term used in the Windows ecosystem) is > prevalent in the cmake files. So the first thing I did was I stubbed it out > and set SSE42, AVX and AVX512 to be not present > * In bit_util.h I provided implementations for popcount32, popcount64 that > were not neon accelerated, although neon_cnt is provided by msvc (for n64) > * Removed nmintrin.h since that is x64/x64 specific. Note, _BitScanReverse > and _BitScanForward are Microsoft specific and support on ARM64. > * cpu_info.cc needed tweaks for cpuid stuff, I just returned false and > didn't really care too much about any upstream effects. flag_mappings and > num_flags ought be defined in the not WIN32 ifdef, since they're not actually > used. > After these changes I was able to remove the vcpkg restriction that > artificially failed the library from compiling on arm64 and I was able to > successfully compile for both arm64-windows-static and arm64-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14196) [C++][Parquet] Default to compliant nested types in Parquet writer
[ https://issues.apache.org/jira/browse/ARROW-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427218#comment-17427218 ] Joris Van den Bossche commented on ARROW-14196: --- bq. 2. Make it possible to select columns by eliding the list name components. Currently, I think the C++ API only deals with column indices? (at least for the Python bindings, the translation of column names to field indices happens in Python) For Python that should be relatively straightforward to implement. Opened ARROW-14286 for this. bq. If so, I'd have to support both naming conventions because both would exist in the wild. [~jpivarski] yes, but that's already the case right now as well. Parquet files written by (py)arrow will use a different name for the list element compared to parquet files written by other tools (that's actually what we are trying to harmonize). So if you select a subfield of a list field by name, you already need to take into account potentially different names at the moment. > [C++][Parquet] Default to compliant nested types in Parquet writer > -- > > Key: ARROW-14196 > URL: https://issues.apache.org/jira/browse/ARROW-14196 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet >Reporter: Joris Van den Bossche >Priority: Major > Fix For: 6.0.0 > > > In C++ there is already an option to get the "compliant_nested_types" (to > have the list columns follow the Parquet specification), and ARROW-11497 > exposed this option in Python. > This is still set to False by default, but in the source it says "TODO: At > some point we should flip this.", and in ARROW-11497 there was also some > discussion about what it would take to change the default. > cc [~emkornfield] [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14287) [R] Selecting colums while reading Parquet file with nested types can give wrong column
Joris Van den Bossche created ARROW-14287: - Summary: [R] Selecting colums while reading Parquet file with nested types can give wrong column Key: ARROW-14287 URL: https://issues.apache.org/jira/browse/ARROW-14287 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Joris Van den Bossche I created two small files (using Python for my convenience): {code:python} import pyarrow as pa import pyarrow.parquet as pq table = pa.table({"a": [1, 2], "b": [3, 4]}) pq.write_table(table, "test1.parquet") table = pa.table({"a": [1, 2], "nested": [[{'f1': 1, 'f2': 3}, {'f1': 3, 'f2': 4}], None], "b": [3, 4]}) pq.write_table(table, "test2.parquet") {code} where the first is a simple file, and the second contains a column with a nested list of struct type. Reading that in R with a column selection works in the first case, but actually reads the second column instead of third in the second case: {code:r} > arrow::read_parquet("test1.parquet", col_select=c("b")) b 1 3 2 4 > arrow::read_parquet("test2.parquet", col_select=c("b")) nested 1 3, 4 2 NULL {code} This is due to the simple conversion of column names to integer indices in the R code, while Parquet counts the individual fields of nested columns separately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-13944) [C++] Bump xsimd to latest version
[ https://issues.apache.org/jira/browse/ARROW-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-13944. Fix Version/s: 6.0.0 Resolution: Fixed Issue resolved by pull request 11142 [https://github.com/apache/arrow/pull/11142] > [C++] Bump xsimd to latest version > -- > > Key: ARROW-13944 > URL: https://issues.apache.org/jira/browse/ARROW-13944 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > xsimd is refactored to use architecture instead of register size to define a > batch. > I've adapted arrow code to this change. > There's one xsimd bug [1] needs to fix before we can upgrade. > [1] https://github.com/xtensor-stack/xsimd/pull/553 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14286) [Python][Parquet] Allow to select columns of a list field without requiring the list component names
Joris Van den Bossche created ARROW-14286: - Summary: [Python][Parquet] Allow to select columns of a list field without requiring the list component names Key: ARROW-14286 URL: https://issues.apache.org/jira/browse/ARROW-14286 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche Subtask for ARROW-14196. Currently, if you have a list column, where the list elements itself are nested items (eg a list of structs), selecting a subset of that list column requires something like {{columns=["columnA.list.item.subfield"]}}. While this "list.item" is superfluous, since a list always contains a single child. So ideally we allow to specify this as {{columns=["columnA.subfield"]}}. This also avoids relying on the exact name of the list item (item vs element), for which the default differs between Parquet and Arrow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14285) [C++] Fix crashes when pretty-printing data from valid IPC file (OSS-Fuzz)
[ https://issues.apache.org/jira/browse/ARROW-14285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14285: --- Labels: pull-request-available (was: ) > [C++] Fix crashes when pretty-printing data from valid IPC file (OSS-Fuzz) > -- > > Key: ARROW-14285 > URL: https://issues.apache.org/jira/browse/ARROW-14285 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > > Fix the following issues found by OSS-Fuzz: > * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39677 > * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39703 > * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39763 > * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39773 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14285) [C++] Fix crashes when pretty-printing data from valid IPC file (OSS-Fuzz)
Antoine Pitrou created ARROW-14285: -- Summary: [C++] Fix crashes when pretty-printing data from valid IPC file (OSS-Fuzz) Key: ARROW-14285 URL: https://issues.apache.org/jira/browse/ARROW-14285 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou Assignee: Antoine Pitrou Fix For: 6.0.0 Fix the following issues found by OSS-Fuzz: * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39677 * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39703 * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39763 * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39773 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10140) [Python][C++] Add test for map column of a parquet file created from pyarrow and pandas
[ https://issues.apache.org/jira/browse/ARROW-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-10140: -- Fix Version/s: (was: 6.0.0) 7.0.0 > [Python][C++] Add test for map column of a parquet file created from pyarrow > and pandas > --- > > Key: ARROW-10140 > URL: https://issues.apache.org/jira/browse/ARROW-10140 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 1.0.1 >Reporter: Chen Ming >Assignee: Joris Van den Bossche >Priority: Minor > Fix For: 7.0.0 > > Attachments: pyspark.snappy.parquet, test_map.parquet, test_map.py, > test_map_2.0.0.parquet, test_map_200.parquet > > > Hi, > I'm having problems reading parquet files with 'map' data type created by > pyarrow. > I followed > [https://stackoverflow.com/questions/63553715/pyarrow-data-types-for-columns-that-have-lists-of-dictionaries] > to convert a pandas DF to an arrow table, then call write_table to output a > parquet file: > (We also referred to https://issues.apache.org/jira/browse/ARROW-9812) > {code:java} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > print(f'PyArrow Version = {pa.__version__}') > print(f'Pandas Version = {pd.__version__}') > df = pd.DataFrame({ > 'col1': pd.Series([ > [('id', 'something'), ('value2', 'else')], > [('id', 'something2'), ('value','else2')], > ]), > 'col2': pd.Series(['foo', 'bar']) > }) > udt = pa.map_(pa.string(), pa.string()) > schema = pa.schema([pa.field('col1', udt), pa.field('col2', pa.string())]) > table = pa.Table.from_pandas(df, schema) > pq.write_table(table, './test_map.parquet') > {code} > The above code (attached as test_map.py) runs smoothly on my developing > computer: > {code:java} > PyArrow Version = 1.0.1 > Pandas Version = 1.1.2 > {code} > And generated the test_map.parquet file (attached as test_map.parquet) > successfully. > Then I use parquet-tools (1.11.1) to read the file, but get the following > output: > {code:java} > $ java -jar parquet-tools-1.11.1.jar head test_map.parquet > col1: > .key_value: > .key_value: > col2 = foo > col1: > .key_value: > .key_value: > col2 = bar > {code} > I also checked the schema of the parquet file: > {code:java} > java -jar parquet-tools-1.11.1.jar schema test_map.parquet > message schema { > optional group col1 (MAP) { > repeated group key_value { > required binary key (STRING); > optional binary value (STRING); > } > } > optional binary col2 (STRING); > }{code} > Am I doing something wrong? > We need to output the data to parquet files, and query them later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14284) [C++][Python] Improve error message when trying use SyncScanner when requiring async
Joris Van den Bossche created ARROW-14284: - Summary: [C++][Python] Improve error message when trying use SyncScanner when requiring async Key: ARROW-14284 URL: https://issues.apache.org/jira/browse/ARROW-14284 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Joris Van den Bossche See ARROW-14257 The current error message gives "Asynchronous scanning is not supported by SyncScanner" Copying the comment of [~westonpace]: In Python it is always use_async=True. In R the scanner is hidden from the user on dataset writes but the option there is use_async as well. In C++ the option is UseAsync in the ScannerBuilder. How about, "Writing datasets requires that the input scanner is configured to scan asynchronously via the use_async or UseAsync options." -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-14257) [Doc][Python] dataset doc build fails
[ https://issues.apache.org/jira/browse/ARROW-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-14257. --- Resolution: Fixed Issue resolved by pull request 11364 [https://github.com/apache/arrow/pull/11364] > [Doc][Python] dataset doc build fails > - > > Key: ARROW-14257 > URL: https://issues.apache.org/jira/browse/ARROW-14257 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, Python >Reporter: Antoine Pitrou >Assignee: Joris Van den Bossche >Priority: Blocker > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > {code} > >>>- > Exception in /home/antoine/arrow/dev/docs/source/python/dataset.rst at block > ending on line 578 > Specify :okexcept: as an option in the ipython:: block to suppress this > message > --- > ArrowNotImplementedError Traceback (most recent call last) > in > > 1 ds.write_dataset(scanner, new_root, format="parquet", > partitioning=new_part) > ~/arrow/dev/python/pyarrow/dataset.py in write_dataset(data, base_dir, > basename_template, format, partitioning, partitioning_flavor, schema, > filesystem, file_options, use_threads, max_partitions, file_visitor) > 861 _filesystemdataset_write( > 862 scanner, base_dir, basename_template, filesystem, > partitioning, > --> 863 file_options, max_partitions, file_visitor > 864 ) > ~/arrow/dev/python/pyarrow/_dataset.pyx in > pyarrow._dataset._filesystemdataset_write() > ~/arrow/dev/python/pyarrow/error.pxi in pyarrow.lib.check_status() > ArrowNotImplementedError: Asynchronous scanning is not supported by > SyncScanner > /home/antoine/arrow/dev/cpp/src/arrow/dataset/file_base.cc:367 > scanner->ScanBatchesAsync() > <<<- > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-2659) [Python] More graceful reading of empty String columns in ParquetDataset
[ https://issues.apache.org/jira/browse/ARROW-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2659: - Fix Version/s: (was: 6.0.0) 7.0.0 > [Python] More graceful reading of empty String columns in ParquetDataset > > > Key: ARROW-2659 > URL: https://issues.apache.org/jira/browse/ARROW-2659 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Uwe Korn >Assignee: Joris Van den Bossche >Priority: Major > Labels: dataset, dataset-parquet-read, parquet > Fix For: 7.0.0 > > Attachments: read_parquet_dataset.error.read_table.novalidation.txt, > read_parquet_dataset.error.read_table.txt > > > When currently saving a {{ParquetDataset}} from Pandas, we don't get > consistent schemas, even if the source was a single DataFrame. This is due to > the fact that in some partitions object columns like string can become empty. > Then the resulting Arrow schema will differ. In the central metadata, we will > store this column as {{pa.string}} whereas in the partition file with the > empty columns, this columns will be stored as {{pa.null}}. > The two schemas are still a valid match in terms of schema evolution and we > should respect that in > https://github.com/apache/arrow/blob/79a22074e0b059a24c5cd45713f8d085e24f826a/python/pyarrow/parquet.py#L754 > Instead of doing a {{pa.Schema.equals}} in > https://github.com/apache/arrow/blob/79a22074e0b059a24c5cd45713f8d085e24f826a/python/pyarrow/parquet.py#L778 > we should introduce a new method {{pa.Schema.can_evolve_to}} that is more > graceful and returns {{True}} if a dataset piece has a null column where the > main metadata states a nullable column of any type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5248) [Python] support dateutil timezones
[ https://issues.apache.org/jira/browse/ARROW-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5248: - Fix Version/s: (was: 6.0.0) 7.0.0 > [Python] support dateutil timezones > --- > > Key: ARROW-5248 > URL: https://issues.apache.org/jira/browse/ARROW-5248 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Minor > Labels: beginner > Fix For: 7.0.0 > > > The {{dateutil}} packages also provides a set of timezone objects > (https://dateutil.readthedocs.io/en/stable/tz.html) in addition to {{pytz}}. > In pyarrow, we only support pytz timezones (and the stdlib datetime.timezone > fixed offset): > {code} > In [2]: import dateutil.tz > > > In [3]: import pyarrow as pa > > > In [5]: pa.timestamp('us', dateutil.tz.gettz('Europe/Brussels')) > > > ... > ~/miniconda3/envs/dev37/lib/python3.7/site-packages/pyarrow/types.pxi in > pyarrow.lib.tzinfo_to_string() > ValueError: Unable to convert timezone > `tzfile('/usr/share/zoneinfo/Europe/Brussels')` to string > {code} > But pandas also supports dateutil timezones. As a consequence, when having a > pandas DataFrame that uses a dateutil timezone, you get an error when > converting to an arrow table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10726) [Python] Reading multiple parquet files with different index column dtype (originating pandas) reads wrong data
[ https://issues.apache.org/jira/browse/ARROW-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-10726: -- Fix Version/s: (was: 6.0.0) 7.0.0 > [Python] Reading multiple parquet files with different index column dtype > (originating pandas) reads wrong data > --- > > Key: ARROW-10726 > URL: https://issues.apache.org/jira/browse/ARROW-10726 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > Fix For: 7.0.0 > > > See https://github.com/pandas-dev/pandas/issues/38058 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-14004) [Python] to_pandas() converts to float instead of using pandas nullable types
[ https://issues.apache.org/jira/browse/ARROW-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-14004: - Assignee: Joris Van den Bossche > [Python] to_pandas() converts to float instead of using pandas nullable types > - > > Key: ARROW-14004 > URL: https://issues.apache.org/jira/browse/ARROW-14004 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, Python >Reporter: Miguel Cantón Cortés >Assignee: Joris Van den Bossche >Priority: Major > Labels: pandas > Fix For: 6.0.0 > > Attachments: image.png > > > We've noticed that when converting an Arrow Table to pandas using > `.to_pandas()` integer columns with null values get converted to float > instead of using pandas nullable types. > If the column was created with pandas first it is correctly preserved (I > guess it's using stored metadata for this). > I've attached a screenshot showing this behavior. > As currently there is support for nullable types in pandas, just as in Arrow, > it would be great to use these types when dealing with columns with null > values. > If you are reticent to change this behavior, a param would be nice too (e.g. > `to_pandas(use_nullable_types: True)`). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13525) [Python] Mention alternatives in deprecation message of ParquetDataset attributes
[ https://issues.apache.org/jira/browse/ARROW-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13525: --- Labels: pull-request-available (was: ) > [Python] Mention alternatives in deprecation message of ParquetDataset > attributes > - > > Key: ARROW-13525 > URL: https://issues.apache.org/jira/browse/ARROW-13525 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Fix For: 5.0.1, 6.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Follow-up on ARROW-13074. > We should maybe also expose the {{partitioning}} attribute on ParquetDataset > (if constructed with {{use_legacy_dataset=False}}), as I did for the > {{filesystem}}/{{files}}/{{fragments}} attributes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14283) [C++][CI] LLVM cannot be found on macOS GHA builds
[ https://issues.apache.org/jira/browse/ARROW-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14283: --- Labels: pull-request-available (was: ) > [C++][CI] LLVM cannot be found on macOS GHA builds > -- > > Key: ARROW-14283 > URL: https://issues.apache.org/jira/browse/ARROW-14283 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration >Reporter: Antoine Pitrou >Priority: Critical > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > See > https://github.com/apache/arrow/pull/11372/checks?check_run_id=3859972940 > https://github.com/apache/arrow/pull/11372/checks?check_run_id=3859973472 > https://github.com/apache/arrow/pull/11372/checks?check_run_id=3859973399 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14282) [R] altrep vectors for factors (dictionaries)
Romain Francois created ARROW-14282: --- Summary: [R] altrep vectors for factors (dictionaries) Key: ARROW-14282 URL: https://issues.apache.org/jira/browse/ARROW-14282 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Romain Francois Assignee: Romain Francois As it is the case in Converter_Dictionary, this should probably be split into 2 paths depending on whether the arrays needs unification (all the levels are the same) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14281) How to Review PRs Guidelines
Alessandro Molina created ARROW-14281: - Summary: How to Review PRs Guidelines Key: ARROW-14281 URL: https://issues.apache.org/jira/browse/ARROW-14281 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Assignee: Antoine Pitrou Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14280) R-Arrow Architectural Overview
[ https://issues.apache.org/jira/browse/ARROW-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Molina updated ARROW-14280: -- Parent: ARROW-14278 Issue Type: Sub-task (was: Improvement) > R-Arrow Architectural Overview > -- > > Key: ARROW-14280 > URL: https://issues.apache.org/jira/browse/ARROW-14280 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Alessandro Molina >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14279) PyArrow Architectural Overview
Alessandro Molina created ARROW-14279: - Summary: PyArrow Architectural Overview Key: ARROW-14279 URL: https://issues.apache.org/jira/browse/ARROW-14279 Project: Apache Arrow Issue Type: Sub-task Reporter: Alessandro Molina Assignee: Alessandro Molina Fix For: 7.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14280) R-Arrow Architectural Overview
Alessandro Molina created ARROW-14280: - Summary: R-Arrow Architectural Overview Key: ARROW-14280 URL: https://issues.apache.org/jira/browse/ARROW-14280 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14278) New Contributors Guide
Alessandro Molina created ARROW-14278: - Summary: New Contributors Guide Key: ARROW-14278 URL: https://issues.apache.org/jira/browse/ARROW-14278 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Fix For: 7.0.0 Umbrella Issue for the Guide for new contributors for Python and R -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14277) R Tutorials 2021-Q4 Initiative
Alessandro Molina created ARROW-14277: - Summary: R Tutorials 2021-Q4 Initiative Key: ARROW-14277 URL: https://issues.apache.org/jira/browse/ARROW-14277 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Fix For: 7.0.0 An umbrella ticket for the initiative of writing up a set of Tutorials for R users -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-14259) [R] converting from R vector to Array when the R vector is altrep
[ https://issues.apache.org/jira/browse/ARROW-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-14259. - Fix Version/s: 6.0.0 Resolution: Fixed Issue resolved by pull request 11366 [https://github.com/apache/arrow/pull/11366] > [R] converting from R vector to Array when the R vector is altrep > - > > Key: ARROW-14259 > URL: https://issues.apache.org/jira/browse/ARROW-14259 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Romain Francois >Assignee: Romain Francois >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When we have an R vector that was created from an Array with altrep, and then > we want to convert again to an Array, currently it materializes it, and it > should not. Instead it should be grabbing the array from the internals -of > the altrep object. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-14260) [C++] GTest linker error with vcpkg and Visual Studio 2019
[ https://issues.apache.org/jira/browse/ARROW-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425872#comment-17425872 ] Ian Cook edited comment on ARROW-14260 at 10/11/21, 1:51 PM: - [~apitrou] this seems like it might be related to ARROW-14247. Let's see if the fix for that in [#11356|https://github.com/apache/arrow/pull/11356/] makes this go away. (update: it did not) was (Author: icook): [~apitrou] this seems like it might be related to ARROW-14247. Let's see if the fix for that in [#11356|https://github.com/apache/arrow/pull/11356/] makes this go away. > [C++] GTest linker error with vcpkg and Visual Studio 2019 > -- > > Key: ARROW-14260 > URL: https://issues.apache.org/jira/browse/ARROW-14260 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Ian Cook >Priority: Major > > The *test-build-vcpkg-win* nightly Crossbow job is failing with these linker > errors: > {code:java} > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) void __cdecl > testing::internal2::PrintBytesInObjectTo(unsigned char const *,unsigned > __int64,class std::basic_ostream > *)" > (__imp_?PrintBytesInObjectTo@internal2@testing@@YAXPEBE_KPEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@@Z) > referenced in function "class std::basic_ostream std::char_traits > & __cdecl testing::internal2::operator<< std::char_traits,class std::_Vector_iterator std::_Vector_val > > > >(class std::basic_ostream > &,class > std::_Vector_iterator arrow::compute::ExecNode *> > > const &)" > (??$?6DU?$char_traits@D@std@@V?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@1@@internal2@testing@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@AEAV23@AEBV?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@3@@Z) > > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > referenced in function "void __cdecl arrow::fs::AssertFileContents(class > arrow::fs::FileSystem *,class std::basic_string std::char_traits,class std::allocator > const &,class > std::basic_string,class > std::allocator > const &)" > (?AssertFileContents@fs@arrow@@YAXPEAVFileSystem@12@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@1@Z) > > unity_0_cxx.obj : error LNK2001: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > {code} > Link to the error where it occurs in the full log: > https://github.com/ursacomputing/crossbow/runs/3799925986#step:4:2737 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14213) R arrow package not working on RStudio/Ubuntu
[ https://issues.apache.org/jira/browse/ARROW-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427137#comment-17427137 ] Neal Richardson commented on ARROW-14213: - Re-reading the logs, it looks like the installation was successful, but you were installing arrow from a session where you already had arrow loaded. Have you restarted R? > R arrow package not working on RStudio/Ubuntu > - > > Key: ARROW-14213 > URL: https://issues.apache.org/jira/browse/ARROW-14213 > Project: Apache Arrow > Issue Type: Bug > Components: R > Environment: R version 3.6.3 (2020-02-29) -- "Holding the Windsock" > Copyright (C) 2020 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) >Reporter: Thomas Wutzler >Priority: Major > > I try reading feather files in R with the arrow package that were generated > in Python. > I run on R 3.6.3 on an RStudio server window on linux machine, for which I > have no other access. I get the message: > {{Cannot call io___MemoryMappedFile__Open().}} > According to the advice in the linked help-file: > [https://cran.r-project.org/web/packages/arrow/vignettes/install.html] I > create this issue with the full log of the installation: > {{}} > > arrow::install_arrow(verbose = TRUE)Installing package into > > '/Net/Groups/BGI/scratch/twutz/R/atacama-library/3.6' > (as 'lib' is unspecified)trying URL > 'https://ftp5.gwdg.de/pub/misc/cran/src/contrib/arrow_5.0.0.2.tar.gz'Content > type 'application/octet-stream' length 483642 bytes (472 > KB)==downloaded 472 KB* > installing *source* package 'arrow' ...** package 'arrow' successfully > unpacked and MD5 sums checked** using staged installationtrying URL > 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/bin/ubuntu-16.04/arrow-5.0.0.2.zip'Content > type 'binary/octet-stream' length 17214781 bytes (16.4 > MB)==downloaded 16.4 MB*** > Successfully retrieved C++ binaries for ubuntu-16.04 > Binary package requires libcurl and openssl > If installation fails, retry after installing those system requirements > PKG_CFLAGS=-I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 > PKG_LIBS=-L/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/lib > -larrow_dataset -lparquet -larrow -larrow -larrow_bundled_dependencies > -larrow_dataset -lparquet -lssl -lcrypto -lcurl** libsg++ -std=gnu++11 > -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c RTasks.cpp -o RTasks.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c altrep.cpp -o altrep.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c array.cpp -o array.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c array_to_vector.cpp -o array_to_vector.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c arraydata.cpp -o arraydata.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 >
[jira] [Closed] (ARROW-14276) [Packaging] Dependency resolution issues in the nightly conda builds
[ https://issues.apache.org/jira/browse/ARROW-14276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou closed ARROW-14276. -- Fix Version/s: (was: 6.0.0) Resolution: Duplicate > [Packaging] Dependency resolution issues in the nightly conda builds > > > Key: ARROW-14276 > URL: https://issues.apache.org/jira/browse/ARROW-14276 > Project: Apache Arrow > Issue Type: New Feature > Components: Packaging >Reporter: Krisztian Szucs >Priority: Major > > The majority of the conda nightly builds are failing due to dependency > resolution problems: > {code} > - conda-linux-gcc-py37-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py37-arm64 > - conda-linux-gcc-py37-cpu-r41: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py37-cpu-r41 > - conda-linux-gcc-py37-cuda: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py37-cuda > - conda-linux-gcc-py38-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py38-arm64 > - conda-linux-gcc-py38-cpu: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py38-cpu > - conda-linux-gcc-py38-cuda: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py38-cuda > - conda-linux-gcc-py39-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py39-arm64 > - conda-linux-gcc-py39-cpu: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py39-cpu > - conda-linux-gcc-py39-cuda: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py39-cuda > - conda-win-vs2017-py36-r40: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-win-vs2017-py36-r40 > - conda-win-vs2017-py38: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-win-vs2017-py38 > - conda-win-vs2017-py39: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-win-vs2017-py39 > {code} > I assume that we need to sync the recipes again with up to date pin files. > cc @uwe -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14275) [C++][Parquet][Doc] default output option for Parquet Scan Example
[ https://issues.apache.org/jira/browse/ARROW-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Muite updated ARROW-14275: - Summary: [C++][Parquet][Doc] default output option for Parquet Scan Example (was: should have default output option) > [C++][Parquet][Doc] default output option for Parquet Scan Example > -- > > Key: ARROW-14275 > URL: https://issues.apache.org/jira/browse/ARROW-14275 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation, Parquet >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > > [Parquet scan > example|https://github.com/apache/arrow/blob/master/cpp/examples/arrow/dataset_parquet_scan_example.cc] > should not fake success if no argument is given, but should instead create > a new directory in the current directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14276) [Packaging] Dependency resolution issues in the nightly conda builds
Krisztian Szucs created ARROW-14276: --- Summary: [Packaging] Dependency resolution issues in the nightly conda builds Key: ARROW-14276 URL: https://issues.apache.org/jira/browse/ARROW-14276 Project: Apache Arrow Issue Type: New Feature Components: Packaging Reporter: Krisztian Szucs Fix For: 6.0.0 The majority of the conda nightly builds are failing due to dependency resolution problems: {code} - conda-linux-gcc-py37-arm64: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py37-arm64 - conda-linux-gcc-py37-cpu-r41: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py37-cpu-r41 - conda-linux-gcc-py37-cuda: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py37-cuda - conda-linux-gcc-py38-arm64: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py38-arm64 - conda-linux-gcc-py38-cpu: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py38-cpu - conda-linux-gcc-py38-cuda: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py38-cuda - conda-linux-gcc-py39-arm64: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py39-arm64 - conda-linux-gcc-py39-cpu: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py39-cpu - conda-linux-gcc-py39-cuda: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-linux-gcc-py39-cuda - conda-win-vs2017-py36-r40: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-win-vs2017-py36-r40 - conda-win-vs2017-py38: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-win-vs2017-py38 - conda-win-vs2017-py39: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-10-11-0-azure-conda-win-vs2017-py39 {code} I assume that we need to sync the recipes again with up to date pin files. cc @uwe -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13710) [Doc][Cookbook] Sending and receiving data over a network using an Arrow Flight RPC server - Python
[ https://issues.apache.org/jira/browse/ARROW-13710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13710: --- Labels: pull-request-available (was: ) > [Doc][Cookbook] Sending and receiving data over a network using an Arrow > Flight RPC server - Python > --- > > Key: ARROW-13710 > URL: https://issues.apache.org/jira/browse/ARROW-13710 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Alessandro Molina >Assignee: Alessandro Molina >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14275) should have default output option
[ https://issues.apache.org/jira/browse/ARROW-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Muite updated ARROW-14275: - Component/s: Documentation > should have default output option > - > > Key: ARROW-14275 > URL: https://issues.apache.org/jira/browse/ARROW-14275 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation, Parquet >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > > [Parquet scan > example|https://github.com/apache/arrow/blob/master/cpp/examples/arrow/dataset_parquet_scan_example.cc] > should not fake success if no argument is given, but should instead create > a new directory in the current directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14275) should have default output option
Benson Muite created ARROW-14275: Summary: should have default output option Key: ARROW-14275 URL: https://issues.apache.org/jira/browse/ARROW-14275 Project: Apache Arrow Issue Type: Improvement Components: C++, Parquet Reporter: Benson Muite Assignee: Benson Muite [Parquet scan example|https://github.com/apache/arrow/blob/master/cpp/examples/arrow/dataset_parquet_scan_example.cc] should not fake success if no argument is given, but should instead create a new directory in the current directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14274) [C++] Upgrade vendored base64 code
[ https://issues.apache.org/jira/browse/ARROW-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427069#comment-17427069 ] Antoine Pitrou commented on ARROW-14274: I am not aware that base64 is performance critical currently. That said, I'm ok with improving the code, or even using a totally different implementation if desired. > [C++] Upgrade vendored base64 code > -- > > Key: ARROW-14274 > URL: https://issues.apache.org/jira/browse/ARROW-14274 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > > The vendored base64 code looks suboptimal. [1] > We should at least upgrade to latest upstream code which improved a lot. [2] > Maybe adopt more optimized implementation if base64 performance matters for > arrow. [3] > [1] > https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/base64.cpp#L49 > [2] https://github.com/ReneNyffenegger/cpp-base64/blob/master/base64.cpp#L129 > [3] https://github.com/aklomp/base64 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14274) [C++] Upgrade vendored base64 code
Yibo Cai created ARROW-14274: Summary: [C++] Upgrade vendored base64 code Key: ARROW-14274 URL: https://issues.apache.org/jira/browse/ARROW-14274 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Yibo Cai Assignee: Yibo Cai The vendored base64 code looks suboptimal. [1] We should at least upgrade to latest upstream code which improved a lot. [2] Maybe adopt more optimized implementation if base64 performance matters for arrow. [3] [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/base64.cpp#L49 [2] https://github.com/ReneNyffenegger/cpp-base64/blob/master/base64.cpp#L129 [3] https://github.com/aklomp/base64 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13730) [Doc][Cookbook] Adding a column to an existing Table - Python
[ https://issues.apache.org/jira/browse/ARROW-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13730: --- Labels: pull-request-available (was: ) > [Doc][Cookbook] Adding a column to an existing Table - Python > - > > Key: ARROW-13730 > URL: https://issues.apache.org/jira/browse/ARROW-13730 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Alessandro Molina >Assignee: Alessandro Molina >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13317) [Python] Improve documentation on what 'use_threads' does in 'read_feather'
[ https://issues.apache.org/jira/browse/ARROW-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-13317: -- Labels: documentation good-first-issue (was: documentation) > [Python] Improve documentation on what 'use_threads' does in 'read_feather' > --- > > Key: ARROW-13317 > URL: https://issues.apache.org/jira/browse/ARROW-13317 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 4.0.1 >Reporter: Arun Joseph >Priority: Trivial > Labels: documentation, good-first-issue > Fix For: 7.0.0 > > > The current documentation for > [read_feather|https://arrow.apache.org/docs/python/generated/pyarrow.feather.read_feather.html] > states the following: > *use_threads* (_bool__,_ _default True_) – Whether to parallelize reading > using multiple threads. > if the underlying file uses compression, then multiple threads can still be > spawned. The verbiage of the *use_threads* is ambiguous on whether the > restriction on multiple threads is only for the conversion from pyarrow to > the pandas dataframe vs the reading/decompression of the file itself which > might spawn additional threads. > [set_cpu_count|http://arrow.apache.org/docs/python/generated/pyarrow.set_cpu_count.html#pyarrow.set_cpu_count] > might be good to mention as a way to actually limit threads spawned -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13436) [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
[ https://issues.apache.org/jira/browse/ARROW-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-13436: -- Labels: good-first-issue (was: ) > [Python][Doc] Clarify what should be expected if read_table is passed an > empty list of columns > -- > > Key: ARROW-13436 > URL: https://issues.apache.org/jira/browse/ARROW-13436 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Weston Pace >Assignee: Weston Pace >Priority: Major > Labels: good-first-issue > > The documentation for pyarrow.parquet.read_table states: > > * *columns* (_list_) – If not None, only these columns will be read from the > file. A column name may be a prefix of a nested field, e.g. ‘a’ will select > ‘a.b’, ‘a.c’, and ‘a.d.e’. > > It is not clear what should be the expected result if columns is an empty > list. In pyarrow 3.0 this read in all columns (as long as > use_legacy_dataset=False). In pyarrow 4.0 this doesn't read in any columns. > I think this behavior (not reading in any columns) is the correct behavior > (since None can be used for all columns) but we should clarify that in the > docs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13525) [Python] Mention alternatives in deprecation message of ParquetDataset attributes
[ https://issues.apache.org/jira/browse/ARROW-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-13525: -- Fix Version/s: 6.0.0 > [Python] Mention alternatives in deprecation message of ParquetDataset > attributes > - > > Key: ARROW-13525 > URL: https://issues.apache.org/jira/browse/ARROW-13525 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Fix For: 5.0.1, 6.0.0 > > > Follow-up on ARROW-13074. > We should maybe also expose the {{partitioning}} attribute on ParquetDataset > (if constructed with {{use_legacy_dataset=False}}), as I did for the > {{filesystem}}/{{files}}/{{fragments}} attributes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13922) ParquetDataset throws error when len(path_or_paths) = 1
[ https://issues.apache.org/jira/browse/ARROW-13922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-13922: -- Labels: good-second-issue (was: ) > ParquetDataset throws error when len(path_or_paths) = 1 > --- > > Key: ARROW-13922 > URL: https://issues.apache.org/jira/browse/ARROW-13922 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Ashish Gupta >Assignee: Weston Pace >Priority: Major > Labels: good-second-issue > > > After updating pyarrow to version 5.0.0, ParquetDataset doesn't take a list > of length 1 for path_or_paths. Is this by design or a bug? > > {code:java} > In [1]: import pyarrow.parquet as pq > In [2]: import pandas as pd > In [3]: df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']}) > In [4]: df.to_parquet('test.parquet', index=False) > In [5]: pq.ParquetDataset('test.parquet', > use_legacy_dataset=False).read(use_threads=False).to_pandas() > Out[5]: >A B > 0 1 a > 1 2 b > 2 3 c > In [6]: pq.ParquetDataset(['test.parquet'], > use_legacy_dataset=False).read(use_threads=False).to_pandas() > --- > ValueErrorTraceback (most recent call last) > ValueError: cannot construct a FileSource from a path without a FileSystem > Exception ignored in: 'pyarrow._dataset._make_file_source' > Traceback (most recent call last): > File > "/data/install/anaconda3/lib/python3.8/site-packages/pyarrow/parquet.py", > line 1676, in __init__ > fragment = parquet_format.make_fragment(single_file, filesystem) > ValueError: cannot construct a FileSource from a path without a FileSystem > --- > ArrowInvalid Traceback (most recent call last) > in > > 1 pq.ParquetDataset(['test.parquet'], > use_legacy_dataset=False).read(use_threads=False).to_pandas()/data/install/anaconda3/lib/python3.8/site-packages/pyarrow/parquet.py > in __new__(cls, path_or_paths, filesystem, schema, metadata, > split_row_groups, validate_schema, filters, metadata_nthreads, > read_dictionary, memory_map, buffer_size, partitioning, use_legacy_dataset, > pre_buffer, coerce_int96_timestamp_unit) >1284 >1285 if not use_legacy_dataset: > -> 1286 return _ParquetDatasetV2( >1287 path_or_paths, filesystem=filesystem, >1288 > filters=filters,/data/install/anaconda3/lib/python3.8/site-packages/pyarrow/parquet.py > in __init__(self, path_or_paths, filesystem, filters, partitioning, > read_dictionary, buffer_size, memory_map, ignore_prefixes, pre_buffer, > coerce_int96_timestamp_unit, **kwargs) >1677 >1678 self._dataset = ds.FileSystemDataset( > -> 1679 [fragment], schema=fragment.physical_schema, >1680 format=parquet_format, >1681 > filesystem=fragment.filesystem/data/install/anaconda3/lib/python3.8/site-packages/pyarrow/_dataset.pyx > in > pyarrow._dataset.Fragment.physical_schema.__get__()/data/install/anaconda3/lib/python3.8/site-packages/pyarrow/error.pxi > in > pyarrow.lib.pyarrow_internal_check_status()/data/install/anaconda3/lib/python3.8/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status()ArrowInvalid: Called Open() on an uninitialized > FileSource > In [7]: pq.ParquetDataset(['test.parquet', 'test.parquet'], > use_legacy_dataset=False).read(use_threads=False).to_pandas() > Out[7]: >A B > 0 1 a > 1 2 b > 2 3 c > 3 1 a > 4 2 b > 5 3 c > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13735) [Python] Creating a Map array with non-default field names segfaults
[ https://issues.apache.org/jira/browse/ARROW-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-13735: -- Labels: good-second-issue (was: ) > [Python] Creating a Map array with non-default field names segfaults > > > Key: ARROW-13735 > URL: https://issues.apache.org/jira/browse/ARROW-13735 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > Labels: good-second-issue > Fix For: 6.0.0 > > > With ARROW-13696, you can create a MapType with non-default field names (the > default being "key" and "value"). > However, when then trying to create an array with it from python tuples, it > crashes: > {code:python} > >>> t = pa.map_(pa.field("name", "string", nullable=False), "int64") > >>> pa.array([[('a', 1), ('b', 2)], [('c', 3)]], type=t) > ../src/arrow/array/array_nested.cc:192: Check failed: > self->list_type_->value_type()->Equals(data->child_data[0]->type) > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xf0b882)[0x7f298d497882] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xf0b800)[0x7f298d497800] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xf0b822)[0x7f298d497822] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7f298d497b81] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xb39d31)[0x7f298d0c5d31] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow8MapArray7SetDataERKSt10shared_ptrINS_9ArrayDataEE+0x198)[0x7f298d0c06be] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow8MapArrayC1ERKSt10shared_ptrINS_9ArrayDataEE+0x64)[0x7f298d0bed14] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN9__gnu_cxx13new_allocatorIN5arrow8MapArrayEE9constructIS2_JRKSt10shared_ptrINS1_9ArrayDataEvPT_DpOT0_+0x49)[0x7f298d1a0f13] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt16allocator_traitsISaIN5arrow8MapArrayEEE9constructIS1_JRKSt10shared_ptrINS0_9ArrayDataEvRS2_PT_DpOT0_+0x38)[0x7f298d19ebe6] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow8MapArrayESaIS1_ELN9__gnu_cxx12_Lock_policyE2EEC1IJRKSt10shared_ptrINS0_9ArrayDataES2_DpOT_+0xaf)[0x7f298d19b547] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN5arrow8MapArrayESaIS5_EJRKSt10shared_ptrINS4_9ArrayDataERPT_St20_Sp_alloc_shared_tagIT0_EDpOT1_+0xb2)[0x7f298d195a64] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt12__shared_ptrIN5arrow8MapArrayELN9__gnu_cxx12_Lock_policyE2EEC2ISaIS1_EJRKSt10shared_ptrINS0_9ArrayDataESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x4c)[0x7f298d1918bc] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt10shared_ptrIN5arrow8MapArrayEEC1ISaIS1_EJRKS_INS0_9ArrayDataESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x39)[0x7f298d18f617] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZSt15allocate_sharedIN5arrow8MapArrayESaIS1_EJRKSt10shared_ptrINS0_9ArrayDataS3_IT_ERKT0_DpOT1_+0x38)[0x7f298d18d254] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZSt11make_sharedIN5arrow8MapArrayEJRKSt10shared_ptrINS0_9ArrayDataS2_IT_EDpOT0_+0x54)[0x7f298d1897b7] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xbf5d6a)[0x7f298d181d6a] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xbef0f3)[0x7f298d17b0f3] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x99)[0x7f298d173f6b] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow12ArrayBuilder6FinishEPSt10shared_ptrINS_5ArrayEE+0x115)[0x7f298d0e4ed9] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow12ArrayBuilder6FinishEv+0x47)[0x7f298d0e4fb7] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(+0x28cc91)[0x7f29d05d2c91] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(+0x292774)[0x7f29d05d8774] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(+0x28ca00)[0x7f29d05d2a00] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(+0x288f63)[0x7f29d05cef63] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(_ZN5arrow2py17ConvertPySequenceEP7_objectS2_NS0_19PyConversionOptionsEPNS_10MemoryPoolE+0xa9d)[0x7f29d05cadb7] > /home/joris/scipy/repos/arrow/python/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1c890d)[0x7f29d08f190d] > /home/joris/miniconda3/envs/arrow-dev/bin/python(PyCFunction_Call+0x54)[0x5581d331a814] > /home/joris/miniconda3/envs/arrow-dev/bin/python(_PyObject_MakeTpCall+0x31e)[0x5581d332988e] >
[jira] [Updated] (ARROW-13735) [Python] Creating a Map array with non-default field names segfaults
[ https://issues.apache.org/jira/browse/ARROW-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-13735: -- Labels: (was: good-second-issue) > [Python] Creating a Map array with non-default field names segfaults > > > Key: ARROW-13735 > URL: https://issues.apache.org/jira/browse/ARROW-13735 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > Fix For: 6.0.0 > > > With ARROW-13696, you can create a MapType with non-default field names (the > default being "key" and "value"). > However, when then trying to create an array with it from python tuples, it > crashes: > {code:python} > >>> t = pa.map_(pa.field("name", "string", nullable=False), "int64") > >>> pa.array([[('a', 1), ('b', 2)], [('c', 3)]], type=t) > ../src/arrow/array/array_nested.cc:192: Check failed: > self->list_type_->value_type()->Equals(data->child_data[0]->type) > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xf0b882)[0x7f298d497882] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xf0b800)[0x7f298d497800] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xf0b822)[0x7f298d497822] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7f298d497b81] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xb39d31)[0x7f298d0c5d31] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow8MapArray7SetDataERKSt10shared_ptrINS_9ArrayDataEE+0x198)[0x7f298d0c06be] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow8MapArrayC1ERKSt10shared_ptrINS_9ArrayDataEE+0x64)[0x7f298d0bed14] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN9__gnu_cxx13new_allocatorIN5arrow8MapArrayEE9constructIS2_JRKSt10shared_ptrINS1_9ArrayDataEvPT_DpOT0_+0x49)[0x7f298d1a0f13] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt16allocator_traitsISaIN5arrow8MapArrayEEE9constructIS1_JRKSt10shared_ptrINS0_9ArrayDataEvRS2_PT_DpOT0_+0x38)[0x7f298d19ebe6] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow8MapArrayESaIS1_ELN9__gnu_cxx12_Lock_policyE2EEC1IJRKSt10shared_ptrINS0_9ArrayDataES2_DpOT_+0xaf)[0x7f298d19b547] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN5arrow8MapArrayESaIS5_EJRKSt10shared_ptrINS4_9ArrayDataERPT_St20_Sp_alloc_shared_tagIT0_EDpOT1_+0xb2)[0x7f298d195a64] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt12__shared_ptrIN5arrow8MapArrayELN9__gnu_cxx12_Lock_policyE2EEC2ISaIS1_EJRKSt10shared_ptrINS0_9ArrayDataESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x4c)[0x7f298d1918bc] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZNSt10shared_ptrIN5arrow8MapArrayEEC1ISaIS1_EJRKS_INS0_9ArrayDataESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x39)[0x7f298d18f617] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZSt15allocate_sharedIN5arrow8MapArrayESaIS1_EJRKSt10shared_ptrINS0_9ArrayDataS3_IT_ERKT0_DpOT1_+0x38)[0x7f298d18d254] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZSt11make_sharedIN5arrow8MapArrayEJRKSt10shared_ptrINS0_9ArrayDataS2_IT_EDpOT0_+0x54)[0x7f298d1897b7] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xbf5d6a)[0x7f298d181d6a] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(+0xbef0f3)[0x7f298d17b0f3] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x99)[0x7f298d173f6b] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow12ArrayBuilder6FinishEPSt10shared_ptrINS_5ArrayEE+0x115)[0x7f298d0e4ed9] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.600(_ZN5arrow12ArrayBuilder6FinishEv+0x47)[0x7f298d0e4fb7] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(+0x28cc91)[0x7f29d05d2c91] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(+0x292774)[0x7f29d05d8774] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(+0x28ca00)[0x7f29d05d2a00] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(+0x288f63)[0x7f29d05cef63] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_python.so.600(_ZN5arrow2py17ConvertPySequenceEP7_objectS2_NS0_19PyConversionOptionsEPNS_10MemoryPoolE+0xa9d)[0x7f29d05cadb7] > /home/joris/scipy/repos/arrow/python/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1c890d)[0x7f29d08f190d] > /home/joris/miniconda3/envs/arrow-dev/bin/python(PyCFunction_Call+0x54)[0x5581d331a814] > /home/joris/miniconda3/envs/arrow-dev/bin/python(_PyObject_MakeTpCall+0x31e)[0x5581d332988e] >
[jira] [Commented] (ARROW-14260) [C++] GTest linker error with vcpkg and Visual Studio 2019
[ https://issues.apache.org/jira/browse/ARROW-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427008#comment-17427008 ] Antoine Pitrou commented on ARROW-14260: The CI job is building GTest using vcpkg and the CMake is building GTest from source during the Arrow build process. There's probably a mismatch between the two versions. > [C++] GTest linker error with vcpkg and Visual Studio 2019 > -- > > Key: ARROW-14260 > URL: https://issues.apache.org/jira/browse/ARROW-14260 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Ian Cook >Priority: Major > > The *test-build-vcpkg-win* nightly Crossbow job is failing with these linker > errors: > {code:java} > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) void __cdecl > testing::internal2::PrintBytesInObjectTo(unsigned char const *,unsigned > __int64,class std::basic_ostream > *)" > (__imp_?PrintBytesInObjectTo@internal2@testing@@YAXPEBE_KPEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@@Z) > referenced in function "class std::basic_ostream std::char_traits > & __cdecl testing::internal2::operator<< std::char_traits,class std::_Vector_iterator std::_Vector_val > > > >(class std::basic_ostream > &,class > std::_Vector_iterator arrow::compute::ExecNode *> > > const &)" > (??$?6DU?$char_traits@D@std@@V?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@1@@internal2@testing@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@AEAV23@AEBV?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@3@@Z) > > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > referenced in function "void __cdecl arrow::fs::AssertFileContents(class > arrow::fs::FileSystem *,class std::basic_string std::char_traits,class std::allocator > const &,class > std::basic_string,class > std::allocator > const &)" > (?AssertFileContents@fs@arrow@@YAXPEAVFileSystem@12@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@1@Z) > > unity_0_cxx.obj : error LNK2001: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > {code} > Link to the error where it occurs in the full log: > https://github.com/ursacomputing/crossbow/runs/3799925986#step:4:2737 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14260) [C++] GTest linker error with vcpkg and Visual Studio 2019
[ https://issues.apache.org/jira/browse/ARROW-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427007#comment-17427007 ] Antoine Pitrou commented on ARROW-14260: I would say it looks more like a GTest linking or packaging issue to me. cc [~kou] [~kszucs] > [C++] GTest linker error with vcpkg and Visual Studio 2019 > -- > > Key: ARROW-14260 > URL: https://issues.apache.org/jira/browse/ARROW-14260 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Ian Cook >Priority: Major > > The *test-build-vcpkg-win* nightly Crossbow job is failing with these linker > errors: > {code:java} > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) void __cdecl > testing::internal2::PrintBytesInObjectTo(unsigned char const *,unsigned > __int64,class std::basic_ostream > *)" > (__imp_?PrintBytesInObjectTo@internal2@testing@@YAXPEBE_KPEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@@Z) > referenced in function "class std::basic_ostream std::char_traits > & __cdecl testing::internal2::operator<< std::char_traits,class std::_Vector_iterator std::_Vector_val > > > >(class std::basic_ostream > &,class > std::_Vector_iterator arrow::compute::ExecNode *> > > const &)" > (??$?6DU?$char_traits@D@std@@V?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@1@@internal2@testing@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@AEAV23@AEBV?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@3@@Z) > > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > referenced in function "void __cdecl arrow::fs::AssertFileContents(class > arrow::fs::FileSystem *,class std::basic_string std::char_traits,class std::allocator > const &,class > std::basic_string,class > std::allocator > const &)" > (?AssertFileContents@fs@arrow@@YAXPEAVFileSystem@12@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@1@Z) > > unity_0_cxx.obj : error LNK2001: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > {code} > Link to the error where it occurs in the full log: > https://github.com/ursacomputing/crossbow/runs/3799925986#step:4:2737 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14260) [C++] GTest linker error with vcpkg and Visual Studio 2019
[ https://issues.apache.org/jira/browse/ARROW-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-14260: --- Summary: [C++] GTest linker error with vcpkg and Visual Studio 2019 (was: [C++] Linker error with Visual Studio 2019) > [C++] GTest linker error with vcpkg and Visual Studio 2019 > -- > > Key: ARROW-14260 > URL: https://issues.apache.org/jira/browse/ARROW-14260 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Ian Cook >Priority: Major > > The *test-build-vcpkg-win* nightly Crossbow job is failing with these linker > errors: > {code:java} > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) void __cdecl > testing::internal2::PrintBytesInObjectTo(unsigned char const *,unsigned > __int64,class std::basic_ostream > *)" > (__imp_?PrintBytesInObjectTo@internal2@testing@@YAXPEBE_KPEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@@Z) > referenced in function "class std::basic_ostream std::char_traits > & __cdecl testing::internal2::operator<< std::char_traits,class std::_Vector_iterator std::_Vector_val > > > >(class std::basic_ostream > &,class > std::_Vector_iterator arrow::compute::ExecNode *> > > const &)" > (??$?6DU?$char_traits@D@std@@V?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@1@@internal2@testing@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@AEAV23@AEBV?$_Vector_iterator@V?$_Vector_val@U?$_Simple_types@PEAVExecNode@compute@arrow@@@std@@@std@@@3@@Z) > > unity_1_cxx.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > referenced in function "void __cdecl arrow::fs::AssertFileContents(class > arrow::fs::FileSystem *,class std::basic_string std::char_traits,class std::allocator > const &,class > std::basic_string,class > std::allocator > const &)" > (?AssertFileContents@fs@arrow@@YAXPEAVFileSystem@12@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@1@Z) > > unity_0_cxx.obj : error LNK2001: unresolved external symbol > "__declspec(dllimport) class testing::AssertionResult __cdecl > testing::internal::CmpHelperEQ(char const *,char const *,__int64,__int64)" > (__imp_?CmpHelperEQ@internal@testing@@YA?AVAssertionResult@2@PEBD0_J1@Z) > {code} > Link to the error where it occurs in the full log: > https://github.com/ursacomputing/crossbow/runs/3799925986#step:4:2737 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11238) [Python] Make SubTreeFileSystem print method more informative
[ https://issues.apache.org/jira/browse/ARROW-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-11238: -- Labels: good-first-issue (was: ) > [Python] Make SubTreeFileSystem print method more informative > - > > Key: ARROW-11238 > URL: https://issues.apache.org/jira/browse/ARROW-11238 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Ian Cook >Priority: Minor > Labels: good-first-issue > > The {{SubTreeFileSystem}} class does not have a {{\_\_str\_\_}} or > {{\_\_repr\_\_}} method. Define these methods to show useful information when > these objects are printed, such as a filesystem URI including scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14213) R arrow package not working on RStudio/Ubuntu
[ https://issues.apache.org/jira/browse/ARROW-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426995#comment-17426995 ] Thomas Wutzler commented on ARROW-14213: The admins responded that these packages are installed with the following versions: ii libcurl3:amd64 7.47.0-1ubuntu2.19 amd64 easy-to-use client-side URL transfer library (OpenSSL flavour) ii libcurl3-gnutls:amd64 7.47.0-1ubuntu2.19 amd64 easy-to-use client-side URL transfer library (GnuTLS flavour) ii libcurl4-gnutls-dev:amd64 7.47.0-1ubuntu2.19 amd64 development files and documentation for libcurl (GnuTLS flavour) ii openssl 1.0.2g-1ubuntu4.20 amd64 Secure Sockets Layer toolkit - cryptographic utility > R arrow package not working on RStudio/Ubuntu > - > > Key: ARROW-14213 > URL: https://issues.apache.org/jira/browse/ARROW-14213 > Project: Apache Arrow > Issue Type: Bug > Components: R > Environment: R version 3.6.3 (2020-02-29) -- "Holding the Windsock" > Copyright (C) 2020 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) >Reporter: Thomas Wutzler >Priority: Major > > I try reading feather files in R with the arrow package that were generated > in Python. > I run on R 3.6.3 on an RStudio server window on linux machine, for which I > have no other access. I get the message: > {{Cannot call io___MemoryMappedFile__Open().}} > According to the advice in the linked help-file: > [https://cran.r-project.org/web/packages/arrow/vignettes/install.html] I > create this issue with the full log of the installation: > {{}} > > arrow::install_arrow(verbose = TRUE)Installing package into > > '/Net/Groups/BGI/scratch/twutz/R/atacama-library/3.6' > (as 'lib' is unspecified)trying URL > 'https://ftp5.gwdg.de/pub/misc/cran/src/contrib/arrow_5.0.0.2.tar.gz'Content > type 'application/octet-stream' length 483642 bytes (472 > KB)==downloaded 472 KB* > installing *source* package 'arrow' ...** package 'arrow' successfully > unpacked and MD5 sums checked** using staged installationtrying URL > 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/bin/ubuntu-16.04/arrow-5.0.0.2.zip'Content > type 'binary/octet-stream' length 17214781 bytes (16.4 > MB)==downloaded 16.4 MB*** > Successfully retrieved C++ binaries for ubuntu-16.04 > Binary package requires libcurl and openssl > If installation fails, retry after installing those system requirements > PKG_CFLAGS=-I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 > PKG_LIBS=-L/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/lib > -larrow_dataset -lparquet -larrow -larrow -larrow_bundled_dependencies > -larrow_dataset -lparquet -lssl -lcrypto -lcurl** libsg++ -std=gnu++11 > -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c RTasks.cpp -o RTasks.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c altrep.cpp -o altrep.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c array.cpp -o array.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -g -c array_to_vector.cpp -o array_to_vector.o > g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG > -I/tmp/RtmpXvu6Oc/R.INSTALL1451f6ede9ea2/arrow/libarrow/arrow-5.0.0.2/include > -DARROW_R_WITH_ARROW -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET > -DARROW_R_WITH_S3 -I../inst/include/-fpic -g -O2 > -fstack-protector-strong -Wformat -Werror=format-security
[jira] [Updated] (ARROW-14273) PlasmaClient::Contains should return false before the corresponding object is sealed
[ https://issues.apache.org/jira/browse/ARROW-14273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chimucong updated ARROW-14273: -- Component/s: C++ > PlasmaClient::Contains should return false before the corresponding object is > sealed > > > Key: ARROW-14273 > URL: https://issues.apache.org/jira/browse/ARROW-14273 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: chimucong >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14273) PlasmaClient::Contains should return false before the corresponding object is sealed
[ https://issues.apache.org/jira/browse/ARROW-14273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14273: --- Labels: pull-request-available (was: ) > PlasmaClient::Contains should return false before the corresponding object is > sealed > > > Key: ARROW-14273 > URL: https://issues.apache.org/jira/browse/ARROW-14273 > Project: Apache Arrow > Issue Type: Bug >Reporter: chimucong >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14273) PlasmaClient::Contains should return false before the corresponding object is sealed
chimucong created ARROW-14273: - Summary: PlasmaClient::Contains should return false before the corresponding object is sealed Key: ARROW-14273 URL: https://issues.apache.org/jira/browse/ARROW-14273 Project: Apache Arrow Issue Type: Bug Reporter: chimucong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14272) PlasmaClient::Contains should return false before the corresponding object is sealed
chimucong created ARROW-14272: - Summary: PlasmaClient::Contains should return false before the corresponding object is sealed Key: ARROW-14272 URL: https://issues.apache.org/jira/browse/ARROW-14272 Project: Apache Arrow Issue Type: Bug Components: C++ - Plasma Reporter: chimucong According to the doc([https://arrow.apache.org/docs/python/generated/pyarrow.plasma.PlasmaClient.html?highlight=contains#pyarrow.plasma.PlasmaClient.contains]), contains should return false before the corresponding object is sealed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14271) [Java] Inconsistent logic for type IDs in Union vectors
Roee Shlomo created ARROW-14271: --- Summary: [Java] Inconsistent logic for type IDs in Union vectors Key: ARROW-14271 URL: https://issues.apache.org/jira/browse/ARROW-14271 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 6.0.0 Reporter: Roee Shlomo The current logic for calculating the type IDs in UnionVector#getField and DenseUnionVector#getField is: # DenseUnionVector uses an increasing counter # UnionVector uses the ordinal of the type enum # Both completely ignore the type IDs provided at construction as part of fieldType (if provided) We encountered this inconsistency while testing a direct roundtrip of a union vector between pyarrow and Java with the C Data Interface ('direct' here means without using VectorSchemaRoot/RecordBatch). The identifiers for the type IDs differ after completing a roundtrip. -- This message was sent by Atlassian Jira (v8.3.4#803005)