date:20220422

[jira] [Resolved] (ARROW-16296) [GLib][Parquet] Add missing casts for GArrowRoundMode

2022-04-22 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-16296.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12971
[https://github.com/apache/arrow/pull/12971]

> [GLib][Parquet] Add missing casts for GArrowRoundMode
> -
>
> Key: ARROW-16296
> URL: https://issues.apache.org/jira/browse/ARROW-16296
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib, Parquet
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16296) [GLib][Parquet] Add missing casts for GArrowRoundMode

2022-04-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16296:
---
Labels: pull-request-available  (was: )

> [GLib][Parquet] Add missing casts for GArrowRoundMode
> -
>
> Key: ARROW-16296
> URL: https://issues.apache.org/jira/browse/ARROW-16296
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib, Parquet
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16296) [GLib][Parquet] Add missing casts for GArrowRoundMode

2022-04-22 Thread Kouhei Sutou (Jira)

Kouhei Sutou created ARROW-16296:


 Summary: [GLib][Parquet] Add missing casts for GArrowRoundMode
 Key: ARROW-16296
 URL: https://issues.apache.org/jira/browse/ARROW-16296
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib, Parquet
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16295) [CI][Release] verify-rc-source-windows still uses windows-2016

2022-04-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16295:
---
Labels: pull-request-available  (was: )

> [CI][Release] verify-rc-source-windows still uses windows-2016
> --
>
> Key: ARROW-16295
> URL: https://issues.apache.org/jira/browse/ARROW-16295
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> windows-2016 is deprecated: 
> https://github.com/actions/virtual-environments/issues/4312



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16295) [CI][Release] verify-rc-source-windows still uses windows-2016

2022-04-22 Thread Kouhei Sutou (Jira)

Kouhei Sutou created ARROW-16295:


 Summary: [CI][Release] verify-rc-source-windows still uses 
windows-2016
 Key: ARROW-16295
 URL: https://issues.apache.org/jira/browse/ARROW-16295
 Project: Apache Arrow
  Issue Type: Test
  Components: Continuous Integration
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


windows-2016 is deprecated: 
https://github.com/actions/virtual-environments/issues/4312



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16293) [CI][GLib] Tests are unstable

2022-04-22 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-16293.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12964
[https://github.com/apache/arrow/pull/12964]

> [CI][GLib] Tests are unstable
> -
>
> Key: ARROW-16293
> URL: https://issues.apache.org/jira/browse/ARROW-16293
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 1. macOS test is timed out because ccache cache isn't available: 
> https://github.com/apache/arrow/runs/6134456502?check_suite_focus=true
> 2. {{gparquet_row_group_metadata_equal()}} isn't stable on Windows: 
> https://github.com/apache/arrow/runs/6134457213?check_suite_focus=true#step:14:308



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16294) [C++] Improve performance of parquet readahead

2022-04-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16294:
---
Labels: pull-request-available  (was: )

> [C++] Improve performance of parquet readahead
> --
>
> Key: ARROW-16294
> URL: https://issues.apache.org/jira/browse/ARROW-16294
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Weston Pace
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The 7.0.0 readahead for parquet would read up to 256 row groups at once which 
> meant that, if the consumer were too slow, we would almost certainly run out 
> of memory.
> ARROW-15410 improved readahead as a whole and, in the process, changed 
> parquet so it's always  reading 1 row group in advance.
> This is not always ideal in S3 scenarios.  We may want to read many row 
> groups in advance if the row groups are small.  To fix this we should 
> continue reading in parallel until there are at least batch_size * 
> batch_readahead rows being fetched.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16291) [Java]: Support JSE17 for Java Cookbooks

2022-04-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16291:
---
Labels: pull-request-available  (was: )

> [Java]: Support JSE17 for Java Cookbooks
> 
>
> Key: ARROW-16291
> URL: https://issues.apache.org/jira/browse/ARROW-16291
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: David Dali Susanibar Arce
>Assignee: David Dali Susanibar Arce
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>
> Realize changes needed to run cookbooks through JSE17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16294) [C++] Improve performance of parquet readahead

2022-04-22 Thread David Li (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526714#comment-17526714
 ] 

David Li commented on ARROW-16294:
--

This is very similar to ARROW-14648 right? Or rather ARROW-14648 is the fully 
general solution?

> [C++] Improve performance of parquet readahead
> --
>
> Key: ARROW-16294
> URL: https://issues.apache.org/jira/browse/ARROW-16294
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Weston Pace
>Priority: Major
>
> The 7.0.0 readahead for parquet would read up to 256 row groups at once which 
> meant that, if the consumer were too slow, we would almost certainly run out 
> of memory.
> ARROW-15410 improved readahead as a whole and, in the process, changed 
> parquet so it's always  reading 1 row group in advance.
> This is not always ideal in S3 scenarios.  We may want to read many row 
> groups in advance if the row groups are small.  To fix this we should 
> continue reading in parallel until there are at least batch_size * 
> batch_readahead rows being fetched.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16294) [C++] Improve performance of parquet readahead

2022-04-22 Thread Weston Pace (Jira)

Weston Pace created ARROW-16294:
---

 Summary: [C++] Improve performance of parquet readahead
 Key: ARROW-16294
 URL: https://issues.apache.org/jira/browse/ARROW-16294
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Weston Pace


The 7.0.0 readahead for parquet would read up to 256 row groups at once which 
meant that, if the consumer were too slow, we would almost certainly run out of 
memory.

ARROW-15410 improved readahead as a whole and, in the process, changed parquet 
so it's always  reading 1 row group in advance.

This is not always ideal in S3 scenarios.  We may want to read many row groups 
in advance if the row groups are small.  To fix this we should continue reading 
in parallel until there are at least batch_size * batch_readahead rows being 
fetched.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-15410) [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread Weston Pace (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace resolved ARROW-15410.
-
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12228
[https://github.com/apache/arrow/pull/12228]

> [C++][Datasets] Improve memory usage of datasets API when scanning parquet
> --
>
> Key: ARROW-15410
> URL: https://issues.apache.org/jira/browse/ARROW-15410
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Weston Pace
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> This is a more targeted fix to improve memory usage when scanning parquet 
> files.  It is related to broader issues like ARROW-14648 but those will 
> likely take longer to fix.  The goal here is to make it possible to scan 
> large parquet datasets with many files where each file has reasonably sized 
> row groups (e.g. 1 million rows).  Currently we run out of memory scanning a 
> configuration as simple as:
> 21 parquet files
> Each parquet file has 10 million rows split into row groups of size 1 million



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16273) [C++] Valgrind error in arrow-compute-scalar-test

2022-04-22 Thread Weston Pace (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526712#comment-17526712
 ] 

Weston Pace commented on ARROW-16273:
-

I have not been able to reproduce this and it appears the nightly valgrind is 
now passing.  I'm not sure if some issue got fixed concurrently or if this is 
just flaky.

> [C++] Valgrind error in arrow-compute-scalar-test
> -
>
> Key: ARROW-16273
> URL: https://issues.apache.org/jira/browse/ARROW-16273
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Weston Pace
>Priority: Major
>
> Currently valgrind is failing earlier on the tpch-node-test and 
> hash-join-node-test.  Once we fix those tests it seems the next error is this:
> {noformat}
> [ RUN  ] TestStringKernels/0.Strptime
> ==9928== Conditional jump or move depends on uninitialised value(s)
> ==9928==at 0x411AEA2: arrow::TestInitialized(arrow::ArrayData const&) 
> (gtest_util.cc:682)
> ==9928==by 0xAE1C79: arrow::compute::(anonymous 
> namespace)::ValidateOutput(arrow::ArrayData const&) (test_util.cc:287)
> ==9928==by 0xAE23FC: arrow::compute::ValidateOutput(arrow::Datum const&) 
> (test_util.cc:320)
> ==9928==by 0xAE4946: 
> arrow::compute::CheckScalarNonRecursive(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::vector > const&, arrow::Datum 
> const&, arrow::compute::FunctionOptions const*) (test_util.cc:80)
> ==9928==by 0xAE63A4: 
> arrow::compute::CheckScalar(std::__cxx11::basic_string std::char_traits, std::allocator >, std::vector std::allocator > const&, arrow::Datum, 
> arrow::compute::FunctionOptions const*) (test_util.cc:108)
> ==9928==by 0xAE7E28: 
> arrow::compute::CheckScalarUnary(std::__cxx11::basic_string std::char_traits, std::allocator >, arrow::Datum, arrow::Datum, 
> arrow::compute::FunctionOptions const*) (test_util.cc:254)
> ==9928==by 0xAE80D3: 
> arrow::compute::CheckScalarUnary(std::__cxx11::basic_string std::char_traits, std::allocator >, 
> std::shared_ptr, std::__cxx11::basic_string std::char_traits, std::allocator >, 
> std::shared_ptr, std::__cxx11::basic_string std::char_traits, std::allocator >, 
> arrow::compute::FunctionOptions const*) (test_util.cc:260)
> ==9928==by 0x9F783F: 
> arrow::compute::BaseTestStringKernels::CheckUnary(std::__cxx11::basic_string  std::char_traits, std::allocator >, 
> std::__cxx11::basic_string, std::allocator 
> >, std::shared_ptr, std::__cxx11::basic_string std::char_traits, std::allocator >, 
> arrow::compute::FunctionOptions const*) (scalar_string_test.cc:56)
> ==9928==by 0xA2A62D: 
> arrow::compute::TestStringKernels_Strptime_Test::TestBody()
>  (scalar_string_test.cc:1855)
> ==9928==by 0x64974DC: void 
> testing::internal::HandleSehExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607)
> ==9928==by 0x648E90C: void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643)
> ==9928==by 0x6469CDC: testing::Test::Run() (gtest.cc:2682)
> ==9928==by 0x646A6FE: testing::TestInfo::Run() (gtest.cc:2861)
> ==9928==by 0x646B0BD: testing::TestSuite::Run() (gtest.cc:3015)
> ==9928==by 0x647B1DB: testing::internal::UnitTestImpl::RunAllTests() 
> (gtest.cc:5855)
> ==9928==by 0x6498497: bool 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2607)
> ==9928==by 0x648FAF9: bool 
> testing::internal::HandleExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2643)
> ==9928==by 0x64796A8: testing::UnitTest::Run() (gtest.cc:5438)
> ==9928==by 0x4204918: RUN_ALL_TESTS() (gtest.h:2490)
> ==9928==by 0x420495B: main (gtest_main.cc:52)
> ==9928== 
> {
>
>Memcheck:Cond
>fun:_ZN5arrow15TestInitializedERKNS_9ArrayDataE
>fun:_ZN5arrow7compute12_GLOBAL__N_114ValidateOutputERKNS_9ArrayDataE
>fun:_ZN5arrow7compute14ValidateOutputERKNS_5DatumE
>
> fun:_ZN5arrow7compute23CheckScalarNonRecursiveERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaISA_EERKSA_PKNS0_15FunctionOptionsE
>
> fun:_ZN5arrow7compute11CheckScalarENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaIS8_EES8_PKNS0_15FunctionOptionsE
>
> fun:_ZN5arrow7compute16CheckScalarUnaryENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_5DatumES7_PKNS0_15FunctionOptionsE
>
> fun:_ZN5arrow7compute16CheckScalarUnaryENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10shared_ptrINS_8DataTypeEES6_S9_S6_PKNS0_15FunctionOptionsE
>
> fun:_ZN5arrow7

[jira] [Resolved] (ARROW-16264) [C++][CI] Valgrind timeout in arrow-compute-hash-join-node-test

2022-04-22 Thread Weston Pace (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace resolved ARROW-16264.
-
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12944
[https://github.com/apache/arrow/pull/12944]

> [C++][CI] Valgrind timeout in arrow-compute-hash-join-node-test
> ---
>
> Key: ARROW-16264
> URL: https://issues.apache.org/jira/browse/ARROW-16264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Weston Pace
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is starting to show up once we fixed the valgrind errors in the tpch 
> node test.
> https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=23628&view=results



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (ARROW-16173) [C++] Add benchmarks for temporal functions/kernels

2022-04-22 Thread Rok Mihevc (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-16173:
--

Assignee: Rok Mihevc

> [C++] Add benchmarks for temporal functions/kernels
> ---
>
> Key: ARROW-16173
> URL: https://issues.apache.org/jira/browse/ARROW-16173
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: David Li
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: good-second-issue, kernel
>
> See ML: https://lists.apache.org/thread/bp2f036sgfj72o46yqmglnx20zfc6tfq



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16234) [C++] Implement Rank Kernel

2022-04-22 Thread David Li (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526694#comment-17526694
 ] 

David Li commented on ARROW-16234:
--

Ah, sounds reasonable.

> [C++] Implement Rank Kernel
> ---
>
> Key: ARROW-16234
> URL: https://issues.apache.org/jira/browse/ARROW-16234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Will Ayd
>Assignee: Will Ayd
>Priority: Minor
>  Labels: C++, good-second-issue, kernel, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Didn't see this in the library already so apologies if overlooked, but I 
> think it would be nice to add a compute kernel for ranking. Here is a similar 
> function in pandas:
> [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16293) [CI][GLib] Tests are unstable

2022-04-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16293:
---
Labels: pull-request-available  (was: )

> [CI][GLib] Tests are unstable
> -
>
> Key: ARROW-16293
> URL: https://issues.apache.org/jira/browse/ARROW-16293
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1. macOS test is timed out because ccache cache isn't available: 
> https://github.com/apache/arrow/runs/6134456502?check_suite_focus=true
> 2. {{gparquet_row_group_metadata_equal()}} isn't stable on Windows: 
> https://github.com/apache/arrow/runs/6134457213?check_suite_focus=true#step:14:308



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16240) [Python] Support row_group_size/chunk_size keyword in pq.write_to_dataset with use_legacy_dataset=False

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16240.
-
Resolution: Fixed

Issue resolved by pull request 12955
[https://github.com/apache/arrow/pull/12955]

> [Python] Support row_group_size/chunk_size keyword in pq.write_to_dataset 
> with use_legacy_dataset=False
> ---
>
> Key: ARROW-16240
> URL: https://issues.apache.org/jira/browse/ARROW-16240
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Alenka Frim
>Assignee: Alenka Frim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The {{pq.write_to_dataset}} (legacy implementation) supports the 
> {{row_group_size}}/{{chunk_size}} keyword to specify the row group size of 
> the written parquet files.
> Now that we made {{use_legacy_dataset=False}} the default, this keyword 
> doesn't work anymore.
> This is because {{dataset.write_dataset(..)}} doesn't support the parquet 
> {{row_group_size}} keyword. The {{ParquetFileWriteOptions}} class doesn't 
> support this keyword. 
> On the parquet side, this is also the only keyword that is not passed to the 
> {{ParquetWriter}} init (and thus to parquet's {{WriterProperties}} or 
> {{ArrowWriterProperties}}), but to the actual {{write_table}} call. In C++ 
> this can be seen at 
> https://github.com/apache/arrow/blob/76d064c729f5e2287bf2a2d5e02d1fb192ae5738/cpp/src/parquet/arrow/writer.h#L62-L71
> See discussion: 
> [https://github.com/apache/arrow/pull/12811#discussion_r845304218]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16293) [CI][GLib] Tests are unstable

2022-04-22 Thread Kouhei Sutou (Jira)

Kouhei Sutou created ARROW-16293:


 Summary: [CI][GLib] Tests are unstable
 Key: ARROW-16293
 URL: https://issues.apache.org/jira/browse/ARROW-16293
 Project: Apache Arrow
  Issue Type: Test
  Components: Continuous Integration, GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


1. macOS test is timed out because ccache cache isn't available: 
https://github.com/apache/arrow/runs/6134456502?check_suite_focus=true
2. {{gparquet_row_group_metadata_equal()}} isn't stable on Windows: 
https://github.com/apache/arrow/runs/6134457213?check_suite_focus=true#step:14:308



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16234) [C++] Implement Rank Kernel

2022-04-22 Thread Will Ayd (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526689#comment-17526689
 ] 

Will Ayd commented on ARROW-16234:
--

I was thinking they wouldn't - the returning array would just give back NULL 
where NULL was initially provided. You'll see this in the pandas docs as 
"na_option"

 

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html

> [C++] Implement Rank Kernel
> ---
>
> Key: ARROW-16234
> URL: https://issues.apache.org/jira/browse/ARROW-16234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Will Ayd
>Assignee: Will Ayd
>Priority: Minor
>  Labels: C++, good-second-issue, kernel, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Didn't see this in the library already so apologies if overlooked, but I 
> think it would be nice to add a compute kernel for ranking. Here is a similar 
> function in pandas:
> [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-15542) [GLib][Parquet] Add GParquet*Metadata

2022-04-22 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-15542.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

We can use {{parquet_arrow_file_reader_get_metadata()}}, 
{{gparquet_file_metadata_get_row_group()}}, 
{{gparquet_row_group_metadata_get_column_chunk()}}, 
{{gparquet_column_chunk_metadata_get_statistics()}}, 
{{gparquet_statistics_get_n_distinct_values()}}, 
{{gparquet_*_statistics_get_min()}} and {{gparquet_*_statistics_get_max()}} for 
this.

> [GLib][Parquet] Add GParquet*Metadata
> -
>
> Key: ARROW-15542
> URL: https://issues.apache.org/jira/browse/ARROW-15542
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: GLib
>Affects Versions: 6.0.1
>Reporter: FrankJiao
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 8.0.0
>
>
> how to read ColumnChunkMetaData in parquet-glib?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16251) [GLib][Parquet] Add GParquetStatistics

2022-04-22 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-16251.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12953
[https://github.com/apache/arrow/pull/12953]

> [GLib][Parquet] Add GParquetStatistics
> --
>
> Key: ARROW-16251
> URL: https://issues.apache.org/jira/browse/ARROW-16251
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: GLib, Parquet
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-12885) [C++] Error: template with C linkage template

2022-04-22 Thread Kouhei Sutou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526683#comment-17526683
 ] 

Kouhei Sutou commented on ARROW-12885:
--

Thanks for the information.
It seems that we need something like 
https://github.com/protocolbuffers/protobuf/pull/9065/files .

> [C++] Error: template with C linkage template 
> ---
>
> Key: ARROW-12885
> URL: https://issues.apache.org/jira/browse/ARROW-12885
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: IBM i | AS400 | AIX
>Reporter: Menno
>Priority: Major
> Attachments: 2021-05-26 16_31_09-Window.png, thrift_ep-build-err.log
>
>
> When installing arrow on IBM i it fails the install at the thrift dependency 
> install with the following output:
> !2021-05-26 16_31_09-Window.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Kouhei Sutou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526678#comment-17526678
 ] 

Kouhei Sutou commented on ARROW-16282:
--

We have an issue for this: ARROW-16176

I don't think that this is a blocker of 8.0.0 because we didn't have support 
for Ubuntu 22.04 (OpenSSL 3) in the previous release. This is not break a 
backward compatibility. This is just a new feature.

> [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu 
> to 22.04
> -
>
> Key: ARROW-16282
> URL: https://issues.apache.org/jira/browse/ARROW-16282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Raúl Cumplido
>Assignee: Jacob Wujciak-Jens
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
> 22.04 and we can see how the nightly release job has been failing since then.
> Working for ubuntu 20.04 on 2022-04-08:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]
> Failing for ubuntu 22.04 on 2022-04-09:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]
> The error seems to be related with missing libssl:
> {code:java}
>  ===
> Build and test C# libraries
> ===
> └ Ensuring that C# is installed...
> └ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
> -
> SDK Version: 3.1.405Telemetry
> -
> The .NET Core tools collect usage data in order to help us improve your 
> experience. It is collected by Microsoft and shared with the community. You 
> can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT 
> environment variable to '1' or 'true' using your favorite shell.Read more 
> about .NET Core CLI Tools telemetry: 
> https://aka.ms/dotnet-cli-telemetry
> Explore documentation: https://aka.ms/dotnet-docs
> Report issues and find source on GitHub: https://github.com/dotnet/core
> Find out what's new: https://aka.ms/dotnet-whats-new
> Learn about the installed HTTPS developer cert: 
> https://aka.ms/aspnet-core-https
> Use 'dotnet --help' to see available commands or visit: 
> https://aka.ms/dotnet-cli-docs
> Write your first app: https://aka.ms/first-net-core-app
> --
> No usable version of libssl was found
> /arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted       
>           (core dumped) dotnet tool install --tool-path ${csharp_bin} 
> sourcelink
> Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
> 134
> Error: `docker-compose --file 
> /home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
> VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
> ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
> above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-15757) [Python] Missing bindings for existing_data_behavior makes it impossible to maintain old behavior

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-15757.
-
Resolution: Fixed

Issue resolved by pull request 12838
[https://github.com/apache/arrow/pull/12838]

> [Python] Missing bindings for existing_data_behavior makes it impossible to 
> maintain old behavior 
> --
>
> Key: ARROW-15757
> URL: https://issues.apache.org/jira/browse/ARROW-15757
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Parquet, Python
>Affects Versions: 7.0.0
>Reporter: christophe bagot
>Assignee: Alenka Frim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0, 7.0.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Shouldn't the missing bindings reported earlier in 
> [https://github.com/apache/arrow/pull/11632] be propagated higher up [here in 
> the parquet.py 
> module|https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L2217]?
> Passing **kwargs as is the case for {{write_table}} would do the trick I 
> think.
> I am finding myself stuck while using pandas.to_parquet with 
> {{use_legacy_dataset=false}} and no way to set the {{existing_data_behavior}} 
> flag to {{overwrite_or_ignore}}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16291) [Java]: Support JSE17 for Java Cookbooks

2022-04-22 Thread David Dali Susanibar Arce (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Dali Susanibar Arce updated ARROW-16291:
--
Fix Version/s: 9.0.0

> [Java]: Support JSE17 for Java Cookbooks
> 
>
> Key: ARROW-16291
> URL: https://issues.apache.org/jira/browse/ARROW-16291
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: David Dali Susanibar Arce
>Assignee: David Dali Susanibar Arce
>Priority: Major
> Fix For: 9.0.0
>
>
> Realize changes needed to run cookbooks through JSE17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16292) [Java][Doc]: Upgrade java documentation for JSE17

2022-04-22 Thread David Dali Susanibar Arce (Jira)

David Dali Susanibar Arce created ARROW-16292:
-

 Summary: [Java][Doc]: Upgrade java documentation for JSE17
 Key: ARROW-16292
 URL: https://issues.apache.org/jira/browse/ARROW-16292
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Documentation, Java
Affects Versions: 9.0.0
Reporter: David Dali Susanibar Arce
Assignee: David Dali Susanibar Arce


Document  changes needed to support JSE17:
 # Changed for arrow side: Changes related to {{--add-exports"}} are needed to 
continue supporting erroProne base on JSE11+ [installation 
doc|https://errorprone.info/docs/installation]. It mean you won't need this 
changes if you run arrow java building code without errorProne validation (mvn 
clean install -P-error-prone-jdk11+ )

 # Changes as a user of arrow: If the user are planning to use Arrow with JSE17 
is needed to pass modules needed. For example if I run cookbook for IO 
[https://arrow.apache.org/cookbook/java/io.html] it finished with an error 
mention {{Unable to make field long java.nio.Buffer.address accessible: module 
java.base does not "opens java.nio" to unnamed module}} for that reason as a 
user for JSE17 (not for arrow changes) is needed to add VM arguments as {{-ea 
--add-opens=java.base/java.nio=ALL-UNNAMED}} and it will finished without 
errors.

 

This ticket are related with 
https://github.com/apache/arrow/pull/12941#pullrequestreview-950090643



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16291) [Java]: Support JSE17 for Java Cookbooks

2022-04-22 Thread David Dali Susanibar Arce (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Dali Susanibar Arce updated ARROW-16291:
--
Component/s: Java

> [Java]: Support JSE17 for Java Cookbooks
> 
>
> Key: ARROW-16291
> URL: https://issues.apache.org/jira/browse/ARROW-16291
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: David Dali Susanibar Arce
>Assignee: David Dali Susanibar Arce
>Priority: Major
>
> Realize changes needed to run cookbooks through JSE17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16291) [Java]: Support JSE17 for Java Cookbooks

2022-04-22 Thread David Dali Susanibar Arce (Jira)

David Dali Susanibar Arce created ARROW-16291:
-

 Summary: [Java]: Support JSE17 for Java Cookbooks
 Key: ARROW-16291
 URL: https://issues.apache.org/jira/browse/ARROW-16291
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: David Dali Susanibar Arce
Assignee: David Dali Susanibar Arce


Realize changes needed to run cookbooks through JSE17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16234) [C++] Implement Rank Kernel

2022-04-22 Thread David Li (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526665#comment-17526665
 ] 

David Li commented on ARROW-16234:
--

Not sure 'ignore nulls' works for sorting in general, how would nulls get 
compared to other items?

> [C++] Implement Rank Kernel
> ---
>
> Key: ARROW-16234
> URL: https://issues.apache.org/jira/browse/ARROW-16234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Will Ayd
>Assignee: Will Ayd
>Priority: Minor
>  Labels: C++, good-second-issue, kernel, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Didn't see this in the library already so apologies if overlooked, but I 
> think it would be nice to add a compute kernel for ranking. Here is a similar 
> function in pandas:
> [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16234) [C++] Implement Rank Kernel

2022-04-22 Thread David Li (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526664#comment-17526664
 ] 

David Li commented on ARROW-16234:
--

We could offer function options for different modes which will also impact the 
return type. This is what we do for other similar kernels (e.g. quantile I 
think where you can choose to interpolate and get a float, or not interpolate 
and get the original data type).

> [C++] Implement Rank Kernel
> ---
>
> Key: ARROW-16234
> URL: https://issues.apache.org/jira/browse/ARROW-16234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Will Ayd
>Assignee: Will Ayd
>Priority: Minor
>  Labels: C++, good-second-issue, kernel, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Didn't see this in the library already so apologies if overlooked, but I 
> think it would be nice to add a compute kernel for ranking. Here is a similar 
> function in pandas:
> [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16280) [C++] Avoid copying shared_ptr in Expression::type()

2022-04-22 Thread David Li (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li resolved ARROW-16280.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12957
[https://github.com/apache/arrow/pull/12957]

> [C++] Avoid copying shared_ptr in Expression::type()
> 
>
> Key: ARROW-16280
> URL: https://issues.apache.org/jira/browse/ARROW-16280
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Tobias Zagorni
>Assignee: Tobias Zagorni
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Split off from ARROW-16161, since this is a fairly straightforward fix and 
> completely independent of ExecBatch.
> Expression::type() currently copies a shared_ptr, while the return 
> value is often used directly. We can avoid copying the shared_ptr, by 
> returning a reference to it. This reduces thread contention on these 
> shared_ptrs (ARROW-16161).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16234) [C++] Implement Rank Kernel

2022-04-22 Thread Will Ayd (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526659#comment-17526659
 ] 

Will Ayd commented on ARROW-16234:
--

I think we also need to consider how to handle NULL. In my current design I was 
thinking we should delegate as much responsibility to the standard sorting 
behavior, but AFAICT there are only SortOptions to rank NULLs at the start or 
the end, not necessarily to ignore NULL altogether. If we want to completely 
remove NULL from being calculated in the ranking algorithm I wonder if we 
should try and work that up the class hierarchy a bit to to the same thing in 
general sorting

> [C++] Implement Rank Kernel
> ---
>
> Key: ARROW-16234
> URL: https://issues.apache.org/jira/browse/ARROW-16234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Will Ayd
>Assignee: Will Ayd
>Priority: Minor
>  Labels: C++, good-second-issue, kernel, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Didn't see this in the library already so apologies if overlooked, but I 
> think it would be nice to add a compute kernel for ranking. Here is a similar 
> function in pandas:
> [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16234) [C++] Implement Rank Kernel

2022-04-22 Thread Will Ayd (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526658#comment-17526658
 ] 

Will Ayd commented on ARROW-16234:
--

I pushed up a rough draft for this on GH just to make sure the foundation was 
right. However, I'm wondering if you think we should mirror what pandas does in 
cases of ties or pick another default. Pandas interpolates an average for tied 
rankings by default, which of course is going to change our returned data type. 
Not sure if we want to stray from the integral return value as a default or 
instead pick another thing like dense ranking

> [C++] Implement Rank Kernel
> ---
>
> Key: ARROW-16234
> URL: https://issues.apache.org/jira/browse/ARROW-16234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Will Ayd
>Assignee: Will Ayd
>Priority: Minor
>  Labels: C++, good-second-issue, kernel, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Didn't see this in the library already so apologies if overlooked, but I 
> think it would be nice to add a compute kernel for ranking. Here is a similar 
> function in pandas:
> [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16289) [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE encoded arrays

2022-04-22 Thread Eduardo Ponce (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526656#comment-17526656
 ] 

Eduardo Ponce commented on ARROW-16289:
---

The term Scalar is used in different (but related) contexts. For example, the 
notion of a Scalar value, Scalar kernels, Scalar expressions, etc.

I recall from an ad-hoc conversation last year where it was discussed that we 
should consider treating Scalars as a 1-element Array to making the compute 
layer logic more straightforward. The front-end API would still have the 
concept of a Scalar but it would be disguised as an Array for execution 
purposes.

I think such a proposal has its merits, but we should ensure where the concept 
of Scalar will remain and make these distinctions clear.

> [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE 
> encoded arrays
> 
>
> Key: ARROW-16289
> URL: https://issues.apache.org/jira/browse/ARROW-16289
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Priority: Major
>
> This JIRA is a proposal / discussion.  I am not asserting this is the way to 
> go but I would like to consider it.
> From the execution engine's perspective an exec batch's columns are always 
> either arrays or scalars.  The only time we make use of scalars today is for 
> the four augmented columns (e.g. __filename).  Once we have support for RLE 
> arrays a scalar could easily be encoded as an RLE array and there would be no 
> need to use scalars here.
> The advantage would be reducing the complexity in exec nodes and avoiding 
> issues like ARROW-16288.  It is already rather difficult to explain the idea 
> of a "scalar" and "vector" function and then have to turn around and explain 
> that the word "scalar" has an entirely different meaning when talking about 
> field shape.
> I think it's worth considering taking this even further and removing the 
> concept from the compute layer entirely.  Kernel functions that want to have 
> special logic for scalars could do so using the RLE array.  This would be a 
> significant change to many kernels which currently declare the ANY shape and 
> determine which logic to apply within the kernel itself (e.g. there is one 
> array OR scalar kernel and not one kernel for each).
> Admittedly there is probably a few instructions and a few bytes more to 
> handle an RLE scalar than the scalar we have today.  However, this is just 
> different flavors of O(1) and not likely to have significant impact.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16234) [C++] Implement Rank Kernel

2022-04-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16234:
---
Labels: C++ good-second-issue kernel pull-request-available  (was: C++ 
good-second-issue kernel)

> [C++] Implement Rank Kernel
> ---
>
> Key: ARROW-16234
> URL: https://issues.apache.org/jira/browse/ARROW-16234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Will Ayd
>Assignee: Will Ayd
>Priority: Minor
>  Labels: C++, good-second-issue, kernel, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Didn't see this in the library already so apologies if overlooked, but I 
> think it would be nice to add a compute kernel for ranking. Here is a similar 
> function in pandas:
> [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16290) [C++] ExecuteScalarExpression, when calling a nullary function on a nullary batch, resets the batch length to 1

2022-04-22 Thread Weston Pace (Jira)

Weston Pace created ARROW-16290:
---

 Summary: [C++] ExecuteScalarExpression, when calling a nullary 
function on a nullary batch, resets the batch length to 1
 Key: ARROW-16290
 URL: https://issues.apache.org/jira/browse/ARROW-16290
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


At the moment ARROW-16286 prevents us from using ExecuteScalarExpression on 
nullary functions.  However, if we bypass constant folding, then we run into 
another problem.  The batch passed to the function always has length = 1.

This appears to be tied up with the logic of ExecBatchIterator that I don't 
quite follow entirely.  However, we should be preserving the batch length of 
the input to ExecuteScalarExpression and passing that to the function.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16289) [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE encoded arrays

2022-04-22 Thread David Li (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526637#comment-17526637
 ] 

David Li commented on ARROW-16289:
--

The concept of scalars would still exist (e.g. in expressions, options) so 
there's still potential for confusion though this would reduce it. Aggregations 
would presumably still return scalars, too.

It does seem being able to accept scalars is more confusing than it's worth, 
though.

> [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE 
> encoded arrays
> 
>
> Key: ARROW-16289
> URL: https://issues.apache.org/jira/browse/ARROW-16289
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Priority: Major
>
> This JIRA is a proposal / discussion.  I am not asserting this is the way to 
> go but I would like to consider it.
> From the execution engine's perspective an exec batch's columns are always 
> either arrays or scalars.  The only time we make use of scalars today is for 
> the four augmented columns (e.g. __filename).  Once we have support for RLE 
> arrays a scalar could easily be encoded as an RLE array and there would be no 
> need to use scalars here.
> The advantage would be reducing the complexity in exec nodes and avoiding 
> issues like ARROW-16288.  It is already rather difficult to explain the idea 
> of a "scalar" and "vector" function and then have to turn around and explain 
> that the word "scalar" has an entirely different meaning when talking about 
> field shape.
> I think it's worth considering taking this even further and removing the 
> concept from the compute layer entirely.  Kernel functions that want to have 
> special logic for scalars could do so using the RLE array.  This would be a 
> significant change to many kernels which currently declare the ANY shape and 
> determine which logic to apply within the kernel itself (e.g. there is one 
> array OR scalar kernel and not one kernel for each).
> Admittedly there is probably a few instructions and a few bytes more to 
> handle an RLE scalar than the scalar we have today.  However, this is just 
> different flavors of O(1) and not likely to have significant impact.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-15015) [R] Test / CI flag for ensuring all tests are run?

2022-04-22 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane resolved ARROW-15015.

Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12940
[https://github.com/apache/arrow/pull/12940]

> [R] Test / CI flag for ensuring all tests are run?
> --
>
> Key: ARROW-15015
> URL: https://issues.apache.org/jira/browse/ARROW-15015
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Jonathan Keane
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We frequently skip [tests that depend on features that are not 
> available|https://github.com/apache/arrow/blob/b9ac245afef081339093cd1930153d6b18b0479d/r/tests/testthat/helper-skip.R#L24-L34]
>  which is nice (especially for CRAN tests, where the features might not be 
> buildable.
> But should we have a CI flag that we could turn on (that is, off by default) 
> that forces all of these tests to be run so we can know there is at least one 
> build that runs all the tests? AFAICT, right now if we have no CI that 
> successfully builds parquet, for example, most of our parquet tests would 
> silently not run.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16289) [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE encoded arrays

2022-04-22 Thread Weston Pace (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526627#comment-17526627
 ] 

Weston Pace commented on ARROW-16289:
-

CC [~lidavidm] [~edponce] [~apitrou] [~michalno] [~yibocai]

> [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE 
> encoded arrays
> 
>
> Key: ARROW-16289
> URL: https://issues.apache.org/jira/browse/ARROW-16289
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Priority: Major
>
> This JIRA is a proposal / discussion.  I am not asserting this is the way to 
> go but I would like to consider it.
> From the execution engine's perspective an exec batch's columns are always 
> either arrays or scalars.  The only time we make use of scalars today is for 
> the four augmented columns (e.g. __filename).  Once we have support for RLE 
> arrays a scalar could easily be encoded as an RLE array and there would be no 
> need to use scalars here.
> The advantage would be reducing the complexity in exec nodes and avoiding 
> issues like ARROW-16288.  It is already rather difficult to explain the idea 
> of a "scalar" and "vector" function and then have to turn around and explain 
> that the word "scalar" has an entirely different meaning when talking about 
> field shape.
> I think it's worth considering taking this even further and removing the 
> concept from the compute layer entirely.  Kernel functions that want to have 
> special logic for scalars could do so using the RLE array.  This would be a 
> significant change to many kernels which currently declare the ANY shape and 
> determine which logic to apply within the kernel itself (e.g. there is one 
> array OR scalar kernel and not one kernel for each).
> Admittedly there is probably a few instructions and a few bytes more to 
> handle an RLE scalar than the scalar we have today.  However, this is just 
> different flavors of O(1) and not likely to have significant impact.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16282:
---
Labels: pull-request-available  (was: )

> [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu 
> to 22.04
> -
>
> Key: ARROW-16282
> URL: https://issues.apache.org/jira/browse/ARROW-16282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Raúl Cumplido
>Assignee: Jacob Wujciak-Jens
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
> 22.04 and we can see how the nightly release job has been failing since then.
> Working for ubuntu 20.04 on 2022-04-08:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]
> Failing for ubuntu 22.04 on 2022-04-09:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]
> The error seems to be related with missing libssl:
> {code:java}
>  ===
> Build and test C# libraries
> ===
> └ Ensuring that C# is installed...
> └ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
> -
> SDK Version: 3.1.405Telemetry
> -
> The .NET Core tools collect usage data in order to help us improve your 
> experience. It is collected by Microsoft and shared with the community. You 
> can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT 
> environment variable to '1' or 'true' using your favorite shell.Read more 
> about .NET Core CLI Tools telemetry: 
> https://aka.ms/dotnet-cli-telemetry
> Explore documentation: https://aka.ms/dotnet-docs
> Report issues and find source on GitHub: https://github.com/dotnet/core
> Find out what's new: https://aka.ms/dotnet-whats-new
> Learn about the installed HTTPS developer cert: 
> https://aka.ms/aspnet-core-https
> Use 'dotnet --help' to see available commands or visit: 
> https://aka.ms/dotnet-cli-docs
> Write your first app: https://aka.ms/first-net-core-app
> --
> No usable version of libssl was found
> /arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted       
>           (core dumped) dotnet tool install --tool-path ${csharp_bin} 
> sourcelink
> Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
> 134
> Error: `docker-compose --file 
> /home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
> VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
> ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
> above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16289) [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE encoded arrays

2022-04-22 Thread Weston Pace (Jira)

Weston Pace created ARROW-16289:
---

 Summary: [C++] (eventually) abandon scalar columns of an ExecBatch 
in favor of RLE encoded arrays
 Key: ARROW-16289
 URL: https://issues.apache.org/jira/browse/ARROW-16289
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


This JIRA is a proposal / discussion.  I am not asserting this is the way to go 
but I would like to consider it.

>From the execution engine's perspective an exec batch's columns are always 
>either arrays or scalars.  The only time we make use of scalars today is for 
>the four augmented columns (e.g. __filename).  Once we have support for RLE 
>arrays a scalar could easily be encoded as an RLE array and there would be no 
>need to use scalars here.

The advantage would be reducing the complexity in exec nodes and avoiding 
issues like ARROW-16288.  It is already rather difficult to explain the idea of 
a "scalar" and "vector" function and then have to turn around and explain that 
the word "scalar" has an entirely different meaning when talking about field 
shape.

I think it's worth considering taking this even further and removing the 
concept from the compute layer entirely.  Kernel functions that want to have 
special logic for scalars could do so using the RLE array.  This would be a 
significant change to many kernels which currently declare the ANY shape and 
determine which logic to apply within the kernel itself (e.g. there is one 
array OR scalar kernel and not one kernel for each).

Admittedly there is probably a few instructions and a few bytes more to handle 
an RLE scalar than the scalar we have today.  However, this is just different 
flavors of O(1) and not likely to have significant impact.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16288) [C++] ValueDescr::SCALAR nearly unused and does not work for projection

2022-04-22 Thread Weston Pace (Jira)

Weston Pace created ARROW-16288:
---

 Summary: [C++] ValueDescr::SCALAR nearly unused and does not work 
for projection
 Key: ARROW-16288
 URL: https://issues.apache.org/jira/browse/ARROW-16288
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


First, there are almost no kernels that actually use this shape.  Only the 
functions "all", "any", "list_element", "mean", "product", "struct_field", and 
"sum" have kernels with this shape.  Most kernels that have special logic for 
scalars handle it by using {{ValueDescr::ANY}}

Second, when passing an expression to the project node, the expression must be 
bound based on the dataset schema.  Since the binding happens based on a schema 
(and not a batch) the function is bound to ValueDescr::ARRAY 
(https://github.com/apache/arrow/blob/a16be6b7b6c8271202ff766b99c199b2e29bdfa8/cpp/src/arrow/compute/exec/expression.cc#L461)

This results in an error if the function has only ValueDescr::SCALAR kernels 
and would likely be a problem even if the function had both types of kernels 
because it would get bound to the wrong kernel.

This simplest fix may be to just get rid of ValueDescr and change all kernels 
to ValueDescr::ANY behavior.  If we choose to keep it we will need to figure 
out how to handle this kind of binding.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-16282:
--

Assignee: Jacob Wujciak-Jens

> [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu 
> to 22.04
> -
>
> Key: ARROW-16282
> URL: https://issues.apache.org/jira/browse/ARROW-16282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Raúl Cumplido
>Assignee: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 8.0.0
>
>
> We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
> 22.04 and we can see how the nightly release job has been failing since then.
> Working for ubuntu 20.04 on 2022-04-08:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]
> Failing for ubuntu 22.04 on 2022-04-09:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]
> The error seems to be related with missing libssl:
> {code:java}
>  ===
> Build and test C# libraries
> ===
> └ Ensuring that C# is installed...
> └ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
> -
> SDK Version: 3.1.405Telemetry
> -
> The .NET Core tools collect usage data in order to help us improve your 
> experience. It is collected by Microsoft and shared with the community. You 
> can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT 
> environment variable to '1' or 'true' using your favorite shell.Read more 
> about .NET Core CLI Tools telemetry: 
> https://aka.ms/dotnet-cli-telemetry
> Explore documentation: https://aka.ms/dotnet-docs
> Report issues and find source on GitHub: https://github.com/dotnet/core
> Find out what's new: https://aka.ms/dotnet-whats-new
> Learn about the installed HTTPS developer cert: 
> https://aka.ms/aspnet-core-https
> Use 'dotnet --help' to see available commands or visit: 
> https://aka.ms/dotnet-cli-docs
> Write your first app: https://aka.ms/first-net-core-app
> --
> No usable version of libssl was found
> /arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted       
>           (core dumped) dotnet tool install --tool-path ${csharp_bin} 
> sourcelink
> Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
> 134
> Error: `docker-compose --file 
> /home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
> VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
> ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
> above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16121) [Python] Deprecate the (common_)metadata(_path) attributes of ParquetDataset

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16121.
-
Resolution: Fixed

Issue resolved by pull request 12952
[https://github.com/apache/arrow/pull/12952]

> [Python] Deprecate the (common_)metadata(_path) attributes of ParquetDataset
> 
>
> Key: ARROW-16121
> URL: https://issues.apache.org/jira/browse/ARROW-16121
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Alenka Frim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The custom python ParquetDataset implementation exposes the {{metadata}}, 
> {{metadata_path}}, {{common_metadata}} and {{common_metadata_path}} 
> attributes, something for which we didn't add an equivalent to the new 
> dataset API. 
> Unless we still want to add something for this, we should deprecate those 
> attributes in the legacy ParquetDataset. 
> In addition, we should also deprecate passing the {{metadata}} keyword in the 
> ParquetDataset constructor. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526597#comment-17526597
 ] 

Jacob Wujciak-Jens commented on ARROW-16282:


This is most likely due to ubuntu 22 using openssl 3 
https://discourse.ubuntu.com/t/openssl-3-0-transition-plans/24453

> [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu 
> to 22.04
> -
>
> Key: ARROW-16282
> URL: https://issues.apache.org/jira/browse/ARROW-16282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Raúl Cumplido
>Priority: Blocker
> Fix For: 8.0.0
>
>
> We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
> 22.04 and we can see how the nightly release job has been failing since then.
> Working for ubuntu 20.04 on 2022-04-08:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]
> Failing for ubuntu 22.04 on 2022-04-09:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]
> The error seems to be related with missing libssl:
> {code:java}
>  ===
> Build and test C# libraries
> ===
> └ Ensuring that C# is installed...
> └ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
> -
> SDK Version: 3.1.405Telemetry
> -
> The .NET Core tools collect usage data in order to help us improve your 
> experience. It is collected by Microsoft and shared with the community. You 
> can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT 
> environment variable to '1' or 'true' using your favorite shell.Read more 
> about .NET Core CLI Tools telemetry: 
> https://aka.ms/dotnet-cli-telemetry
> Explore documentation: https://aka.ms/dotnet-docs
> Report issues and find source on GitHub: https://github.com/dotnet/core
> Find out what's new: https://aka.ms/dotnet-whats-new
> Learn about the installed HTTPS developer cert: 
> https://aka.ms/aspnet-core-https
> Use 'dotnet --help' to see available commands or visit: 
> https://aka.ms/dotnet-cli-docs
> Write your first app: https://aka.ms/first-net-core-app
> --
> No usable version of libssl was found
> /arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted       
>           (core dumped) dotnet tool install --tool-path ${csharp_bin} 
> sourcelink
> Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
> 134
> Error: `docker-compose --file 
> /home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
> VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
> ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
> above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-15989) [R] Implement rbind for Table and RecordBatch

2022-04-22 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-15989.
-
Resolution: Fixed

Issue resolved by pull request 12751
[https://github.com/apache/arrow/pull/12751]

> [R] Implement rbind for Table and RecordBatch
> -
>
> Key: ARROW-15989
> URL: https://issues.apache.org/jira/browse/ARROW-15989
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Affects Versions: 7.0.0
>Reporter: Will Jones
>Assignee: Will Jones
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> In ARROW-15013 we implemented c() for Arrow arrays. We should now be able to 
> implement rbind for Tables and RecordBatches (rbind on batches would produce 
> a table).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16284) [Python][Packaging] Use delocate-fuse to create universal2 wheels

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16284.
-
Resolution: Fixed

Issue resolved by pull request 12959
[https://github.com/apache/arrow/pull/12959]

> [Python][Packaging] Use delocate-fuse to create universal2 wheels
> -
>
> Key: ARROW-16284
> URL: https://issues.apache.org/jira/browse/ARROW-16284
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Previously we used specific universal2 configurations for vcpkg to build the 
> dependencies containing symbols for both architectures. This approach proved 
> to be fragile to vcpkg changes making it hard to upgrade the vcpkg version. 
> As an example https://github.com/apache/arrow/pull/12893 bumps the vcpkg 
> version where absl has stopped compiling for two CMAKE_OSX_ARCHITECTURES, it 
> has been already fixed in absl's upstream but that hasn't been released yet.
> The new approach uses multibuild's delocate to build the wheels for both 
> arm64 and amd64 separately and fuse them in an upcoming step to a universal2 
> wheel (using {{lipo}} under the hood).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16287) PyArrow: RuntimeError: AppendRowGroups requires equal schemas when writing _metadata file

2022-04-22 Thread Kyle Barron (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Barron updated ARROW-16287:

Description: 
I'm trying to follow the example here: 
[https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-medata-files]
 to write an example partitioned dataset. But I'm consistently getting an error 
about non-equal schemas. Here's a mcve:
{code:java}
from pathlib import Path
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
size = 100_000_000
partition_col = np.random.randint(0, 10, size)
values = np.random.rand(size)
table = pa.Table.from_pandas(
    pd.DataFrame({"partition_col": partition_col, "values": values})
)
metadata_collector = []
root_path = Path("random.parquet")
pq.write_to_dataset(
    table,
    root_path,
    partition_cols=["partition_col"],
    metadata_collector=metadata_collector,
)

Write the ``_common_metadata`` parquet file without row groups statistics
pq.write_metadata(table.schema, root_path / "_common_metadata")


Write the ``_metadata`` parquet file with row groups statistics of all files
pq.write_metadata(
    table.schema, root_path / "_metadata", metadata_collector=metadata_collector
) {code}
This raises the error
{code:java}
---
RuntimeError                              Traceback (most recent call last)
Input In [92], in ()
> 1 pq.write_metadata(
      2     table.schema, root_path / "_metadata", 
metadata_collector=metadata_collector
      3 )
File ~/tmp/env/lib/python3.8/site-packages/pyarrow/parquet.py:2324, in 
write_metadata(schema, where, metadata_collector, **kwargs)
   2322 metadata = read_metadata(where)
   2323 for m in metadata_collector:
-> 2324     metadata.append_row_groups(m)
   2325 metadata.write_metadata_file(where)
File ~/tmp/env/lib/python3.8/site-packages/pyarrow/_parquet.pyx:628, in 
pyarrow._parquet.FileMetaData.append_row_groups()
RuntimeError: AppendRowGroups requires equal schemas. {code}
But all schemas in the `metadata_collector` list seem to be the same:
{code:java}
all(metadata_collector[0].schema == meta.schema for meta in metadata_collector)
# True {code}

  was:
I'm trying to follow the example here: 
[https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-medata-files]
 to write an example partitioned dataset. But I'm consistently getting an error 
about non-equal schemas. Here's a mcve:

```

from pathlib import Path

import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

size = 100_000_000
partition_col = np.random.randint(0, 10, size)
values = np.random.rand(size)
table = pa.Table.from_pandas(
    pd.DataFrame(\{"partition_col": partition_col, "values": values})
)

metadata_collector = []
root_path = Path("random.parquet")
pq.write_to_dataset(
    table,
    root_path,
    partition_cols=["partition_col"],
    metadata_collector=metadata_collector,
)

# Write the ``_common_metadata`` parquet file without row groups statistics
pq.write_metadata(table.schema, root_path / "_common_metadata")

# Write the ``_metadata`` parquet file with row groups statistics of all files
pq.write_metadata(
    table.schema, root_path / "_metadata", metadata_collector=metadata_collector
)

```

This raises the error

```

---
RuntimeError                              Traceback (most recent call last)
Input In [92], in ()
> 1 pq.write_metadata(
      2     table.schema, root_path / "_metadata", 
metadata_collector=metadata_collector
      3 )

File ~/tmp/env/lib/python3.8/site-packages/pyarrow/parquet.py:2324, in 
write_metadata(schema, where, metadata_collector, **kwargs)
   2322 metadata = read_metadata(where)
   2323 for m in metadata_collector:
-> 2324     metadata.append_row_groups(m)
   2325 metadata.write_metadata_file(where)

File ~/tmp/env/lib/python3.8/site-packages/pyarrow/_parquet.pyx:628, in 
pyarrow._parquet.FileMetaData.append_row_groups()

RuntimeError: AppendRowGroups requires equal schemas.

```

But all schemas in the `metadata_collector` list seem to be the same:

```

all(metadata_collector[0].schema == meta.schema for meta in metadata_collector)

# True

```


> PyArrow: RuntimeError: AppendRowGroups requires equal schemas when writing 
> _metadata file
> -
>
> Key: ARROW-16287
> URL: https://issues.apache.org/jira/browse/ARROW-16287
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Parquet
>Affects Versions: 7.0.0
> Environment: MacOS. Python 3.8.10.
> pyarrow: '7.0.0'
> pandas: '1.4.2'
> numpy: '1.22.3'
>Reporter: Kyle Barron
>Priority: Major
>
> I'm trying to follow the example h

[jira] [Created] (ARROW-16287) PyArrow: RuntimeError: AppendRowGroups requires equal schemas when writing _metadata file

2022-04-22 Thread Kyle Barron (Jira)

Kyle Barron created ARROW-16287:
---

 Summary: PyArrow: RuntimeError: AppendRowGroups requires equal 
schemas when writing _metadata file
 Key: ARROW-16287
 URL: https://issues.apache.org/jira/browse/ARROW-16287
 Project: Apache Arrow
  Issue Type: Bug
  Components: Parquet
Affects Versions: 7.0.0
 Environment: MacOS. Python 3.8.10.
pyarrow: '7.0.0'
pandas: '1.4.2'
numpy: '1.22.3'
Reporter: Kyle Barron


I'm trying to follow the example here: 
[https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-medata-files]
 to write an example partitioned dataset. But I'm consistently getting an error 
about non-equal schemas. Here's a mcve:

```

from pathlib import Path

import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

size = 100_000_000
partition_col = np.random.randint(0, 10, size)
values = np.random.rand(size)
table = pa.Table.from_pandas(
    pd.DataFrame(\{"partition_col": partition_col, "values": values})
)

metadata_collector = []
root_path = Path("random.parquet")
pq.write_to_dataset(
    table,
    root_path,
    partition_cols=["partition_col"],
    metadata_collector=metadata_collector,
)

# Write the ``_common_metadata`` parquet file without row groups statistics
pq.write_metadata(table.schema, root_path / "_common_metadata")

# Write the ``_metadata`` parquet file with row groups statistics of all files
pq.write_metadata(
    table.schema, root_path / "_metadata", metadata_collector=metadata_collector
)

```

This raises the error

```

---
RuntimeError                              Traceback (most recent call last)
Input In [92], in ()
> 1 pq.write_metadata(
      2     table.schema, root_path / "_metadata", 
metadata_collector=metadata_collector
      3 )

File ~/tmp/env/lib/python3.8/site-packages/pyarrow/parquet.py:2324, in 
write_metadata(schema, where, metadata_collector, **kwargs)
   2322 metadata = read_metadata(where)
   2323 for m in metadata_collector:
-> 2324     metadata.append_row_groups(m)
   2325 metadata.write_metadata_file(where)

File ~/tmp/env/lib/python3.8/site-packages/pyarrow/_parquet.pyx:628, in 
pyarrow._parquet.FileMetaData.append_row_groups()

RuntimeError: AppendRowGroups requires equal schemas.

```

But all schemas in the `metadata_collector` list seem to be the same:

```

all(metadata_collector[0].schema == meta.schema for meta in metadata_collector)

# True

```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Closed] (ARROW-13505) [R] Installation of Arrow fails on Debian Gnu/Linux

2022-04-22 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane closed ARROW-13505.
--
Resolution: Done

> [R] Installation of Arrow fails on Debian Gnu/Linux
> ---
>
> Key: ARROW-13505
> URL: https://issues.apache.org/jira/browse/ARROW-13505
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 5.0.0
> Environment: Debian Gnu/Linux 11
> R 4.0.4
>Reporter: Amit Ramon
>Priority: Major
>  Labels: Linux, debian, r
> Attachments: arrow-1.log
>
>
> {{I'm trying to install Arrow on Debian Gnu/Linux using R 4.0.4. Arrow is not 
> installed, and I tried both `install.packages("arrow")` and the script 
> provided by the Arrow project that contains the `install_arrow()` function.}}
> {{The installation always fail at some point after lots of compilation with a 
> message}}
> {code:java}
> /usr/bin/ld: cannot find 
> /home/amit/tmp/Rtmp54dYjQ/R.INSTALL49a67e7832c7/arrow/libarrow/arrow-5.0.0/lib:
>  file format not recognized{code}
> {{I've tried calling the `install_arrow()` function in the following ways:}}
>  
> {code:java}
> install_arrow(binary = TRUE, minimal = TRUE, verbose = TRUE)
> install_arrow(binary = FALSE, minimal = TRUE, verbose = TRUE) {code}
> {{I also try to install the Arrow binaries (using the command in 
> [https://arrow.apache.org/install/|}}{{https://arrow.apache.org/install/}}{{])
>  and then running the above commands again but got the same error.}}
> {{I'm attaching the log from running the first command above.}}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-12654) [R] Bundled C++ build fails with ccache

2022-04-22 Thread Jonathan Keane (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526574#comment-17526574
 ] 

Jonathan Keane commented on ARROW-12654:


We resolved an issue that looks very similar in ARROW-14638 and I suspect this 
will be resolved by that as well. I'm going to close this for now, but if you 
do still run into this issue after our next release (8.0.0, which should be 
done shortly) either re-open this, or please create a new Jira so we can dig 
into it. Thanks!

> [R] Bundled C++ build fails with ccache
> ---
>
> Key: ARROW-12654
> URL: https://issues.apache.org/jira/browse/ARROW-12654
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 4.0.0
> Environment: debian:buster-slim, pkg-config installed
>Reporter: Stephan Radel
>Priority: Major
>  Labels: Debian
>
> Dear Apache Team,
> I'm not able to install Arrow 4.0.0 properly on Debian 10 via Docker. I tried 
> several switches from recommendation here: 
> [https://arrow.apache.org/docs/r/articles/install.html|https://arrow.apache.org/docs/r/articles/install.html.]
>  (ARROW_USE_PKG_CONFIG TRUE/FALSE, LIBARROW_BINARY TRUE/FALSE) but with no 
> success so far.
> Here the latest log:
> {code:java}
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> trying URL 
> 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/bin/debian-10/arrow-4.0.0.zip'
> Error in download.file(from_url, to_file, quiet = quietly) : 
>   cannot open URL 
> 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/bin/debian-10/arrow-4.0.0.zip'
> *** No C++ binaries found for debian-10
> trying URL 
> 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/src/arrow-4.0.0.zip'
> Error in download.file(from_url, to_file, quiet = quietly) : 
>   cannot open URL 
> 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/src/arrow-4.0.0.zip'
> trying URL 
> 'https://www.apache.org/dyn/closer.lua?action=download&filename=arrow/arrow-4.0.0/apache-arrow-4.0.0.tar.gz'
> Content type 'application/octet-stream' length 9042294 bytes (8.6 MB)
> ==
> downloaded 8.6 MB
> *** Successfully retrieved C++ source
> *** Building C++ libraries
> *** Building with MAKEFLAGS= -j2 
>  cmake
> trying URL 
> 'https://github.com/Kitware/CMake/releases/download/v3.19.2/cmake-3.19.2-Linux-x86_64.tar.gz'
> Content type 'application/octet-stream' length 42931014 bytes (40.9 MB)
> ==
> downloaded 40.9 MB
>  arrow with 
> SOURCE_DIR="/tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp" 
> BUILD_DIR="/tmp/RtmpykibIC/file77bf609e654f" DEST_DIR="libarrow/arrow-4.0.0" 
> CMAKE="/tmp/RtmpykibIC/file77bf5d6bcf01/cmake-3.19.2-Linux-x86_64/bin/cmake" 
> CC="ccache gcc" CXX="ccache g++ -std=gnu++11" LDFLAGS="-Wl,-z,relro" 
> ARROW_S3=OFF ARROW_MIMALLOC=OFF 
> ++ pwd
> + : /tmp/Rtmpru2gYI/R.INSTALL779923ea9560/arrow
> + : /tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp
> + : /tmp/RtmpykibIC/file77bf609e654f
> + : libarrow/arrow-4.0.0
> + : /tmp/RtmpykibIC/file77bf5d6bcf01/cmake-3.19.2-Linux-x86_64/bin/cmake
> ++ cd /tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp
> ++ pwd
> + SOURCE_DIR=/tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp
> ++ mkdir -p libarrow/arrow-4.0.0
> ++ cd libarrow/arrow-4.0.0
> ++ pwd
> + DEST_DIR=/tmp/Rtmpru2gYI/R.INSTALL779923ea9560/arrow/libarrow/arrow-4.0.0
> + '[' '' = false ']'
> + ARROW_DEFAULT_PARAM=OFF
> + mkdir -p /tmp/RtmpykibIC/file77bf609e654f
> + pushd /tmp/RtmpykibIC/file77bf609e654f
> /tmp/RtmpykibIC/file77bf609e654f /tmp/Rtmpru2gYI/R.INSTALL779923ea9560/arrow
> + /tmp/RtmpykibIC/file77bf5d6bcf01/cmake-3.19.2-Linux-x86_64/bin/cmake 
> -DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF -DARROW_BUILD_SHARED=OFF 
> -DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON 
> -DARROW_DEPENDENCY_SOURCE=BUNDLED -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=ON 
> -DARROW_MIMALLOC=OFF -DARROW_JSON=ON -DARROW_PARQUET=ON -DARROW_S3=OFF 
> -DARROW_WITH_BROTLI=OFF -DARROW_WITH_BZ2=OFF -DARROW_WITH_LZ4=OFF 
> -DARROW_WITH_RE2=ON -DARROW_WITH_SNAPPY=OFF -DARROW_WITH_UTF8PROC=ON 
> -DARROW_WITH_ZLIB=OFF -DARROW_WITH_ZSTD=OFF -DCMAKE_BUILD_TYPE=Release 
> -DCMAKE_INSTALL_LIBDIR=lib 
> -DCMAKE_INSTALL_PREFIX=/tmp/Rtmpru2gYI/R.INSTALL779923ea9560/arrow/libarrow/arrow-4.0.0
>  -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON 
> -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_UNITY_BUILD=ON -G 'Unix 
> Makefiles' /tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp
> -- Building using CMake version: 3.19.2
> -- The C compiler identification is GNU 8.3.0
> -- The CXX compiler identification is GNU 8.3.0
> -- Detecting C compiler ABI info
> --

[jira] [Closed] (ARROW-12654) [R] Bundled C++ build fails with ccache

2022-04-22 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane closed ARROW-12654.
--
Resolution: Fixed

> [R] Bundled C++ build fails with ccache
> ---
>
> Key: ARROW-12654
> URL: https://issues.apache.org/jira/browse/ARROW-12654
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 4.0.0
> Environment: debian:buster-slim, pkg-config installed
>Reporter: Stephan Radel
>Priority: Major
>  Labels: Debian
>
> Dear Apache Team,
> I'm not able to install Arrow 4.0.0 properly on Debian 10 via Docker. I tried 
> several switches from recommendation here: 
> [https://arrow.apache.org/docs/r/articles/install.html|https://arrow.apache.org/docs/r/articles/install.html.]
>  (ARROW_USE_PKG_CONFIG TRUE/FALSE, LIBARROW_BINARY TRUE/FALSE) but with no 
> success so far.
> Here the latest log:
> {code:java}
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> trying URL 
> 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/bin/debian-10/arrow-4.0.0.zip'
> Error in download.file(from_url, to_file, quiet = quietly) : 
>   cannot open URL 
> 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/bin/debian-10/arrow-4.0.0.zip'
> *** No C++ binaries found for debian-10
> trying URL 
> 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/src/arrow-4.0.0.zip'
> Error in download.file(from_url, to_file, quiet = quietly) : 
>   cannot open URL 
> 'https://arrow-r-nightly.s3.amazonaws.com/libarrow/src/arrow-4.0.0.zip'
> trying URL 
> 'https://www.apache.org/dyn/closer.lua?action=download&filename=arrow/arrow-4.0.0/apache-arrow-4.0.0.tar.gz'
> Content type 'application/octet-stream' length 9042294 bytes (8.6 MB)
> ==
> downloaded 8.6 MB
> *** Successfully retrieved C++ source
> *** Building C++ libraries
> *** Building with MAKEFLAGS= -j2 
>  cmake
> trying URL 
> 'https://github.com/Kitware/CMake/releases/download/v3.19.2/cmake-3.19.2-Linux-x86_64.tar.gz'
> Content type 'application/octet-stream' length 42931014 bytes (40.9 MB)
> ==
> downloaded 40.9 MB
>  arrow with 
> SOURCE_DIR="/tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp" 
> BUILD_DIR="/tmp/RtmpykibIC/file77bf609e654f" DEST_DIR="libarrow/arrow-4.0.0" 
> CMAKE="/tmp/RtmpykibIC/file77bf5d6bcf01/cmake-3.19.2-Linux-x86_64/bin/cmake" 
> CC="ccache gcc" CXX="ccache g++ -std=gnu++11" LDFLAGS="-Wl,-z,relro" 
> ARROW_S3=OFF ARROW_MIMALLOC=OFF 
> ++ pwd
> + : /tmp/Rtmpru2gYI/R.INSTALL779923ea9560/arrow
> + : /tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp
> + : /tmp/RtmpykibIC/file77bf609e654f
> + : libarrow/arrow-4.0.0
> + : /tmp/RtmpykibIC/file77bf5d6bcf01/cmake-3.19.2-Linux-x86_64/bin/cmake
> ++ cd /tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp
> ++ pwd
> + SOURCE_DIR=/tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp
> ++ mkdir -p libarrow/arrow-4.0.0
> ++ cd libarrow/arrow-4.0.0
> ++ pwd
> + DEST_DIR=/tmp/Rtmpru2gYI/R.INSTALL779923ea9560/arrow/libarrow/arrow-4.0.0
> + '[' '' = false ']'
> + ARROW_DEFAULT_PARAM=OFF
> + mkdir -p /tmp/RtmpykibIC/file77bf609e654f
> + pushd /tmp/RtmpykibIC/file77bf609e654f
> /tmp/RtmpykibIC/file77bf609e654f /tmp/Rtmpru2gYI/R.INSTALL779923ea9560/arrow
> + /tmp/RtmpykibIC/file77bf5d6bcf01/cmake-3.19.2-Linux-x86_64/bin/cmake 
> -DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF -DARROW_BUILD_SHARED=OFF 
> -DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON 
> -DARROW_DEPENDENCY_SOURCE=BUNDLED -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=ON 
> -DARROW_MIMALLOC=OFF -DARROW_JSON=ON -DARROW_PARQUET=ON -DARROW_S3=OFF 
> -DARROW_WITH_BROTLI=OFF -DARROW_WITH_BZ2=OFF -DARROW_WITH_LZ4=OFF 
> -DARROW_WITH_RE2=ON -DARROW_WITH_SNAPPY=OFF -DARROW_WITH_UTF8PROC=ON 
> -DARROW_WITH_ZLIB=OFF -DARROW_WITH_ZSTD=OFF -DCMAKE_BUILD_TYPE=Release 
> -DCMAKE_INSTALL_LIBDIR=lib 
> -DCMAKE_INSTALL_PREFIX=/tmp/Rtmpru2gYI/R.INSTALL779923ea9560/arrow/libarrow/arrow-4.0.0
>  -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON 
> -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_UNITY_BUILD=ON -G 'Unix 
> Makefiles' /tmp/RtmpykibIC/file77bf2da3d338/apache-arrow-4.0.0/cpp
> -- Building using CMake version: 3.19.2
> -- The C compiler identification is GNU 8.3.0
> -- The CXX compiler identification is GNU 8.3.0
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: /usr/bin/ccache - skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Check for working CXX compiler: /usr/bin/ccache - skipped
> -- Detecting CXX compile features
> -- Detecting

[jira] [Assigned] (ARROW-16281) [R] [CI] Bump versions with the release of 4.2

2022-04-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld reassigned ARROW-16281:


Assignee: Dragoș Moldovan-Grünfeld  (was: Jacob Wujciak-Jens)

> [R] [CI] Bump versions with the release of 4.2
> --
>
> Key: ARROW-16281
> URL: https://issues.apache.org/jira/browse/ARROW-16281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>
> Now that R 4.2 is released, we should bump all of our R versions where we 
> have ones hardcoded.
> This will mean dropping support for 3.4 entirely and adding in 4.0 to 
> https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/dev/tasks/r/github.linux.versions.yml#L34
> There are a few other places that we have hard-coded versions (we might need 
> to wait a few days for these to catch up):
> https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/dev/tasks/tasks.yml#L1291-L1295
> https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/.github/workflows/r.yml#L60
>  (and a few other places in that file — though one note: we build an old 
> version of windows that uses rtools35 in the GHA CI so that we catch when we 
> break that — we'll want to keep that!)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-15163) [R] lubridate functions for 8.0.0

2022-04-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-15163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld resolved ARROW-15163.
--
Resolution: Fixed

> [R] lubridate functions for 8.0.0
> -
>
> Key: ARROW-15163
> URL: https://issues.apache.org/jira/browse/ARROW-15163
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Alessandro Molina
>Assignee: Jonathan Keane
>Priority: Major
> Fix For: 8.0.0
>
>
> *Umbrella ticket for the Initiative aimed at reaching support for the most 
> important lubridate functions in the R bindings*



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-9235) [R] Support for `connection` class when reading and writing files

2022-04-22 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-9235.

Resolution: Fixed

Issue resolved by pull request 12323
[https://github.com/apache/arrow/pull/12323]

> [R] Support for `connection` class when reading and writing files
> -
>
> Key: ARROW-9235
> URL: https://issues.apache.org/jira/browse/ARROW-9235
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Michael Quinn
>Assignee: Dewey Dunnington
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> We have an internal filesystem that we interact with through objects that 
> inherit from the connection class. These files aren't necessarily local, 
> making it slightly more complicated to read and write parquet files, for 
> example.
> For now, we're generating raw vectors and using that to create the file. For 
> example, to read files
> {noformat}
> ReadParquet <- function(filename, ...) {}}
>    file <-file(filename,"rb")
>    on.exit(close(file))
>    raw <- readBin(file, "raw", FileInfo(filename)$size)
>    return(arrow::read_parquet(raw, ...))
> }
> {noformat}
> And to write,
> {noformat}
> WriteParquet <- function(df, filepath, ...) {
>    stream <- BufferOutputStream$create()
>    write_parquet(df, stream, ...)
>    raw <- stream$finish()$data()
>file <- file(filepath, "wb")
>    on.exit(close(file)
>    writeBin(raw, file)
>    return(invisible())
> }
> {noformat}
> At the C++ level, we are interacting with ` R_new_custom_connection` defined 
> here:
>  [https://github.com/wch/r-source/blob/trunk/src/include/R_ext/Connections.h]
> I've been very impressed with how feature-rich arrow is. It would be nice to 
> see this API supported as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16286) [C++] SimplifyWithGuarantee does not work with non-deterministic expressions

2022-04-22 Thread Weston Pace (Jira)

Weston Pace created ARROW-16286:
---

 Summary: [C++] SimplifyWithGuarantee does not work with 
non-deterministic expressions
 Key: ARROW-16286
 URL: https://issues.apache.org/jira/browse/ARROW-16286
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


If an expression is non-deterministic (e.g. "random") then 
SimplifyWithGuarantee may incorrectly think it can fold constants.

For example, if the call is {{random()}} then {{SimplifyWithGuarantee}} will 
detect that all the arguments are constants (or, more accurately, there are 
zero non-constant arguments) and decide it can execute the expression 
immediately and fold it into a constant.

We could maybe add a hack for the random case since it is the only nullary 
function but, in general, we will probably need a way to define functions as 
"non-deterministic" and prevent constant folding.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16189) [CI][C++] Implement CI on Apple M1 for C++

2022-04-22 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-16189:
---
Parent: ARROW-10657
Issue Type: Sub-task  (was: Improvement)

> [CI][C++] Implement CI on Apple M1 for C++
> --
>
> Key: ARROW-16189
> URL: https://issues.apache.org/jira/browse/ARROW-16189
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Continuous Integration
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16190) [CI][R] Implement CI on Apple M1 for R

2022-04-22 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-16190:
---
Parent: ARROW-10657
Issue Type: Sub-task  (was: Improvement)

> [CI][R] Implement CI on Apple M1 for R
> --
>
> Key: ARROW-16190
> URL: https://issues.apache.org/jira/browse/ARROW-16190
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16285) [CI][Python] Enable skipped kartothek integration tests

2022-04-22 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526561#comment-17526561
 ] 

Jacob Wujciak-Jens commented on ARROW-16285:


This issue is blocked until kartothek fixes the linked issue.

> [CI][Python] Enable skipped kartothek integration tests 
> 
>
> Key: ARROW-16285
> URL: https://issues.apache.org/jira/browse/ARROW-16285
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-12203) [C++][Python] Switch default Parquet version to 2.4

2022-04-22 Thread Antoine Pitrou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526558#comment-17526558
 ] 

Antoine Pitrou commented on ARROW-12203:


Marking this is as critical for 9.0 so that we finally do it.

> [C++][Python] Switch default Parquet version to 2.4
> ---
>
> Key: ARROW-12203
> URL: https://issues.apache.org/jira/browse/ARROW-12203
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Python
>Reporter: Antoine Pitrou
>Priority: Critical
> Fix For: 9.0.0
>
>
> Currently, Parquet write APIs default to maximum-compatibility Parquet 
> version "1.0", which disables some logical types such as UINT32. We may want 
> to switch the default to "2.0" instead, to allow faithful representation of 
> more types.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16181) [CI][C++] Valgrind failure in TPCH node tests

2022-04-22 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526559#comment-17526559
 ] 

Krisztian Szucs commented on ARROW-16181:
-

[~apitrou] is this still valid?

> [CI][C++] Valgrind failure in TPCH node tests
> -
>
> Key: ARROW-16181
> URL: https://issues.apache.org/jira/browse/ARROW-16181
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
> Fix For: 8.0.0
>
>
> See 
> [https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=23077&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=7667]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-12203) [C++][Python] Switch default Parquet version to 2.4

2022-04-22 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-12203:
---
Priority: Critical  (was: Major)

> [C++][Python] Switch default Parquet version to 2.4
> ---
>
> Key: ARROW-12203
> URL: https://issues.apache.org/jira/browse/ARROW-12203
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Python
>Reporter: Antoine Pitrou
>Priority: Critical
> Fix For: 9.0.0
>
>
> Currently, Parquet write APIs default to maximum-compatibility Parquet 
> version "1.0", which disables some logical types such as UINT32. We may want 
> to switch the default to "2.0" instead, to allow faithful representation of 
> more types.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16285) [CI][Python] Enable skipped kartothek integration tests

2022-04-22 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-16285:
---
Summary: [CI][Python] Enable skipped kartothek integration tests   (was: 
[CI][Python} Enable skipped kartothek integration tests )

> [CI][Python] Enable skipped kartothek integration tests 
> 
>
> Key: ARROW-16285
> URL: https://issues.apache.org/jira/browse/ARROW-16285
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16285) [CI][Python} Enable skipped kartothek integration tests

2022-04-22 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-16285:
--

 Summary: [CI][Python} Enable skipped kartothek integration tests 
 Key: ARROW-16285
 URL: https://issues.apache.org/jira/browse/ARROW-16285
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration, Python
Reporter: Jacob Wujciak-Jens






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-12203) [C++][Python] Switch default Parquet version to 2.4

2022-04-22 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526556#comment-17526556
 ] 

Krisztian Szucs commented on ARROW-12203:
-

Postponing to 9.0

> [C++][Python] Switch default Parquet version to 2.4
> ---
>
> Key: ARROW-12203
> URL: https://issues.apache.org/jira/browse/ARROW-12203
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Python
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 8.0.0
>
>
> Currently, Parquet write APIs default to maximum-compatibility Parquet 
> version "1.0", which disables some logical types such as UINT32. We may want 
> to switch the default to "2.0" instead, to allow faithful representation of 
> more types.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-12203) [C++][Python] Switch default Parquet version to 2.4

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-12203:

Fix Version/s: 9.0.0
   (was: 8.0.0)

> [C++][Python] Switch default Parquet version to 2.4
> ---
>
> Key: ARROW-12203
> URL: https://issues.apache.org/jira/browse/ARROW-12203
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Python
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 9.0.0
>
>
> Currently, Parquet write APIs default to maximum-compatibility Parquet 
> version "1.0", which disables some logical types such as UINT32. We may want 
> to switch the default to "2.0" instead, to allow faithful representation of 
> more types.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16182) [C++][CI] TPCH node tests timeout under ThreadSanitizer

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16182.
-
Resolution: Fixed

> [C++][CI] TPCH node tests timeout under ThreadSanitizer
> ---
>
> Key: ARROW-16182
> URL: https://issues.apache.org/jira/browse/ARROW-16182
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Sasha Krassovsky
>Priority: Critical
> Fix For: 8.0.0
>
>
> See 
> https://github.com/ursacomputing/crossbow/runs/6000716964?check_suite_focus=true#step:5:4854



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16182) [C++][CI] TPCH node tests timeout under ThreadSanitizer

2022-04-22 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526554#comment-17526554
 ] 

Krisztian Szucs commented on ARROW-16182:
-

Seems to be resolved now 
https://github.com/apache/arrow/pull/12843#issuecomment-1106673435


> [C++][CI] TPCH node tests timeout under ThreadSanitizer
> ---
>
> Key: ARROW-16182
> URL: https://issues.apache.org/jira/browse/ARROW-16182
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Sasha Krassovsky
>Priority: Critical
> Fix For: 8.0.0
>
>
> See 
> https://github.com/ursacomputing/crossbow/runs/6000716964?check_suite_focus=true#step:5:4854



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16257) [R] Break-up as_date and as_datetime into individual functions

2022-04-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld resolved ARROW-16257.
--
Resolution: Fixed

> [R] Break-up as_date and as_datetime into individual functions
> --
>
> Key: ARROW-16257
> URL: https://issues.apache.org/jira/browse/ARROW-16257
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 8.0.0
>
>
> A follow-up from 
> [ARROW-15800|https://issues.apache.org/jira/browse/ARROW-15800].
> See also: https://github.com/apache/arrow/pull/12738#discussion_r854329903



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-15801) [R] Implement bindings for lubridate date-time helpers

2022-04-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld resolved ARROW-15801.
--
Resolution: Fixed

> [R] Implement bindings for lubridate date-time helpers
> --
>
> Key: ARROW-15801
> URL: https://issues.apache.org/jira/browse/ARROW-15801
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Assignee: Jonathan Keane
>Priority: Major
> Fix For: 8.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16219) [CI][Python] Install failure on s390x

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16219.
-
Resolution: Fixed

Issue resolved by pull request 12945
[https://github.com/apache/arrow/pull/12945]

> [CI][Python] Install failure on s390x
> -
>
> Key: ARROW-16219
> URL: https://issues.apache.org/jira/browse/ARROW-16219
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Antoine Pitrou
>Assignee: Raúl Cumplido
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Seems to happen quite reliably on Travis-CI:
> https://app.travis-ci.com/github/apache/arrow/builds/249511328
> Perhaps just a matter of setting the SETUPTOOLS_SCM_VERSION environment 
> variable to some dummy value?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16278) [CI] Git installation failure on homebrew

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16278.
-
Resolution: Fixed

Issue resolved by pull request 12958
[https://github.com/apache/arrow/pull/12958]

> [CI] Git installation failure on homebrew
> -
>
> Key: ARROW-16278
> URL: https://issues.apache.org/jira/browse/ARROW-16278
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Some builds are failing due to git unable to install on homebrew. This seems 
> to be related to the new git release:
>  _With the fixes for CVE-2022-24765 that are common with versions of_
> _Git 2.30.4, 2.31.3, 2.32.2, 2.33.3, 2.34.3, and 2.35.3, Git has_
> _been taught not to recognise repositories owned by other users, in_
> _order to avoid getting affected by their config files and hooks._
> _You can list the path to the safe/trusted repositories that may be_
> _owned by others on a multi-valued configuration variable_
> _safe.directory to override this behaviour, or use '*' to declare_
> _that you trust anything._
> Failed job example 
> https://github.com/apache/arrow/runs/6114985460?check_suite_focus=true:
> {code:java}
> Installing automake
> Installing aws-sdk-cpp
> Installing boost
> Using brotli
> Using c-ares
> Installing ccache
> Using cmake
> Installing flatbuffers
> Installing git
> ==> Downloading https://ghcr.io/v2/homebrew/core/git/manifests/2.36.0
> ==> Downloading 
> https://ghcr.io/v2/homebrew/core/git/blobs/sha256:5739e703f9ad34dba01e343d76f363143f740bf6e05c945c8f19a073546c6ce5
> ==> Downloading from 
> https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:5739e703f9ad34dba01e343d76f363143f740bf6e05c945c8f19a073546c6ce5?se=2022-04-21T18%3A35%3A00Z&sig=ZdiaSBdomnIwd4Ga4PORXPs2%2FYZXrrLLaks61mgmyEs%3D&sp=r&spr=https&sr=b&sv=2019-12-12
> ==> Pouring git--2.36.0.big_sur.bottle.tar.gz
> Error: The `brew link` step did not complete successfully
> The formula built, but is not symlinked into /usr/local
> Could not symlink etc/bash_completion.d/git-completion.bash
> Target /usr/local/etc/bash_completion.d/git-completion.bash
> is a symlink belonging to git@2.35.1. You can unlink it:
>   brew unlink git@2.35.1To force the link and overwrite all conflicting files:
>   brew link --overwrite gitTo list all files that would be deleted:
>   brew link --overwrite --dry-run gitPossible conflicting files are:
> /usr/local/etc/bash_completion.d/git-completion.bash -> 
> /usr/local/Cellar/git@2.35.1/2.35.1/etc/bash_completion.d/git-completion.bash
> /usr/local/etc/bash_completion.d/git-prompt.sh -> 
> /usr/local/Cellar/git@2.35.1/2.35.1/etc/bash_completion.d/git-prompt.sh
> /usr/local/bin/git -> /usr/local/Cellar/git@2.35.1/2.35.1/bin/git
> /usr/local/bin/git-cvsserver -> 
> /usr/local/Cellar/git@2.35.1/2.35.1/bin/git-cvsserver
> /usr/local/bin/git-receive-pack -> 
> /usr/local/Cellar/git@2.35.1/2.35.1/bin/git-receive-pack
> /usr/local/bin/git-shell -> /usr/local/Cellar/git@2.35.1/2.35.1/bin/git-shell
> /usr/local/bin/git-upload-archive -> 
> /usr/local/Cellar/git@2.35.1/2.35.1/bin/git-upload-archive
> /usr/local/bin/git-upload-pack -> 
> /usr/local/Cellar/git@2.35.1/2.35.1/bin/git-upload-pack
> Error: Could not symlink share/doc/git-doc/MyFirstContribution.html
> Target /usr/local/share/doc/git-doc/MyFirstContribution.html
> is a symlink belonging to git@2.35.1. You can unlink it:
>   brew unlink git@2.35.1To force the link and overwrite all conflicting files:
>   brew link --overwrite git@2.35.1To list all files that would be deleted:
>   brew link --overwrite --dry-run git@2.35.1
> Installing git has failed!
> Installing glog
> Installing grpc
> Using llvm
> Installing llvm@12
> Using lz4
> Installing minio
> Installing ninja
> Installing numpy
> Using openssl@1.1
> Installing protobuf
> Using python
> Installing rapidjson
> Installing snappy
> Installing thrift
> Using wget
> Using zstd
> Homebrew Bundle failed! 1 Brewfile dependency failed to install.
> Error: Process completed with exit code 1. {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16121) [Python] Deprecate the (common_)metadata(_path) attributes of ParquetDataset

2022-04-22 Thread Joris Van den Bossche (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-16121:
--
Fix Version/s: 8.0.0
   (was: 9.0.0)

> [Python] Deprecate the (common_)metadata(_path) attributes of ParquetDataset
> 
>
> Key: ARROW-16121
> URL: https://issues.apache.org/jira/browse/ARROW-16121
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Alenka Frim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The custom python ParquetDataset implementation exposes the {{metadata}}, 
> {{metadata_path}}, {{common_metadata}} and {{common_metadata_path}} 
> attributes, something for which we didn't add an equivalent to the new 
> dataset API. 
> Unless we still want to add something for this, we should deprecate those 
> attributes in the legacy ParquetDataset. 
> In addition, we should also deprecate passing the {{metadata}} keyword in the 
> ParquetDataset constructor. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16121) [Python] Deprecate the (common_)metadata(_path) attributes of ParquetDataset

2022-04-22 Thread Joris Van den Bossche (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526530#comment-17526530
 ] 

Joris Van den Bossche commented on ARROW-16121:
---

Moved back to 8.0.0 milestone, the PR is ready

> [Python] Deprecate the (common_)metadata(_path) attributes of ParquetDataset
> 
>
> Key: ARROW-16121
> URL: https://issues.apache.org/jira/browse/ARROW-16121
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Alenka Frim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The custom python ParquetDataset implementation exposes the {{metadata}}, 
> {{metadata_path}}, {{common_metadata}} and {{common_metadata_path}} 
> attributes, something for which we didn't add an equivalent to the new 
> dataset API. 
> Unless we still want to add something for this, we should deprecate those 
> attributes in the legacy ParquetDataset. 
> In addition, we should also deprecate passing the {{metadata}} keyword in the 
> ParquetDataset constructor. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-16204) [C++][Dataset] Default error existing_data_behaviour for writing dataset ignores a single file

2022-04-22 Thread Joris Van den Bossche (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche resolved ARROW-16204.
---
Resolution: Fixed

Issue resolved by pull request 12898
[https://github.com/apache/arrow/pull/12898]

> [C++][Dataset] Default error existing_data_behaviour for writing dataset 
> ignores a single file
> --
>
> Key: ARROW-16204
> URL: https://issues.apache.org/jira/browse/ARROW-16204
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While trying to understand a failing test in 
> https://github.com/apache/arrow/pull/12811#discussion_r851128672, I noticed 
> that the {{write_dataset}} function does not actually always raise an error 
> by default if there is already existing data in the target location.
> The documentation says it will raise "if any data exists in the destination" 
> (which is also what I would expect), but in practice it seems that it does 
> ignore certain file names:
> {code:python}
> import pyarrow.dataset as ds
> table = pa.table({'a': [1, 2, 3]})
> # write a first time to new directory: OK
> >>> ds.write_dataset(table, "test_overwrite", format="parquet")
> >>> !ls test_overwrite
> part-0.parquet
> # write a second time to the same directory: passes, but should raise?
> >>> ds.write_dataset(table, "test_overwrite", format="parquet")
> >>> !ls test_overwrite
> part-0.parquet
> # write a another time to the same directory with different name: still passes
> >>> ds.write_dataset(table, "test_overwrite", format="parquet", 
> >>> basename_template="data-{i}.parquet")
> >>> !ls test_overwrite
> data-0.parquetpart-0.parquet
> # now writing again finally raises an error
> >>> ds.write_dataset(table, "test_overwrite", format="parquet")
> ...
> ArrowInvalid: Could not write to test_overwrite as the directory is not empty 
> and existing_data_behavior is to error
> {code}
> So it seems that when checking if existing data exists, it seems to ignore 
> any files that match the basename template pattern.
> cc [~westonpace] do you know if this was intentional? (I would find that a 
> strange corner case, and in any case it is also not documented)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-15800) [R] Implement bindings for lubridate::as_date() and lubridate::as_datetime()

2022-04-22 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane resolved ARROW-15800.

Resolution: Fixed

Issue resolved by pull request 12738
[https://github.com/apache/arrow/pull/12738]

> [R] Implement bindings for lubridate::as_date() and lubridate::as_datetime()
> 
>
> Key: ARROW-15800
> URL: https://issues.apache.org/jira/browse/ARROW-15800
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16284) [Python][Packaging] Use delocate-fuse to create universal2 wheels

2022-04-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16284:
---
Labels: pull-request-available  (was: )

> [Python][Packaging] Use delocate-fuse to create universal2 wheels
> -
>
> Key: ARROW-16284
> URL: https://issues.apache.org/jira/browse/ARROW-16284
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Previously we used specific universal2 configurations for vcpkg to build the 
> dependencies containing symbols for both architectures. This approach proved 
> to be fragile to vcpkg changes making it hard to upgrade the vcpkg version. 
> As an example https://github.com/apache/arrow/pull/12893 bumps the vcpkg 
> version where absl has stopped compiling for two CMAKE_OSX_ARCHITECTURES, it 
> has been already fixed in absl's upstream but that hasn't been released yet.
> The new approach uses multibuild's delocate to build the wheels for both 
> arm64 and amd64 separately and fuse them in an upcoming step to a universal2 
> wheel (using {{lipo}} under the hood).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16284) [Python][Packaging] Use delocate-fuse to create universal2 wheels

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16284:

Fix Version/s: 8.0.0

> [Python][Packaging] Use delocate-fuse to create universal2 wheels
> -
>
> Key: ARROW-16284
> URL: https://issues.apache.org/jira/browse/ARROW-16284
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 8.0.0
>
>
> Previously we used specific universal2 configurations for vcpkg to build the 
> dependencies containing symbols for both architectures. This approach proved 
> to be fragile to vcpkg changes making it hard to upgrade the vcpkg version. 
> As an example https://github.com/apache/arrow/pull/12893 bumps the vcpkg 
> version where absl has stopped compiling for two CMAKE_OSX_ARCHITECTURES, it 
> has been already fixed in absl's upstream but that hasn't been released yet.
> The new approach uses multibuild's delocate to build the wheels for both 
> arm64 and amd64 separately and fuse them in an upcoming step to a universal2 
> wheel (using {{lipo}} under the hood).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16284) [Python][Packaging] Use delocate-fuse to create universal2 wheels

2022-04-22 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-16284:
---

 Summary: [Python][Packaging] Use delocate-fuse to create 
universal2 wheels
 Key: ARROW-16284
 URL: https://issues.apache.org/jira/browse/ARROW-16284
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Krisztian Szucs


Previously we used specific universal2 configurations for vcpkg to build the 
dependencies containing symbols for both architectures. This approach proved to 
be fragile to vcpkg changes making it hard to upgrade the vcpkg version. As an 
example https://github.com/apache/arrow/pull/12893 bumps the vcpkg version 
where absl has stopped compiling for two CMAKE_OSX_ARCHITECTURES, it has been 
already fixed in absl's upstream but that hasn't been released yet.

The new approach uses multibuild's delocate to build the wheels for both arm64 
and amd64 separately and fuse them in an upcoming step to a universal2 wheel 
(using {{lipo}} under the hood).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (ARROW-16284) [Python][Packaging] Use delocate-fuse to create universal2 wheels

2022-04-22 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-16284:
---

Assignee: Krisztian Szucs

> [Python][Packaging] Use delocate-fuse to create universal2 wheels
> -
>
> Key: ARROW-16284
> URL: https://issues.apache.org/jira/browse/ARROW-16284
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>
> Previously we used specific universal2 configurations for vcpkg to build the 
> dependencies containing symbols for both architectures. This approach proved 
> to be fragile to vcpkg changes making it hard to upgrade the vcpkg version. 
> As an example https://github.com/apache/arrow/pull/12893 bumps the vcpkg 
> version where absl has stopped compiling for two CMAKE_OSX_ARCHITECTURES, it 
> has been already fixed in absl's upstream but that hasn't been released yet.
> The new approach uses multibuild's delocate to build the wheels for both 
> arm64 and amd64 separately and fuse them in an upcoming step to a universal2 
> wheel (using {{lipo}} under the hood).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16283) [Go] Cleanup Panics in new Buffered Reader

2022-04-22 Thread Matthew Topol (Jira)

Matthew Topol created ARROW-16283:
-

 Summary: [Go] Cleanup Panics in new Buffered Reader
 Key: ARROW-16283
 URL: https://issues.apache.org/jira/browse/ARROW-16283
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Matthew Topol
Assignee: Matthew Topol






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-6390) [Python][Flight] Add Python documentation / tutorial for Flight

2022-04-22 Thread David Li (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526515#comment-17526515
 ] 

David Li commented on ARROW-6390:
-

ARROW-16065 adds a basic Python/Flight documentation page. Maybe further work 
can be done in the cookbook?

> [Python][Flight] Add Python documentation / tutorial for Flight
> ---
>
> Key: ARROW-6390
> URL: https://issues.apache.org/jira/browse/ARROW-6390
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, FlightRPC, Python
>Reporter: Wes McKinney
>Assignee: Alessandro Molina
>Priority: Major
>
> There is no Sphinx documentation for using Flight from Python. I have found 
> that writing documentation is an effective way to uncover usability problems 
> -- I would suggest we write comprehensive documentation for using Flight from 
> Python as a way to refine the public Python API



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526514#comment-17526514
 ] 

Krisztian Szucs commented on ARROW-16282:
-

cc [~eerhardt]

> [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu 
> to 22.04
> -
>
> Key: ARROW-16282
> URL: https://issues.apache.org/jira/browse/ARROW-16282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Raúl Cumplido
>Priority: Blocker
> Fix For: 8.0.0
>
>
> We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
> 22.04 and we can see how the nightly release job has been failing since then.
> Working for ubuntu 20.04 on 2022-04-08:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]
> Failing for ubuntu 22.04 on 2022-04-09:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]
> The error seems to be related with missing libssl:
> {code:java}
>  ===
> Build and test C# libraries
> ===
> └ Ensuring that C# is installed...
> └ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
> -
> SDK Version: 3.1.405Telemetry
> -
> The .NET Core tools collect usage data in order to help us improve your 
> experience. It is collected by Microsoft and shared with the community. You 
> can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT 
> environment variable to '1' or 'true' using your favorite shell.Read more 
> about .NET Core CLI Tools telemetry: 
> https://aka.ms/dotnet-cli-telemetry
> Explore documentation: https://aka.ms/dotnet-docs
> Report issues and find source on GitHub: https://github.com/dotnet/core
> Find out what's new: https://aka.ms/dotnet-whats-new
> Learn about the installed HTTPS developer cert: 
> https://aka.ms/aspnet-core-https
> Use 'dotnet --help' to see available commands or visit: 
> https://aka.ms/dotnet-cli-docs
> Write your first app: https://aka.ms/first-net-core-app
> --
> No usable version of libssl was found
> /arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted       
>           (core dumped) dotnet tool install --tool-path ${csharp_bin} 
> sourcelink
> Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
> 134
> Error: `docker-compose --file 
> /home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
> VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
> ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
> above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Krisztian Szucs (Jira)



[ https://issues.apache.org/jira/browse/ARROW-16282 ]


Krisztian Szucs deleted comment on ARROW-16282:
-

was (Author: kszucs):
cc @eerhardt

> [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu 
> to 22.04
> -
>
> Key: ARROW-16282
> URL: https://issues.apache.org/jira/browse/ARROW-16282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Raúl Cumplido
>Priority: Blocker
> Fix For: 8.0.0
>
>
> We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
> 22.04 and we can see how the nightly release job has been failing since then.
> Working for ubuntu 20.04 on 2022-04-08:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]
> Failing for ubuntu 22.04 on 2022-04-09:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]
> The error seems to be related with missing libssl:
> {code:java}
>  ===
> Build and test C# libraries
> ===
> └ Ensuring that C# is installed...
> └ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
> -
> SDK Version: 3.1.405Telemetry
> -
> The .NET Core tools collect usage data in order to help us improve your 
> experience. It is collected by Microsoft and shared with the community. You 
> can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT 
> environment variable to '1' or 'true' using your favorite shell.Read more 
> about .NET Core CLI Tools telemetry: 
> https://aka.ms/dotnet-cli-telemetry
> Explore documentation: https://aka.ms/dotnet-docs
> Report issues and find source on GitHub: https://github.com/dotnet/core
> Find out what's new: https://aka.ms/dotnet-whats-new
> Learn about the installed HTTPS developer cert: 
> https://aka.ms/aspnet-core-https
> Use 'dotnet --help' to see available commands or visit: 
> https://aka.ms/dotnet-cli-docs
> Write your first app: https://aka.ms/first-net-core-app
> --
> No usable version of libssl was found
> /arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted       
>           (core dumped) dotnet tool install --tool-path ${csharp_bin} 
> sourcelink
> Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
> 134
> Error: `docker-compose --file 
> /home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
> VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
> ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
> above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526513#comment-17526513
 ] 

Krisztian Szucs commented on ARROW-16282:
-

cc @eerhardt

> [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu 
> to 22.04
> -
>
> Key: ARROW-16282
> URL: https://issues.apache.org/jira/browse/ARROW-16282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Raúl Cumplido
>Priority: Blocker
> Fix For: 8.0.0
>
>
> We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
> 22.04 and we can see how the nightly release job has been failing since then.
> Working for ubuntu 20.04 on 2022-04-08:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]
> Failing for ubuntu 22.04 on 2022-04-09:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]
> The error seems to be related with missing libssl:
> {code:java}
>  ===
> Build and test C# libraries
> ===
> └ Ensuring that C# is installed...
> └ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
> -
> SDK Version: 3.1.405Telemetry
> -
> The .NET Core tools collect usage data in order to help us improve your 
> experience. It is collected by Microsoft and shared with the community. You 
> can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT 
> environment variable to '1' or 'true' using your favorite shell.Read more 
> about .NET Core CLI Tools telemetry: 
> https://aka.ms/dotnet-cli-telemetry
> Explore documentation: https://aka.ms/dotnet-docs
> Report issues and find source on GitHub: https://github.com/dotnet/core
> Find out what's new: https://aka.ms/dotnet-whats-new
> Learn about the installed HTTPS developer cert: 
> https://aka.ms/aspnet-core-https
> Use 'dotnet --help' to see available commands or visit: 
> https://aka.ms/dotnet-cli-docs
> Write your first app: https://aka.ms/first-net-core-app
> --
> No usable version of libssl was found
> /arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted       
>           (core dumped) dotnet tool install --tool-path ${csharp_bin} 
> sourcelink
> Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
> 134
> Error: `docker-compose --file 
> /home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
> VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
> ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
> above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (ARROW-9825) [FlightRPC] Add a "Flight SQL" extension on top of FlightRPC

2022-04-22 Thread David Li (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li resolved ARROW-9825.
-
Resolution: Fixed

This was added, we can open follow up tickets for any other work.

> [FlightRPC] Add a "Flight SQL" extension on top of FlightRPC 
> -
>
> Key: ARROW-9825
> URL: https://issues.apache.org/jira/browse/ARROW-9825
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: FlightRPC, Format
>Reporter: Ryan Nicholson
>Priority: Major
>
> As a developer of database clients and backends, I would like to have a 
> standard in place to communicate between the two while being able to leverage 
> the data transfer features of Arrow Flight.
> The Arrow Flight RPC specification allows for extensibility by using opaque 
> payloads to perform “Actions”, specify “Commands” and so on.
> I propose the addition of a Flight SQL extension consisting of predefined 
> protobuf messages and workflows to enable features such as:
>  * Discovering specific database capabilities from generic clients
>  * Browsing catalogs.
>  * Execution of different types of SQL commands.
> Supporting documentation and a POC changelist will be sent to the mailing 
> list in the coming days which describes protobuf messages and workflows 
> enabling these features.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2022-04-22 Thread Lance Dacey (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526487#comment-17526487
 ] 

Lance Dacey commented on ARROW-12358:
-

Nice, thanks. I can try to test with a nightly build this weekend.

> [C++][Python][R][Dataset] Control overwriting vs appending when writing to 
> existing dataset
> ---
>
> Key: ARROW-12358
> URL: https://issues.apache.org/jira/browse/ARROW-12358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Weston Pace
>Priority: Major
>  Labels: dataset
> Fix For: 9.0.0
>
>
> Currently, the dataset writing (eg with {{pyarrow.dataset.write_dataset}}) 
> uses a fixed filename template ({{"part\{i\}.ext"}}). This means that when 
> you are writing to an existing dataset, you de facto overwrite previous data 
> when using this default template.
> There is some discussion in ARROW-10695 about how the user can avoid this by 
> ensuring the file names are unique (the user can specify the 
> {{basename_template}} to be something unique). There is also ARROW-7706 about 
> silently doubling data (so _not_ overwriting existing data) with the legacy 
> {{parquet.write_to_dataset}} implementation. 
> It could be good to have a "mode" when writing datasets that controls the 
> different possible behaviours. And erroring when there is pre-existing data 
> in the target directory is maybe the safest default, because both appending 
> vs overwriting silently can be surprising behaviour depending on your 
> expectations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2022-04-22 Thread Weston Pace (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526483#comment-17526483
 ] 

Weston Pace commented on ARROW-12358:
-

[~ldacey] now that ARROW-16159 has merged this is probably ready to test again. 
 Are you able to test with the nightly builds?  Or do you want to wait for the 
release?

> [C++][Python][R][Dataset] Control overwriting vs appending when writing to 
> existing dataset
> ---
>
> Key: ARROW-12358
> URL: https://issues.apache.org/jira/browse/ARROW-12358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Weston Pace
>Priority: Major
>  Labels: dataset
> Fix For: 9.0.0
>
>
> Currently, the dataset writing (eg with {{pyarrow.dataset.write_dataset}}) 
> uses a fixed filename template ({{"part\{i\}.ext"}}). This means that when 
> you are writing to an existing dataset, you de facto overwrite previous data 
> when using this default template.
> There is some discussion in ARROW-10695 about how the user can avoid this by 
> ensuring the file names are unique (the user can specify the 
> {{basename_template}} to be something unique). There is also ARROW-7706 about 
> silently doubling data (so _not_ overwriting existing data) with the legacy 
> {{parquet.write_to_dataset}} implementation. 
> It could be good to have a "mode" when writing datasets that controls the 
> different possible behaviours. And erroring when there is pre-existing data 
> in the target directory is maybe the safest default, because both appending 
> vs overwriting silently can be surprising behaviour depending on your 
> expectations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/ARROW-16282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526475#comment-17526475
 ] 

Raúl Cumplido commented on ARROW-16282:
---

I can reproduce locally by using:
{code:java}
$ UBUNTU=22.04 docker-compose run --rm ubuntu-verify-rc {code}

> [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu 
> to 22.04
> -
>
> Key: ARROW-16282
> URL: https://issues.apache.org/jira/browse/ARROW-16282
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, Continuous Integration
>Reporter: Raúl Cumplido
>Priority: Blocker
> Fix For: 8.0.0
>
>
> We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
> 22.04 and we can see how the nightly release job has been failing since then.
> Working for ubuntu 20.04 on 2022-04-08:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]
> Failing for ubuntu 22.04 on 2022-04-09:
> [https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]
> The error seems to be related with missing libssl:
> {code:java}
>  ===
> Build and test C# libraries
> ===
> └ Ensuring that C# is installed...
> └ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
> -
> SDK Version: 3.1.405Telemetry
> -
> The .NET Core tools collect usage data in order to help us improve your 
> experience. It is collected by Microsoft and shared with the community. You 
> can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT 
> environment variable to '1' or 'true' using your favorite shell.Read more 
> about .NET Core CLI Tools telemetry: 
> https://aka.ms/dotnet-cli-telemetry
> Explore documentation: https://aka.ms/dotnet-docs
> Report issues and find source on GitHub: https://github.com/dotnet/core
> Find out what's new: https://aka.ms/dotnet-whats-new
> Learn about the installed HTTPS developer cert: 
> https://aka.ms/aspnet-core-https
> Use 'dotnet --help' to see available commands or visit: 
> https://aka.ms/dotnet-cli-docs
> Write your first app: https://aka.ms/first-net-core-app
> --
> No usable version of libssl was found
> /arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted       
>           (core dumped) dotnet tool install --tool-path ${csharp_bin} 
> sourcelink
> Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
> 134
> Error: `docker-compose --file 
> /home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
> VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
> ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
> above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Closed] (ARROW-14591) [R] Implement bindings for lubridate duration types

2022-04-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-14591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld closed ARROW-14591.

Resolution: Fixed

> [R] Implement bindings for lubridate duration types
> ---
>
> Key: ARROW-14591
> URL: https://issues.apache.org/jira/browse/ARROW-14591
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nicola Crane
>Assignee: Jonathan Keane
>Priority: Major
> Fix For: 8.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-14596) [Python] parquet.read_table nested fields in columns does not work for use_legacy_dataset=False

2022-04-22 Thread Alenka Frim (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-14596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526407#comment-17526407
 ] 

Alenka Frim commented on ARROW-14596:
-

I would like to add observations we got today when pairing with 
[~jorisvandenbossche] on this topic.

First was the result of using {{pq.read_table}} with legacy implementation vs 
using {{ds.dataset}} with column projection. The data can get selected 
correctly with the dataset implementation but what happens is that the 
structure of a nested field is not kept (from struct it is flattened to string 
column). In case of using columns selection with a list in  {{{}ds.dataset{}}}, 
it errors, as reported in the issue.
{code:python}
>>> import pandas as pd
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> 
>>> df = pd.DataFrame({
... 'user_id': ['abc123', 'qrs456'],
... 'interaction': [{'type': 'click', 'element': 'button'}, 
{'type':'scroll', 'element': 'window'}]
... })
>>> 
>>> table = pa.Table.from_pandas(df)
>>> pq.write_table(table, 'example.parquet')
{code}
{code:python}
>>> pq.read_table('example.parquet', columns = ['user_id', 'interaction.type'], 
>>> use_legacy_dataset = True)
pyarrow.Table
user_id: string
interaction: struct
  child 0, type: string

user_id: [["abc123","qrs456"]]
interaction: [
  -- is_valid: all not null
  -- child 0 type: string
["click","scroll"]]
{code}
{code:python}
>>> import pyarrow.dataset as ds
>>> projection = {
... 'user_id': ds.field('user_id'),
... 'new': ds.field(('interaction', 'type'))
... }
>>> ds.dataset('example.parquet').to_table(columns=projection)
pyarrow.Table
user_id: string
new: string

user_id: [["abc123","qrs456"]]
new: [["click","scroll"]]
{code}
{code:python}
>>> ds.dataset('example.parquet').to_table(columns=['user_id', 
>>> 'interaction.type'])
Traceback (most recent call last):
  File "", line 1, in 
  File "pyarrow/_dataset.pyx", line 303, in pyarrow._dataset.Dataset.to_table
return self.scanner(**kwargs).to_table()
  File "pyarrow/_dataset.pyx", line 270, in pyarrow._dataset.Dataset.scanner
return Scanner.from_dataset(self, **kwargs)
  File "pyarrow/_dataset.pyx", line 2322, in 
pyarrow._dataset.Scanner.from_dataset
_populate_builder(builder, columns=columns, filter=filter,
  File "pyarrow/_dataset.pyx", line 2168, in pyarrow._dataset._populate_builder
check_status(builder.ProjectColumns([tobytes(c) for c in columns]))
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
raise ArrowInvalid(message)
pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(interaction.type) in 
user_id: string
interaction: struct
__fragment_index: int32
__batch_index: int32
__last_in_fragment: bool
__filename: string
/Users/alenkafrim/repos/arrow/cpp/src/arrow/type.h:1722  CheckNonEmpty(matches, 
root)
/Users/alenkafrim/repos/arrow/cpp/src/arrow/type.h:1757  FindOne(root)
/Users/alenkafrim/repos/arrow/cpp/src/arrow/dataset/scanner.cc:714  
ref->GetOne(dataset_schema)
/Users/alenkafrim/repos/arrow/cpp/src/arrow/dataset/scanner.cc:784  
ProjectionDescr::FromNames(std::move(columns), *scan_options_->dataset_schema)
{code}
When Scanner object is being created from the dataset class via {{to_table}} 
and (through _populate_builder) and in the case of a list of columns the 
{{ProjectColumns}} method ("arrow::dataset::ScannerBuilder") is being called it 
only accepts string column names and errors when a column is a struct.

We were thinking if it would be a good idea to add a new method in 
{{scanner.cc}} that would mimic {{FromNames}} method but takes {{field_ref}} as 
an argument? Afterwords there would also be a need to recreate a struct field 
for which we are not sure how to approach.

cc [~westonpace] [~apitrou] do you think that would be a correct way to go?

> [Python] parquet.read_table nested fields in columns does not work for 
> use_legacy_dataset=False
> ---
>
> Key: ARROW-14596
> URL: https://issues.apache.org/jira/browse/ARROW-14596
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Tom Scheffers
>Assignee: Alenka Frim
>Priority: Critical
> Fix For: 9.0.0
>
>
> Reading nested field does not work with use_legacy_dataset=False.
> This works:
>  
> {code:java}
> import pyarrow.parquet as pq
> t = pq.read_table(
>  source=*filename*,
>  columns=['store_key', 'properties.country'], 
>  use_legacy_dataset=True,
> ).to_pandas()
> {code}
> This does not work (for the same parquet file):
>  
> {code:java}
> import pyarrow.parquet as pq
> t = pq.read_table(
>  source=*filename*,
>  columns=['store_key', 'properties.country'], 
>  use_legacy_dataset=False,
> ).to_pandas(){code}
>  



--
This message was sent by Atlassia

[jira] [Closed] (ARROW-15224) [R] Add binding for not_between() ternary kernel

2022-04-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld closed ARROW-15224.

Resolution: Won't Fix

A corresponding {{dplyr::not_between()}} function does not exist.  

> [R] Add binding for not_between() ternary kernel
> 
>
> Key: ARROW-15224
> URL: https://issues.apache.org/jira/browse/ARROW-15224
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Eduardo Ponce
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>
> Add R binding for {{not_between()}} compute function from ARROW-15223.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-15224) [R] Add binding for not_between() ternary kernel

2022-04-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/ARROW-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526444#comment-17526444
 ] 

Dragoș Moldovan-Grünfeld commented on ARROW-15224:
--

I will _close_ the issue with _won't fix._ Thanks __ 

> [R] Add binding for not_between() ternary kernel
> 
>
> Key: ARROW-15224
> URL: https://issues.apache.org/jira/browse/ARROW-15224
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Eduardo Ponce
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>
> Add R binding for {{not_between()}} compute function from ARROW-15223.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (ARROW-15224) [R] Add binding for not_between() ternary kernel

2022-04-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/ARROW-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526444#comment-17526444
 ] 

Dragoș Moldovan-Grünfeld edited comment on ARROW-15224 at 4/22/22 1:53 PM:
---

I will _close_ the issue with _won't fix._ Thanks


was (Author: dragosmg):
I will _close_ the issue with _won't fix._ Thanks __ 

> [R] Add binding for not_between() ternary kernel
> 
>
> Key: ARROW-15224
> URL: https://issues.apache.org/jira/browse/ARROW-15224
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Eduardo Ponce
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>
> Add R binding for {{not_between()}} compute function from ARROW-15223.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Jira

Raúl Cumplido created ARROW-16282:
-

 Summary: [CI] [C#] Verifiy release on c-sharp has been failing 
since upgrading ubuntu to 22.04
 Key: ARROW-16282
 URL: https://issues.apache.org/jira/browse/ARROW-16282
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#, Continuous Integration
Reporter: Raúl Cumplido
 Fix For: 8.0.0


We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
22.04 and we can see how the nightly release job has been failing since then.

Working for ubuntu 20.04 on 2022-04-08:

[https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]

Failing for ubuntu 22.04 on 2022-04-09:

[https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]

The error seems to be related with missing libssl:
{code:java}
 ===
Build and test C# libraries
===
└ Ensuring that C# is installed...
└ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
-
SDK Version: 3.1.405Telemetry
-
The .NET Core tools collect usage data in order to help us improve your 
experience. It is collected by Microsoft and shared with the community. You can 
opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT environment 
variable to '1' or 'true' using your favorite shell.Read more about .NET Core 
CLI Tools telemetry: https://aka.ms/dotnet-cli-telemetry
Explore documentation: https://aka.ms/dotnet-docs
Report issues and find source on GitHub: https://github.com/dotnet/core
Find out what's new: https://aka.ms/dotnet-whats-new
Learn about the installed HTTPS developer cert: https://aka.ms/aspnet-core-https
Use 'dotnet --help' to see available commands or visit: 
https://aka.ms/dotnet-cli-docs
Write your first app: https://aka.ms/first-net-core-app
--
No usable version of libssl was found
/arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted         
        (core dumped) dotnet tool install --tool-path ${csharp_bin} sourcelink
Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
134
Error: `docker-compose --file 
/home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-15224) [R] Add binding for not_between() ternary kernel

2022-04-22 Thread Eduardo Ponce (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526442#comment-17526442
 ] 

Eduardo Ponce commented on ARROW-15224:
---

Based on these observations, it seems we conclude that a {{not_between}} 
function will not be included. So we can close this issue.

> [R] Add binding for not_between() ternary kernel
> 
>
> Key: ARROW-15224
> URL: https://issues.apache.org/jira/browse/ARROW-15224
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Eduardo Ponce
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>
> Add R binding for {{not_between()}} compute function from ARROW-15223.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-15224) [R] Add binding for not_between() ternary kernel

2022-04-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/ARROW-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526439#comment-17526439
 ] 

Dragoș Moldovan-Grünfeld commented on ARROW-15224:
--

I think the situation is similar in {{dplyr}} - the data manipulation R package 
we link to. 

{code:r}
library(dplyr, warn.conflicts = FALSE)

starwars %>% 
  filter(between(height, 100, 150))
#> # A tibble: 5 × 14
#>   name  height  mass hair_color skin_color eye_color birth_year sex   
gender
#>   
 
#> 1 Leia Org…15049 brown  light  brown 19 fema… 
femin…
#> 2 Mon Moth…150NA auburn fair   blue  48 fema… 
femin…
#> 3 Watto137NA black  blue, grey yellowNA male  
mascu…
#> 4 Sebulba  11240 none   grey, red  orangeNA male  
mascu…
#> 5 Gasgano  122NA none   white, bl… black NA male  
mascu…
#> # … with 5 more variables: homeworld , species , films ,
#> #   vehicles , starships 

starwars %>% 
  filter(!between(height, 100, 150))
#> # A tibble: 76 × 14
#>name height  mass hair_color skin_color eye_color birth_year sex   
gender
#>   
 
#>  1 Luke Sk…17277 blond  fair   blue19   male  
mascu…
#>  2 C-3PO   16775gold   yellow 112   none  
mascu…
#>  3 R2-D29632white, bl… red 33   none  
mascu…
#>  4 Darth V…202   136 none   white  yellow  41.9 male  
mascu…
#>  5 Owen La…178   120 brown, gr… light  blue52   male  
mascu…
#>  6 Beru Wh…16575 brown  light  blue47   fema… 
femin…
#>  7 R5-D49732white, red red NA   none  
mascu…
#>  8 Biggs D…18384 black  light  brown   24   male  
mascu…
#>  9 Obi-Wan…18277 auburn, w… fair   blue-gray   57   male  
mascu…
#> 10 Anakin …18884 blond  fair   blue41.9 male  
mascu…
#> # … with 66 more rows, and 5 more variables: homeworld , species ,
#> #   films , vehicles , starships 
{code}

> [R] Add binding for not_between() ternary kernel
> 
>
> Key: ARROW-15224
> URL: https://issues.apache.org/jira/browse/ARROW-15224
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Eduardo Ponce
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>
> Add R binding for {{not_between()}} compute function from ARROW-15223.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (ARROW-15224) [R] Add binding for not_between() ternary kernel

2022-04-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/ARROW-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526406#comment-17526406
 ] 

Dragoș Moldovan-Grünfeld edited comment on ARROW-15224 at 4/22/22 1:43 PM:
---

Given {{dplyr::not_between()}} does not exist, do we need an R 
{{not_between()}} binding? What do you think? [~jonkeane] 
[~thisisnic][~paleolimbot]


was (Author: dragosmg):
Given {{dplyr::not_between()}} does not exist, do we need an R {{not_between()} 
binding? What do you think? [~jonkeane] [~thisisnic][~paleolimbot]

> [R] Add binding for not_between() ternary kernel
> 
>
> Key: ARROW-15224
> URL: https://issues.apache.org/jira/browse/ARROW-15224
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Eduardo Ponce
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>
> Add R binding for {{not_between()}} compute function from ARROW-15223.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16281) [R] [CI] Bump versions with the release of 4.2

2022-04-22 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane updated ARROW-16281:
---
Summary: [R] [CI] Bump versions with the release of 4.2  (was: [R] [CI])

> [R] [CI] Bump versions with the release of 4.2
> --
>
> Key: ARROW-16281
> URL: https://issues.apache.org/jira/browse/ARROW-16281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> Now that R 4.2 is released, we should bump all of our R versions where we 
> have ones hardcoded.
> This will mean dropping support for 3.4 entirely and adding in 4.0 to 
> https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/dev/tasks/r/github.linux.versions.yml#L34
> There are a few other places that we have hard-coded versions (we might need 
> to wait a few days for these to catch up):
> https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/dev/tasks/tasks.yml#L1291-L1295
> https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/.github/workflows/r.yml#L60
>  (and a few other places in that file — though one note: we build an old 
> version of windows that uses rtools35 in the GHA CI so that we catch when we 
> break that — we'll want to keep that!)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16281) [R] [CI]

2022-04-22 Thread Jonathan Keane (Jira)

Jonathan Keane created ARROW-16281:
--

 Summary: [R] [CI]
 Key: ARROW-16281
 URL: https://issues.apache.org/jira/browse/ARROW-16281
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Jonathan Keane
Assignee: Jacob Wujciak-Jens


Now that R 4.2 is released, we should bump all of our R versions where we have 
ones hardcoded.

This will mean dropping support for 3.4 entirely and adding in 4.0 to 
https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/dev/tasks/r/github.linux.versions.yml#L34

There are a few other places that we have hard-coded versions (we might need to 
wait a few days for these to catch up):

https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/dev/tasks/tasks.yml#L1291-L1295
https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/.github/workflows/r.yml#L60
 (and a few other places in that file — though one note: we build an old 
version of windows that uses rtools35 in the GHA CI so that we catch when we 
break that — we'll want to keep that!)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

1 2 >

1 - 100 of 128 matches

Mail list logo