[jira] [Resolved] (ARROW-18341) [Doc][Python] Update note about bundling Arrow C++ on Windows

2022-11-21 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche resolved ARROW-18341.
---
Resolution: Fixed

Issue resolved by pull request 14660
[https://github.com/apache/arrow/pull/14660]

> [Doc][Python] Update note about bundling Arrow C++ on Windows
> -
>
> Key: ARROW-18341
> URL: https://issues.apache.org/jira/browse/ARROW-18341
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Alenka Frim
>Assignee: Alenka Frim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is a note on the python development page under Widnows section about 
> bundling the Arrow C++ libraries with Python extensions:
> [https://arrow.apache.org/docs/dev/developers/python.html#building-on-windows]
> This note can be revised:
>  * if you are using conda, the fact that Arrow C++ libs are not bundled is 
> fine since conda will ensure those libs are found.
>  * If you are not using conda, you have to ensure those libs can be found: 
> either by updating {{PATH}} (every time before importing pyarrow), or either 
> by bundling them (... using the {{PYARROW_BUNDLE_ARROW_CPP}} env variable 
> instead of {{{}--bundle-arrow-cpp{}}}). With the caveat those won't be 
> automatically updated when rebuilding the arrow-cpp libs then.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-18225) [Python] write_metadata does not fully use **kwargs

2022-11-21 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche resolved ARROW-18225.
---
Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14574
[https://github.com/apache/arrow/pull/14574]

> [Python] write_metadata does not fully use **kwargs
> ---
>
> Key: ARROW-18225
> URL: https://issues.apache.org/jira/browse/ARROW-18225
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: François Chareyron
>Assignee: Miles Granger
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When using {{write_metadata}}, {{kwargs}} can be used to pass a FileSystem to 
> a ParquetWriter. However, those {{kwargs}} are not passed to 
> {{read_metadata}} later on despite the function accepting a filesystem 
> argument.
> This creates an error when trying to write metadata on a S3FileSystem for 
> example.
> {code:python}
> def write_metadata(schema, where, metadata_collector=None, **kwargs):
> writer = ParquetWriter(where, schema, **kwargs)
> writer.close()
>     if metadata_collector is not None:
> metadata = read_metadata(where) # kwargs should be passed here
> for m in metadata_collector:
> metadata.append_row_groups(m)
> metadata.write_metadata_file(where) # kwargs should be passed here
> {code}
> {code:python}
> def read_metadata(where, memory_map=False, decryption_properties=None,
>   filesystem=None):
> ...{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-18366) [Packaging][RPM][Gandiva] Failed to link on AlmaLinux 9

2022-11-21 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-18366.
--
Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14680
[https://github.com/apache/arrow/pull/14680]

> [Packaging][RPM][Gandiva] Failed to link on AlmaLinux 9 
> 
>
> Key: ARROW-18366
> URL: https://issues.apache.org/jira/browse/ARROW-18366
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> https://github.com/ursacomputing/crossbow/actions/runs/3502784911/jobs/5867407921#step:6:4748
> {noformat}
> FAILED: gandiva-glib/Gandiva-1.0.gir 
> env 
> PKG_CONFIG_PATH=/usr/lib64/pkgconfig:/usr/share/pkgconfig:/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/meson-uninstalled
>  /usr/bin/g-ir-scanner --quiet --no-libtool --namespace=Gandiva 
> --nsversion=1.0 --warn-all --output gandiva-glib/Gandiva-1.0.gir 
> --c-include=gandiva-glib/gandiva-glib.h --warn-all 
> --include-uninstalled=./arrow-glib/Arrow-1.0.gir 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/gandiva-glib 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src
>  
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src
>  -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src
>  
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src
>  -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src 
> --filelist=/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib/libgandiva-glib.so.1100.0.0.p/Gandiva_1.0_gir_filelist
>  --include=Arrow-1.0 --symbol-prefix=ggandiva --identifier-prefix=GGandiva 
> --pkg-export=gandiva-glib --cflags-begin 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src
>  
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src
>  -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src 
> -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
> -I/usr/include/sysprof-4 -I/usr/include/gobject-introspection-1.0 
> --cflags-end 
> --add-include-path=/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/arrow-glib
>  --add-include-path=/usr/share/gir-1.0 
> -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib 
> --library gandiva-glib 
> -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/arrow-glib 
> -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../../cpp/redhat-linux-build/release
>  --extra-library=gobject-2.0 --extra-library=glib-2.0 
> --extra-library=girepository-1.0 --sources-top-dirs 
> /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/ --sources-top-dirs 
> /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/ --warn-error
> /usr/bin/ld: 
> /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../../cpp/redhat-linux-build/release/libgandiva.so.1100:
>  undefined reference to `std::__glibcxx_assert_fail(char const*, int, char 
> const*, char const*)'
> collect2: error: ld returned 1 exit status
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18366) [Packaging][RPM][Gandiva] Failed to link on AlmaLinux 9

2022-11-21 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-18366:
-
Fix Version/s: 10.0.2

> [Packaging][RPM][Gandiva] Failed to link on AlmaLinux 9 
> 
>
> Key: ARROW-18366
> URL: https://issues.apache.org/jira/browse/ARROW-18366
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.2, 11.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> https://github.com/ursacomputing/crossbow/actions/runs/3502784911/jobs/5867407921#step:6:4748
> {noformat}
> FAILED: gandiva-glib/Gandiva-1.0.gir 
> env 
> PKG_CONFIG_PATH=/usr/lib64/pkgconfig:/usr/share/pkgconfig:/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/meson-uninstalled
>  /usr/bin/g-ir-scanner --quiet --no-libtool --namespace=Gandiva 
> --nsversion=1.0 --warn-all --output gandiva-glib/Gandiva-1.0.gir 
> --c-include=gandiva-glib/gandiva-glib.h --warn-all 
> --include-uninstalled=./arrow-glib/Arrow-1.0.gir 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/gandiva-glib 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src
>  
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src
>  -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src
>  
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src
>  -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src 
> --filelist=/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib/libgandiva-glib.so.1100.0.0.p/Gandiva_1.0_gir_filelist
>  --include=Arrow-1.0 --symbol-prefix=ggandiva --identifier-prefix=GGandiva 
> --pkg-export=gandiva-glib --cflags-begin 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src
>  
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src
>  -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src 
> -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src 
> -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
> -I/usr/include/sysprof-4 -I/usr/include/gobject-introspection-1.0 
> --cflags-end 
> --add-include-path=/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/arrow-glib
>  --add-include-path=/usr/share/gir-1.0 
> -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib 
> --library gandiva-glib 
> -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/arrow-glib 
> -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../../cpp/redhat-linux-build/release
>  --extra-library=gobject-2.0 --extra-library=glib-2.0 
> --extra-library=girepository-1.0 --sources-top-dirs 
> /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/ --sources-top-dirs 
> /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/ --warn-error
> /usr/bin/ld: 
> /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../../cpp/redhat-linux-build/release/libgandiva.so.1100:
>  undefined reference to `std::__glibcxx_assert_fail(char const*, int, char 
> const*, char const*)'
> collect2: error: ld returned 1 exit status
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-18355) [R] support the quoted_na argument in open_dataset for CSVs by mapping it to CSVConvertOptions$strings_can_be_null

2022-11-21 Thread Nicola Crane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636936#comment-17636936
 ] 

Nicola Crane edited comment on ARROW-18355 at 11/22/22 12:27 AM:
-

Nah, I don't think it's worth us spending time adding and then later removing 
it unless users are clamouring for it, which I don't see here.  Thanks for 
looking into this!


was (Author: thisisnic):
Nah, I don't think it's worth us spending time adding and then later removing 
it unless users are clamouring for it, which I don't see here.

> [R] support the quoted_na argument in open_dataset for CSVs by mapping it to 
> CSVConvertOptions$strings_can_be_null
> --
>
> Key: ARROW-18355
> URL: https://issues.apache.org/jira/browse/ARROW-18355
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Nicola Crane
>Assignee: Will Jones
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-18355) [R] support the quoted_na argument in open_dataset for CSVs by mapping it to CSVConvertOptions$strings_can_be_null

2022-11-21 Thread Nicola Crane (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Crane closed ARROW-18355.

Resolution: Won't Fix

> [R] support the quoted_na argument in open_dataset for CSVs by mapping it to 
> CSVConvertOptions$strings_can_be_null
> --
>
> Key: ARROW-18355
> URL: https://issues.apache.org/jira/browse/ARROW-18355
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Nicola Crane
>Assignee: Will Jones
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18355) [R] support the quoted_na argument in open_dataset for CSVs by mapping it to CSVConvertOptions$strings_can_be_null

2022-11-21 Thread Nicola Crane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636936#comment-17636936
 ] 

Nicola Crane commented on ARROW-18355:
--

Nah, I don't think it's worth us spending time adding and then later removing 
it unless users are clamouring for it, which I don't see here.

> [R] support the quoted_na argument in open_dataset for CSVs by mapping it to 
> CSVConvertOptions$strings_can_be_null
> --
>
> Key: ARROW-18355
> URL: https://issues.apache.org/jira/browse/ARROW-18355
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Nicola Crane
>Assignee: Will Jones
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-15812) [R] Allow user to supply col_names argument when reading in a CSV dataset

2022-11-21 Thread Will Jones (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Jones reassigned ARROW-15812:
--

Assignee: Will Jones

> [R] Allow user to supply col_names argument when reading in a CSV dataset
> -
>
> Key: ARROW-15812
> URL: https://issues.apache.org/jira/browse/ARROW-15812
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Nicola Crane
>Assignee: Will Jones
>Priority: Major
>
> Allow the user to supply the {{col_names}} argument from {{readr}} when 
> reading in a dataset.  
> This is already possible when reading in a single CSV file via 
> {{arrow::read_csv_arrow()}} via the {{readr_to_csv_read_options}} function, 
> and so once the C++ functionality to autogenerate column names for Datasets 
> is implemented, we should hook up {{readr_to_csv_read_options}} in 
> {{csv_file_format_read_opts}} just like we have with 
> {{readr_to_csv_parse_options}} in {{csv_file_format_parse_options}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15812) [R] Allow user to supply col_names argument when reading in a CSV dataset

2022-11-21 Thread Will Jones (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636918#comment-17636918
 ] 

Will Jones commented on ARROW-15812:


Auto-generation of column names was added to Datasets in 
https://issues.apache.org/jira/browse/ARROW-16436

> [R] Allow user to supply col_names argument when reading in a CSV dataset
> -
>
> Key: ARROW-15812
> URL: https://issues.apache.org/jira/browse/ARROW-15812
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Nicola Crane
>Priority: Major
>
> Allow the user to supply the {{col_names}} argument from {{readr}} when 
> reading in a dataset.  
> This is already possible when reading in a single CSV file via 
> {{arrow::read_csv_arrow()}} via the {{readr_to_csv_read_options}} function, 
> and so once the C++ functionality to autogenerate column names for Datasets 
> is implemented, we should hook up {{readr_to_csv_read_options}} in 
> {{csv_file_format_read_opts}} just like we have with 
> {{readr_to_csv_parse_options}} in {{csv_file_format_parse_options}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18378) MIGRATION: Disable issue reporting in ASF Jira

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18378:
---

 Summary: MIGRATION: Disable issue reporting in ASF Jira
 Key: ARROW-18378
 URL: https://issues.apache.org/jira/browse/ARROW-18378
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


ARROW-18364 enabled issue reporting for Apache Arrow in GitHub issues. Even 
though existing Jira issues have not yet been migrated and are still being 
worked in the Jira system, we should assess disabling creation of new issues in 
ASF Jira, and instead pointing users to GitHub issues. This may benefit the 
project by reducing the need to monitor inflow in two discrete systems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18377) MIGRATION: Automate component labels from issue form content

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18377:
---

 Summary: MIGRATION: Automate component labels from issue form 
content
 Key: ARROW-18377
 URL: https://issues.apache.org/jira/browse/ARROW-18377
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


ARROW-18364 added the ability to report issues in GitHub, and includes GitHub 
issue templates with a drop-down component(s) selector. These form elements 
drive resulting issue markdown only, and cannot dynamically drive issue labels. 
This requires GitHub actions, which also have a few limitations. First, the 
issue form does not produce any structured data, it only produces the issue 
description markdown, so a parser is required. Second, ASF restricts GitHub 
actions to a selection of approved actions. It is likely that while community 
actions exist to generate structured data from issue forms, the Apache Arrow 
project will need to write its own parser and label application action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18116) [R][Doc] correct paths for the read_parquet examples in cloud storage vignette

2022-11-21 Thread Sam Albers (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636887#comment-17636887
 ] 

Sam Albers commented on ARROW-18116:


This was fixed with https://issues.apache.org/jira/browse/ARROW-17448 so it can 
probably be closed as it is correct on the main docs now: 
https://arrow.apache.org/docs/r/articles/fs.html

 

> [R][Doc] correct paths for the read_parquet examples in cloud storage vignette
> --
>
> Key: ARROW-18116
> URL: https://issues.apache.org/jira/browse/ARROW-18116
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, R
>Reporter: Stephanie Hazlitt
>Priority: Major
>  Labels: triaged
>
> {{The S3 file paths don't run:}}
> {code:java}
> > library(arrow) 
> > read_parquet(file = 
> > "s3://voltrondata-labs-datasets/nyc-taxi/year=2019/month=6/data.parquet") 
> Error in url(file, open = "rb") : URL scheme unsupported by this method{code}
> {{It looks like the file names are `part-0.parquet` not `data.parquet`.}}
> {{This runs:}}
> {code:java}
> read_parquet(file = 
> "s3://voltrondata-labs-datasets/nyc-taxi/year=2019/month=6/part-0.parquet"){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-18303) [Go] Missing tag for compute module

2022-11-21 Thread Matthew Topol (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Topol resolved ARROW-18303.
---
Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14690
[https://github.com/apache/arrow/pull/14690]

> [Go] Missing tag for compute module
> ---
>
> Key: ARROW-18303
> URL: https://issues.apache.org/jira/browse/ARROW-18303
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Affects Versions: 10.0.0
>Reporter: Lilian Maurel
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>   Original Estimate: 1h
>  Time Spent: 0.5h
>  Remaining Estimate: 0.5h
>
> Since https://issues.apache.org/jira/browse/ARROW-17456 compute is separate 
> to a separate module.
>  
> import change to github.com/apache/arrow/go/v9/arrow/compute to 
> github.com/apache/arrow/go/arrow/compute/v10
>  
> Tag go/arrow/compute/v10.0.0 must be create for go mod resolution
>  
> Also in go.mod
> line module github.com/apache/arrow/go/v10/arrow/compute
> must be change by module github.com/apache/arrow/go/arrow/compute/v10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18376) MIGRATION: Add component labels to GitHub

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18376:
---

 Summary: MIGRATION: Add component labels to GitHub
 Key: ARROW-18376
 URL: https://issues.apache.org/jira/browse/ARROW-18376
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


Similar to ARROW-18375, component labels have been established based on 
existing component values defined in ASF Jira. The following labels are needed:

* Component: Archery
* Component: Benchmarking
* Component: C
* Component: C#
* Component: C++
* Component: C++ - Gandiva
* Component: C++ - Plasma
* Component: Continuous Integration
* Component: Dart
* Component: Developer Tools
* Component: Documentation
* Component: FlightRPC
* Component: Format
* Component: GLib
* Component: Go
* Component: GPU
* Component: Integration
* Component: Java
* Component: JavaScript
* Component: MATLAB
* Component: Packaging
* Component: Parquet
* Component: Python
* Component: R
* Component: Ruby
* Component: Swift
* Component: Website
* Component: Other



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18375) MIGRATION: Enable GitHub issue type labels

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18375:
---

 Summary: MIGRATION: Enable GitHub issue type labels
 Key: ARROW-18375
 URL: https://issues.apache.org/jira/browse/ARROW-18375
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


As part of enabling GitHub issue reporting, the following labels have been 
defined and need to be added to the repository label options. Without these 
labels added, [new issues|https://github.com/apache/arrow/issues/14692] do not 
get the issue template-defined issue type labels set properly.

 

Labels:
 * Type: bug
 * Type: enhancement
 * Type: usage

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17610) [C++] Support additional source types in SourceNode

2022-11-21 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace resolved ARROW-17610.
-
Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14207
[https://github.com/apache/arrow/pull/14207]

> [C++] Support additional source types in SourceNode
> ---
>
> Key: ARROW-17610
> URL: https://issues.apache.org/jira/browse/ARROW-17610
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Yaron Gvili
>Assignee: Yaron Gvili
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> This issue will add support for `ArrayVector`, `ExecBatch`, and `RecordBatch` 
> sources in `SourceNode`. See [this 
> thread|https://lists.apache.org/thread/9l23c0w48ywx314klbyshz8ntyzgs1zw] for 
> context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18303) [Go] Missing tag for compute module

2022-11-21 Thread Matthew Topol (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636826#comment-17636826
 ] 

Matthew Topol commented on ARROW-18303:
---

Instead of keeping the compute module as a separate module, the attached PR 
marks every file in the Compute package and it's sub-packages as only buildable 
in go1.18+ so that it can be maintained as part of the arrow module (rather 
than requiring separate git tags and an entirely separate module) without 
breaking compatibility for earlier Go versions. The module docs have been 
updated to make this explicit and state that the {{compute}} package requires 
go1.18, but the rest of the Arrow module maintains compatibility with Go 1.17+

> [Go] Missing tag for compute module
> ---
>
> Key: ARROW-18303
> URL: https://issues.apache.org/jira/browse/ARROW-18303
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Affects Versions: 10.0.0
>Reporter: Lilian Maurel
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Time Spent: 20m
>  Remaining Estimate: 40m
>
> Since https://issues.apache.org/jira/browse/ARROW-17456 compute is separate 
> to a separate module.
>  
> import change to github.com/apache/arrow/go/v9/arrow/compute to 
> github.com/apache/arrow/go/arrow/compute/v10
>  
> Tag go/arrow/compute/v10.0.0 must be create for go mod resolution
>  
> Also in go.mod
> line module github.com/apache/arrow/go/v10/arrow/compute
> must be change by module github.com/apache/arrow/go/arrow/compute/v10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-18374) [Go][CI][Benchmarks] Fix Go Bench Script after conbench change

2022-11-21 Thread Matthew Topol (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Topol resolved ARROW-18374.
---
Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14689
[https://github.com/apache/arrow/pull/14689]

> [Go][CI][Benchmarks] Fix Go Bench Script after conbench change
> --
>
> Key: ARROW-18374
> URL: https://issues.apache.org/jira/browse/ARROW-18374
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, Continuous Integration, Go
>Reporter: Matthew Topol
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Change [https://github.com/conbench/conbench/pull/417/files#] requires now 
> putting an explicit {{github=None}} as an argument to {{BenchmarkResult}} to 
> have it get the github info from the locally cloned repo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636820#comment-17636820
 ] 

Weston Pace commented on ARROW-18371:
-

{{MakeBasicBatches}} I agree is a definite no.  The new source types being 
added in ARROW-17610 should remove any real need for {{BatchesWithSchema}} so I 
think that is a no too.

I agree the random data generation could be quite useful.  I think it would 
also be quite interesting to expose random data generation as an exec node but 
probably shouldn't add more work :)

> [C++] Expose *FromJSON helpers
> --
>
> Key: ARROW-18371
> URL: https://issues.apache.org/jira/browse/ARROW-18371
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Priority: Major
>  Labels: testing
>
> {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
> testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
> could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18303) [Go] Missing tag for compute module

2022-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-18303:
---
Labels: pull-request-available  (was: )

> [Go] Missing tag for compute module
> ---
>
> Key: ARROW-18303
> URL: https://issues.apache.org/jira/browse/ARROW-18303
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Affects Versions: 10.0.0
>Reporter: Lilian Maurel
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Time Spent: 10m
>  Remaining Estimate: 50m
>
> Since https://issues.apache.org/jira/browse/ARROW-17456 compute is separate 
> to a separate module.
>  
> import change to github.com/apache/arrow/go/v9/arrow/compute to 
> github.com/apache/arrow/go/arrow/compute/v10
>  
> Tag go/arrow/compute/v10.0.0 must be create for go mod resolution
>  
> Also in go.mod
> line module github.com/apache/arrow/go/v10/arrow/compute
> must be change by module github.com/apache/arrow/go/arrow/compute/v10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18374) [Go][CI][Benchmarks] Fix Go Bench Script after conbench change

2022-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-18374:
---
Labels: pull-request-available  (was: )

> [Go][CI][Benchmarks] Fix Go Bench Script after conbench change
> --
>
> Key: ARROW-18374
> URL: https://issues.apache.org/jira/browse/ARROW-18374
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, Continuous Integration, Go
>Reporter: Matthew Topol
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Change [https://github.com/conbench/conbench/pull/417/files#] requires now 
> putting an explicit {{github=None}} as an argument to {{BenchmarkResult}} to 
> have it get the github info from the locally cloned repo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18374) [Go][CI][Benchmarks] Fix Go Bench Script after conbench change

2022-11-21 Thread Matthew Topol (Jira)
Matthew Topol created ARROW-18374:
-

 Summary: [Go][CI][Benchmarks] Fix Go Bench Script after conbench 
change
 Key: ARROW-18374
 URL: https://issues.apache.org/jira/browse/ARROW-18374
 Project: Apache Arrow
  Issue Type: Bug
  Components: Benchmarking, Continuous Integration, Go
Reporter: Matthew Topol
Assignee: Matthew Topol


Change [https://github.com/conbench/conbench/pull/417/files#] requires now 
putting an explicit {{github=None}} as an argument to {{BenchmarkResult}} to 
have it get the github info from the locally cloned repo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-18343) [C++] AllocateBitmap() with out parameter is declared but not defined

2022-11-21 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-18343.

Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14657
[https://github.com/apache/arrow/pull/14657]

> [C++] AllocateBitmap() with out parameter is declared but not defined
> -
>
> Key: ARROW-18343
> URL: https://issues.apache.org/jira/browse/ARROW-18343
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Jin Shang
>Assignee: Jin Shang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [This variant of 
> AllocateBitmap|https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.h#L483]
>  is declared but not defined in buffer.cc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17504) [C++] Adding Fetch Node ToProto

2022-11-21 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-17504:
-

Assignee: (was: Vibhatha Lakmal Abeykoon)

> [C++] Adding Fetch Node ToProto
> ---
>
> Key: ARROW-17504
> URL: https://issues.apache.org/jira/browse/ARROW-17504
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> For roundtrip testing adding the Fetch declaration serialization. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17502) [C++] Fetch Node Substrait Integration

2022-11-21 Thread Apache Arrow JIRA Bot (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636797#comment-17636797
 ] 

Apache Arrow JIRA Bot commented on ARROW-17502:
---

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Fetch Node Substrait Integration
> --
>
> Key: ARROW-17502
> URL: https://issues.apache.org/jira/browse/ARROW-17502
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> Fetch Node is a newly added node (WIP at the moment[1]). After finalizing the 
> Fetch node creation, this needs to integrated with Substrait. 
>  
> [1].[Fetch Node Creation |https://issues.apache.org/jira/browse/ARROW-17190]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17504) [C++] Adding Fetch Node ToProto

2022-11-21 Thread Apache Arrow JIRA Bot (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636796#comment-17636796
 ] 

Apache Arrow JIRA Bot commented on ARROW-17504:
---

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Adding Fetch Node ToProto
> ---
>
> Key: ARROW-17504
> URL: https://issues.apache.org/jira/browse/ARROW-17504
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> For roundtrip testing adding the Fetch declaration serialization. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17502) [C++] Fetch Node Substrait Integration

2022-11-21 Thread Apache Arrow JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Arrow JIRA Bot reassigned ARROW-17502:
-

Assignee: (was: Vibhatha Lakmal Abeykoon)

> [C++] Fetch Node Substrait Integration
> --
>
> Key: ARROW-17502
> URL: https://issues.apache.org/jira/browse/ARROW-17502
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> Fetch Node is a newly added node (WIP at the moment[1]). After finalizing the 
> Fetch node creation, this needs to integrated with Substrait. 
>  
> [1].[Fetch Node Creation |https://issues.apache.org/jira/browse/ARROW-17190]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636794#comment-17636794
 ] 

Antoine Pitrou commented on ARROW-18371:


> I assume the comment is regarding BatchesWithSchema and MakeBasicBatches.

Yes, this is what I meant. Sorry for the imprecision.

> [C++] Expose *FromJSON helpers
> --
>
> Key: ARROW-18371
> URL: https://issues.apache.org/jira/browse/ARROW-18371
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Priority: Major
>  Labels: testing
>
> {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
> testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
> could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18373) MIGRATION: Enable multiple component selection in issue templates

2022-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-18373:
---
Labels: pull-request-available  (was: )

> MIGRATION: Enable multiple component selection in issue templates
> -
>
> Key: ARROW-18373
> URL: https://issues.apache.org/jira/browse/ARROW-18373
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Per comments in [this merged PR|https://github.com/apache/arrow/pull/14675], 
> we would like to enable selection of multiple components when reporting 
> issues via GitHub issues.
> Additionally, we may want to add the needed Apache license to the issue 
> templates and remove the exclusion rules from rat_exclude_files.txt.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636755#comment-17636755
 ] 

Rok Mihevc edited comment on ARROW-18371 at 11/21/22 4:10 PM:
--

I assume the comment is regarding BatchesWithSchema and MakeBasicBatches.


was (Author: rokm):
I assume the comment regarding BatchesWithSchema and MakeBasicBatches.

> [C++] Expose *FromJSON helpers
> --
>
> Key: ARROW-18371
> URL: https://issues.apache.org/jira/browse/ARROW-18371
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Priority: Major
>  Labels: testing
>
> {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
> testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
> could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636755#comment-17636755
 ] 

Rok Mihevc commented on ARROW-18371:


I assume the comment regarding BatchesWithSchema and MakeBasicBatches.

> [C++] Expose *FromJSON helpers
> --
>
> Key: ARROW-18371
> URL: https://issues.apache.org/jira/browse/ARROW-18371
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Priority: Major
>  Labels: testing
>
> {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
> testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
> could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Li Jin (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636754#comment-17636754
 ] 

Li Jin commented on ARROW-18371:


> Definitely not. These are functions generating ad hoc data tailored for 
> specific tests, with little consistency.

To clarify, do you know the \{Array,{{Exec,Record}Batch}FromJSON or 
BatchesWithSchema/MakeBasicBatches

> [C++] Expose *FromJSON helpers
> --
>
> Key: ARROW-18371
> URL: https://issues.apache.org/jira/browse/ARROW-18371
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Priority: Major
>  Labels: testing
>
> {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
> testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
> could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Li Jin (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636754#comment-17636754
 ] 

Li Jin edited comment on ARROW-18371 at 11/21/22 4:08 PM:
--

> Definitely not. These are functions generating ad hoc data tailored for 
> specific tests, with little consistency.

To clarify, do you mean the \{Array,{{Exec,Record}Batch}FromJSON or 
BatchesWithSchema/MakeBasicBatches


was (Author: icexelloss):
> Definitely not. These are functions generating ad hoc data tailored for 
> specific tests, with little consistency.

To clarify, do you know the \{Array,{{Exec,Record}Batch}FromJSON or 
BatchesWithSchema/MakeBasicBatches

> [C++] Expose *FromJSON helpers
> --
>
> Key: ARROW-18371
> URL: https://issues.apache.org/jira/browse/ARROW-18371
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Priority: Major
>  Labels: testing
>
> {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
> testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
> could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636747#comment-17636747
 ] 

Antoine Pitrou commented on ARROW-18371:


Definitely not. These are functions generating ad hoc data tailored for 
specific tests, with little consistency.
We could expose the Random generation class, though, possibly together with 
some API cleanup.

> [C++] Expose *FromJSON helpers
> --
>
> Key: ARROW-18371
> URL: https://issues.apache.org/jira/browse/ARROW-18371
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Priority: Major
>  Labels: testing
>
> {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
> testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
> could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-18110) [Go] Scalar Comparisons

2022-11-21 Thread Matthew Topol (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Topol resolved ARROW-18110.
---
Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14669
[https://github.com/apache/arrow/pull/14669]

> [Go] Scalar Comparisons
> ---
>
> Key: ARROW-18110
> URL: https://issues.apache.org/jira/browse/ARROW-18110
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Matthew Topol
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636736#comment-17636736
 ] 

Rok Mihevc commented on ARROW-18371:


*FromJSON functions seem clear cut. How about also adding BatchesWithSchema and 
MakeBasicBatches. [~apitrou] [~westonpace]

> [C++] Expose *FromJSON helpers
> --
>
> Key: ARROW-18371
> URL: https://issues.apache.org/jira/browse/ARROW-18371
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Priority: Major
>  Labels: testing
>
> {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
> testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
> could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-18373) MIGRATION: Enable multiple component selection in issue templates

2022-11-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-18373:
---

Assignee: Todd Farmer

> MIGRATION: Enable multiple component selection in issue templates
> -
>
> Key: ARROW-18373
> URL: https://issues.apache.org/jira/browse/ARROW-18373
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>
> Per comments in [this merged PR|https://github.com/apache/arrow/pull/14675], 
> we would like to enable selection of multiple components when reporting 
> issues via GitHub issues.
> Additionally, we may want to add the needed Apache license to the issue 
> templates and remove the exclusion rules from rat_exclude_files.txt.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18373) MIGRATION: Enable multiple component selection in issue templates

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18373:
---

 Summary: MIGRATION: Enable multiple component selection in issue 
templates
 Key: ARROW-18373
 URL: https://issues.apache.org/jira/browse/ARROW-18373
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


Per comments in [this merged PR|https://github.com/apache/arrow/pull/14675], we 
would like to enable selection of multiple components when reporting issues via 
GitHub issues.

Additionally, we may want to add the needed Apache license to the issue 
templates and remove the exclusion rules from rat_exclude_files.txt.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-18323) MIGRATION TEST ISSUE #2

2022-11-21 Thread Nicola Crane (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Crane resolved ARROW-18323.
--
Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14675
[https://github.com/apache/arrow/pull/14675]

> MIGRATION TEST ISSUE #2
> ---
>
> Key: ARROW-18323
> URL: https://issues.apache.org/jira/browse/ARROW-18323
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue was created to help test migration-related process and tooling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-18363) [Docs] Include warning when viewing old docs (redirecting to stable/dev docs)

2022-11-21 Thread Alenka Frim (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alenka Frim reassigned ARROW-18363:
---

Assignee: Alenka Frim

> [Docs] Include warning when viewing old docs (redirecting to stable/dev docs)
> -
>
> Key: ARROW-18363
> URL: https://issues.apache.org/jira/browse/ARROW-18363
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Joris Van den Bossche
>Assignee: Alenka Frim
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Now we have versioned docs, we also have the old versions of the developers 
> docs (eg 
> https://arrow.apache.org/docs/9.0/developers/guide/communication.html). Those 
> might be outdated (eg regarding communication channels, build instructions, 
> etc), and typically when contributing / developing with the latest arrow, one 
> should _always_ check the latest dev version of the contributing docs.
> We could add a warning box pointing this out and linking to the dev docs. 
> For example similarly how some projects warn about viewing old docs in 
> general and point to the stable docs (eg https://mne.tools/1.1/index.html or 
> https://scikit-learn.org/1.0/user_guide.html). In this case we could have a 
> custom box when at a page in /developers to point to the dev docs instead of 
> stable docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18363) [Docs] Include warning when viewing old docs (redirecting to stable/dev docs)

2022-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-18363:
---
Labels: pull-request-available  (was: )

> [Docs] Include warning when viewing old docs (redirecting to stable/dev docs)
> -
>
> Key: ARROW-18363
> URL: https://issues.apache.org/jira/browse/ARROW-18363
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now we have versioned docs, we also have the old versions of the developers 
> docs (eg 
> https://arrow.apache.org/docs/9.0/developers/guide/communication.html). Those 
> might be outdated (eg regarding communication channels, build instructions, 
> etc), and typically when contributing / developing with the latest arrow, one 
> should _always_ check the latest dev version of the contributing docs.
> We could add a warning box pointing this out and linking to the dev docs. 
> For example similarly how some projects warn about viewing old docs in 
> general and point to the stable docs (eg https://mne.tools/1.1/index.html or 
> https://scikit-learn.org/1.0/user_guide.html). In this case we could have a 
> custom box when at a page in /developers to point to the dev docs instead of 
> stable docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18372) [R] "Error in `collect()`: ! Invalid: negative malloc size" after large computation returning one cell

2022-11-21 Thread Lucas Mation (Jira)
Lucas Mation created ARROW-18372:


 Summary: [R] "Error in `collect()`: ! Invalid: negative malloc 
size" after large computation returning one cell
 Key: ARROW-18372
 URL: https://issues.apache.org/jira/browse/ARROW-18372
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 10.0.0
Reporter: Lucas Mation


I have a large parquet file 900 million rows , 40cols parquet file, subdivided 
into folders for each year. I was trying to calculate how many unique 
combinations of id1+id2+id3+id4 there are in the dataset.

 

Notice that the "collected" dataset is supposed to be only one row and one cel, 
containing the count (I've confirmed this by subseting the dataset ("%>% 
head(10^6)" ) before computing the count, and it works). That is why the error 
below is so weird

```

fa <- 'myparteq folder' #huge 

va <- open_dataset(fa)

tic()
d <- va  %>% head(10^6) %>% count(id1,id2,id3,id4) %>% count %>% collect

toc()

 

Error in `collect()`:
! Invalid: negative malloc size
Run `rlang::last_error()` to see where the error occurred.

 

> rlang::last_error()

Error in `collect()`:
! Invalid: negative malloc size
---
Backtrace:
 1. ... %>% collect
 3. arrow:::collect.arrow_dplyr_query(.)
Run `rlang::last_trace()` to see the full context.

 

> rlang::last_trace()

Error in `collect()`:
! Invalid: negative malloc size
---
Backtrace:
    x
 1. +-... %>% collect
 2. +-dplyr::collect(.)
 3. \-arrow:::collect.arrow_dplyr_query(.)
 4.   \-base::tryCatch(...)
 5.     \-base (local) tryCatchList(expr, classes, parentenv, handlers)
 6.       \-base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 7.         \-value[[3L]](cond)
 8.           \-arrow:::augment_io_error_msg(e, call, schema = x$.data$schema)
 9.             \-rlang::abort(msg, call = call)

 

```

I am running this on a windows server, 512Gb of RAM.

 sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 R2 x64 (build 9600)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252    
LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
[5] LC_TIME=Portuguese_Brazil.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] arrow_10.0.0      data.table_1.14.4 forcats_0.5.2     dplyr_1.0.10      
purrr_0.3.5  readr_2.1.3       tidyr_1.2.1       tibble_3.1.8     
 [9] ggplot2_3.3.6     tidyverse_1.3.2   gt_0.7.0          xtable_1.8-4      
ggthemes_4.2.4    collapse_1.8.6    pryr_0.1.5        janitor_2.1.0    
[17] tictoc_1.1        lubridate_1.8.0   stringr_1.4.1     readxl_1.4.1     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9          assertthat_0.2.1    digest_0.6.30       utf8_1.2.2     
     R6_2.5.1            cellranger_1.1.0    backports_1.4.1    
 [8] reprex_2.0.2        httr_1.4.4          pillar_1.8.1        rlang_1.0.6    
     googlesheets4_1.0.1 rstudioapi_0.14     googledrive_2.0.0  
[15] bit_4.0.4           munsell_0.5.0       broom_1.0.1         compiler_4.2.1 
     modelr_0.1.9        pkgconfig_2.0.3     htmltools_0.5.3    
[22] tidyselect_1.2.0    codetools_0.2-18    fansi_1.0.3         crayon_1.5.2   
     tzdb_0.3.0          dbplyr_2.2.1        withr_2.5.0        
[29] grid_4.2.1          jsonlite_1.8.3      gtable_0.3.1        
lifecycle_1.0.3     DBI_1.1.3           magrittr_2.0.3      scales_1.2.1       
[36] cli_3.4.1           stringi_1.7.8       fs_1.5.2            
snakecase_0.11.0    xml2_1.3.3          ellipsis_0.3.2      generics_0.1.3     
[43] vctrs_0.5.0         tools_4.2.1         bit64_4.0.5         glue_1.6.2     
     hms_1.1.2           parallel_4.2.1      fastmap_1.1.0      
[50] colorspace_2.0-3    gargle_1.2.1        rvest_1.0.3         haven_2.5.1    

 

 arrow_info()
Arrow package version: 10.0.0

Capabilities:
               
dataset    TRUE
substrait FALSE
parquet    TRUE
json       TRUE
s3         TRUE
gcs        TRUE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc  FALSE
mimalloc   TRUE

Arrow options():
                       
arrow.use_threads FALSE

Memory:
                  
Allocator mimalloc
Current   74.82 Gb
Max       97.75 Gb

Runtime:
                        
SIMD Level          avx2
Detected SIMD Level avx2

Build:
                                                             
C++ Library Version                                    10.0.0
C++ Compiler                                              GNU
C++ Compiler Version                                   10.3.0
Git ID               aa7118b6e5f49b354fa8a93d9cf363c9ebe9a3f0

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18371) [C++] Expose *FromJSON helpers

2022-11-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-18371:
--

 Summary: [C++] Expose *FromJSON helpers
 Key: ARROW-18371
 URL: https://issues.apache.org/jira/browse/ARROW-18371
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


{Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when 
testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches 
could be considered as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)