[jira] [Assigned] (ARROW-17079) [C++] Improve error message propagation from AWS SDK
[ https://issues.apache.org/jira/browse/ARROW-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-17079: Assignee: Philipp Moritz > [C++] Improve error message propagation from AWS SDK > > > Key: ARROW-17079 > URL: https://issues.apache.org/jira/browse/ARROW-17079 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 8.0.0 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Dear all, > I'd like to see if there is interest to improve the error messages that > originate from the AWS SDK. Especially for loading datasets from S3, there > are many things that can go wrong and the error messages that (Py)Arrow gives > are not always the most actionable, especially if the call involves many > different SDK functions. In particular, it would be great to have the > following attached to each error message: > * A machine parseable status code from the AWS SDK > * Information as to exactly which AWS SDK call failed, so it can be > disambiguated for Arrow API calls that use multiple AWS SDK calls > In the ideal case, as a developer I could reconstruct the AWS SDK call that > failed from the error message (e.g. in a form the allows me to run the API > call via the "aws" CLI program) so I can debug errors and see how they relate > to my AWS infrastructure. Any progress in this direction would be super > helpful. > > For context: I recently was debugging some permissioning issues in S3 based > on the current error codes and it was pretty hard to figure out what was > going on (see > [https://github.com/ray-project/ray/issues/19799#issuecomment-1185035602).] > > I'm happy to take a stab at this problem but might need some help. Is > implementing a custom StatusDetail class for AWS errors and propagating > errors that way the right hunch here? > [https://github.com/apache/arrow/blob/50f6fcad6cc09c06e78dcd09ad07218b86e689de/cpp/src/arrow/status.h#L110] > > All the best, > Philipp. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17079) [C++] Improve error message propagation from AWS SDK
[ https://issues.apache.org/jira/browse/ARROW-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-17079: - Summary: [C++] Improve error message propagation from AWS SDK (was: Improve error message propagation from AWS SDK) > [C++] Improve error message propagation from AWS SDK > > > Key: ARROW-17079 > URL: https://issues.apache.org/jira/browse/ARROW-17079 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 8.0.0 >Reporter: Philipp Moritz >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Dear all, > I'd like to see if there is interest to improve the error messages that > originate from the AWS SDK. Especially for loading datasets from S3, there > are many things that can go wrong and the error messages that (Py)Arrow gives > are not always the most actionable, especially if the call involves many > different SDK functions. In particular, it would be great to have the > following attached to each error message: > * A machine parseable status code from the AWS SDK > * Information as to exactly which AWS SDK call failed, so it can be > disambiguated for Arrow API calls that use multiple AWS SDK calls > In the ideal case, as a developer I could reconstruct the AWS SDK call that > failed from the error message (e.g. in a form the allows me to run the API > call via the "aws" CLI program) so I can debug errors and see how they relate > to my AWS infrastructure. Any progress in this direction would be super > helpful. > > For context: I recently was debugging some permissioning issues in S3 based > on the current error codes and it was pretty hard to figure out what was > going on (see > [https://github.com/ray-project/ray/issues/19799#issuecomment-1185035602).] > > I'm happy to take a stab at this problem but might need some help. Is > implementing a custom StatusDetail class for AWS errors and propagating > errors that way the right hunch here? > [https://github.com/apache/arrow/blob/50f6fcad6cc09c06e78dcd09ad07218b86e689de/cpp/src/arrow/status.h#L110] > > All the best, > Philipp. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17079) Improve error message propagation from AWS SDK
[ https://issues.apache.org/jira/browse/ARROW-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17079: --- Labels: pull-request-available (was: ) > Improve error message propagation from AWS SDK > -- > > Key: ARROW-17079 > URL: https://issues.apache.org/jira/browse/ARROW-17079 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 8.0.0 >Reporter: Philipp Moritz >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Dear all, > I'd like to see if there is interest to improve the error messages that > originate from the AWS SDK. Especially for loading datasets from S3, there > are many things that can go wrong and the error messages that (Py)Arrow gives > are not always the most actionable, especially if the call involves many > different SDK functions. In particular, it would be great to have the > following attached to each error message: > * A machine parseable status code from the AWS SDK > * Information as to exactly which AWS SDK call failed, so it can be > disambiguated for Arrow API calls that use multiple AWS SDK calls > In the ideal case, as a developer I could reconstruct the AWS SDK call that > failed from the error message (e.g. in a form the allows me to run the API > call via the "aws" CLI program) so I can debug errors and see how they relate > to my AWS infrastructure. Any progress in this direction would be super > helpful. > > For context: I recently was debugging some permissioning issues in S3 based > on the current error codes and it was pretty hard to figure out what was > going on (see > [https://github.com/ray-project/ray/issues/19799#issuecomment-1185035602).] > > I'm happy to take a stab at this problem but might need some help. Is > implementing a custom StatusDetail class for AWS errors and propagating > errors that way the right hunch here? > [https://github.com/apache/arrow/blob/50f6fcad6cc09c06e78dcd09ad07218b86e689de/cpp/src/arrow/status.h#L110] > > All the best, > Philipp. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16340) [C++][Python] Move all Python related code into PyArrow
[ https://issues.apache.org/jira/browse/ARROW-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-16340: - Summary: [C++][Python] Move all Python related code into PyArrow (was: [Python] Move all Python related code into PyArrow) > [C++][Python] Move all Python related code into PyArrow > --- > > Key: ARROW-16340 > URL: https://issues.apache.org/jira/browse/ARROW-16340 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Alenka Frim >Assignee: Alenka Frim >Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 32h 10m > Remaining Estimate: 0h > > Move {{src/arrow/python}} directory into {{pyarrow}} and arrange PyArrow to > build it. > More details can be found on this thread: > https://lists.apache.org/thread/jbxyldhqff4p9z53whhs95y4jcomdgd2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17535) [Python] List arrays aren't supported in to_pandas calls
Micah Kornfield created ARROW-17535: --- Summary: [Python] List arrays aren't supported in to_pandas calls Key: ARROW-17535 URL: https://issues.apache.org/jira/browse/ARROW-17535 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Micah Kornfield EXTENSION is not in the list of types allowed. I think in order to enable EXTENSION we need to be able to call to_pylist or similar on the original extension array from C++ code, in case there were user provided overrides. Off the top of my head one way of doing this would be to pass through an additional std::unorderd_map where PyObject is the bound to_pylist python function. Are there other alternative that might be cleaner? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (ARROW-17534) [C++] Support optional arguments in aggregation function mapping in the Substrait consumer.
[ https://issues.apache.org/jira/browse/ARROW-17534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vibhatha Lakmal Abeykoon reassigned ARROW-17534: Assignee: Vibhatha Lakmal Abeykoon > [C++] Support optional arguments in aggregation function mapping in the > Substrait consumer. > --- > > Key: ARROW-17534 > URL: https://issues.apache.org/jira/browse/ARROW-17534 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Weston Pace >Assignee: Vibhatha Lakmal Abeykoon >Priority: Major > Labels: substrait > > It appears that {{sum}} and {{avg}} have an optional enum argument to specify > overflow behavior. I'm not certain if I just missed this or if it is new. > Either way the current function mapping does not account for this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17519) [R] RTools35 job is failing
[ https://issues.apache.org/jira/browse/ARROW-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585073#comment-17585073 ] Kouhei Sutou commented on ARROW-17519: -- Thanks. It seems that the referred discussion https://lists.apache.org/thread/9g14n3odhj6kzsgjxr6k6d3q73hg2njr from your link includes R 3.5 on Windows: {quote} It is reasonable to drop support for R < 4.0 on Windows, as suggested in this JIRA comment: https://issues.apache.org/jira/browse/ARROW-17110?focusedCommentId=17571472=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17571472 {quote} > [R] RTools35 job is failing > --- > > Key: ARROW-17519 > URL: https://issues.apache.org/jira/browse/ARROW-17519 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Dewey Dunnington >Priority: Major > > After ARROW-17436, the RTools35 job is consistently failing with: > {noformat} > Error: Error: package or namespace load failed for 'arrow' in inDL(x, > as.logical(local), as.logical(now), ...): > unable to load shared object > 'D:/a/arrow/arrow/r/check/arrow.Rcheck/00LOCK-arrow/00new/arrow/libs/i386/arrow.dll': > LoadLibrary failure: A dynamic link library (DLL) initialization routine > failed. > {noformat} > Given that there is a mailing list discussion about dropping support for that > platform, should we disable the check? Or wait until that is resolved to > disable the check? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
[ https://issues.apache.org/jira/browse/ARROW-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585071#comment-17585071 ] Kouhei Sutou edited comment on ARROW-17531 at 8/26/22 12:36 AM: Thanks. Could you try the following instead? {noformat} install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/centos7/latest;) {noformat} was (Author: kou): Thanks. Could you try {{install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/centos7/latest;)}} instead? > /lib64/libm.so.6: version `GLIBC_2.27' not found > > > Key: ARROW-17531 > URL: https://issues.apache.org/jira/browse/ARROW-17531 > Project: Apache Arrow > Issue Type: Bug > Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >Reporter: Net Zhang >Priority: Major > > Hi, I've followed the [instructions > |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to > install the arrow R package on a Linux machine. > {noformat} > options( > HTTPUserAgent = > sprintf( > "R/%s R (%s)", > getRversion(), > paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > ) > ) > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > {noformat} > The installation was successful but when I load the library I received error > message indicating > {noformat} > /lib64/libm.so.6: version `GLIBC_2.27' not found > {noformat} > Here's my full log, containing machine information > {noformat} > > HTTPUserAgent = > + sprintf( > + "R/%s R (%s)", > + getRversion(), > + paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > + ) > > HTTPUserAgent > [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > > install.packages("arrow", repos = > > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > Installing package into > ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ > (as ‘lib’ is unspecified) > trying URL > 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' > Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) > == > downloaded 33.1 MB > * installing *binary* package ‘arrow’ ... > * DONE (arrow) > The downloaded source packages are in > ‘/tmp/RtmpUfdX4s/downloaded_packages’ > > library(arrow) > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': > /lib64/libm.so.6: version `GLIBC_2.27' not found (required by > /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) > In addition: Warning message: > package ‘arrow’ was built under R version 4.0.5 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
[ https://issues.apache.org/jira/browse/ARROW-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585071#comment-17585071 ] Kouhei Sutou edited comment on ARROW-17531 at 8/26/22 12:36 AM: Thanks. Could you try {{install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/centos7/latest;)}} instead? was (Author: kou): Thanks. Could you try {{install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/centos7/latest}} instead? > /lib64/libm.so.6: version `GLIBC_2.27' not found > > > Key: ARROW-17531 > URL: https://issues.apache.org/jira/browse/ARROW-17531 > Project: Apache Arrow > Issue Type: Bug > Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >Reporter: Net Zhang >Priority: Major > > Hi, I've followed the [instructions > |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to > install the arrow R package on a Linux machine. > {noformat} > options( > HTTPUserAgent = > sprintf( > "R/%s R (%s)", > getRversion(), > paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > ) > ) > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > {noformat} > The installation was successful but when I load the library I received error > message indicating > {noformat} > /lib64/libm.so.6: version `GLIBC_2.27' not found > {noformat} > Here's my full log, containing machine information > {noformat} > > HTTPUserAgent = > + sprintf( > + "R/%s R (%s)", > + getRversion(), > + paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > + ) > > HTTPUserAgent > [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > > install.packages("arrow", repos = > > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > Installing package into > ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ > (as ‘lib’ is unspecified) > trying URL > 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' > Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) > == > downloaded 33.1 MB > * installing *binary* package ‘arrow’ ... > * DONE (arrow) > The downloaded source packages are in > ‘/tmp/RtmpUfdX4s/downloaded_packages’ > > library(arrow) > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': > /lib64/libm.so.6: version `GLIBC_2.27' not found (required by > /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) > In addition: Warning message: > package ‘arrow’ was built under R version 4.0.5 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
[ https://issues.apache.org/jira/browse/ARROW-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585071#comment-17585071 ] Kouhei Sutou commented on ARROW-17531: -- Thanks. Could you try {{install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/centos7/latest}} instead? > /lib64/libm.so.6: version `GLIBC_2.27' not found > > > Key: ARROW-17531 > URL: https://issues.apache.org/jira/browse/ARROW-17531 > Project: Apache Arrow > Issue Type: Bug > Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >Reporter: Net Zhang >Priority: Major > > Hi, I've followed the [instructions > |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to > install the arrow R package on a Linux machine. > {noformat} > options( > HTTPUserAgent = > sprintf( > "R/%s R (%s)", > getRversion(), > paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > ) > ) > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > {noformat} > The installation was successful but when I load the library I received error > message indicating > {noformat} > /lib64/libm.so.6: version `GLIBC_2.27' not found > {noformat} > Here's my full log, containing machine information > {noformat} > > HTTPUserAgent = > + sprintf( > + "R/%s R (%s)", > + getRversion(), > + paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > + ) > > HTTPUserAgent > [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > > install.packages("arrow", repos = > > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > Installing package into > ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ > (as ‘lib’ is unspecified) > trying URL > 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' > Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) > == > downloaded 33.1 MB > * installing *binary* package ‘arrow’ ... > * DONE (arrow) > The downloaded source packages are in > ‘/tmp/RtmpUfdX4s/downloaded_packages’ > > library(arrow) > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': > /lib64/libm.so.6: version `GLIBC_2.27' not found (required by > /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) > In addition: Warning message: > package ‘arrow’ was built under R version 4.0.5 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
[ https://issues.apache.org/jira/browse/ARROW-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585037#comment-17585037 ] Kouhei Sutou edited comment on ARROW-17531 at 8/26/22 12:32 AM: {noformat} > sessionInfo()R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /opt/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 tools_4.0.2packrat_0.6.0 {noformat} Here's the session info was (Author: JIRAUSER294961): `` > sessionInfo()R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /opt/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 tools_4.0.2packrat_0.6.0 ``` Here's the session info > /lib64/libm.so.6: version `GLIBC_2.27' not found > > > Key: ARROW-17531 > URL: https://issues.apache.org/jira/browse/ARROW-17531 > Project: Apache Arrow > Issue Type: Bug > Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >Reporter: Net Zhang >Priority: Major > > Hi, I've followed the [instructions > |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to > install the arrow R package on a Linux machine. > {noformat} > options( > HTTPUserAgent = > sprintf( > "R/%s R (%s)", > getRversion(), > paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > ) > ) > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > {noformat} > The installation was successful but when I load the library I received error > message indicating > {noformat} > /lib64/libm.so.6: version `GLIBC_2.27' not found > {noformat} > Here's my full log, containing machine information > {noformat} > > HTTPUserAgent = > + sprintf( > + "R/%s R (%s)", > + getRversion(), > + paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > + ) > > HTTPUserAgent > [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > > install.packages("arrow", repos = > > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > Installing package into > ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ > (as ‘lib’ is unspecified) > trying URL > 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' > Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) > == > downloaded 33.1 MB > * installing *binary* package ‘arrow’ ... > * DONE (arrow) > The downloaded source packages are in > ‘/tmp/RtmpUfdX4s/downloaded_packages’ > > library(arrow) > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': > /lib64/libm.so.6: version `GLIBC_2.27' not found (required by > /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) > In addition: Warning message: > package ‘arrow’ was built under R version 4.0.5 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17534) [C++] Support optional arguments in aggregation function mapping in the Substrait consumer.
Weston Pace created ARROW-17534: --- Summary: [C++] Support optional arguments in aggregation function mapping in the Substrait consumer. Key: ARROW-17534 URL: https://issues.apache.org/jira/browse/ARROW-17534 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Weston Pace It appears that {{sum}} and {{avg}} have an optional enum argument to specify overflow behavior. I'm not certain if I just missed this or if it is new. Either way the current function mapping does not account for this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16772) [C++] Implement encode and decode functions for Run-Length encoding
[ https://issues.apache.org/jira/browse/ARROW-16772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aldrin Montana updated ARROW-16772: --- Component/s: C++ > [C++] Implement encode and decode functions for Run-Length encoding > --- > > Key: ARROW-16772 > URL: https://issues.apache.org/jira/browse/ARROW-16772 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Tobias Zagorni >Assignee: Tobias Zagorni >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
[ https://issues.apache.org/jira/browse/ARROW-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585037#comment-17585037 ] Net Zhang commented on ARROW-17531: --- `` > sessionInfo()R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /opt/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 tools_4.0.2packrat_0.6.0 ``` Here's the session info > /lib64/libm.so.6: version `GLIBC_2.27' not found > > > Key: ARROW-17531 > URL: https://issues.apache.org/jira/browse/ARROW-17531 > Project: Apache Arrow > Issue Type: Bug > Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >Reporter: Net Zhang >Priority: Major > > Hi, I've followed the [instructions > |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to > install the arrow R package on a Linux machine. > {noformat} > options( > HTTPUserAgent = > sprintf( > "R/%s R (%s)", > getRversion(), > paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > ) > ) > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > {noformat} > The installation was successful but when I load the library I received error > message indicating > {noformat} > /lib64/libm.so.6: version `GLIBC_2.27' not found > {noformat} > Here's my full log, containing machine information > {noformat} > > HTTPUserAgent = > + sprintf( > + "R/%s R (%s)", > + getRversion(), > + paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > + ) > > HTTPUserAgent > [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > > install.packages("arrow", repos = > > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > Installing package into > ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ > (as ‘lib’ is unspecified) > trying URL > 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' > Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) > == > downloaded 33.1 MB > * installing *binary* package ‘arrow’ ... > * DONE (arrow) > The downloaded source packages are in > ‘/tmp/RtmpUfdX4s/downloaded_packages’ > > library(arrow) > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': > /lib64/libm.so.6: version `GLIBC_2.27' not found (required by > /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) > In addition: Warning message: > package ‘arrow’ was built under R version 4.0.5 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
[ https://issues.apache.org/jira/browse/ARROW-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585032#comment-17585032 ] Kouhei Sutou commented on ARROW-17531: -- Could you show your OS information? > /lib64/libm.so.6: version `GLIBC_2.27' not found > > > Key: ARROW-17531 > URL: https://issues.apache.org/jira/browse/ARROW-17531 > Project: Apache Arrow > Issue Type: Bug > Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >Reporter: Net Zhang >Priority: Major > > Hi, I've followed the [instructions > |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to > install the arrow R package on a Linux machine. > {noformat} > options( > HTTPUserAgent = > sprintf( > "R/%s R (%s)", > getRversion(), > paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > ) > ) > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > {noformat} > The installation was successful but when I load the library I received error > message indicating > {noformat} > /lib64/libm.so.6: version `GLIBC_2.27' not found > {noformat} > Here's my full log, containing machine information > {noformat} > > HTTPUserAgent = > + sprintf( > + "R/%s R (%s)", > + getRversion(), > + paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > + ) > > HTTPUserAgent > [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > > install.packages("arrow", repos = > > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) > Installing package into > ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ > (as ‘lib’ is unspecified) > trying URL > 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' > Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) > == > downloaded 33.1 MB > * installing *binary* package ‘arrow’ ... > * DONE (arrow) > The downloaded source packages are in > ‘/tmp/RtmpUfdX4s/downloaded_packages’ > > library(arrow) > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': > /lib64/libm.so.6: version `GLIBC_2.27' not found (required by > /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) > In addition: Warning message: > package ‘arrow’ was built under R version 4.0.5 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
[ https://issues.apache.org/jira/browse/ARROW-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-17531: - Description: Hi, I've followed the [instructions |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to install the arrow R package on a Linux machine. {noformat} options( HTTPUserAgent = sprintf( "R/%s R (%s)", getRversion(), paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) ) ) install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) {noformat} The installation was successful but when I load the library I received error message indicating {noformat} /lib64/libm.so.6: version `GLIBC_2.27' not found {noformat} Here's my full log, containing machine information {noformat} > HTTPUserAgent = + sprintf( + "R/%s R (%s)", + getRversion(), + paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) + ) > HTTPUserAgent [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) Installing package into ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) == downloaded 33.1 MB * installing *binary* package ‘arrow’ ... * DONE (arrow) The downloaded source packages are in ‘/tmp/RtmpUfdX4s/downloaded_packages’ > library(arrow) Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) In addition: Warning message: package ‘arrow’ was built under R version 4.0.5 {noformat} was: Hi, I've followed the [instructions |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to install the arrow R package on a Linux machine. {noformat} options( HTTPUserAgent = sprintf( "R/%s R (%s)", getRversion(), paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) ) ) install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) ``` The installation was successful but when I load the library I received error message indicating ``` /lib64/libm.so.6: version `GLIBC_2.27' not found ``` Here's my full log, containing machine information ``` > HTTPUserAgent = + sprintf( + "R/%s R (%s)", + getRversion(), + paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) + ) > HTTPUserAgent [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) Installing package into ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) == downloaded 33.1 MB * installing *binary* package ‘arrow’ ... * DONE (arrow) The downloaded source packages are in ‘/tmp/RtmpUfdX4s/downloaded_packages’ > library(arrow) Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) In addition: Warning message: package ‘arrow’ was built under R version 4.0.5 {noformat} > /lib64/libm.so.6: version `GLIBC_2.27' not found > > > Key: ARROW-17531 > URL: https://issues.apache.org/jira/browse/ARROW-17531 > Project: Apache Arrow > Issue Type: Bug > Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >Reporter: Net Zhang >Priority: Major > > Hi, I've followed the [instructions > |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to > install the arrow R package on a Linux machine. > {noformat} > options( > HTTPUserAgent = > sprintf( > "R/%s R (%s)", > getRversion(), > paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > ) > ) >
[jira] [Updated] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
[ https://issues.apache.org/jira/browse/ARROW-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-17531: - Description: Hi, I've followed the [instructions |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to install the arrow R package on a Linux machine. {noformat} options( HTTPUserAgent = sprintf( "R/%s R (%s)", getRversion(), paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) ) ) install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) ``` The installation was successful but when I load the library I received error message indicating ``` /lib64/libm.so.6: version `GLIBC_2.27' not found ``` Here's my full log, containing machine information ``` > HTTPUserAgent = + sprintf( + "R/%s R (%s)", + getRversion(), + paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) + ) > HTTPUserAgent [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) Installing package into ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) == downloaded 33.1 MB * installing *binary* package ‘arrow’ ... * DONE (arrow) The downloaded source packages are in ‘/tmp/RtmpUfdX4s/downloaded_packages’ > library(arrow) Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) In addition: Warning message: package ‘arrow’ was built under R version 4.0.5 {noformat} was: Hi, I've followed the [instructions |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to install the arrow R package on a Linux machine. ``` options( HTTPUserAgent = sprintf( "R/%s R (%s)", getRversion(), paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) ) ) install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) ``` The installation was successful but when I load the library I received error message indicating ``` /lib64/libm.so.6: version `GLIBC_2.27' not found ``` Here's my full log, containing machine information ``` > HTTPUserAgent = + sprintf( + "R/%s R (%s)", + getRversion(), + paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) + ) > HTTPUserAgent [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) Installing package into ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) == downloaded 33.1 MB * installing *binary* package ‘arrow’ ... * DONE (arrow) The downloaded source packages are in ‘/tmp/RtmpUfdX4s/downloaded_packages’ > library(arrow) Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) In addition: Warning message: package ‘arrow’ was built under R version 4.0.5 ``` > /lib64/libm.so.6: version `GLIBC_2.27' not found > > > Key: ARROW-17531 > URL: https://issues.apache.org/jira/browse/ARROW-17531 > Project: Apache Arrow > Issue Type: Bug > Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >Reporter: Net Zhang >Priority: Major > > Hi, I've followed the [instructions > |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to > install the arrow R package on a Linux machine. > {noformat} > options( > HTTPUserAgent = > sprintf( > "R/%s R (%s)", > getRversion(), > paste(getRversion(), R.version["platform"], R.version["arch"], > R.version["os"]) > ) > ) > install.packages("arrow", repos = >
[jira] [Created] (ARROW-17533) [R] Implement asof join
Jonathan Keane created ARROW-17533: -- Summary: [R] Implement asof join Key: ARROW-17533 URL: https://issues.apache.org/jira/browse/ARROW-17533 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Jonathan Keane With ARROW-16083 we have asof joins, could we expose this in R? Docs for the node: https://arrow.apache.org/docs/cpp/api/compute.html?highlight=asof#_CPPv4N5arrow7compute19AsofJoinNodeOptionsE A possible syntax might be (there does not appear to be a syntax in dplyr for this already): {code} asof_join(table1, table2, by = "field", tolerance = 1) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17533) [R] Implement asof join
[ https://issues.apache.org/jira/browse/ARROW-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585027#comment-17585027 ] Jonathan Keane commented on ARROW-17533: A bit more prior art | folks asking for: https://stackoverflow.com/questions/58538114/is-there-an-r-equivalent-of-pythons-pandas-merge-asof > [R] Implement asof join > --- > > Key: ARROW-17533 > URL: https://issues.apache.org/jira/browse/ARROW-17533 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Jonathan Keane >Priority: Major > > With ARROW-16083 we have asof joins, could we expose this in R? > Docs for the node: > https://arrow.apache.org/docs/cpp/api/compute.html?highlight=asof#_CPPv4N5arrow7compute19AsofJoinNodeOptionsE > A possible syntax might be (there does not appear to be a syntax in dplyr for > this already): > {code} > asof_join(table1, table2, by = "field", tolerance = 1) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17532) [Go] Implement Numeric Cast functions
Matthew Topol created ARROW-17532: - Summary: [Go] Implement Numeric Cast functions Key: ARROW-17532 URL: https://issues.apache.org/jira/browse/ARROW-17532 Project: Apache Arrow Issue Type: Sub-task Components: Go Reporter: Matthew Topol Assignee: Matthew Topol -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17528) [R] Tidy up the pkgdown articles site index
[ https://issues.apache.org/jira/browse/ARROW-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585023#comment-17585023 ] Stephanie Hazlitt commented on ARROW-17528: --- We could consider a few broad categories for the first level e.g. Developers (already there), Installation, Users. > [R] Tidy up the pkgdown articles site index > > > Key: ARROW-17528 > URL: https://issues.apache.org/jira/browse/ARROW-17528 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Nicola Crane >Priority: Major > > We could better organise the different articles we have to make it easier for > users to find the right info -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-12711) [R] Bindings for paste(collapse), str_c(collapse), and str_flatten()
[ https://issues.apache.org/jira/browse/ARROW-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585019#comment-17585019 ] Travis Lim commented on ARROW-12711: [~icook] Any updates on bindings for dplyr summarise with paste(collapse) or str_c(collapse) in upcoming releases? A potential workaround was floated for Python here https://issues.apache.org/jira/browse/ARROW-12710 but having this in R would be a game changer, especially for NLP applications :pray: :pray: :pray: > [R] Bindings for paste(collapse), str_c(collapse), and str_flatten() > > > Key: ARROW-12711 > URL: https://issues.apache.org/jira/browse/ARROW-12711 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Ian Cook >Priority: Major > Labels: query-engine > > These are the aggregating versions of string concatenation—they combine > values from a set of rows into a single value. > The bindings for {{paste()}} and {{str_c()}} might be tricky to implement > because when these functions are called with the {{coallapse}} argument > unset, they do _not_ aggregate. > In {{summarise()}} we need to be able to use scalar concatenation within > aggregate concatenation, like this: > {code:java} > starwars %>% > filter(!is.na(hair_color) & !is.na(eye_color)) %>% > group_by(homeworld) %>% > summarise(hair_and_eyes = paste0(paste0(hair_color, "-haired and ", > eye_color, "-eyed"), collapse = ", ")){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17527) [Go] Implement Cast to Boolean Functions
[ https://issues.apache.org/jira/browse/ARROW-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17527: --- Labels: pull-request-available (was: ) > [Go] Implement Cast to Boolean Functions > > > Key: ARROW-17527 > URL: https://issues.apache.org/jira/browse/ARROW-17527 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Matthew Topol >Assignee: Matthew Topol >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17530) [Java] VectorSchemaRoot#addVector() cannot add a vector to the end of the current vector collection
[ https://issues.apache.org/jira/browse/ARROW-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Larry White updated ARROW-17530: Description: The current implementation of Java VectorSchemaRoot cannot add a vector at the end of the current list (which is the generally understood meaning of "add"). The Precondition check in the method's second line prevents providing an appropriate index for adding at the end: {code:java} public VectorSchemaRoot addVector(int index, FieldVector vector) { Preconditions.checkNotNull(vector); Preconditions.checkArgument(index >= 0 && index < fieldVectors.size()); List newVectors = new ArrayList<>(); for (int i = 0; i < fieldVectors.size(); i++) { if (i == index) { newVectors.add(vector); } newVectors.add(fieldVectors.get(i)); } return new VectorSchemaRoot(newVectors); } {code} One possible implementation resolving the issue is shown below. {code:java} public VectorSchemaRoot addVector(int index, FieldVector vector) { Preconditions.checkNotNull(vector); Preconditions.checkArgument(index >= 0 && index <= fieldVectors.size()); List newVectors = new ArrayList<>(); if (index == fieldVectors.size()) { newVectors.addAll(fieldVectors); newVectors.add(vector); } else { for (int i = 0; i < fieldVectors.size(); i++) { if (i == index) { newVectors.add(vector); } newVectors.add(fieldVectors.get(i)); } } return new VectorSchemaRoot(newVectors); } {code} was: The current implementation of Java VectorSchemaRoot cannot add a vector at the end of the current list (which is the generally understood meaning of "add"). The Precondition check in the method's second line prevents providing an appropriate index for adding at the end: {code:java} public VectorSchemaRoot addVector(int index, FieldVector vector) { Preconditions.checkNotNull(vector); Preconditions.checkArgument(index >= 0 && index < fieldVectors.size()); List newVectors = new ArrayList<>(); for (int i = 0; i < fieldVectors.size(); i++) { if (i == index) { newVectors.add(vector); } newVectors.add(fieldVectors.get(i)); } return new VectorSchemaRoot(newVectors); } {code} One possible implementation resolving the issue is shown below. {code:java} public VectorSchemaRoot addVector(int index, FieldVector vector) { Preconditions.checkNotNull(vector); Preconditions.checkArgument(index >= 0 && index <= fieldVectors.size()); List newVectors = new ArrayList<>(); if (index == fieldVectors.size()) { newVectors.addAll(fieldVectors); newVectors.add(vector); } else { for (int i = 0; i < fieldVectors.size(); i++) { if (i == index) { newVectors.add(vector); } newVectors.add(fieldVectors.get(i)); } } return new VectorSchemaRoot(newVectors); } {code} > [Java] VectorSchemaRoot#addVector() cannot add a vector to the end of the > current vector collection > --- > > Key: ARROW-17530 > URL: https://issues.apache.org/jira/browse/ARROW-17530 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 9.0.0, 9.0.1 >Reporter: Larry White >Assignee: Larry White >Priority: Major > > The current implementation of Java VectorSchemaRoot cannot add a vector at > the end of the current list (which is the generally understood meaning of > "add"). > The Precondition check in the method's second line prevents providing an > appropriate index for adding at the end: > {code:java} > public VectorSchemaRoot addVector(int index, FieldVector vector) { > Preconditions.checkNotNull(vector); > Preconditions.checkArgument(index >= 0 && index < fieldVectors.size()); > List newVectors = new ArrayList<>(); > for (int i = 0; i < fieldVectors.size(); i++) { > if (i == index) { > newVectors.add(vector); > } > newVectors.add(fieldVectors.get(i)); > } > return new VectorSchemaRoot(newVectors); > } > {code} > One possible implementation resolving the issue is shown below. > {code:java} > public VectorSchemaRoot addVector(int index, FieldVector vector) { > Preconditions.checkNotNull(vector); > Preconditions.checkArgument(index >= 0 && index <= fieldVectors.size()); > List newVectors = new ArrayList<>(); > if (index == fieldVectors.size()) { > newVectors.addAll(fieldVectors); > newVectors.add(vector); > } else { > for (int i = 0; i < fieldVectors.size(); i++) { > if (i == index) { > newVectors.add(vector); > } > newVectors.add(fieldVectors.get(i)); > } > } > return new VectorSchemaRoot(newVectors); > } > {code} > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17530) [Java] VectorSchemaRoot#addVector() cannot add a vector to the end of the current vector collection
[ https://issues.apache.org/jira/browse/ARROW-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Larry White updated ARROW-17530: Issue Type: Bug (was: Improvement) > [Java] VectorSchemaRoot#addVector() cannot add a vector to the end of the > current vector collection > --- > > Key: ARROW-17530 > URL: https://issues.apache.org/jira/browse/ARROW-17530 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 9.0.0, 9.0.1 >Reporter: Larry White >Assignee: Larry White >Priority: Major > > The current implementation of Java VectorSchemaRoot cannot add a vector at > the end of the current list (which is the generally understood meaning of > "add"). > The Precondition check in the method's second line prevents providing an > appropriate index for adding at the end: > > {code:java} > public VectorSchemaRoot addVector(int index, FieldVector vector) { > Preconditions.checkNotNull(vector); > Preconditions.checkArgument(index >= 0 && index < fieldVectors.size()); > List newVectors = new ArrayList<>(); > for (int i = 0; i < fieldVectors.size(); i++) { > if (i == index) { > newVectors.add(vector); > } > newVectors.add(fieldVectors.get(i)); > } > return new VectorSchemaRoot(newVectors); > } > {code} > > > One possible implementation resolving the issue is shown below. > > {code:java} > public VectorSchemaRoot addVector(int index, FieldVector vector) { > Preconditions.checkNotNull(vector); > Preconditions.checkArgument(index >= 0 && index <= fieldVectors.size()); > List newVectors = new ArrayList<>(); > if (index == fieldVectors.size()) { > newVectors.addAll(fieldVectors); > newVectors.add(vector); > } else { > for (int i = 0; i < fieldVectors.size(); i++) { > if (i == index) { > newVectors.add(vector); > } > newVectors.add(fieldVectors.get(i)); > } > } > return new VectorSchemaRoot(newVectors); > } > {code} > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17262) [C++] Kernel input type matcher for RLE
[ https://issues.apache.org/jira/browse/ARROW-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17262: --- Labels: pull-request-available (was: ) > [C++] Kernel input type matcher for RLE > --- > > Key: ARROW-17262 > URL: https://issues.apache.org/jira/browse/ARROW-17262 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Tobias Zagorni >Assignee: Tobias Zagorni >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Builds on top of ARROW-17261 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17531) /lib64/libm.so.6: version `GLIBC_2.27' not found
Net Zhang created ARROW-17531: - Summary: /lib64/libm.so.6: version `GLIBC_2.27' not found Key: ARROW-17531 URL: https://issues.apache.org/jira/browse/ARROW-17531 Project: Apache Arrow Issue Type: Bug Environment: R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu) Reporter: Net Zhang Hi, I've followed the [instructions |https://cran.r-project.org/web/packages/arrow/vignettes/install.html]to install the arrow R package on a Linux machine. ``` options( HTTPUserAgent = sprintf( "R/%s R (%s)", getRversion(), paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) ) ) install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) ``` The installation was successful but when I load the library I received error message indicating ``` /lib64/libm.so.6: version `GLIBC_2.27' not found ``` Here's my full log, containing machine information ``` > HTTPUserAgent = + sprintf( + "R/%s R (%s)", + getRversion(), + paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"]) + ) > HTTPUserAgent [1] "R/4.0.2 R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)" > install.packages("arrow", repos = > "https://packagemanager.rstudio.com/all/__linux__/focal/latest;) Installing package into ‘/users/PZS1008/netzhang/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_9.0.0.tar.gz' Content type 'binary/octet-stream' length 34655538 bytes (33.1 MB) == downloaded 33.1 MB * installing *binary* package ‘arrow’ ... * DONE (arrow) The downloaded source packages are in ‘/tmp/RtmpUfdX4s/downloaded_packages’ > library(arrow) Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so': /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /users/xx/R/x86_64-pc-linux-gnu-library/4.0/arrow/libs/arrow.so) In addition: Warning message: package ‘arrow’ was built under R version 4.0.5 ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17530) [Java] VectorSchemaRoot#addVector() cannot add a vector to the end of the current vector collection
Larry White created ARROW-17530: --- Summary: [Java] VectorSchemaRoot#addVector() cannot add a vector to the end of the current vector collection Key: ARROW-17530 URL: https://issues.apache.org/jira/browse/ARROW-17530 Project: Apache Arrow Issue Type: Improvement Components: Java Affects Versions: 9.0.0, 9.0.1 Reporter: Larry White Assignee: Larry White The current implementation of Java VectorSchemaRoot cannot add a vector at the end of the current list (which is the generally understood meaning of "add"). The Precondition check in the method's second line prevents providing an appropriate index for adding at the end: {code:java} public VectorSchemaRoot addVector(int index, FieldVector vector) { Preconditions.checkNotNull(vector); Preconditions.checkArgument(index >= 0 && index < fieldVectors.size()); List newVectors = new ArrayList<>(); for (int i = 0; i < fieldVectors.size(); i++) { if (i == index) { newVectors.add(vector); } newVectors.add(fieldVectors.get(i)); } return new VectorSchemaRoot(newVectors); } {code} One possible implementation resolving the issue is shown below. {code:java} public VectorSchemaRoot addVector(int index, FieldVector vector) { Preconditions.checkNotNull(vector); Preconditions.checkArgument(index >= 0 && index <= fieldVectors.size()); List newVectors = new ArrayList<>(); if (index == fieldVectors.size()) { newVectors.addAll(fieldVectors); newVectors.add(vector); } else { for (int i = 0; i < fieldVectors.size(); i++) { if (i == index) { newVectors.add(vector); } newVectors.add(fieldVectors.get(i)); } } return new VectorSchemaRoot(newVectors); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17516) [C++] Concatenate implementation for RLE
[ https://issues.apache.org/jira/browse/ARROW-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17516: --- Labels: pull-request-available (was: ) > [C++] Concatenate implementation for RLE > > > Key: ARROW-17516 > URL: https://issues.apache.org/jira/browse/ARROW-17516 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Tobias Zagorni >Assignee: Tobias Zagorni >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > builds on top of ARROW-17419 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17529) Clean up how the CSV reader handles the first buffer
Ziheng Wang created ARROW-17529: --- Summary: Clean up how the CSV reader handles the first buffer Key: ARROW-17529 URL: https://issues.apache.org/jira/browse/ARROW-17529 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Ziheng Wang Assignee: Ziheng Wang Currently how the CSV reader handles the first block in the CSV is not great. In fact I think the first block is read multiple times. First in the Peek in file_csv.cc and then in the InitFromBlock in the OpenReaderAsync in reader.cc This could be problematic if the first block is pretty big, and also delays the synchronous opening of a dataset. Possible solution is to use a smaller block size for the peek in file_csv.cc since you don't need to read the entire block to GetConvertOptions. So we could really just have another option in reader_options that's first_peek_size or something like that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17528) [R] Tidy up the pkgdown articles site
Nicola Crane created ARROW-17528: Summary: [R] Tidy up the pkgdown articles site Key: ARROW-17528 URL: https://issues.apache.org/jira/browse/ARROW-17528 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Nicola Crane We could better organise the different articles we have to make it easier for users to find the right info -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17528) [R] Tidy up the pkgdown articles site index
[ https://issues.apache.org/jira/browse/ARROW-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Crane updated ARROW-17528: - Summary: [R] Tidy up the pkgdown articles site index (was: [R] Tidy up the pkgdown articles site ) > [R] Tidy up the pkgdown articles site index > > > Key: ARROW-17528 > URL: https://issues.apache.org/jira/browse/ARROW-17528 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Nicola Crane >Priority: Major > > We could better organise the different articles we have to make it easier for > users to find the right info -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17258) [C++] Handling of array-only types using VisitTypeInline
[ https://issues.apache.org/jira/browse/ARROW-17258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tobias Zagorni updated ARROW-17258: --- Summary: [C++] Handling of array-only types using VisitTypeInline (was: [C++] Separate VisitTypeInline for types that can exist as a Scalar) > [C++] Handling of array-only types using VisitTypeInline > > > Key: ARROW-17258 > URL: https://issues.apache.org/jira/browse/ARROW-17258 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Tobias Zagorni >Assignee: Tobias Zagorni >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17432) [R] messed up rows when importing large csv into parquet
[ https://issues.apache.org/jira/browse/ARROW-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584903#comment-17584903 ] SHIMA Tatsuya commented on ARROW-17432: --- Hi, how about passing the schema to the {{col_types}} argument? {code:r} csv_stream <- open_dataset(csv_file, format = "csv", col_types = sch) {code} Or, using {{readr::read_csv()}}? I also wonder if the number of rows in the dataset fetched is the same in all cases. > [R] messed up rows when importing large csv into parquet > > > Key: ARROW-17432 > URL: https://issues.apache.org/jira/browse/ARROW-17432 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 8.0.0, 9.0.0 > Environment: R version 4.2.1 > Running in Arch Linux - EndeavourOS > arrow_info() > Arrow package version: 9.0.0 > Capabilities: > > datasetTRUE > substrait FALSE > parquetTRUE > json TRUE > s3 TRUE > gcsTRUE > utf8proc TRUE > re2TRUE > snappy TRUE > gzip TRUE > brotli TRUE > zstd TRUE > lz4TRUE > lz4_frame TRUE > lzo FALSE > bz2TRUE > jemalloc TRUE > mimalloc TRUE > Memory: > > Allocator jemalloc > Current 49.31 Kb > Max1.63 Mb > Runtime: > > SIMD Level avx2 > Detected SIMD Level avx2 > Build: > > C++ Library Version 9.0.0 > C++ Compiler GNU > C++ Compiler Version 7.5.0 > > print(pa.__version__) > 9.0.0 >Reporter: Guillermo Duran >Priority: Major > > This is a weird issue that creates new rows when importing a large csv (56 > GB) into parquet in R. It occurred with both R Arrow 8.0.0 and 9.0.0 BUT > didn't occur with the Python Arrow library 9.0.0. Due to the large size of > the original csv it's difficult to create a reproducible example, but I share > the code and outputs. > The code I use in R to import the csv: > {code:java} > library(arrow) > library(dplyr) > > csv_file <- "/ebird_erd2021/full/obs.csv" > dest <- "/ebird_erd2021/full/obs_parquet/" > sch = arrow::schema(checklist_id = float32(), > species_code = string(), > exotic_category = float32(), > obs_count = float32(), > only_presence_reported = float32(), > only_slash_reported = float32(), > valid = float32(), > reviewed = float32(), > has_media = float32() > ) > csv_stream <- open_dataset(csv_file, format = "csv", > schema = sch, skip_rows = 1) > write_dataset(csv_stream, dest, format = "parquet", > max_rows_per_file=100L, > hive_style = TRUE, > existing_data_behavior = "overwrite"){code} > When I load the dataset and check one random _checklist_id_ I get rows that > are not part of the _obs.csv_ file. There shouldn't be duplicated species in > a checklist but there are ({_}amerob{_} for example)... also note that the > duplicated species have different {_}obs_count{_}. 50 species in total in > that specific {_}checklist_id{_}. > {code:java} > parquet_arrow <- open_dataset(dest, format = "parquet") > parquet_arrow |> > filter(checklist_id == 18543372) |> > arrange(species_code) |> > collect() > # A tibble: 50 × 3 >checklist_id species_code obs_count > > 1 18543372 altori 3 > 2 18543372 amekes 1 > 3 18543372 amered 40 > 4 18543372 amerob 30 > 5 18543372 amerob 9 > 6 18543372 balori 9 > 7 18543372 blkter 9 > 8 18543372 blkvul 20 > 9 18543372 buggna 1 > 10 18543372 buwwar 1 > # … with 40 more rows > # ℹ Use `print(n = ...)` to see more rows{code} > If I use awk to query the csv file with that same checklist id, I get > something different: > {code:java} > $ awk -F "," '{ if ($1 == 18543372) { print } }' obs.csv > 18543372.0,rewbla,,60.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,amerob,,30.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,robgro,,2.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,eastow,,1.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,sedwre1,,2.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,ovenbi1,,1.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,buggna,,1.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,reshaw,,1.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,turvul,,1.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,gowwar,,1.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,balori,,9.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,buwwar,,1.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,grycat,,1.0,0.0,0.0,1.0,0.0,0.0 > 18543372.0,cangoo,,6.0,0.0,0.0,1.0,0.0,0.0 >
[jira] [Updated] (ARROW-17527) [Go] Implement Cast to Boolean Functions
[ https://issues.apache.org/jira/browse/ARROW-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Topol updated ARROW-17527: -- Summary: [Go] Implement Cast to Boolean Functions (was: [Go] Implement Cast Functions) > [Go] Implement Cast to Boolean Functions > > > Key: ARROW-17527 > URL: https://issues.apache.org/jira/browse/ARROW-17527 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Matthew Topol >Assignee: Matthew Topol >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17526) [R] [Docs] Improve (or really actually document) our Python bridge documentation
[ https://issues.apache.org/jira/browse/ARROW-17526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584896#comment-17584896 ] Nicola Crane commented on ARROW-17526: -- [~jonkeane] Mind opening a cookbook issue too? We can just swap out the scalar/array content for tables as it's a much more compelling use case. > [R] [Docs] Improve (or really actually document) our Python bridge > documentation > - > > Key: ARROW-17526 > URL: https://issues.apache.org/jira/browse/ARROW-17526 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, R >Reporter: Jonathan Keane >Priority: Major > > https://twitter.com/jonkeane/status/1560016227824721920?s=20=g2MhdOOJbh0q0MpxPI4R_Q > When I wrote this, I wished there was a one-page I could show passing a table > or recordbatchreader back and forth. > https://arrow.apache.org/cookbook/r/using-pyarrow-from-r.html#introduction-4 > also has some details, but is more focused on scalars and arrays than tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17527) [Go] Implement Cast Functions
Matthew Topol created ARROW-17527: - Summary: [Go] Implement Cast Functions Key: ARROW-17527 URL: https://issues.apache.org/jira/browse/ARROW-17527 Project: Apache Arrow Issue Type: Sub-task Components: Go Reporter: Matthew Topol Assignee: Matthew Topol -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17455) [Go] Implement Initial Function and Kernel architecture
[ https://issues.apache.org/jira/browse/ARROW-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Topol resolved ARROW-17455. --- Fix Version/s: 10.0.0 Resolution: Fixed Issue resolved by pull request 13964 [https://github.com/apache/arrow/pull/13964] > [Go] Implement Initial Function and Kernel architecture > --- > > Key: ARROW-17455 > URL: https://issues.apache.org/jira/browse/ARROW-17455 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Matthew Topol >Assignee: Matthew Topol >Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17526) [R] [Docs] Improve (or really actually document) our Python bridge documentation
Jonathan Keane created ARROW-17526: -- Summary: [R] [Docs] Improve (or really actually document) our Python bridge documentation Key: ARROW-17526 URL: https://issues.apache.org/jira/browse/ARROW-17526 Project: Apache Arrow Issue Type: Improvement Components: Documentation, R Reporter: Jonathan Keane https://twitter.com/jonkeane/status/1560016227824721920?s=20=g2MhdOOJbh0q0MpxPI4R_Q When I wrote this, I wished there was a one-page I could show passing a table or recordbatchreader back and forth. https://arrow.apache.org/cookbook/r/using-pyarrow-from-r.html#introduction-4 also has some details, but is more focused on scalars and arrays than tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17458) [C++] CSV Writer: Unsupported cast from decimal to utf8
[ https://issues.apache.org/jira/browse/ARROW-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-17458: --- Fix Version/s: 10.0.0 > [C++] CSV Writer: Unsupported cast from decimal to utf8 > > > Key: ARROW-17458 > URL: https://issues.apache.org/jira/browse/ARROW-17458 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 6.0.1 >Reporter: Pavel Kovalenko >Priority: Critical > Labels: csv, decimal, good-first-issue, good-second-issue, > unsupported > Fix For: 10.0.0 > > > The following code snippet fails with an Unsupported cast error if a table > has a decimal column. > {code:cpp} > std::shared_ptr table; > ARROW_CHECK_OK(reader->ReadAll()); > std::shared_ptr output = > arrow::io::FileOutputStream::Open(csvPath).ValueOrDie(); > auto writeOptions = arrow::csv::WriteOptions::Defaults(); > writeOptions.include_header = false; > auto status = arrow::csv::WriteCSV(*table, writeOptions, output.get()); > if (!status.ok()) { > SETHROW_ERROR(std::runtime_error, "Couldn't write table csv: {}", > status.message()); > } > {code} > {code:cpp} > Unsupported cast from decimal128(7, 2) to utf8 using function cast_string > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17458) [C++] CSV Writer: Unsupported cast from decimal to utf8
[ https://issues.apache.org/jira/browse/ARROW-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-17458: --- Priority: Critical (was: Major) > [C++] CSV Writer: Unsupported cast from decimal to utf8 > > > Key: ARROW-17458 > URL: https://issues.apache.org/jira/browse/ARROW-17458 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 6.0.1 >Reporter: Pavel Kovalenko >Priority: Critical > Labels: csv, decimal, good-first-issue, good-second-issue, > unsupported > > The following code snippet fails with an Unsupported cast error if a table > has a decimal column. > {code:cpp} > std::shared_ptr table; > ARROW_CHECK_OK(reader->ReadAll()); > std::shared_ptr output = > arrow::io::FileOutputStream::Open(csvPath).ValueOrDie(); > auto writeOptions = arrow::csv::WriteOptions::Defaults(); > writeOptions.include_header = false; > auto status = arrow::csv::WriteCSV(*table, writeOptions, output.get()); > if (!status.ok()) { > SETHROW_ERROR(std::runtime_error, "Couldn't write table csv: {}", > status.message()); > } > {code} > {code:cpp} > Unsupported cast from decimal128(7, 2) to utf8 using function cast_string > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17525) [Java] Read ORC files using org.apache.arrow.dataset.jni.NativeDatasetFactory
[ https://issues.apache.org/jira/browse/ARROW-17525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17525: --- Labels: pull-request-available (was: ) > [Java] Read ORC files using org.apache.arrow.dataset.jni.NativeDatasetFactory > -- > > Key: ARROW-17525 > URL: https://issues.apache.org/jira/browse/ARROW-17525 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Affects Versions: 9.0.0 >Reporter: Igor Suhorukov >Assignee: Igor Suhorukov >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Support ORC file format in java Dataset API -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17525) [Java] Read ORC files using org.apache.arrow.dataset.jni.NativeDatasetFactory
Igor Suhorukov created ARROW-17525: -- Summary: [Java] Read ORC files using org.apache.arrow.dataset.jni.NativeDatasetFactory Key: ARROW-17525 URL: https://issues.apache.org/jira/browse/ARROW-17525 Project: Apache Arrow Issue Type: Improvement Components: Java Affects Versions: 9.0.0 Reporter: Igor Suhorukov Assignee: Igor Suhorukov Support ORC file format in java Dataset API -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17519) [R] RTools35 job is failing
[ https://issues.apache.org/jira/browse/ARROW-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584876#comment-17584876 ] Dewey Dunnington commented on ARROW-17519: -- Sure! https://lists.apache.org/thread/h9v83rwdl015z2j6s8zwdr1qp4svb5j8 > [R] RTools35 job is failing > --- > > Key: ARROW-17519 > URL: https://issues.apache.org/jira/browse/ARROW-17519 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Dewey Dunnington >Priority: Major > > After ARROW-17436, the RTools35 job is consistently failing with: > {noformat} > Error: Error: package or namespace load failed for 'arrow' in inDL(x, > as.logical(local), as.logical(now), ...): > unable to load shared object > 'D:/a/arrow/arrow/r/check/arrow.Rcheck/00LOCK-arrow/00new/arrow/libs/i386/arrow.dll': > LoadLibrary failure: A dynamic link library (DLL) initialization routine > failed. > {noformat} > Given that there is a mailing list discussion about dropping support for that > platform, should we disable the check? Or wait until that is resolved to > disable the check? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17524) The ORC reader method ReadStripe does not work when we specify fields to selected as a list of integers
[ https://issues.apache.org/jira/browse/ARROW-17524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17524: --- Labels: pull-request-available (was: ) > The ORC reader method ReadStripe does not work when we specify fields to > selected as a list of integers > --- > > Key: ARROW-17524 > URL: https://issues.apache.org/jira/browse/ARROW-17524 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 8.0.1 >Reporter: Louis Calot >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I think there is a bug in the ORC reader : when we specify the fields indexes > that we want to keep, it does not work correctly. Looking at the code, it > seems to be because we do "includeTypes" in lieue of "include" when setting > the ORC options. > It can be problematic when we want to import an ORC table containing Union > types as it will do an error at the import, even if we try not to import > these specific fields. > The definitions of the corresponding ORC methods are here : > [https://github.com/apache/orc/blob/72220851cbde164a22706f8d47741fd1ad3db190/c%2B%2B/src/Options.hh#L185-L191] > and > [https://github.com/apache/orc/blob/72220851cbde164a22706f8d47741fd1ad3db190/c%2B%2B/src/Options.hh#L201-L207] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17524) The ORC reader method ReadStripe does not work when we specify fields to selected as a list of integers
Louis Calot created ARROW-17524: --- Summary: The ORC reader method ReadStripe does not work when we specify fields to selected as a list of integers Key: ARROW-17524 URL: https://issues.apache.org/jira/browse/ARROW-17524 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 8.0.1 Reporter: Louis Calot I think there is a bug in the ORC reader : when we specify the fields indexes that we want to keep, it does not work correctly. Looking at the code, it seems to be because we do "includeTypes" in lieue of "include" when setting the ORC options. It can be problematic when we want to import an ORC table containing Union types as it will do an error at the import, even if we try not to import these specific fields. The definitions of the corresponding ORC methods are here : [https://github.com/apache/orc/blob/72220851cbde164a22706f8d47741fd1ad3db190/c%2B%2B/src/Options.hh#L185-L191] and [https://github.com/apache/orc/blob/72220851cbde164a22706f8d47741fd1ad3db190/c%2B%2B/src/Options.hh#L201-L207] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17458) [C++] CSV Writer: Unsupported cast from decimal to utf8
[ https://issues.apache.org/jira/browse/ARROW-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Li updated ARROW-17458: - Labels: csv decimal good-first-issue good-second-issue unsupported (was: csv decimal unsupported) > [C++] CSV Writer: Unsupported cast from decimal to utf8 > > > Key: ARROW-17458 > URL: https://issues.apache.org/jira/browse/ARROW-17458 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 6.0.1 >Reporter: Pavel Kovalenko >Priority: Major > Labels: csv, decimal, good-first-issue, good-second-issue, > unsupported > > The following code snippet fails with an Unsupported cast error if a table > has a decimal column. > {code:cpp} > std::shared_ptr table; > ARROW_CHECK_OK(reader->ReadAll()); > std::shared_ptr output = > arrow::io::FileOutputStream::Open(csvPath).ValueOrDie(); > auto writeOptions = arrow::csv::WriteOptions::Defaults(); > writeOptions.include_header = false; > auto status = arrow::csv::WriteCSV(*table, writeOptions, output.get()); > if (!status.ok()) { > SETHROW_ERROR(std::runtime_error, "Couldn't write table csv: {}", > status.message()); > } > {code} > {code:cpp} > Unsupported cast from decimal128(7, 2) to utf8 using function cast_string > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17458) [C++] CSV Writer: Unsupported cast from decimal to utf8
[ https://issues.apache.org/jira/browse/ARROW-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584853#comment-17584853 ] Jonathan Keane commented on ARROW-17458: We ran into this issue today as well, working on conversions for benchmarking datasets > [C++] CSV Writer: Unsupported cast from decimal to utf8 > > > Key: ARROW-17458 > URL: https://issues.apache.org/jira/browse/ARROW-17458 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 6.0.1 >Reporter: Pavel Kovalenko >Priority: Major > Labels: csv, decimal, unsupported > > The following code snippet fails with an Unsupported cast error if a table > has a decimal column. > {code:cpp} > std::shared_ptr table; > ARROW_CHECK_OK(reader->ReadAll()); > std::shared_ptr output = > arrow::io::FileOutputStream::Open(csvPath).ValueOrDie(); > auto writeOptions = arrow::csv::WriteOptions::Defaults(); > writeOptions.include_header = false; > auto status = arrow::csv::WriteCSV(*table, writeOptions, output.get()); > if (!status.ok()) { > SETHROW_ERROR(std::runtime_error, "Couldn't write table csv: {}", > status.message()); > } > {code} > {code:cpp} > Unsupported cast from decimal128(7, 2) to utf8 using function cast_string > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17262) [C++] Kernel input type matcher for RLE
[ https://issues.apache.org/jira/browse/ARROW-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tobias Zagorni updated ARROW-17262: --- Description: Builds on top of ARROW-17261 > [C++] Kernel input type matcher for RLE > --- > > Key: ARROW-17262 > URL: https://issues.apache.org/jira/browse/ARROW-17262 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Tobias Zagorni >Assignee: Tobias Zagorni >Priority: Major > > Builds on top of ARROW-17261 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17263) [C++] Utility functions for working with RLE
[ https://issues.apache.org/jira/browse/ARROW-17263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tobias Zagorni updated ARROW-17263: --- Description: based on top of ARROW-17261 > [C++] Utility functions for working with RLE > > > Key: ARROW-17263 > URL: https://issues.apache.org/jira/browse/ARROW-17263 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Tobias Zagorni >Assignee: Tobias Zagorni >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > based on top of ARROW-17261 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17261) [C++] Add type ID, Type and Array classes for RLE
[ https://issues.apache.org/jira/browse/ARROW-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tobias Zagorni updated ARROW-17261: --- Description: Based on top of ARROW-17258 Mostly picking these parts from ARROW-16772 and ARROW-16781 to create an easier order to merge things was:Mostly picking these parts from ARROW-16772 and ARROW-16781 to create an easier order to merge things > [C++] Add type ID, Type and Array classes for RLE > - > > Key: ARROW-17261 > URL: https://issues.apache.org/jira/browse/ARROW-17261 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Tobias Zagorni >Assignee: Tobias Zagorni >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Based on top of ARROW-17258 > Mostly picking these parts from ARROW-16772 and ARROW-16781 to create an > easier order to merge things -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17459) [C++] Support nested data conversions for chunked array
[ https://issues.apache.org/jira/browse/ARROW-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584808#comment-17584808 ] Arthur Passos commented on ARROW-17459: --- I am also trying to write test to cover this case, but failing to do so. For some reason, the files I generate with the very same schema and size don't get chunked while reading it. The original file was provided by a customer and it's confidential data, so it can't be used. All the files I generated contain the above mentioned schema. The differences are in the data length. Some had maps of 50~300 elements with keys of random strings of 20~50 characters and values of random strings of 50~5000 characters. I also tried a low cardinality example and a large string example (2^30 characters). I'd be very thankful if someone could give me some tips on how to generate a file that will trigger the exception. > [C++] Support nested data conversions for chunked array > --- > > Key: ARROW-17459 > URL: https://issues.apache.org/jira/browse/ARROW-17459 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Arthur Passos >Priority: Blocker > > `FileReaderImpl::ReadRowGroup` fails with "Nested data conversions not > implemented for chunked array outputs". It fails on > [ChunksToSingle]([https://github.com/apache/arrow/blob/7f6b074b84b1ca519b7c5fc7da318e8d47d44278/cpp/src/parquet/arrow/reader.cc#L95]) > Data schema is: > {code:java} > optional group fields_map (MAP) = 217 { > repeated group key_value { > required binary key (STRING) = 218; > optional binary value (STRING) = 219; > } > } > fields_map.key_value.value-> Size In Bytes: 13243589 Size In Ratio: 0.20541047 > fields_map.key_value.key-> Size In Bytes: 3008860 Size In Ratio: 0.046667963 > {code} > Is there a way to work around this issue in the cpp lib? > In any case, I am willing to implement this, but I need some guidance. I am > very new to parquet (as in started reading about it yesterday). > > Probably related to: https://issues.apache.org/jira/browse/ARROW-10958 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17327) [Python] Parquet should be listed in PyArrow's get_libraries() function
[ https://issues.apache.org/jira/browse/ARROW-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584795#comment-17584795 ] Antoine Pitrou commented on ARROW-17327: [~willjones127] {{get_libraries()}} is tested in {{test_cython.py}}. I wonder what is different here that requires adding {{parquet}} while the tests generally run fine. > [Python] Parquet should be listed in PyArrow's get_libraries() function > --- > > Key: ARROW-17327 > URL: https://issues.apache.org/jira/browse/ARROW-17327 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Steven Silvester >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > We are updating {{PyMongoArrow}} to use PyArrow 8.0, and saw the following > [failure| > https://github.com/mongodb-labs/mongo-arrow/runs/7696619223?check_suite_focus=true] > when building wheels: "@rpath/libparquet.800.dylib not found". > We overcame the error by explicitly adding "parquet" to the list of libraries > returned by {{get_libraries}}. I am happy to submit a PR. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17523) [C++] Support more substrait function
Jin Chengcheng created ARROW-17523: -- Summary: [C++] Support more substrait function Key: ARROW-17523 URL: https://issues.apache.org/jira/browse/ARROW-17523 Project: Apache Arrow Issue Type: Improvement Affects Versions: 10.0.0 Reporter: Jin Chengcheng Assignee: Jin Chengcheng support is_null, is_not_null, count function -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17518) [CI][Docs][Python] Development version is not correctly detected from git
[ https://issues.apache.org/jira/browse/ARROW-17518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-17518. Fix Version/s: 10.0.0 Resolution: Fixed Issue resolved by pull request 13966 [https://github.com/apache/arrow/pull/13966] > [CI][Docs][Python] Development version is not correctly detected from git > - > > Key: ARROW-17518 > URL: https://issues.apache.org/jira/browse/ARROW-17518 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Documentation, Python >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Attachments: image-2022-08-24-18-32-00-888.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The current glob used on our git commands to detect the development version > is not correct and can be seen on the published docs: > !image-2022-08-24-18-32-00-888.png! > Reproduced on bash: > {code:java} > $ git describe --dirty --tags --long > apache-arrow-10.0.0.dev-113-g28b81ec-dirty > $ git describe --dirty --tags --long --match "apache-arrow-[0-9].*" > apache-arrow-9.0.0.dev-640-g28b81ec-dirty {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-15277) [Python] Use Make to create ChunkedArray and remove checks
[ https://issues.apache.org/jira/browse/ARROW-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-15277. Fix Version/s: 10.0.0 Resolution: Fixed Issue resolved by pull request 13950 [https://github.com/apache/arrow/pull/13950] > [Python] Use Make to create ChunkedArray and remove checks > -- > > Key: ARROW-15277 > URL: https://issues.apache.org/jira/browse/ARROW-15277 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Eduardo Ponce >Assignee: Miles Granger >Priority: Minor > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > In PyArrow, the {{ChunkedArray}} constructor function validates the input > {{Arrays}} in terms of omitted type and same types, but these checks are > already made in the underlying C++ via {{ChunkedArray::Make}}. Need to expose > the {{Make()}} to use it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17495) [R] arrow_eval: do we need both nse_funcs and .cache$functions?
[ https://issues.apache.org/jira/browse/ARROW-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584703#comment-17584703 ] Dragoș Moldovan-Grünfeld commented on ARROW-17495: -- I can look into it (maybe after I finish with the capstone). As far as I can tell, both {{.cache$functions}} and {{nse_funcs}} are created at load time. {{cache$functions}} is {{nse_funcs}} + Arrow Compute functions (prefixed with {{{}arrow_{}}}). I bumped into this while trying to register / translate a user-defined function with pre-existing bindings. I needed to update either {{cache.functions}} or {{nse_funcs}} - we can update the former via {{{}update_cache = TRUE{}}}, but then I had to change {{call_binding()}} to fetch from the updated {{cache}} and not from {{{}nse_funcs{}}}. This led me to think that folks might be confused by these 2 objects that overlap by quite a bit and are in some situations interchangeable (mostly nse_funcs can be replaced by {{cache$functions}} which includes it). > [R] arrow_eval: do we need both nse_funcs and .cache$functions? > --- > > Key: ARROW-17495 > URL: https://issues.apache.org/jira/browse/ARROW-17495 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Affects Versions: 9.0.0 >Reporter: Dragoș Moldovan-Grünfeld >Priority: Minor > > Currently we have 2 copies of the same information, once in {{nse_funcs}} and > once in {{{}.cache$functions{}}}. I wasn't able to figure out the reason for > this. Maybe I am missing something or maybe this is just legacy code that we > can update. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17433) [C++] AppVeyor build fails due to Boost/S3
[ https://issues.apache.org/jira/browse/ARROW-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-17433. -- Fix Version/s: 10.0.0 Resolution: Fixed Issue resolved by pull request 13903 [https://github.com/apache/arrow/pull/13903] > [C++] AppVeyor build fails due to Boost/S3 > -- > > Key: ARROW-17433 > URL: https://issues.apache.org/jira/browse/ARROW-17433 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: David Li >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Observed on master > {noformat} > [182/351] Building CXX object > src\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx.obj > FAILED: > src/arrow/filesystem/CMakeFiles/arrow-s3fs-test.dir/Unity/unity_0_cxx.cxx.obj > C:\Miniconda37-x64\Scripts\clcache.exe /nologo /TP -DARROW_HAVE_RUNTIME_AVX2 > -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 > -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_MIMALLOC > -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 > -DARROW_WITH_SNAPPY -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD > -DAWS_CAL_USE_IMPORT_EXPORT -DAWS_CHECKSUMS_USE_IMPORT_EXPORT > -DAWS_COMMON_USE_IMPORT_EXPORT -DAWS_EVENT_STREAM_USE_IMPORT_EXPORT > -DAWS_IO_USE_IMPORT_EXPORT -DAWS_SDK_VERSION_MAJOR=1 > -DAWS_SDK_VERSION_MINOR=8 -DAWS_SDK_VERSION_PATCH=186 > -DAWS_USE_IO_COMPLETION_PORTS -DBOOST_ALL_DYN_LINK -DBOOST_ALL_NO_LIB > -DBOOST_ATOMIC_DYN_LINK -DBOOST_ATOMIC_NO_LIB -DBOOST_FILESYSTEM_DYN_LINK > -DBOOST_FILESYSTEM_NO_LIB -DBOOST_SYSTEM_DYN_LINK -DBOOST_SYSTEM_NO_LIB > -DPROTOBUF_USE_DLLS -DURI_STATIC_BUILD -DUSE_IMPORT_EXPORT > -DUSE_IMPORT_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS -D_CRT_SECURE_NO_WARNINGS > -D_ENABLE_EXTENDED_ALIGNED_STORAGE -IC:\projects\arrow\cpp\build\src > -IC:\projects\arrow\cpp\src -IC:\projects\arrow\cpp\src\generated > -IC:\projects\arrow\cpp\thirdparty\flatbuffers\include > -IC:\Miniconda37-x64\envs\arrow\Library\include > -IC:\projects\arrow\cpp\thirdparty\hadoop\include > -IC:\projects\arrow\cpp\build\mimalloc_ep\src\mimalloc_ep\include\mimalloc-2.0 > /DWIN32 /D_WINDOWS /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING > /EHsc /wd5105 /bigobj /utf-8 /W3 /wd4800 /wd4996 /wd4065 /WX /MP /MD /Od > /UNDEBUG /showIncludes > /Fosrc\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx.obj > /Fdsrc\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\ /FS -c > C:\projects\arrow\cpp\build\src\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx > Please define _WIN32_WINNT or _WIN32_WINDOWS appropriately. For example: > - add -D_WIN32_WINNT=0x0601 to the compiler command line; or > - add _WIN32_WINNT=0x0601 to your project's Preprocessor Definitions. > Assuming _WIN32_WINNT=0x0601 (i.e. Windows 7 target). > C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(266): > error C2220: warning treated as error - no 'object' file generated > C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(261): > note: while compiling class template member function > 'boost::iterators::transform_iterator>,Char > > **,boost::process::detail::entry>,boost::process::detail::entry>> > > boost::process::basic_environment_impl::find(const > std::basic_string,std::allocator> &)' > with > [ > Char=char > ] > C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(361): > note: see reference to function template instantiation > 'boost::iterators::transform_iterator>,Char > > **,boost::process::detail::entry>,boost::process::detail::entry>> > > boost::process::basic_environment_impl::find(const > std::basic_string,std::allocator> &)' > being compiled > with > [ > Char=char > ] > C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(632): > note: see reference to class template instantiation > 'boost::process::basic_environment_impl' > being compiled > with > [ > Char=char > ] > C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/env.hpp(176): > note: see reference to class template instantiation > 'boost::process::basic_environment' being compiled > C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/env.hpp(183): > note: see reference to class template instantiation > 'boost::process::detail::env_init' being compiled > C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/relationship.hpp(595): > note: see reference to class template instantiation >