[DISCUSS] [Rust] Deleting the legacy query execution code
I have been working on ARROW-5227 [1] for quite a while now (with help from some other contributors) and the new trait-based physical query plan has now reached feature parity with the previous implementation of query execution which directly executed the logical plan enum and therefore could not be extended by other projects and also didn’t support parallel execution. I would now like to delete the original query execution code which would reduce the code base by approximately 2,500 lines of code and remove a lot of code duplication. I have a WIP PR for this [2]. I am raising this for discussion here because there is an argument for just deprecating this code for 1.0.0 and then removing in a later release, but I would prefer to delete it prior to the 1.0.0 release to reduce confusion and maintenance overhead. Does anybody have any objections to this? [1] https://issues.apache.org/jira/browse/ARROW-5227 [2] https://github.com/apache/arrow/pull/5583
Re: [DISCUSS] Result vs Status
> > > > > It was my impression that we had workable solutions for using Result in > at > > least Python and Glib/Ruby (I'm don't know about R). > > In Python we do (though it needed a C++-side helper). > OK, so what could more context be provided on: > From the discussion in the sync call, it seems reasonable to require that: > Public APIs which are likely to be directly wrapped in a binding should not > use Result<> to the exclusion of Status. An equivalent Status API should > always be provided for ease of binding. Thanks, Micah On Thu, Oct 3, 2019 at 3:42 AM Antoine Pitrou wrote: > > Le 03/10/2019 à 06:13, Micah Kornfield a écrit : > > > > It was my impression that we had workable solutions for using Result in > at > > least Python and Glib/Ruby (I'm don't know about R). > > In Python we do (though it needed a C++-side helper). > > Regards > > Antoine. >
Re: [jira] [Created] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
Very recently i had the pleasure to install arrow on Linux. At this stage let me first remark that without the help of @xhochy and @kou I certainly would have failed. I have now managed to install(? still quite a lot of warning messages) in a rocker container. I have published the docker-image here: https://hub.docker.com/r/tschm/rocker-arrow Maybe one of the experts could fix and/or improve it? Many thanks Thomas On Fri, 4 Oct 2019 at 20:07, Neal Richardson (Jira) wrote: > Neal Richardson created ARROW-6793: > -- > > Summary: [R] Arrow C++ binary packaging for Linux > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Neal Richardson > Assignee: Neal Richardson > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, > you get a shell that tells you to install the C++ library. That was a > useful approach to allow us to get the package on CRAN, which makes it easy > for macOS and Windows users to install, but it doesn't improve the > installation experience for Linux users. This is an impediment to adoption > of arrow not only by users but also by package maintainers who might want > to depend on arrow. > > macOS and Windows have a better experience because at installation time, > the configure scripts download and statically link a prebuilt C++ library. > CRAN bundles the whole thing up and delivers that as a binary R package. > > Python wheels do a similar thing: they're binaries that contain all > external dependencies. And there are pyarrow wheels for Linux. This > suggests that we could do something similar for R: build a generic Linux > binary of the C++ library and download it in the R package configure script > at install time. > > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, > but it turned out that the "manylinux2010" standard is too archaic to work > with contemporary Rcpp. > > Proposal: do a similar workflow to what the manylinux2010 pyarrow build > does, just with slightly more modern compiler/settings. Publish that C++ > binary package to bintray. Then download it in the R configure script if a > local/system package isn't found. > > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid > everywhere and/or ensure the current fallback behavior when we encounter a > distro that this doesn't work for. If necessary, we can make multiple > flavors of this C++ binary for debian, centos, etc. > > > > -- > This message was sent by Atlassian Jira > (v8.3.4#803005) > -- Dr. Thomas Schmelzer *post: *Rue Louis-de-Savoie 60, 1110 Morges, Switzerland *mobile:* +41 786 928 942 *skype: *thomas.schmelzer
[jira] [Created] (ARROW-6795) [C#] Reading large Arrow files in C# results in an exception
Eric Erhardt created ARROW-6795: --- Summary: [C#] Reading large Arrow files in C# results in an exception Key: ARROW-6795 URL: https://issues.apache.org/jira/browse/ARROW-6795 Project: Apache Arrow Issue Type: Bug Components: C# Reporter: Eric Erhardt If you try to read a large Arrow file (2GB+) using the C# reader, you get an exception because it is casting the file position (a 64-bit long) to a 32-bit integer. When the file size is large See [https://github.com/apache/arrow/pull/5412] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Should Arrow adopt C++14 / 17?
On Fri, Oct 4, 2019 at 3:18 PM Zhuo Peng wrote: > > > > On 2019/10/04 19:43:04, Wes McKinney wrote: > > On Fri, Oct 4, 2019 at 12:45 PM Zhuo Peng wrote: > > > > > > > > > > > > On 2019/10/04 17:05:00, Antoine Pitrou wrote: > > > > > > > > Le 04/10/2019 à 19:01, Zhuo Peng a écrit : > > > > > > > > > > backports are cool for internal use, but probably not so if a public > > > > > API accepts it? (because you vendor the headers in (i.e. namespace, > > > > > symbol names unchanged), they might clash with headers that a client > > > > > uses). > > > > > > > > This is true unfortunately. > > > > > > > > >>> And btw, was -std=gnu++11 an intentional choice? what gnu > > > > >>> extensions does the library rely on? > > > > >> > > > > >> None, AFAIK. Arrow compiles on MSVC fine. Where is -std=gnu++11 > > > > >> added? > > > > > https://github.com/apache/arrow/blob/3129e3ed90219ecfffe2a25ce5820eec8cc947d0/cpp/cmake_modules/SetupCxxFlags.cmake#L33 > > > > > > > > > > https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_STANDARD.html > > > > > > > > Right, so this is a CMake decision. I think we require only plain C++11 > > > > (but we may enable additional features on some compilers, provided > > > > there's a fallback). > > > Extensions can be disabled through: > > > set(CMAKE_CXX_EXTENSIONS OFF) > > > > > > https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_EXTENSIONS.html > > > > > > Is that something more desirable than the current state? > > > > Yes, I think so, I don't think we need to be relying on GNU gcc > > extensions, but we should open a JIRA issue about disabling it in case > > some tests break because of something we didn't realize we were > > depending on. > sg. I'll create one then. > > > > As far as C++14/17 upgrading, it seems like it will be at least 2 > > years before we could upgrade to C++17 given the state of compiler > > support across the spectrum. Using C++17 would mean requiring at least > > VS 2017 on Windows, since at least in the Python world I think > > everything is on VS 2015. > > > > Are there ways we could create defines to switch between backports and > > STL things (like string_view, optional, etc.) so that developers using > > the Arrow library in a C++17 application can use the built-in types? > This is dangerous unless they build the Arrow library from source with C++17. > if libarrow takes arrow::string_view but the user gives it a > std::string_view, it's UB. > > If we are talking about allowing users to build Arrow with C++17 and support > transparently the new STL types in the public APIs, the ABSL[1] library could > be something to consider.. absl::{string_view,optional,variant} becomes their > std:: counterparts when compiled under C++17, e.g. [2]. > Yes, the presumption would be a monotoolchain environment, and Arrow would need to have some CMake options to build in C++17 mode > And inline namespaces are used [3] to make sure different libraries can > depend on different version of absl. > > [1] https://abseil.io/ > [2] > https://github.com/abseil/abseil-cpp/blob/25597bdfc148e91e27678ec30efa52f4fc8c164f/absl/strings/string_view.h#L38 > [3] > https://github.com/abseil/abseil-cpp/blob/aa844899c937bde5d2b24f276b59997e5b668bde/absl/strings/string_view.h#L38 > > > > > > > > > > Regards > > > > > > > > Antoine. > > > > > >
[jira] [Created] (ARROW-6794) [Release] dev/release/post-03-website.sh is out of date in a couple of ways
Wes McKinney created ARROW-6794: --- Summary: [Release] dev/release/post-03-website.sh is out of date in a couple of ways Key: ARROW-6794 URL: https://issues.apache.org/jira/browse/ARROW-6794 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Wes McKinney Fix For: 1.0.0 * Need to add APACHE_ prefix to environment variables * arrow-site repository is now separate -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Should Arrow adopt C++14 / 17?
On 2019/10/04 19:43:04, Wes McKinney wrote: > On Fri, Oct 4, 2019 at 12:45 PM Zhuo Peng wrote: > > > > > > > > On 2019/10/04 17:05:00, Antoine Pitrou wrote: > > > > > > Le 04/10/2019 à 19:01, Zhuo Peng a écrit : > > > > > > > > backports are cool for internal use, but probably not so if a public > > > > API accepts it? (because you vendor the headers in (i.e. namespace, > > > > symbol names unchanged), they might clash with headers that a client > > > > uses). > > > > > > This is true unfortunately. > > > > > > >>> And btw, was -std=gnu++11 an intentional choice? what gnu extensions > > > >>> does the library rely on? > > > >> > > > >> None, AFAIK. Arrow compiles on MSVC fine. Where is -std=gnu++11 > > > >> added? > > > > https://github.com/apache/arrow/blob/3129e3ed90219ecfffe2a25ce5820eec8cc947d0/cpp/cmake_modules/SetupCxxFlags.cmake#L33 > > > > > > > > https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_STANDARD.html > > > > > > Right, so this is a CMake decision. I think we require only plain C++11 > > > (but we may enable additional features on some compilers, provided > > > there's a fallback). > > Extensions can be disabled through: > > set(CMAKE_CXX_EXTENSIONS OFF) > > > > https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_EXTENSIONS.html > > > > Is that something more desirable than the current state? > > Yes, I think so, I don't think we need to be relying on GNU gcc > extensions, but we should open a JIRA issue about disabling it in case > some tests break because of something we didn't realize we were > depending on. sg. I'll create one then. > > As far as C++14/17 upgrading, it seems like it will be at least 2 > years before we could upgrade to C++17 given the state of compiler > support across the spectrum. Using C++17 would mean requiring at least > VS 2017 on Windows, since at least in the Python world I think > everything is on VS 2015. > > Are there ways we could create defines to switch between backports and > STL things (like string_view, optional, etc.) so that developers using > the Arrow library in a C++17 application can use the built-in types? This is dangerous unless they build the Arrow library from source with C++17. if libarrow takes arrow::string_view but the user gives it a std::string_view, it's UB. If we are talking about allowing users to build Arrow with C++17 and support transparently the new STL types in the public APIs, the ABSL[1] library could be something to consider.. absl::{string_view,optional,variant} becomes their std:: counterparts when compiled under C++17, e.g. [2]. And inline namespaces are used [3] to make sure different libraries can depend on different version of absl. [1] https://abseil.io/ [2] https://github.com/abseil/abseil-cpp/blob/25597bdfc148e91e27678ec30efa52f4fc8c164f/absl/strings/string_view.h#L38 [3] https://github.com/abseil/abseil-cpp/blob/aa844899c937bde5d2b24f276b59997e5b668bde/absl/strings/string_view.h#L38 > > > > > > > Regards > > > > > > Antoine. > > > >
Re: [Proposal]: Expose Flight gRPC for Dremio use case (Java)
I think for now we just want to expose the gRPC impl under a different namespace - FlightGrpcServer would expose FlightBindingService - FlightGrpcClient would expose FlightClient On Fri, Oct 4, 2019 at 11:48 AM David Li wrote: > Hi Rohit, > > This sounds interesting, and I think we've voiced support for > something similar before :) > > Given that Flight does want to abstract over the exact backends, > though, how should we approach this? Is the proposal to also refactor > Flight/Java such that the core classes are just interfaces (or > delegate to interfaces) that anyone can implement, and have the gRPC > implementation as the reference one? Or is this just proposing to > expose the gRPC implementation under a separate namespace, and leave > that question for later? > > Best, > David > > On 10/4/19, Rohit Gupta wrote: > > Hi, > > > > At dremio we are using gRPC for JobsService. One of the api's relies on > > Arrow Flight. We want access to the Flight service so we can bind it to > the > > same managed channel as the rest of JobsService (& not have a completely > > separate server). > > > > The approach would be to create a new module within the same package > > (org.apache.arrow.flight) and have 2 classes FlightGrpcServer & > > FlightGrpcClient that expose the client & server, and also make > > FlightClient ctor package-private. > > > > Please let us know if you have questions or concerns. > > > > Best, > > Rohit > > >
Re: [VOTE] Release Apache Arrow 0.15.0 - RC2
The commits from your local RC branch aren't available so I cannot rebase master yet, I'll just wait for you to be available again. If anyone has some spare time we should try to complete as many post-release tasks this weekend so we can announce the release on Monday or Tuesday next week. Thanks all for your help getting this release ready! On Fri, Oct 4, 2019 at 6:40 AM Krisztián Szűcs wrote: > > We have 5 binding +1 votes and 2 non-binding +1 votes so far. > The 72 hours has passed, so we can close the release vote. > > Sadly I won't be available for the rest of the day, so I will be able > to close the vote and start to work on the the post release tasks > from tomorrow. > @Wes if you have bandwidth feel free to close the vote sooner. > > > On Thu, Oct 3, 2019 at 1:14 AM Bryan Cutler wrote: > > > Accidentally sent too soon. The ORC build error I got was probably just an > > env issue for me, but here it is in case anyone else had the same issue: > > > > In file included from > > > > ESC[01mESC[K/tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep/c++/src/wrap/orc-proto-wrapper.cc:44:0ESC[mESC[K: > > > > ESC[01mESC[K/tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_proto.pb.cc:970 > > :13:ESC[mESC[K > > ESC[01;31mESC[Kerror: > > ESC[mESC[K‘ESC[01mESC[Kdynamic_init_dummy_orc_5fproto_2eprotoESC[mESC[K’ > > defined but not used [-Werror=unused-variable] > > static bool dynamic_init_dummy_orc_5fproto_2eproto = []() { > > AddDescriptors_orc_5fproto_2eproto(); return true; }(); > > ESC[01;32mESC[K ^ESC[mESC[K > > cc1plus: all warnings being treated as errors > > make[5]: *** [c++/src/CMakeFiles/orc.dir/wrap/orc-proto-wrapper.cc.o] Error > > 1 > > make[5]: *** Waiting for unfinished jobs > > make[4]: *** [c++/src/CMakeFiles/orc.dir/all] Error 2 > > make[3]: *** [all] Error 2 > > > > [ 29%] Performing build step for 'orc_ep' > > CMake Error at > > > > /tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-stamp/orc_ep-build-RELEASE.cmake:16 > > (message): > > Command failed: 2 > > > >'make' > > > > See also > > > > > > > > /tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-stamp/orc_ep-build-*.log > > > > CMakeFiles/orc_ep.dir/build.make:111: recipe for target > > 'orc_ep-prefix/src/orc_ep-stamp/orc_ep-build' failed > > make[2]: *** [orc_ep-prefix/src/orc_ep-stamp/orc_ep-build] Error 1 > > CMakeFiles/Makefile2:1248: recipe for target 'CMakeFiles/orc_ep.dir/all' > > failed > > make[1]: *** [CMakeFiles/orc_ep.dir/all] Error 2 > > > > On Wed, Oct 2, 2019 at 4:12 PM Bryan Cutler wrote: > > > > > +1 (non-binding) > > > > > > I ran the following on Ubuntu 16.04 4.15.0-64-generic: > > > > dev/release/verify-release-candidate.sh binaries 0.15.0 2 > > > > ARROW_CUDA=OFF \ > > > TEST_DEFAULT=0 \ > > > TEST_SOURCE=1 \ > > > TEST_CPP=1 \ > > > TEST_PYTHON=1 \ > > > TEST_JAVA=1 \ > > > TEST_INTEGRATION=1 \ > > > dev/release/verify-release-candidate.sh source 0.15.0 2 > > > > > > For source verification I set INTEGRATION_TEST_ARGS="--enable-js=0 > > > --enable-go=0" > > > > > > When attempting source verification with defaults, I got the below error > > > when building the ORC adapter. It looks like just a warning that is being > > > treated as error and seems to be only in > > > > > > On Wed, Oct 2, 2019 at 7:53 AM Andy Grove wrote: > > > > > >> +1 (binding) > > >> > > >> On Mon, Sep 30, 2019 at 11:57 PM Krisztián Szűcs < > > >> szucs.kriszt...@gmail.com> > > >> wrote: > > >> > > >> > Hi, > > >> > > > >> > I would like to propose the following release candidate (RC2) of > > Apache > > >> > Arrow version 0.15.0. This is a release consiting of 697 > > >> > resolved JIRA issues[1]. > > >> > > > >> > This release candidate is based on commit: > > >> > 40d468e162e88e1761b1e80b3ead060f0be927ee [2] > > >> > > > >> > The source release rc2 is hosted at [3]. > > >> > The binary artifacts are hosted at [4][5][6][7]. > > >> > The changelog is located at [8]. > > >> > > > >> > Please download, verify checksums and signatures, run the unit tests, > > >> > and vote on the release. See [9] for how to validate a release > > >> candidate. > > >> > > > >> > The vote will be open for at least 72 hours. > > >> > > > >> > [ ] +1 Release this as Apache Arrow 0.15.0 > > >> > [ ] +0 > > >> > [ ] -1 Do not release this as Apache Arrow 0.15.0 because... > > >> > > > >> > [1]: > > >> > > > >> > > > >> > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.15.0 > > >> > [2]: > > >> > > > >> > > > >> > > https://github.com/apache/arrow/tree/40d468e162e88e1761b1e80b3ead060f0be927ee > > >> > [3]: > > >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.15.0-rc2 > > >> > [4]: https://bintray.com/apache/arrow/centos-rc/0.15.0-rc2 > > >> > [5]: https://bintray.com/apache/arrow/debian-rc/0.15.0-rc2 > > >>
Re: Should Arrow adopt C++14 / 17?
On Fri, Oct 4, 2019 at 12:45 PM Zhuo Peng wrote: > > > > On 2019/10/04 17:05:00, Antoine Pitrou wrote: > > > > Le 04/10/2019 à 19:01, Zhuo Peng a écrit : > > > > > > backports are cool for internal use, but probably not so if a public API > > > accepts it? (because you vendor the headers in (i.e. namespace, symbol > > > names unchanged), they might clash with headers that a client uses). > > > > This is true unfortunately. > > > > >>> And btw, was -std=gnu++11 an intentional choice? what gnu extensions > > >>> does the library rely on? > > >> > > >> None, AFAIK. Arrow compiles on MSVC fine. Where is -std=gnu++11 added? > > > https://github.com/apache/arrow/blob/3129e3ed90219ecfffe2a25ce5820eec8cc947d0/cpp/cmake_modules/SetupCxxFlags.cmake#L33 > > > > > > https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_STANDARD.html > > > > Right, so this is a CMake decision. I think we require only plain C++11 > > (but we may enable additional features on some compilers, provided > > there's a fallback). > Extensions can be disabled through: > set(CMAKE_CXX_EXTENSIONS OFF) > > https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_EXTENSIONS.html > > Is that something more desirable than the current state? Yes, I think so, I don't think we need to be relying on GNU gcc extensions, but we should open a JIRA issue about disabling it in case some tests break because of something we didn't realize we were depending on. As far as C++14/17 upgrading, it seems like it will be at least 2 years before we could upgrade to C++17 given the state of compiler support across the spectrum. Using C++17 would mean requiring at least VS 2017 on Windows, since at least in the Python world I think everything is on VS 2015. Are there ways we could create defines to switch between backports and STL things (like string_view, optional, etc.) so that developers using the Arrow library in a C++17 application can use the built-in types? > > > > Regards > > > > Antoine. > >
Re: [Proposal]: Expose Flight gRPC for Dremio use case (Java)
Is it possible for a single gRPC server to expose multiple services through the same port (it sounds like it is)? It would be a good idea to do similar refactoring in C++ so that Flight RPC endpoints can be provided alongside some other non-Flight endpoints in the same gRPC server On Fri, Oct 4, 2019 at 1:49 PM David Li wrote: > > Hi Rohit, > > This sounds interesting, and I think we've voiced support for > something similar before :) > > Given that Flight does want to abstract over the exact backends, > though, how should we approach this? Is the proposal to also refactor > Flight/Java such that the core classes are just interfaces (or > delegate to interfaces) that anyone can implement, and have the gRPC > implementation as the reference one? Or is this just proposing to > expose the gRPC implementation under a separate namespace, and leave > that question for later? > > Best, > David > > On 10/4/19, Rohit Gupta wrote: > > Hi, > > > > At dremio we are using gRPC for JobsService. One of the api's relies on > > Arrow Flight. We want access to the Flight service so we can bind it to the > > same managed channel as the rest of JobsService (& not have a completely > > separate server). > > > > The approach would be to create a new module within the same package > > (org.apache.arrow.flight) and have 2 classes FlightGrpcServer & > > FlightGrpcClient that expose the client & server, and also make > > FlightClient ctor package-private. > > > > Please let us know if you have questions or concerns. > > > > Best, > > Rohit > >
Re: [Proposal]: Expose Flight gRPC for Dremio use case (Java)
Hi Rohit, This sounds interesting, and I think we've voiced support for something similar before :) Given that Flight does want to abstract over the exact backends, though, how should we approach this? Is the proposal to also refactor Flight/Java such that the core classes are just interfaces (or delegate to interfaces) that anyone can implement, and have the gRPC implementation as the reference one? Or is this just proposing to expose the gRPC implementation under a separate namespace, and leave that question for later? Best, David On 10/4/19, Rohit Gupta wrote: > Hi, > > At dremio we are using gRPC for JobsService. One of the api's relies on > Arrow Flight. We want access to the Flight service so we can bind it to the > same managed channel as the rest of JobsService (& not have a completely > separate server). > > The approach would be to create a new module within the same package > (org.apache.arrow.flight) and have 2 classes FlightGrpcServer & > FlightGrpcClient that expose the client & server, and also make > FlightClient ctor package-private. > > Please let us know if you have questions or concerns. > > Best, > Rohit >
[jira] [Created] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
Neal Richardson created ARROW-6793: -- Summary: [R] Arrow C++ binary packaging for Linux Key: ARROW-6793 URL: https://issues.apache.org/jira/browse/ARROW-6793 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 Our current installation experience on Linux isn't ideal. Unless you've already installed the Arrow C++ library, when you install the R package, you get a shell that tells you to install the C++ library. That was a useful approach to allow us to get the package on CRAN, which makes it easy for macOS and Windows users to install, but it doesn't improve the installation experience for Linux users. This is an impediment to adoption of arrow not only by users but also by package maintainers who might want to depend on arrow. macOS and Windows have a better experience because at installation time, the configure scripts download and statically link a prebuilt C++ library. CRAN bundles the whole thing up and delivers that as a binary R package. Python wheels do a similar thing: they're binaries that contain all external dependencies. And there are pyarrow wheels for Linux. This suggests that we could do something similar for R: build a generic Linux binary of the C++ library and download it in the R package configure script at install time. I experimented with using the Arrow C++ binaries included in the Python wheels in R. See discussion at the end of ARROW-5956. This worked on macOS (not useful for R, but it proved the concept) and almost worked on Linux, but it turned out that the "manylinux2010" standard is too archaic to work with contemporary Rcpp. Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, just with slightly more modern compiler/settings. Publish that C++ binary package to bintray. Then download it in the R configure script if a local/system package isn't found. Once we have a basic version working, test against various distros on [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere and/or ensure the current fallback behavior when we encounter a distro that this doesn't work for. If necessary, we can make multiple flavors of this C++ binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[Proposal]: Expose Flight gRPC for Dremio use case (Java)
Hi, At dremio we are using gRPC for JobsService. One of the api's relies on Arrow Flight. We want access to the Flight service so we can bind it to the same managed channel as the rest of JobsService (& not have a completely separate server). The approach would be to create a new module within the same package (org.apache.arrow.flight) and have 2 classes FlightGrpcServer & FlightGrpcClient that expose the client & server, and also make FlightClient ctor package-private. Please let us know if you have questions or concerns. Best, Rohit
Re: Should Arrow adopt C++14 / 17?
On 2019/10/04 17:05:00, Antoine Pitrou wrote: > > Le 04/10/2019 à 19:01, Zhuo Peng a écrit : > > > > backports are cool for internal use, but probably not so if a public API > > accepts it? (because you vendor the headers in (i.e. namespace, symbol > > names unchanged), they might clash with headers that a client uses). > > This is true unfortunately. > > >>> And btw, was -std=gnu++11 an intentional choice? what gnu extensions does > >>> the library rely on? > >> > >> None, AFAIK. Arrow compiles on MSVC fine. Where is -std=gnu++11 added? > > https://github.com/apache/arrow/blob/3129e3ed90219ecfffe2a25ce5820eec8cc947d0/cpp/cmake_modules/SetupCxxFlags.cmake#L33 > > > > https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_STANDARD.html > > Right, so this is a CMake decision. I think we require only plain C++11 > (but we may enable additional features on some compilers, provided > there's a fallback). Extensions can be disabled through: set(CMAKE_CXX_EXTENSIONS OFF) https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_EXTENSIONS.html Is that something more desirable than the current state? > > Regards > > Antoine. >
Re: Should Arrow adopt C++14 / 17?
Le 04/10/2019 à 19:01, Zhuo Peng a écrit : > > backports are cool for internal use, but probably not so if a public API > accepts it? (because you vendor the headers in (i.e. namespace, symbol names > unchanged), they might clash with headers that a client uses). This is true unfortunately. >>> And btw, was -std=gnu++11 an intentional choice? what gnu extensions does >>> the library rely on? >> >> None, AFAIK. Arrow compiles on MSVC fine. Where is -std=gnu++11 added? > https://github.com/apache/arrow/blob/3129e3ed90219ecfffe2a25ce5820eec8cc947d0/cpp/cmake_modules/SetupCxxFlags.cmake#L33 > > https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_STANDARD.html Right, so this is a CMake decision. I think we require only plain C++11 (but we may enable additional features on some compilers, provided there's a fallback). Regards Antoine.
Re: Should Arrow adopt C++14 / 17?
C++14 isn't very interesting. C++17 is, but it's probably too young given the diversity of platform and toolchain requirements that constrai us. Regards Antoine. Le 04/10/2019 à 18:13, Neal Richardson a écrit : > We do have to care about more than just conda and Python. For R, for > example, C++14 support is limited: > https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-C_002b_002b14-code > > That's not to say that we can't try (if C++ devs wanted to, and I > can't speak for them), but it might be time-consuming (or worse) to > figure out what features are supported on all of the platforms and > languages that Arrow C++ intends to work for. That said, hopefully our > CI coverage is sufficient to let someone try it out and see what > breaks. > > Neal > > On Fri, Oct 4, 2019 at 9:05 AM Zhuo Peng wrote: >> >> Dear Arrow maintainers, >> >> Sorry if this was raised before. I did search the mailing list but "C++" >> matched too many results.. >> >> With manylinux1 (GCC4.8) being sunset, both Conda and Pypa are providing a >> modern enough toolchain (Conda Forge - GCC7; Pypa manylinux2010 docker - >> devtoolset-8(GCC8)). And full C++17 support has been included in GCC7 [1]. I >> wonder what are the concerns of adopting a newer standard? >> >> C++14 might not bring a whole lot of interesting features, but C++17 brings: >> >> std::string_view >> std::optional >> std::variant (the newly added Result class is based on some form of variant >> implementation I suppose?) >> >> and many syntax sugar.. (like emplace_back() returning back(), so you can do >> RETURN_NOT_OK(CreateArray(my_array_sp_vector.emplace_back( >> >> And btw, was -std=gnu++11 an intentional choice? what gnu extensions does >> the library rely on? >> >> [1] https://gcc.gnu.org/projects/cxx-status.html >>
Re: Should Arrow adopt C++14 / 17?
On 2019/10/04 16:53:59, Antoine Pitrou wrote: > > Le 04/10/2019 à 18:05, Zhuo Peng a écrit : > > Dear Arrow maintainers, > > > > Sorry if this was raised before. I did search the mailing list but "C++" > > matched too many results.. > > > > With manylinux1 (GCC4.8) being sunset, both Conda and Pypa are providing a > > modern enough toolchain (Conda Forge - GCC7; Pypa manylinux2010 docker - > > devtoolset-8(GCC8)). And full C++17 support has been included in GCC7 [1]. > > I wonder what are the concerns of adopting a newer standard? > > > > C++14 might not bring a whole lot of interesting features, but C++17 brings: > > > > std::string_view > > std::optional > > std::variant (the newly added Result class is based on some form of variant > > implementation I suppose?) > > We already have `string_view` and `variant` backports. We could > reasonably add a `optional` backport. > backports are cool for internal use, but probably not so if a public API accepts it? (because you vendor the headers in (i.e. namespace, symbol names unchanged), they might clash with headers that a client uses). > > And btw, was -std=gnu++11 an intentional choice? what gnu extensions does > > the library rely on? > > None, AFAIK. Arrow compiles on MSVC fine. Where is -std=gnu++11 added? https://github.com/apache/arrow/blob/3129e3ed90219ecfffe2a25ce5820eec8cc947d0/cpp/cmake_modules/SetupCxxFlags.cmake#L33 https://cmake.org/cmake/help/v3.1/prop_tgt/CXX_STANDARD.html > > Regards > > Antoine. >
Re: Should Arrow adopt C++14 / 17?
Le 04/10/2019 à 18:05, Zhuo Peng a écrit : > Dear Arrow maintainers, > > Sorry if this was raised before. I did search the mailing list but "C++" > matched too many results.. > > With manylinux1 (GCC4.8) being sunset, both Conda and Pypa are providing a > modern enough toolchain (Conda Forge - GCC7; Pypa manylinux2010 docker - > devtoolset-8(GCC8)). And full C++17 support has been included in GCC7 [1]. I > wonder what are the concerns of adopting a newer standard? > > C++14 might not bring a whole lot of interesting features, but C++17 brings: > > std::string_view > std::optional > std::variant (the newly added Result class is based on some form of variant > implementation I suppose?) We already have `string_view` and `variant` backports. We could reasonably add a `optional` backport. > And btw, was -std=gnu++11 an intentional choice? what gnu extensions does the > library rely on? None, AFAIK. Arrow compiles on MSVC fine. Where is -std=gnu++11 added? Regards Antoine.
Re: Should Arrow adopt C++14 / 17?
We do have to care about more than just conda and Python. For R, for example, C++14 support is limited: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-C_002b_002b14-code That's not to say that we can't try (if C++ devs wanted to, and I can't speak for them), but it might be time-consuming (or worse) to figure out what features are supported on all of the platforms and languages that Arrow C++ intends to work for. That said, hopefully our CI coverage is sufficient to let someone try it out and see what breaks. Neal On Fri, Oct 4, 2019 at 9:05 AM Zhuo Peng wrote: > > Dear Arrow maintainers, > > Sorry if this was raised before. I did search the mailing list but "C++" > matched too many results.. > > With manylinux1 (GCC4.8) being sunset, both Conda and Pypa are providing a > modern enough toolchain (Conda Forge - GCC7; Pypa manylinux2010 docker - > devtoolset-8(GCC8)). And full C++17 support has been included in GCC7 [1]. I > wonder what are the concerns of adopting a newer standard? > > C++14 might not bring a whole lot of interesting features, but C++17 brings: > > std::string_view > std::optional > std::variant (the newly added Result class is based on some form of variant > implementation I suppose?) > > and many syntax sugar.. (like emplace_back() returning back(), so you can do > RETURN_NOT_OK(CreateArray(my_array_sp_vector.emplace_back( > > And btw, was -std=gnu++11 an intentional choice? what gnu extensions does the > library rely on? > > [1] https://gcc.gnu.org/projects/cxx-status.html >
[jira] [Created] (ARROW-6792) [R] Explore roxygen2 R6 class documentation
Neal Richardson created ARROW-6792: -- Summary: [R] Explore roxygen2 R6 class documentation Key: ARROW-6792 URL: https://issues.apache.org/jira/browse/ARROW-6792 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 1.0.0 roxygen2 version 7.0 adds support for documenting R6 classes, rather than the ad hoc approach we've had to take without it: [https://github.com/r-lib/roxygen2/blob/master/vignettes/rd.Rmd#L203] Try it out and see how we like it, and consider refactoring the docs to use it everywhere. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Should Arrow adopt C++14 / 17?
Dear Arrow maintainers, Sorry if this was raised before. I did search the mailing list but "C++" matched too many results.. With manylinux1 (GCC4.8) being sunset, both Conda and Pypa are providing a modern enough toolchain (Conda Forge - GCC7; Pypa manylinux2010 docker - devtoolset-8(GCC8)). And full C++17 support has been included in GCC7 [1]. I wonder what are the concerns of adopting a newer standard? C++14 might not bring a whole lot of interesting features, but C++17 brings: std::string_view std::optional std::variant (the newly added Result class is based on some form of variant implementation I suppose?) and many syntax sugar.. (like emplace_back() returning back(), so you can do RETURN_NOT_OK(CreateArray(my_array_sp_vector.emplace_back( And btw, was -std=gnu++11 an intentional choice? what gnu extensions does the library rely on? [1] https://gcc.gnu.org/projects/cxx-status.html
Re: [VOTE] Release Apache Arrow 0.15.0 - RC2
We have 5 binding +1 votes and 2 non-binding +1 votes so far. The 72 hours has passed, so we can close the release vote. Sadly I won't be available for the rest of the day, so I will be able to close the vote and start to work on the the post release tasks from tomorrow. @Wes if you have bandwidth feel free to close the vote sooner. On Thu, Oct 3, 2019 at 1:14 AM Bryan Cutler wrote: > Accidentally sent too soon. The ORC build error I got was probably just an > env issue for me, but here it is in case anyone else had the same issue: > > In file included from > > ESC[01mESC[K/tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep/c++/src/wrap/orc-proto-wrapper.cc:44:0ESC[mESC[K: > > ESC[01mESC[K/tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_proto.pb.cc:970 > :13:ESC[mESC[K > ESC[01;31mESC[Kerror: > ESC[mESC[K‘ESC[01mESC[Kdynamic_init_dummy_orc_5fproto_2eprotoESC[mESC[K’ > defined but not used [-Werror=unused-variable] > static bool dynamic_init_dummy_orc_5fproto_2eproto = []() { > AddDescriptors_orc_5fproto_2eproto(); return true; }(); > ESC[01;32mESC[K ^ESC[mESC[K > cc1plus: all warnings being treated as errors > make[5]: *** [c++/src/CMakeFiles/orc.dir/wrap/orc-proto-wrapper.cc.o] Error > 1 > make[5]: *** Waiting for unfinished jobs > make[4]: *** [c++/src/CMakeFiles/orc.dir/all] Error 2 > make[3]: *** [all] Error 2 > > [ 29%] Performing build step for 'orc_ep' > CMake Error at > > /tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-stamp/orc_ep-build-RELEASE.cmake:16 > (message): > Command failed: 2 > >'make' > > See also > > > > /tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-stamp/orc_ep-build-*.log > > CMakeFiles/orc_ep.dir/build.make:111: recipe for target > 'orc_ep-prefix/src/orc_ep-stamp/orc_ep-build' failed > make[2]: *** [orc_ep-prefix/src/orc_ep-stamp/orc_ep-build] Error 1 > CMakeFiles/Makefile2:1248: recipe for target 'CMakeFiles/orc_ep.dir/all' > failed > make[1]: *** [CMakeFiles/orc_ep.dir/all] Error 2 > > On Wed, Oct 2, 2019 at 4:12 PM Bryan Cutler wrote: > > > +1 (non-binding) > > > > I ran the following on Ubuntu 16.04 4.15.0-64-generic: > > > dev/release/verify-release-candidate.sh binaries 0.15.0 2 > > > ARROW_CUDA=OFF \ > > TEST_DEFAULT=0 \ > > TEST_SOURCE=1 \ > > TEST_CPP=1 \ > > TEST_PYTHON=1 \ > > TEST_JAVA=1 \ > > TEST_INTEGRATION=1 \ > > dev/release/verify-release-candidate.sh source 0.15.0 2 > > > > For source verification I set INTEGRATION_TEST_ARGS="--enable-js=0 > > --enable-go=0" > > > > When attempting source verification with defaults, I got the below error > > when building the ORC adapter. It looks like just a warning that is being > > treated as error and seems to be only in > > > > On Wed, Oct 2, 2019 at 7:53 AM Andy Grove wrote: > > > >> +1 (binding) > >> > >> On Mon, Sep 30, 2019 at 11:57 PM Krisztián Szűcs < > >> szucs.kriszt...@gmail.com> > >> wrote: > >> > >> > Hi, > >> > > >> > I would like to propose the following release candidate (RC2) of > Apache > >> > Arrow version 0.15.0. This is a release consiting of 697 > >> > resolved JIRA issues[1]. > >> > > >> > This release candidate is based on commit: > >> > 40d468e162e88e1761b1e80b3ead060f0be927ee [2] > >> > > >> > The source release rc2 is hosted at [3]. > >> > The binary artifacts are hosted at [4][5][6][7]. > >> > The changelog is located at [8]. > >> > > >> > Please download, verify checksums and signatures, run the unit tests, > >> > and vote on the release. See [9] for how to validate a release > >> candidate. > >> > > >> > The vote will be open for at least 72 hours. > >> > > >> > [ ] +1 Release this as Apache Arrow 0.15.0 > >> > [ ] +0 > >> > [ ] -1 Do not release this as Apache Arrow 0.15.0 because... > >> > > >> > [1]: > >> > > >> > > >> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.15.0 > >> > [2]: > >> > > >> > > >> > https://github.com/apache/arrow/tree/40d468e162e88e1761b1e80b3ead060f0be927ee > >> > [3]: > >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.15.0-rc2 > >> > [4]: https://bintray.com/apache/arrow/centos-rc/0.15.0-rc2 > >> > [5]: https://bintray.com/apache/arrow/debian-rc/0.15.0-rc2 > >> > [6]: https://bintray.com/apache/arrow/python-rc/0.15.0-rc2 > >> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.15.0-rc2 > >> > [8]: > >> > > >> > > >> > https://github.com/apache/arrow/blob/40d468e162e88e1761b1e80b3ead060f0be927ee/CHANGELOG.md > >> > [9]: > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates > >> > > >> > > >
Re: [DISCUSS][Java] Design of the algorithm module
Hi Micah, I agree with 1., i think as an end user, what they would really want is a query/data processing engine. I am not sure how easy/relevant the algorithms will be in the absence of the engine. For e.g. most of these operators would need to pipelined, handle memory, distribution etc. So bundling this along with engine makes a lot more sense, the interfaces required might be a bit different too for that. Thx. On Thu, Oct 3, 2019 at 10:27 AM Micah Kornfield wrote: > Hi Liya Fan, > Thanks again for writing this up. I think it provides a road-map for > intended features. I commented on the document but I wanted to raise a few > high-level concerns here as well to get more feedback from the community. > > 1. It isn't clear to me who the users will of this will be. My perception > is that in the Java ecosystem there aren't use-cases for the algorithms > outside of specific compute engines. I'm not super involved in open-source > Java these days so I would love to hear others opinions. For instance, I'm > not sure if Dremio would switch to using these algorithms instead of the > ones they've already open-sourced [1] and Apache Spark I believe is only > using Arrow for interfacing with Python (they similarly have there own > compute pipeline). I think you mentioned in the past that these are being > used internally on an engine that your company is working on, but if that > is the only consumer it makes me wonder if the algorithm development might > be better served as part of that engine. > > 2. If we do move forward with this, we also need a plan for how to > optimize the algorithms to avoid virtual calls. There are two high-level > approaches template-based and (byte)code generation based. Both aren't > applicable in all situations but it would be good to come consensus on when > (and when not to) use each. > > Thanks, > Micah > > [1] > > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/sabot/op/sort/external > > On Tue, Sep 24, 2019 at 6:48 AM Fan Liya wrote: > > > Hi Micah, > > > > Thanks for your effort and precious time. > > Looking forward to receiving more valuable feedback from you. > > > > Best, > > Liya Fan > > > > On Tue, Sep 24, 2019 at 2:12 PM Micah Kornfield > > wrote: > > > >> Hi Liya Fan, > >> I started reviewing but haven't gotten all the way through it. I will > try > >> to leave more comments over the next few days. > >> > >> Thanks again for the write-up I think it will help frame a productive > >> conversation. > >> > >> -Micah > >> > >> On Tue, Sep 17, 2019 at 1:47 AM Fan Liya wrote: > >> > >>> Hi Micah, > >>> > >>> Thanks for your kind reminder. Comments are enabled now. > >>> > >>> Best, > >>> Liya Fan > >>> > >>> On Tue, Sep 17, 2019 at 12:45 PM Micah Kornfield < > emkornfi...@gmail.com> > >>> wrote: > >>> > Hi Liya Fan, > Thank you for this writeup, it doesn't look like comments are enabled > on > the document. Could you allow for them? > > Thanks, > Micah > > On Sat, Sep 14, 2019 at 6:57 AM Fan Liya > wrote: > > > Dear all, > > > > We have prepared a document for discussing the requirements, design > and > > implementation issues for the algorithm module of Java: > > > > > > > > https://docs.google.com/document/d/17nqHWS7gs0vARfeDAcUEbhKMOYHnCtA46TOY_Nls69s/edit?usp=sharing > > > > So far, we have finished the initial draft for sort, search and > dictionary > > encoding algorithms. Discussions for more algorithms may be added in > the > > future. This document will keep evolving to reflect the latest > discussion > > results in the community and the latest code changes. > > > > Please give your valuable feedback. > > > > Best, > > Liya Fan > > > > >>> >
[jira] [Created] (ARROW-6791) Memory Leak
George Prichard created ARROW-6791: -- Summary: Memory Leak Key: ARROW-6791 URL: https://issues.apache.org/jira/browse/ARROW-6791 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.14.1, 0.14.0 Environment: Ubuntu 18.04, 32GB ram, conda-forge installation Reporter: George Prichard Memory leak with large string columns crashes the program. This only seems to affect 0.14.x - it works fine for me in 0.13.0. It might be related to earlier similar issues? e.g. [https://github.com/apache/arrow/issues/2624] Below is a reprex which works in earlier versions, but crashes on read (writing is fine) in this one. The real-life version of the data is full of URLs as the strings. Weirdly it crashes my 32GB Ubuntu 18.04, but runs (if very slowly for the read) on my 16GB Macbook. Thanks so much for the excellent tools! {code:java} import pandas as pd n_rows = int(1e6) n_cols = 10 col_length = 100 df = pd.DataFrame() for i in range(n_cols): df[f'col_{i}'] = pd.util.testing.rands_array(col_length, n_rows) print('Generated df', df.shape) filename = 'tmp.parquet' print('Writing parquet') df.to_parquet(filename) print('Reading parquet') pd.read_parquet(filename) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)