[DISCUSS][Java] Builders for java classes
As part a PR Ji Liu has made to help populate data for test cases [1], the question came up on whether we should provide a more builder classes in java for ValueVectors. The proposed implementation would wrap the existing Writer classes. Do people think this would be a valuable addition to the java library? I imagine it would be a builder per ValueVectorType. The main benefit I see to this is making the library potentially slightly easier to use for new-comers, but might not be the most efficient. A straw-man interface is listed below. Thoughts? Thanks, Micah class IntVectorBuilder { public IntVectorBuilder(BufferAllocator allocator); IntVectorBuilder add(int value); IntVectorBuilder addAll(int[] values); IntVectorBuilder addNull(); // handles null values in array IntVectorBuilder addAll(Integer... values); IntVectorBuilder addAll(List values); IntVector build(String name); }
[jira] [Created] (ARROW-6983) [C++] Threaded task group crashes sometimes
Neal Richardson created ARROW-6983: -- Summary: [C++] Threaded task group crashes sometimes Key: ARROW-6983 URL: https://issues.apache.org/jira/browse/ARROW-6983 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Neal Richardson Assignee: Antoine Pitrou Fix For: 0.15.1 You can give this a more descriptive title :) See discussion on ARROW-6977. https://gist.github.com/pitrou/87f3091c226db3306c45b2c32dd9aea8 seems to fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[VOTE] Clarifications and forward compatibility changes for Dictionary Encoding
Hello, As discussed on [1], I've proposed clarifications in a PR [2] that clarifies: 1. It is not required that all dictionary batches occur at the beginning of the IPC stream format (if a the first record batch has an all null dictionary encoded column, the null column's dictionary might not be sent until later in the stream). 2. A second dictionary batch for the same ID that is not a "delta batch" in an IPC stream indicates the dictionary should be replaced. 3. Clarifies that the file format, can only contain 1 "NON-delta" dictionary batch and multiple "delta" dictionary batches. 4. Add an enum to dictionary metadata for possible future changes in what format dictionary batches can be sent. (the most likely would be an array Map). An enum is needed as a place holder to allow for forward compatibility past the release 1.0.0. If accepted there will be work in all implementations to make sure that they cover the edge cases highlighted and additional integration testing will be needed. Please vote whether to accept these additions. The vote will be open for at least 72 hours. [ ] +1 Accept these change to the specification [ ] +0 [ ] -1 Do not accept the changes because... Thanks, Micah [1] https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E [2] https://github.com/apache/arrow/pull/5585
Re: [DISCUSS] Result vs Status
OK, it sounds like people want Result (at least in some circumstances). Any thoughts on migrating old APIs and what to do for new APIs going forward? A very rough approximation [1] yields the following counts by module: 853 arrow 17 gandiva 25 parquet 50 plasma [1] grep -r Status cpp/src/* |grep ".h:" | grep "\\*" |grep -v Accept |sed s/:.*// | cut -f3 -d/ |sort Thanks, Micah On Sat, Oct 19, 2019 at 7:50 PM Francois Saint-Jacques < fsaintjacq...@gmail.com> wrote: > As mentioned, Result is an improvement for function which returns a > single value, e.g. Make/Factory-like. My vote goes Result for such > case. For multiple return types, we have std::tuple like Antoine > proposed. > > François > > On Fri, Oct 18, 2019 at 9:19 PM Antoine Pitrou wrote: > > > > > > Le 18/10/2019 à 20:58, Wes McKinney a écrit : > > > I'm definitely uncomfortable with the idea of deprecating Status. > > > > > > We have a few kinds of functions that can fail: > > > > > > 1. Functions with no "out" arguments > > > 2. Functions with one out argument > > > 3. Functions with multiple out arguments > > > > > > IMHO functions in category 2 are the best candidates for utilizing > > > Status. In some cases, Case 3 may be more usable Result-based, but it > > > can also create more work (or confusion) on the part of the developer, > > > either > > > > > > * The T in Result has to be a struct-like value that transports > > > multiple pieces of data > > > > The T can be a std::tuple though, so you need not necessarily define a > > dedicated struct type for a single API's return value. > > > > > Can't say I'm thrilled about having Result or similar for Case > > > 1-type functions (if I'm understanding what would be the solution > > > there). > > > > Agreed. > > > > Regards > > > > Antoine. >
Re: [DISCUSS][Java] Design of the algorithm module
> > To save the effort, or invest it to higher priority issues, we plan to: > 1. We will stop providing "additional algorithms", unless they are > explictly required. This sounds reasonable, we can also evaluate on a case-by-case basis on how widely applicable some are. 2. For existing addition algorithms in our code base, we will stop > improving them. OK, I'm a little afraid of bit-rot here, but we can see you things go. Cheers, Micah On Tue, Oct 22, 2019 at 7:09 PM Fan Liya wrote: > Hi Micah, > > Thank you for reading through my previous email. > > > Is the conversation about rejecting the changes in Flink something you > can link to? I found [1] which seems to allow for Arrow, in what seem like > reasonable places, just not inside the core planner (and even that is a > possibility with a proper PoC). However, I don't think the algorithms > proposed here are directly related to those discussions. > > There is a short discussion [1] in the ML. Please note that our proposal > is not officially "rejected". It is just ignored silently (in fact, this > makes no difference to us). We have had some conferences/discussions with > the Flink commiters and founders, it seems they like ideas, but no progress > has been made so far, because the change is too large and too risky. The > other issue you have indicated [2] represents another (earlier) attempt to > incorporate Arrow to Flink. However, that issue has no progress either. > > > I don't agree with this conclusion. Apache Drill, where most of the > Java code came from has been around for longer period of time. Also, even > without Arrow being around, columnar vs row based DB engines, is design > decision that has nothing to do with existing open source projects. Does > Flink use another open source library for its row representation? > > I think you mean that, row vs. columnar representations and open source > project selection are two independent issues. I agree with you. > Flink has its own implementation for row store, although I think they > should use Arrow directly (if it were available earlier), as columnar store > is the mainstream. > > > I think this circles back around to my original points: > > 1. Which users are we expecting to use the algorithms package that > aren't directly related to data transport in Java (i.e. additional > algorithms)? In many cases the algorithms seem like they would be query > engine specific. I haven't seen much evidence that there are users of the > Java code base that need all these algorithms. > > 2. Contributions to any project consume resources and peoples' time. > If there is only going to be one user of the code it might not belong in > Arrow "proper" due to these hurdles. > > I agree with you that contributing code consumes lots of effort, and we > should only provide general algorithms. > > To save the effort, or invest it to higher priority issues, we plan to: > 1. We will stop providing "additional algorithms", unless they are > explictly required. > 2. For existing addition algorithms in our code base, we will stop > improving them. > > Thanks again for your effort in reviewing algorithms and all the good > review comments. > > Best, > Liya Fan > > > [1] http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/browser > [2] https://issues.apache.org/jira/browse/FLINK-10929 > > On Sun, Oct 20, 2019 at 12:05 PM Micah Kornfield > wrote: > >> Hi Liya Fan, >> Is the conversation about rejecting the changes in Flink something you >> can link to? I found [1] which seems to allow for Arrow, in what seem like >> reasonable places, just not inside the core planner (and even that is a >> possibility with a proper PoC). However, I don't think the algorithms >> proposed here are directly related to those discussions. >> >> I think the lesson learned is that, we should provide some features >>> proactively (at least the general features), and make them good enough. >>> Apache Flink was started around 2015, and Arrow's Java project was started >>> in 2016. If Arrow were made available earlier, maybe Flink would have >>> chosen it in the first place. >> >> >> I don't agree with this conclusion. Apache Drill, where most of the Java >> code came from has been around for longer period of time. Also, even >> without Arrow being around, columnar vs row based DB engines, is design >> decision that has nothing to do with existing open source projects. Does >> Flink use another open source library for its row representation? >> >> When a users needs a algorithm, it may be already too late. AFAIK, most >>> users will choose to implement one by themselves, rather than openning a >>> JIRA in the community. It takes a long time to provide a PR, review the >>> code, merge the code, and wait for the next release. >> >> >> I think this circles back around to my original points: >> 1. Which users are we expecting to use the algorithms package that >> aren't directly related to data transport in Java (i.e.
Re: [C++] The quest for zero-dependency builds
I'll add I don't think we will actually be switching anytime soon. bazel does have some advantages at least over our current CMake system in terms of developer productivity (users can target smaller components with unit tests which avoid re linking). I've started on a prototype and hope to have something to share in the next few days, so we can evaluate if it is reasonable to have the two live side-by-side in the short term. On Wed, Oct 23, 2019 at 4:11 PM Wes McKinney wrote: > On Sun, Oct 20, 2019 at 12:22 PM Maarten Ballintijn > wrote: > > > > Dev's > > > > I would request to be as conservative as possible in choosing (keeping) > a build system. > > > > For developers, packagers and even end-users for some languages the > build system is just > > another dependency. Even if cmake is not ideal, it has become quite > ubiquitous which is a huge plus. > > > > Maybe it is possible to come up with a way of expressing the dependency > relations in cmake in > > a way that makes maintaining them easier. Otherwise it is maybe possible > to generate them from > > a (simple) description file? > > There do seem to be parts of our CMake build system that contain > boilerplate (particularly some of the platform-specific export > defines) that might be better auto-generated in some way, so this is > something it would be worth looking more at. > > FWIW, some Google projects I have seen offer CMake as a build option > but the CMake files are mostly auto-generated from another build > configuration. > > > > > Cheers, > > Maarten. > > > > > > > On Oct 19, 2019, at 11:22 PM, Micah Kornfield > wrote: > > > > > >> > > >> Perhaps meson is also worth exploring? > > > > > > > > > It could be, if someone else wants to take a look we can, compare what > > > things look at in each. Recently, Bazel build rules seem like they > would be > > > useful for some work projects I've been dealing with, so I plan on > focusing > > > my exploration there. > > > > > > On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou > wrote: > > > > > >> > > >> Perhaps meson is also worth exploring? > > >> > > >> > > >> Le 15/10/2019 à 23:06, Micah Kornfield a écrit : > > >>> Hi Wes, > > >>> I agree on both accounts that it won't be a done in the short term, > and > > >> it > > >>> makes sense to tackle in incrementally. Like I said I don't have > much > > >>> bandwidth at the moment but might be able to re-arrange a few things > on > > >> my > > >>> plate. I think some people have asked on the mailing list how they > might > > >>> be able to help, this might be one area that doesn't require a lot of > > >>> in-depth knowledge of C++ at least for a proof of concept. I'll try > to > > >>> open up some JIRAs soon. > > >>> > > >>> Thanks, > > >>> Micah > > >>> > > >>> On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney > > >> wrote: > > >>> > > hi Micah, > > > > Definitely Bazel is worth exploring, but we must be realistic about > > the amount of energy (several hundred hours or more) that's been > > invested in the build system we have now. So a new build system will > > be a large endeavor, but hopefully can make things simpler. > > > > Aside from the requirements gathering process, if it is felt that > > Bazel is a possible path forward in the future, it may be good to > try > > to break up the work into more tractable pieces. For example, a > first > > step would be to set up Bazel configurations to build the project's > > thirdparty toolchain. Since we're reliant in ExternalProject in > CMake > > to do a lot of heavy lifting there for us, I imagine this (taking > care > > of what ThirdpartyToolchain.cmake does not) will take up a lot of > the > > energy > > > > - Wes > > > > On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield < > emkornfi...@gmail.com> > > wrote: > > > > > >> > > >> > > >> This might be taking the thread on more of a tangent, but maybe we > > should > > > start collecting requirements for the C++ build system in general > and > > >> see > > > if there might be better solution that can address some of these > > concerns? > > > In particular, Bazel at least on the surface seems like it might > be a > > > better fit for some of the use cases discussed here. I know this > is a > > big > > > project (and I currently don't have much bandwidth for it) but I > think > > >> if > > > CMake is lacking in these areas it might be worth at least > exploring > > > instead of going down the path of building our own meta-build > system on > > top > > > of CMake. > > > > > > Requirements that I think we are targeting: > > > 1. Be able to provide an out of box build system that requires as > > >> close > > to > > > zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD > > >> minimal" > > > works on any C++ developers desktop without additional >
Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-23-0
It was happening from time to time, but now it is pretty consistent. I'm working on to fix the deployments by running the crossbow artifact uploading script. On Thu, Oct 24, 2019 at 1:16 AM Wes McKinney wrote: > Any clues why the macOS wheel uploads keep flaking out? > > On Wed, Oct 23, 2019 at 7:56 AM Crossbow wrote: > > > > > > Arrow Build Report for Job nightly-2019-10-23-0 > > > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0 > > > > Failed Tasks: > > - docker-clang-format: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-clang-format > > - docker-r-sanitizer: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-r-sanitizer > > - wheel-osx-cp36m: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp36m > > - wheel-osx-cp37m: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp37m > > > > Succeeded Tasks: > > - centos-6: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-6 > > - centos-7: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-7 > > - conda-linux-gcc-py27: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py27 > > - conda-linux-gcc-py36: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py36 > > - conda-linux-gcc-py37: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py37 > > - conda-osx-clang-py27: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py27 > > - conda-osx-clang-py36: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py36 > > - conda-osx-clang-py37: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py37 > > - conda-win-vs2015-py36: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py36 > > - conda-win-vs2015-py37: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py37 > > - debian-buster: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-buster > > - debian-stretch: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-stretch > > - docker-c_glib: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-c_glib > > - docker-cpp-cmake32: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-cmake32 > > - docker-cpp-release: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-release > > - docker-cpp-static-only: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-static-only > > - docker-cpp: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp > > - docker-dask-integration: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-dask-integration > > - docker-docs: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-docs > > - docker-go: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-go > > - docker-hdfs-integration: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-hdfs-integration > > - docker-iwyu: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-iwyu > > - docker-java: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-java > > - docker-js: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-js > > - docker-lint: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-lint > > - docker-pandas-master: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-pandas-master > > - docker-python-2.7-nopandas: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7-nopandas > > - docker-python-2.7: > > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7 > > -
Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-23-0
Any clues why the macOS wheel uploads keep flaking out? On Wed, Oct 23, 2019 at 7:56 AM Crossbow wrote: > > > Arrow Build Report for Job nightly-2019-10-23-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0 > > Failed Tasks: > - docker-clang-format: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-clang-format > - docker-r-sanitizer: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-r-sanitizer > - wheel-osx-cp36m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp36m > - wheel-osx-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp37m > > Succeeded Tasks: > - centos-6: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-6 > - centos-7: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-7 > - conda-linux-gcc-py27: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py27 > - conda-linux-gcc-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py36 > - conda-linux-gcc-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py37 > - conda-osx-clang-py27: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py27 > - conda-osx-clang-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py36 > - conda-osx-clang-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py37 > - conda-win-vs2015-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py36 > - conda-win-vs2015-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py37 > - debian-buster: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-buster > - debian-stretch: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-stretch > - docker-c_glib: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-c_glib > - docker-cpp-cmake32: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-cmake32 > - docker-cpp-release: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-release > - docker-cpp-static-only: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-static-only > - docker-cpp: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp > - docker-dask-integration: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-dask-integration > - docker-docs: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-docs > - docker-go: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-go > - docker-hdfs-integration: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-hdfs-integration > - docker-iwyu: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-iwyu > - docker-java: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-java > - docker-js: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-js > - docker-lint: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-lint > - docker-pandas-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-pandas-master > - docker-python-2.7-nopandas: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7-nopandas > - docker-python-2.7: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7 > - docker-python-3.6-nopandas: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.6-nopandas > - docker-python-3.6: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.6 > - docker-python-3.7: > URL: >
Re: [C++] The quest for zero-dependency builds
On Sun, Oct 20, 2019 at 12:22 PM Maarten Ballintijn wrote: > > Dev's > > I would request to be as conservative as possible in choosing (keeping) a > build system. > > For developers, packagers and even end-users for some languages the build > system is just > another dependency. Even if cmake is not ideal, it has become quite > ubiquitous which is a huge plus. > > Maybe it is possible to come up with a way of expressing the dependency > relations in cmake in > a way that makes maintaining them easier. Otherwise it is maybe possible to > generate them from > a (simple) description file? There do seem to be parts of our CMake build system that contain boilerplate (particularly some of the platform-specific export defines) that might be better auto-generated in some way, so this is something it would be worth looking more at. FWIW, some Google projects I have seen offer CMake as a build option but the CMake files are mostly auto-generated from another build configuration. > > Cheers, > Maarten. > > > > On Oct 19, 2019, at 11:22 PM, Micah Kornfield wrote: > > > >> > >> Perhaps meson is also worth exploring? > > > > > > It could be, if someone else wants to take a look we can, compare what > > things look at in each. Recently, Bazel build rules seem like they would be > > useful for some work projects I've been dealing with, so I plan on focusing > > my exploration there. > > > > On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou wrote: > > > >> > >> Perhaps meson is also worth exploring? > >> > >> > >> Le 15/10/2019 à 23:06, Micah Kornfield a écrit : > >>> Hi Wes, > >>> I agree on both accounts that it won't be a done in the short term, and > >> it > >>> makes sense to tackle in incrementally. Like I said I don't have much > >>> bandwidth at the moment but might be able to re-arrange a few things on > >> my > >>> plate. I think some people have asked on the mailing list how they might > >>> be able to help, this might be one area that doesn't require a lot of > >>> in-depth knowledge of C++ at least for a proof of concept. I'll try to > >>> open up some JIRAs soon. > >>> > >>> Thanks, > >>> Micah > >>> > >>> On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney > >> wrote: > >>> > hi Micah, > > Definitely Bazel is worth exploring, but we must be realistic about > the amount of energy (several hundred hours or more) that's been > invested in the build system we have now. So a new build system will > be a large endeavor, but hopefully can make things simpler. > > Aside from the requirements gathering process, if it is felt that > Bazel is a possible path forward in the future, it may be good to try > to break up the work into more tractable pieces. For example, a first > step would be to set up Bazel configurations to build the project's > thirdparty toolchain. Since we're reliant in ExternalProject in CMake > to do a lot of heavy lifting there for us, I imagine this (taking care > of what ThirdpartyToolchain.cmake does not) will take up a lot of the > energy > > - Wes > > On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield > wrote: > > > >> > >> > >> This might be taking the thread on more of a tangent, but maybe we > should > > start collecting requirements for the C++ build system in general and > >> see > > if there might be better solution that can address some of these > concerns? > > In particular, Bazel at least on the surface seems like it might be a > > better fit for some of the use cases discussed here. I know this is a > big > > project (and I currently don't have much bandwidth for it) but I think > >> if > > CMake is lacking in these areas it might be worth at least exploring > > instead of going down the path of building our own meta-build system on > top > > of CMake. > > > > Requirements that I think we are targeting: > > 1. Be able to provide an out of box build system that requires as > >> close > to > > zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD > >> minimal" > > works on any C++ developers desktop without additional requirements) > > 2. The build system should limit configuration knobs in favor of > >> implied > > dependencies (e.g. "$BUILD python" automatically builds "compute", > > "filesystem", "ipc") > > 3. The build system should be configurable to use (and have the user > > specify) one of "System packages", "Conda packages" or source packages > for > > providing dependencies (and fallback options between the three). > > 4. The build system should be able to treat some dependencies as > optional > > (e.g. different compression libraries or allocators). > > 5. Easily allow developers to limit building unnecessary code for > >> their > > particular task at hand. > > 6. The build system must work across the following
[jira] [Created] (ARROW-6982) [R] Add bindings for compare and boolean kernels
Neal Richardson created ARROW-6982: -- Summary: [R] Add bindings for compare and boolean kernels Key: ARROW-6982 URL: https://issues.apache.org/jira/browse/ARROW-6982 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Romain Francois Fix For: 1.0.0 See cpp/src/arrow/compute/kernels/compare.h and boolean.h. ARROW-6980 introduces an Expression class that works on Arrow Arrays, but to evaluate the expressions, it has to pull the data into R first. This would enable us to do the work in C++ and only pull in the result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6981) [R] Implement HDFS file-system interface in R
Neal Richardson created ARROW-6981: -- Summary: [R] Implement HDFS file-system interface in R Key: ARROW-6981 URL: https://issues.apache.org/jira/browse/ARROW-6981 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6980) [R] dplyr backend for RecordBatch/Table
Neal Richardson created ARROW-6980: -- Summary: [R] dplyr backend for RecordBatch/Table Key: ARROW-6980 URL: https://issues.apache.org/jira/browse/ARROW-6980 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6979) [R] Enable jemalloc in autobrew formula
Neal Richardson created ARROW-6979: -- Summary: [R] Enable jemalloc in autobrew formula Key: ARROW-6979 URL: https://issues.apache.org/jira/browse/ARROW-6979 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 1.0.0 See https://github.com/apache/arrow/blob/59a6788c76330cf055bdbcbc7bdae7b0106c6656/dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb#L47 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6978) [R] Add bindings for sum and mean compute kernels
Neal Richardson created ARROW-6978: -- Summary: [R] Add bindings for sum and mean compute kernels Key: ARROW-6978 URL: https://issues.apache.org/jira/browse/ARROW-6978 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Neal Richardson Assignee: Romain Francois Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6975) [C++] Put make_unique in its own header
Antoine Pitrou created ARROW-6975: - Summary: [C++] Put make_unique in its own header Key: ARROW-6975 URL: https://issues.apache.org/jira/browse/ARROW-6975 Project: Apache Arrow Issue Type: Wish Components: C++ Reporter: Antoine Pitrou {{arrow/util/stl.h}} carries other stuff that is almost never necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6974) [C++] Implement Cast kernel for time-likes with ArrayDataVisitor pattern
Joris Van den Bossche created ARROW-6974: Summary: [C++] Implement Cast kernel for time-likes with ArrayDataVisitor pattern Key: ARROW-6974 URL: https://issues.apache.org/jira/browse/ARROW-6974 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Joris Van den Bossche Currently, the casting for time-like data is done with the {{ShiftTime}} function. It _might_ be possible to simplify this with ArrayDataVisitor (to avoid looping / checking the bitmap). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2019-10-23-0
Arrow Build Report for Job nightly-2019-10-23-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0 Failed Tasks: - docker-clang-format: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-clang-format - docker-r-sanitizer: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-r-sanitizer - wheel-osx-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp36m - wheel-osx-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp37m Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-7 - conda-linux-gcc-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py27 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py37 - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py27 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py37 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py37 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-stretch - docker-c_glib: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-c_glib - docker-cpp-cmake32: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-cmake32 - docker-cpp-release: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-release - docker-cpp-static-only: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-static-only - docker-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp - docker-dask-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-dask-integration - docker-docs: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-docs - docker-go: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-go - docker-hdfs-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-hdfs-integration - docker-iwyu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-iwyu - docker-java: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-java - docker-js: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-js - docker-lint: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-lint - docker-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-pandas-master - docker-python-2.7-nopandas: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7-nopandas - docker-python-2.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7 - docker-python-3.6-nopandas: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.6-nopandas - docker-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.6 - docker-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.7 - docker-r-conda: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-r-conda - docker-r: URL:
[jira] [Created] (ARROW-6973) [C++][ThreadPool] Use perfect forwarding in Submit
Artem Alekseev created ARROW-6973: - Summary: [C++][ThreadPool] Use perfect forwarding in Submit Key: ARROW-6973 URL: https://issues.apache.org/jira/browse/ARROW-6973 Project: Apache Arrow Issue Type: Improvement Reporter: Artem Alekseev Assignee: Artem Alekseev -- This message was sent by Atlassian Jira (v8.3.4#803005)