[DISCUSS][Java] Builders for java classes

2019-10-23 Thread Micah Kornfield
As part a PR Ji Liu has made to help populate data for test cases [1], the
question came up on whether we should provide a more  builder classes in
java for ValueVectors.  The proposed implementation would wrap the existing
Writer classes.

Do people think this would be a valuable addition to the java library? I
imagine it would be a builder per ValueVectorType.  The main benefit I see
to this is making the library potentially slightly easier to use for
new-comers, but might not be the most efficient.  A straw-man interface is
listed below.

Thoughts?

Thanks,
Micah

class IntVectorBuilder {
   public IntVectorBuilder(BufferAllocator allocator);

   IntVectorBuilder add(int value);
IntVectorBuilder addAll(int[] values);
IntVectorBuilder addNull();
// handles null values in array
IntVectorBuilder addAll(Integer... values);
IntVectorBuilder addAll(List values);
IntVector build(String name);
}


[jira] [Created] (ARROW-6983) [C++] Threaded task group crashes sometimes

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6983:
--

 Summary: [C++] Threaded task group crashes sometimes
 Key: ARROW-6983
 URL: https://issues.apache.org/jira/browse/ARROW-6983
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson
Assignee: Antoine Pitrou
 Fix For: 0.15.1


You can give this a more descriptive title :)

See discussion on ARROW-6977. 
https://gist.github.com/pitrou/87f3091c226db3306c45b2c32dd9aea8 seems to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] Clarifications and forward compatibility changes for Dictionary Encoding

2019-10-23 Thread Micah Kornfield
Hello,
As discussed on [1], I've proposed clarifications in a PR [2] that
clarifies:

1.  It is not required that all dictionary batches occur at the beginning
of the IPC stream format (if a the first record batch has an all null
dictionary encoded column, the null column's dictionary might not be sent
until later in the stream).

2.  A second dictionary batch for the same ID that is not a "delta batch"
in an IPC stream indicates the dictionary should be replaced.

3.  Clarifies that the file format, can only contain 1 "NON-delta"
dictionary batch and multiple "delta" dictionary batches.

4.  Add an enum to dictionary metadata for possible future changes in what
format dictionary batches can be sent. (the most likely would be an array
Map).  An enum is needed as a place holder to allow for forward
compatibility past the release 1.0.0.

If accepted there will be work in all implementations to make sure that
they cover the edge cases highlighted and additional integration testing
will be needed.

Please vote whether to accept these additions. The vote will be open for at
least 72 hours.

[ ] +1 Accept these change to the specification
[ ] +0
[ ] -1 Do not accept the changes because...

Thanks,
Micah


[1]
https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
[2] https://github.com/apache/arrow/pull/5585


Re: [DISCUSS] Result vs Status

2019-10-23 Thread Micah Kornfield
OK, it sounds like people want Result (at least in some circumstances).
Any thoughts on migrating old APIs and what to do for new APIs going
forward?

A very rough approximation [1] yields the following counts by module:

 853 arrow

  17 gandiva

  25 parquet

  50 plasma



[1] grep -r Status cpp/src/* |grep ".h:" | grep "\\*" |grep -v Accept |sed
s/:.*// | cut -f3 -d/ |sort


Thanks,

Micah



On Sat, Oct 19, 2019 at 7:50 PM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> As mentioned, Result is an improvement for function which returns a
> single value, e.g. Make/Factory-like. My vote goes Result for such
> case. For multiple return types, we have std::tuple like Antoine
> proposed.
>
> François
>
> On Fri, Oct 18, 2019 at 9:19 PM Antoine Pitrou  wrote:
> >
> >
> > Le 18/10/2019 à 20:58, Wes McKinney a écrit :
> > > I'm definitely uncomfortable with the idea of deprecating Status.
> > >
> > > We have a few kinds of functions that can fail:
> > >
> > > 1. Functions with no "out" arguments
> > > 2. Functions with one out argument
> > > 3. Functions with multiple out arguments
> > >
> > > IMHO functions in category 2 are the best candidates for utilizing
> > > Status. In some cases, Case 3 may be more usable Result-based, but it
> > > can also create more work (or confusion) on the part of the developer,
> > > either
> > >
> > > * The T in Result has to be a struct-like value that transports
> > > multiple pieces of data
> >
> > The T can be a std::tuple though, so you need not necessarily define a
> > dedicated struct type for a single API's return value.
> >
> >  > Can't say I'm thrilled about having Result or similar for Case
> >  > 1-type functions (if I'm understanding what would be the solution
> >  > there).
> >
> > Agreed.
> >
> > Regards
> >
> > Antoine.
>


Re: [DISCUSS][Java] Design of the algorithm module

2019-10-23 Thread Micah Kornfield
>
> To save the effort, or invest it to higher priority issues, we plan to:
> 1. We will stop providing "additional algorithms", unless they are
> explictly required.

This sounds reasonable, we can also evaluate on a case-by-case basis on how
widely applicable some are.

2. For existing addition algorithms in our code base, we will stop
> improving them.

OK, I'm a little afraid of bit-rot here, but we can see you things go.

Cheers,
Micah


On Tue, Oct 22, 2019 at 7:09 PM Fan Liya  wrote:

> Hi Micah,
>
> Thank you for reading through my previous email.
>
> > Is the conversation about rejecting the changes in Flink something you
> can link to? I found [1] which seems to allow for Arrow, in what seem like
> reasonable places, just not inside the core planner (and even that is a
> possibility with a proper PoC).  However, I don't think the algorithms
> proposed here are directly related to those discussions.
>
> There is a short discussion [1] in the ML. Please note that our proposal
> is not officially "rejected". It is just ignored silently (in fact, this
> makes no difference to us). We have had some conferences/discussions with
> the Flink commiters and founders, it seems they like ideas, but no progress
> has been made so far, because the change is too large and too risky. The
> other issue you have indicated [2] represents another (earlier) attempt to
> incorporate Arrow to Flink. However, that issue has no progress either.
>
> > I don't agree with this conclusion.  Apache Drill, where most of the
> Java code came from has been around for longer period of time.  Also, even
> without Arrow being around, columnar vs row based DB engines, is design
> decision that has nothing to do with existing open source projects.  Does
> Flink use another open source library for its row representation?
>
> I think you mean that, row vs. columnar representations and open source
> project selection are two independent issues. I agree with you.
> Flink has its own implementation for row store, although I think they
> should use Arrow directly (if it were available earlier), as columnar store
> is the mainstream.
>
> > I think this circles back around to my original points:
> >  1.  Which users are we expecting to use the algorithms package that
> aren't directly related to data transport in Java (i.e. additional
> algorithms)?  In many cases the algorithms seem like they would be query
> engine specific.  I haven't seen much evidence that there are users of the
> Java code base that need all these algorithms.
> >  2.  Contributions to any project consume resources and peoples' time.
> If there is only going to be one user of the code it might not belong in
> Arrow "proper" due to these hurdles.
>
> I agree with you that contributing code consumes lots of effort, and we
> should only provide general algorithms.
>
> To save the effort, or invest it to higher priority issues, we plan to:
> 1. We will stop providing "additional algorithms", unless they are
> explictly required.
> 2. For existing addition algorithms in our code base, we will stop
> improving them.
>
> Thanks again for your effort in reviewing algorithms and all the good
> review comments.
>
> Best,
> Liya Fan
>
>
> [1] http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/browser
> [2] https://issues.apache.org/jira/browse/FLINK-10929
>
> On Sun, Oct 20, 2019 at 12:05 PM Micah Kornfield 
> wrote:
>
>> Hi Liya Fan,
>> Is the conversation about rejecting the changes in Flink something you
>> can link to? I found [1] which seems to allow for Arrow, in what seem like
>> reasonable places, just not inside the core planner (and even that is a
>> possibility with a proper PoC).  However, I don't think the algorithms
>> proposed here are directly related to those discussions.
>>
>> I think the lesson learned is that, we should provide some features
>>> proactively (at least the general features), and make them good enough.
>>> Apache Flink was started around 2015, and Arrow's Java project was started
>>> in 2016. If Arrow were made available earlier, maybe Flink would have
>>> chosen it in the first place.
>>
>>
>> I don't agree with this conclusion.  Apache Drill, where most of the Java
>> code came from has been around for longer period of time.  Also, even
>> without Arrow being around, columnar vs row based DB engines, is design
>> decision that has nothing to do with existing open source projects.  Does
>> Flink use another open source library for its row representation?
>>
>> When a users needs a algorithm, it may be already too late. AFAIK, most
>>> users will choose to implement one by themselves, rather than openning a
>>> JIRA in the community. It takes a long time to provide a PR, review the
>>> code, merge the code, and wait for the next release.
>>
>>
>> I think this circles back around to my original points:
>>   1.  Which users are we expecting to use the algorithms package that
>> aren't directly related to data transport in Java (i.e. 

Re: [C++] The quest for zero-dependency builds

2019-10-23 Thread Micah Kornfield
I'll add I don't think we will actually be switching anytime soon.  bazel
does have some advantages at least over our current CMake system in terms
of developer productivity (users can target smaller components with unit
tests which avoid re linking).  I've started on a prototype and hope to
have something to share in the next few days, so we can evaluate if it is
reasonable to have the two live side-by-side in the short term.

On Wed, Oct 23, 2019 at 4:11 PM Wes McKinney  wrote:

> On Sun, Oct 20, 2019 at 12:22 PM Maarten Ballintijn 
> wrote:
> >
> > Dev's
> >
> > I would request to be as conservative as possible in choosing (keeping)
> a build system.
> >
> > For developers, packagers and even end-users for some languages the
> build system is just
> > another dependency. Even if cmake is not ideal, it has become quite
> ubiquitous which is a huge plus.
> >
> > Maybe it is possible to come up with a way of expressing the dependency
> relations in cmake in
> > a way that makes maintaining them easier. Otherwise it is maybe possible
> to generate them from
> > a (simple) description file?
>
> There do seem to be parts of our CMake build system that contain
> boilerplate (particularly some of the platform-specific export
> defines) that might be better auto-generated in some way, so this is
> something it would be worth looking more at.
>
> FWIW, some Google projects I have seen offer CMake as a build option
> but the CMake files are mostly auto-generated from another build
> configuration.
>
> >
> > Cheers,
> > Maarten.
> >
> >
> > > On Oct 19, 2019, at 11:22 PM, Micah Kornfield 
> wrote:
> > >
> > >>
> > >> Perhaps meson is also worth exploring?
> > >
> > >
> > > It could be, if someone else wants to take a look we can, compare what
> > > things look at in each. Recently, Bazel build rules seem like they
> would be
> > > useful for some work projects I've been dealing with, so I plan on
> focusing
> > > my exploration there.
> > >
> > > On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou 
> wrote:
> > >
> > >>
> > >> Perhaps meson is also worth exploring?
> > >>
> > >>
> > >> Le 15/10/2019 à 23:06, Micah Kornfield a écrit :
> > >>> Hi Wes,
> > >>> I agree on both accounts that it won't be a done in the short term,
> and
> > >> it
> > >>> makes sense to tackle in incrementally.  Like I said I don't have
> much
> > >>> bandwidth at the moment but might be able to re-arrange a few things
> on
> > >> my
> > >>> plate.  I think some people have asked on the mailing list how they
> might
> > >>> be able to help, this might be one area that doesn't require a lot of
> > >>> in-depth knowledge of C++ at least for a proof of concept.  I'll try
> to
> > >>> open up some JIRAs soon.
> > >>>
> > >>> Thanks,
> > >>> Micah
> > >>>
> > >>> On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney 
> > >> wrote:
> > >>>
> >  hi Micah,
> > 
> >  Definitely Bazel is worth exploring, but we must be realistic about
> >  the amount of energy (several hundred hours or more) that's been
> >  invested in the build system we have now. So a new build system will
> >  be a large endeavor, but hopefully can make things simpler.
> > 
> >  Aside from the requirements gathering process, if it is felt that
> >  Bazel is a possible path forward in the future, it may be good to
> try
> >  to break up the work into more tractable pieces. For example, a
> first
> >  step would be to set up Bazel configurations to build the project's
> >  thirdparty toolchain. Since we're reliant in ExternalProject in
> CMake
> >  to do a lot of heavy lifting there for us, I imagine this (taking
> care
> >  of what ThirdpartyToolchain.cmake does not) will take up a lot of
> the
> >  energy
> > 
> >  - Wes
> > 
> >  On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield <
> emkornfi...@gmail.com>
> >  wrote:
> > >
> > >>
> > >>
> > >> This might be taking the thread on more of a tangent, but maybe we
> >  should
> > > start collecting requirements for the C++ build system in general
> and
> > >> see
> > > if there might be better solution that can address some of these
> >  concerns?
> > > In particular, Bazel at least on the surface seems like it might
> be a
> > > better fit for some of the use cases discussed here.  I know this
> is a
> >  big
> > > project (and I currently don't have much bandwidth for it) but I
> think
> > >> if
> > > CMake is lacking in these areas it might be worth at least
> exploring
> > > instead of going down the path of building our own meta-build
> system on
> >  top
> > > of CMake.
> > >
> > > Requirements that I think we are targeting:
> > > 1.  Be able to provide an out of box build system that requires as
> > >> close
> >  to
> > > zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD
> > >> minimal"
> > > works on any C++ developers desktop without additional
> 

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-23-0

2019-10-23 Thread Krisztián Szűcs
It was happening from time to time, but now it is pretty consistent.
I'm working on to fix the deployments by running the crossbow
artifact uploading script.

On Thu, Oct 24, 2019 at 1:16 AM Wes McKinney  wrote:

> Any clues why the macOS wheel uploads keep flaking out?
>
> On Wed, Oct 23, 2019 at 7:56 AM Crossbow  wrote:
> >
> >
> > Arrow Build Report for Job nightly-2019-10-23-0
> >
> > All tasks:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0
> >
> > Failed Tasks:
> > - docker-clang-format:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-clang-format
> > - docker-r-sanitizer:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-r-sanitizer
> > - wheel-osx-cp36m:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp36m
> > - wheel-osx-cp37m:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp37m
> >
> > Succeeded Tasks:
> > - centos-6:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-6
> > - centos-7:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-7
> > - conda-linux-gcc-py27:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py27
> > - conda-linux-gcc-py36:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py36
> > - conda-linux-gcc-py37:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py37
> > - conda-osx-clang-py27:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py27
> > - conda-osx-clang-py36:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py36
> > - conda-osx-clang-py37:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py37
> > - conda-win-vs2015-py36:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py36
> > - conda-win-vs2015-py37:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py37
> > - debian-buster:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-buster
> > - debian-stretch:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-stretch
> > - docker-c_glib:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-c_glib
> > - docker-cpp-cmake32:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-cmake32
> > - docker-cpp-release:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-release
> > - docker-cpp-static-only:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-static-only
> > - docker-cpp:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp
> > - docker-dask-integration:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-dask-integration
> > - docker-docs:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-docs
> > - docker-go:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-go
> > - docker-hdfs-integration:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-hdfs-integration
> > - docker-iwyu:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-iwyu
> > - docker-java:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-java
> > - docker-js:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-js
> > - docker-lint:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-lint
> > - docker-pandas-master:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-pandas-master
> > - docker-python-2.7-nopandas:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7-nopandas
> > - docker-python-2.7:
> >   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7
> > - 

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-23-0

2019-10-23 Thread Wes McKinney
Any clues why the macOS wheel uploads keep flaking out?

On Wed, Oct 23, 2019 at 7:56 AM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2019-10-23-0
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0
>
> Failed Tasks:
> - docker-clang-format:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-clang-format
> - docker-r-sanitizer:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-r-sanitizer
> - wheel-osx-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp36m
> - wheel-osx-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp37m
>
> Succeeded Tasks:
> - centos-6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-6
> - centos-7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-7
> - conda-linux-gcc-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py27
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py37
> - conda-osx-clang-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py27
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py37
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py37
> - debian-buster:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-buster
> - debian-stretch:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-stretch
> - docker-c_glib:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-c_glib
> - docker-cpp-cmake32:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-cmake32
> - docker-cpp-release:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-release
> - docker-cpp-static-only:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-static-only
> - docker-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp
> - docker-dask-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-dask-integration
> - docker-docs:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-docs
> - docker-go:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-go
> - docker-hdfs-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-hdfs-integration
> - docker-iwyu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-iwyu
> - docker-java:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-java
> - docker-js:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-js
> - docker-lint:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-lint
> - docker-pandas-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-pandas-master
> - docker-python-2.7-nopandas:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7-nopandas
> - docker-python-2.7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7
> - docker-python-3.6-nopandas:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.6-nopandas
> - docker-python-3.6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.6
> - docker-python-3.7:
>   URL: 
> 

Re: [C++] The quest for zero-dependency builds

2019-10-23 Thread Wes McKinney
On Sun, Oct 20, 2019 at 12:22 PM Maarten Ballintijn  wrote:
>
> Dev's
>
> I would request to be as conservative as possible in choosing (keeping) a 
> build system.
>
> For developers, packagers and even end-users for some languages the build 
> system is just
> another dependency. Even if cmake is not ideal, it has become quite 
> ubiquitous which is a huge plus.
>
> Maybe it is possible to come up with a way of expressing the dependency 
> relations in cmake in
> a way that makes maintaining them easier. Otherwise it is maybe possible to 
> generate them from
> a (simple) description file?

There do seem to be parts of our CMake build system that contain
boilerplate (particularly some of the platform-specific export
defines) that might be better auto-generated in some way, so this is
something it would be worth looking more at.

FWIW, some Google projects I have seen offer CMake as a build option
but the CMake files are mostly auto-generated from another build
configuration.

>
> Cheers,
> Maarten.
>
>
> > On Oct 19, 2019, at 11:22 PM, Micah Kornfield  wrote:
> >
> >>
> >> Perhaps meson is also worth exploring?
> >
> >
> > It could be, if someone else wants to take a look we can, compare what
> > things look at in each. Recently, Bazel build rules seem like they would be
> > useful for some work projects I've been dealing with, so I plan on focusing
> > my exploration there.
> >
> > On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou  wrote:
> >
> >>
> >> Perhaps meson is also worth exploring?
> >>
> >>
> >> Le 15/10/2019 à 23:06, Micah Kornfield a écrit :
> >>> Hi Wes,
> >>> I agree on both accounts that it won't be a done in the short term, and
> >> it
> >>> makes sense to tackle in incrementally.  Like I said I don't have much
> >>> bandwidth at the moment but might be able to re-arrange a few things on
> >> my
> >>> plate.  I think some people have asked on the mailing list how they might
> >>> be able to help, this might be one area that doesn't require a lot of
> >>> in-depth knowledge of C++ at least for a proof of concept.  I'll try to
> >>> open up some JIRAs soon.
> >>>
> >>> Thanks,
> >>> Micah
> >>>
> >>> On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney 
> >> wrote:
> >>>
>  hi Micah,
> 
>  Definitely Bazel is worth exploring, but we must be realistic about
>  the amount of energy (several hundred hours or more) that's been
>  invested in the build system we have now. So a new build system will
>  be a large endeavor, but hopefully can make things simpler.
> 
>  Aside from the requirements gathering process, if it is felt that
>  Bazel is a possible path forward in the future, it may be good to try
>  to break up the work into more tractable pieces. For example, a first
>  step would be to set up Bazel configurations to build the project's
>  thirdparty toolchain. Since we're reliant in ExternalProject in CMake
>  to do a lot of heavy lifting there for us, I imagine this (taking care
>  of what ThirdpartyToolchain.cmake does not) will take up a lot of the
>  energy
> 
>  - Wes
> 
>  On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield 
>  wrote:
> >
> >>
> >>
> >> This might be taking the thread on more of a tangent, but maybe we
>  should
> > start collecting requirements for the C++ build system in general and
> >> see
> > if there might be better solution that can address some of these
>  concerns?
> > In particular, Bazel at least on the surface seems like it might be a
> > better fit for some of the use cases discussed here.  I know this is a
>  big
> > project (and I currently don't have much bandwidth for it) but I think
> >> if
> > CMake is lacking in these areas it might be worth at least exploring
> > instead of going down the path of building our own meta-build system on
>  top
> > of CMake.
> >
> > Requirements that I think we are targeting:
> > 1.  Be able to provide an out of box build system that requires as
> >> close
>  to
> > zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD
> >> minimal"
> > works on any C++ developers desktop without additional requirements)
> > 2.  The build system should limit configuration knobs in favor of
> >> implied
> > dependencies (e.g. "$BUILD python" automatically builds "compute",
> > "filesystem", "ipc")
> > 3.  The build system should be configurable to use (and have the user
> > specify) one of "System packages", "Conda packages" or source packages
>  for
> > providing dependencies (and fallback options between the three).
> > 4.  The build system should be able to treat some dependencies as
>  optional
> > (e.g. different compression libraries or allocators).
> > 5.  Easily allow developers to limit building unnecessary code for
> >> their
> > particular task at hand.
> > 6.  The build system must work across the following

[jira] [Created] (ARROW-6982) [R] Add bindings for compare and boolean kernels

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6982:
--

 Summary: [R] Add bindings for compare and boolean kernels
 Key: ARROW-6982
 URL: https://issues.apache.org/jira/browse/ARROW-6982
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Romain Francois
 Fix For: 1.0.0


See cpp/src/arrow/compute/kernels/compare.h and boolean.h. ARROW-6980 
introduces an Expression class that works on Arrow Arrays, but to evaluate the 
expressions, it has to pull the data into R first. This would enable us to do 
the work in C++ and only pull in the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6981) [R] Implement HDFS file-system interface in R

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6981:
--

 Summary: [R] Implement HDFS file-system interface in R
 Key: ARROW-6981
 URL: https://issues.apache.org/jira/browse/ARROW-6981
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6980) [R] dplyr backend for RecordBatch/Table

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6980:
--

 Summary: [R] dplyr backend for RecordBatch/Table
 Key: ARROW-6980
 URL: https://issues.apache.org/jira/browse/ARROW-6980
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6979) [R] Enable jemalloc in autobrew formula

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6979:
--

 Summary: [R] Enable jemalloc in autobrew formula
 Key: ARROW-6979
 URL: https://issues.apache.org/jira/browse/ARROW-6979
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


See 
https://github.com/apache/arrow/blob/59a6788c76330cf055bdbcbc7bdae7b0106c6656/dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb#L47



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6978) [R] Add bindings for sum and mean compute kernels

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6978:
--

 Summary: [R] Add bindings for sum and mean compute kernels
 Key: ARROW-6978
 URL: https://issues.apache.org/jira/browse/ARROW-6978
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Romain Francois
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6975) [C++] Put make_unique in its own header

2019-10-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6975:
-

 Summary: [C++] Put make_unique in its own header
 Key: ARROW-6975
 URL: https://issues.apache.org/jira/browse/ARROW-6975
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


{{arrow/util/stl.h}} carries other stuff that is almost never necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6974) [C++] Implement Cast kernel for time-likes with ArrayDataVisitor pattern

2019-10-23 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6974:


 Summary: [C++] Implement Cast kernel for time-likes with 
ArrayDataVisitor pattern
 Key: ARROW-6974
 URL: https://issues.apache.org/jira/browse/ARROW-6974
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Joris Van den Bossche


Currently, the casting for time-like data is done with the {{ShiftTime}} 
function. It _might_ be possible to simplify this with ArrayDataVisitor (to 
avoid looping / checking the bitmap).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2019-10-23-0

2019-10-23 Thread Crossbow


Arrow Build Report for Job nightly-2019-10-23-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0

Failed Tasks:
- docker-clang-format:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-clang-format
- docker-r-sanitizer:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-r-sanitizer
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp36m
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-travis-wheel-osx-cp37m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-centos-7
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-linux-gcc-py37
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-osx-clang-py37
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-conda-win-vs2015-py37
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-azure-debian-stretch
- docker-c_glib:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-c_glib
- docker-cpp-cmake32:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-cmake32
- docker-cpp-release:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-release
- docker-cpp-static-only:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp-static-only
- docker-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-cpp
- docker-dask-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-dask-integration
- docker-docs:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-docs
- docker-go:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-go
- docker-hdfs-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-hdfs-integration
- docker-iwyu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-iwyu
- docker-java:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-java
- docker-js:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-js
- docker-lint:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-lint
- docker-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-pandas-master
- docker-python-2.7-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7-nopandas
- docker-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-2.7
- docker-python-3.6-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.6-nopandas
- docker-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.6
- docker-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-python-3.7
- docker-r-conda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-23-0-circle-docker-r-conda
- docker-r:
  URL: 

[jira] [Created] (ARROW-6973) [C++][ThreadPool] Use perfect forwarding in Submit

2019-10-23 Thread Artem Alekseev (Jira)
Artem Alekseev created ARROW-6973:
-

 Summary: [C++][ThreadPool] Use perfect forwarding in Submit
 Key: ARROW-6973
 URL: https://issues.apache.org/jira/browse/ARROW-6973
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Artem Alekseev
Assignee: Artem Alekseev






--
This message was sent by Atlassian Jira
(v8.3.4#803005)