Re: [DISCUSS][Java] Builders for java classes

2019-10-24 Thread Fan Liya
Hi Micah,

IMO, we need an adapter from on-heap array to off-heap array.
This is useful because many third-party Java libraries populate data to an
on-heap array.

And I see this API in your design:

IntVectorBuilder addAll(int[] values);

So I am +1 for this.

Best,
Liya Fan

On Thu, Oct 24, 2019 at 12:31 PM Micah Kornfield 
wrote:

> As part a PR Ji Liu has made to help populate data for test cases [1], the
> question came up on whether we should provide a more  builder classes in
> java for ValueVectors.  The proposed implementation would wrap the existing
> Writer classes.
>
> Do people think this would be a valuable addition to the java library? I
> imagine it would be a builder per ValueVectorType.  The main benefit I see
> to this is making the library potentially slightly easier to use for
> new-comers, but might not be the most efficient.  A straw-man interface is
> listed below.
>
> Thoughts?
>
> Thanks,
> Micah
>
> class IntVectorBuilder {
>public IntVectorBuilder(BufferAllocator allocator);
>
>IntVectorBuilder add(int value);
> IntVectorBuilder addAll(int[] values);
> IntVectorBuilder addNull();
> // handles null values in array
> IntVectorBuilder addAll(Integer... values);
> IntVectorBuilder addAll(List values);
> IntVector build(String name);
> }
>


Re: [DISCUSS][Java] Builders for java classes

2019-10-24 Thread Ravindra Pindikura
On Thu, Oct 24, 2019 at 10:01 AM Micah Kornfield 
wrote:

> As part a PR Ji Liu has made to help populate data for test cases [1], the
> question came up on whether we should provide a more  builder classes in
> java for ValueVectors.  The proposed implementation would wrap the existing
> Writer classes.
>
> Do people think this would be a valuable addition to the java library? I
> imagine it would be a builder per ValueVectorType.  The main benefit I see
> to this is making the library potentially slightly easier to use for
> new-comers, but might not be the most efficient.  A straw-man interface is
> listed below.
>
> Thoughts?
>

I can see that it makes writing tests easier, and ease-of-use (esp.
handling the setSafe/setValueCount).

In dremio, we mostly populate value vectors either :

   - from arrow buffers (eg. read from parquet)
   - from other value vectors (eg. selection vector removal or transfers)
   - directly populate the constituent arrow buffers (eg. gandiva)

so, we haven't had a need for explicit builders.



>
> Thanks,
> Micah
>
> class IntVectorBuilder {
>public IntVectorBuilder(BufferAllocator allocator);
>
>IntVectorBuilder add(int value);
> IntVectorBuilder addAll(int[] values);
> IntVectorBuilder addNull();
> // handles null values in array
> IntVectorBuilder addAll(Integer... values);
> IntVectorBuilder addAll(List values);
> IntVector build(String name);
> }
>


-- 
Thanks and regards,
Ravindra.


[NIGHTLY] Arrow Build Report for Job nightly-2019-10-24-0

2019-10-24 Thread Crossbow


Arrow Build Report for Job nightly-2019-10-24-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0

Failed Tasks:
- docker-clang-format:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-clang-format
- docker-r-sanitizer:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-r-sanitizer
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-travis-wheel-osx-cp35m
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-travis-wheel-osx-cp36m
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-travis-wheel-osx-cp37m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-centos-7
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-conda-linux-gcc-py37
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-conda-osx-clang-py37
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-conda-win-vs2015-py37
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-azure-debian-stretch
- docker-c_glib:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-c_glib
- docker-cpp-cmake32:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-cpp-cmake32
- docker-cpp-release:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-cpp-release
- docker-cpp-static-only:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-cpp-static-only
- docker-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-cpp
- docker-dask-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-dask-integration
- docker-docs:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-docs
- docker-go:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-go
- docker-hdfs-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-hdfs-integration
- docker-iwyu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-iwyu
- docker-java:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-java
- docker-js:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-js
- docker-lint:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-lint
- docker-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-pandas-master
- docker-python-2.7-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-python-2.7-nopandas
- docker-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-python-2.7
- docker-python-3.6-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-python-3.6-nopandas
- docker-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-python-3.6
- docker-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-docker-python-3.7
- docker-r-conda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-24-0-circle-

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding

2019-10-24 Thread Antoine Pitrou


Le 24/10/2019 à 04:39, Micah Kornfield a écrit :
> 
> 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> dictionary batch and multiple "delta" dictionary batches.

This is a bit weird.  If the file format can carry delta dictionaries,
it means order is significant, so it may as well contain dictionary
redefinitions.

If the file format is meant to be truly readable in random order, then
it should also forbid delta dictionaries.

Regards

Antoine.


[jira] [Created] (ARROW-6984) Update LZ4 to 1.9.2 for CVE-2019-17543

2019-10-24 Thread Sangeeth Keeriyadath (Jira)
Sangeeth Keeriyadath created ARROW-6984:
---

 Summary: Update LZ4 to 1.9.2 for CVE-2019-17543
 Key: ARROW-6984
 URL: https://issues.apache.org/jira/browse/ARROW-6984
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Affects Versions: 0.15.0
Reporter: Sangeeth Keeriyadath
 Fix For: 0.15.1


There is a reported CVE that LZ4 before 1.9.2 has a heap-based buffer overflow 
in LZ4_write32 (More details in here - 
[https://nvd.nist.gov/vuln/detail/CVE-2019-17543] ). I see that Apache Arrow 
uses *v1.8.3* version ( 
[https://github.com/apache/arrow/blob/47e5ecafa72b70112a64a1174b29b9db45f803ef/cpp/thirdparty/versions.txt#L38]
 ).

We need to bump up the dependency version of LZ4 to *1.9.2* to get past the 
reported CVE. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6985) Steadily increasing time to load file using read_parquet

2019-10-24 Thread Casey (Jira)
Casey created ARROW-6985:


 Summary: Steadily increasing time to load file using read_parquet
 Key: ARROW-6985
 URL: https://issues.apache.org/jira/browse/ARROW-6985
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.15.0, 0.14.0, 0.13.0
Reporter: Casey
 Fix For: 0.15.0, 0.14.0, 0.13.0


I've noticed that reading from parquet using pandas read_parquet function is 
taking steadily longer with each invocation. I've seen the other ticket about 
memory usage but I'm seeing no memory impact just steadily increasing read time 
until I restart the python session.

Below is some code to reproduce my results. I notice it's particularly bad on 
wide matrices, especially using pyarrow==0.15.0
{code:python}
import pyarrow.parquet as pq
import pyarrow as pa
import pandas as pd
import os
import numpy as np
import time

file = "skinny_matrix.pq"

if not os.path.isfile(file):
mat = np.zeros((6000, 26000))
mat.ravel()[::100] = np.random.randn(60 * 26000)
df = pd.DataFrame(mat.T)
table = pa.Table.from_pandas(df)
pq.write_table(table, file)

n_timings = 50
timings = np.empty(n_timings)
for i in range(n_timings):
start = time.time()
new_df = pd.read_parquet(file)
end = time.time()
timings[i] = end - start
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Possible Arrow 0.15.1 release

2019-10-24 Thread Krisztián Szűcs
I had to fix the OSX wheel build issues, once [1] is green I can start the
release procedure, although we have three new issues in the release [2].

ARROW-6983: We have a fix for this.
ARROW-6984: I'll bump LZ4's version.
ARROW-6977: should be resolved by ARROW-6983?

[1]: https://github.com/apache/arrow/pull/5726
[2]: https://issues.apache.org/jira/projects/ARROW/versions/12346358

On Wed, Oct 23, 2019, 3:01 AM Wes McKinney  wrote:

> I just removed ARROW-6895 from 0.15.1. You can cut an RC anytime now
>
> On Tue, Oct 22, 2019 at 7:40 PM Krisztián Szűcs
>  wrote:
> >
> > Cherry picked 1ae946c8bfebd31ceca8d54b66313d4aaa2f029c
> >
> > We have one issue left.
> >
> > On Wed, Oct 23, 2019 at 1:45 AM Antoine Pitrou 
> wrote:
> >
> > >
> > > https://github.com/apache/arrow/pull/5701 was merged, you may cherry
> > > pick it.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 22/10/2019 à 16:16, Krisztián Szűcs a écrit :
> > > > The packaging builds are passing without the following patches too:
> > > > - ARROW-6631: [C++] Do not build any compression libraries by
> default in
> > > > C++ build
> > > > - ARROW-6831: [R] Update R macOS/Windows builds for change in cmake
> > > > compression defaults
> > > > - ARROW-6855: [FlightRPC][C++][Python] Flight middleware for
> C++/Python
> > > > - ARROW-6864: [C++] Add compression-related compile definitions
> before
> > > > adding any unit tests
> > > >
> > > > So I've excluded these patches from PR for the maintenance branch
> [1].
> > > > Once the remaining PRs are ready, I'll cherry pick those changes and
> > > > the patch release is good to go.
> > > >
> > > > [1]: https://github.com/apache/arrow/pull/5708
> > > >
> > > > On Mon, Oct 21, 2019 at 8:20 PM Krisztián Szűcs <
> > > szucs.kriszt...@gmail.com>
> > > > wrote:
> > > >
> > > >> All the relevant packaging builds are passing for the current
> > > >> release branch: https://github.com/apache/arrow/pull/5708
> > > >>
> > > >> Although I've created another branch excluding the following
> > > >> patches [1]:
> > > >> - ARROW-6631: [C++] Do not build any compression libraries by
> default in
> > > >> C++ build
> > > >> - ARROW-6831: [R] Update R macOS/Windows builds for change in cmake
> > > >> compression defaults
> > > >> - ARROW-6855: [FlightRPC][C++][Python] Flight middleware for
> C++/Python
> > > >> - ARROW-6864: [C++] Add compression-related compile definitions
> before
> > > >> adding any unit tests
> > > >>
> > > >> Also submitted the packaging tasks for the new one, waiting
> > > >> for the results [2].
> > > >>
> > > >> [1]:
> https://gist.github.com/kszucs/dbe43f3d5ac3d1ba8865cf08785dc019
> > > >> [2]:
> https://github.com/ursa-labs/crossbow/branches/all?query=build-694
> > > >>
> > > >> On Mon, Oct 21, 2019 at 7:06 PM Wes McKinney 
> > > wrote:
> > > >>
> > > >>> If that patch is not included then I would guess a number of manual
> > > >>> changes
> > > >>> will be required to fix builds as the result of the Cython linking
> > > changes
> > > >>> and the libarrow_python_flight shared library split.
> > > >>>
> > > >>> On Mon, Oct 21, 2019, 9:45 AM Neal Richardson <
> > > >>> neal.p.richard...@gmail.com>
> > > >>> wrote:
> > > >>>
> > >  I'd like to propose that "ARROW-6631: [C++] Do not build any
> > >  compression libraries by default in C++ build" and all other
> > >  cmake-related changes (ARROW-6831, ARROW-6610), be excluded from
> the
> > >  patch release. They sound like (build) API changes, not bugfixes,
> and
> > >  I fear that including them will cause problems.
> > > 
> > >  Neal
> > > 
> > >  On Mon, Oct 21, 2019 at 9:41 AM Krisztián Szűcs
> > >   wrote:
> > > >
> > > > Because of https://github.com/apache/arrow/pull/5627
> > > > I can apply the exports manually if that is desired.
> > > >
> > > >
> > > >
> > > > On Mon, Oct 21, 2019 at 6:26 PM Antoine Pitrou <
> anto...@python.org>
> > >  wrote:
> > > >
> > > >>
> > > >> How did you do the cherry-picking?  I'm very surprised that you
> > > >>> needed
> > > >> to pick up the Flight additions.
> > > >>
> > > >> Regards
> > > >>
> > > >> Antoine.
> > > >>
> > > >>
> > > >> Le 21/10/2019 à 18:18, Krisztián Szűcs a écrit :
> > > >>> On Mon, Oct 21, 2019 at 5:28 PM Wes McKinney <
> wesmck...@gmail.com
> > > 
> > > >> wrote:
> > > >>>
> > >  Thanks Krisztian
> > > 
> > >  I'm going to finish ARROW-6910 soon so it can be merged. I
> don't
> > >  have
> > >  a patch yet for ARROW-6895 but hope to complete it today, but
> it
> > > >>> is
> > >  not a blocker. If you are able to prepare the maintenance
> > > >>> branch, it
> > >  probably makes sense to check that the binary package builds
> are
> > >  looking okay in the meantime
> > > 
> > > >>> I had to include
> https://issues.apache.org/jira/browse/ARROW-6855
> > > >>> and https://is

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding

2019-10-24 Thread Micah Kornfield
Hi Antoine,
There is a defined order for dictionaries in metadata.  What isn't well
defined is relative ordering between record batches and Delta dictionaries.

 However, this point seems confusing. I can't think of a real-world use
case we're it would be valuable enough to include, so I will remove Delta
dictionaries.

So let's cancel this vote and I'll start a new one after the update.

Thanks,
Micah

On Thursday, October 24, 2019, Antoine Pitrou  wrote:

>
> Le 24/10/2019 à 04:39, Micah Kornfield a écrit :
> >
> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> > dictionary batch and multiple "delta" dictionary batches.
>
> This is a bit weird.  If the file format can carry delta dictionaries,
> it means order is significant, so it may as well contain dictionary
> redefinitions.
>
> If the file format is meant to be truly readable in random order, then
> it should also forbid delta dictionaries.
>
> Regards
>
> Antoine.
>


Re: Possible Arrow 0.15.1 release

2019-10-24 Thread Antoine Pitrou


No, ARROW-6977 is something else.  I'll whip up a PR.


Le 24/10/2019 à 17:25, Krisztián Szűcs a écrit :
> I had to fix the OSX wheel build issues, once [1] is green I can start the
> release procedure, although we have three new issues in the release [2].
> 
> ARROW-6983: We have a fix for this.
> ARROW-6984: I'll bump LZ4's version.
> ARROW-6977: should be resolved by ARROW-6983?
> 
> [1]: https://github.com/apache/arrow/pull/5726
> [2]: https://issues.apache.org/jira/projects/ARROW/versions/12346358
> 
> On Wed, Oct 23, 2019, 3:01 AM Wes McKinney  wrote:
> 
>> I just removed ARROW-6895 from 0.15.1. You can cut an RC anytime now
>>
>> On Tue, Oct 22, 2019 at 7:40 PM Krisztián Szűcs
>>  wrote:
>>>
>>> Cherry picked 1ae946c8bfebd31ceca8d54b66313d4aaa2f029c
>>>
>>> We have one issue left.
>>>
>>> On Wed, Oct 23, 2019 at 1:45 AM Antoine Pitrou 
>> wrote:
>>>

 https://github.com/apache/arrow/pull/5701 was merged, you may cherry
 pick it.

 Regards

 Antoine.


 Le 22/10/2019 à 16:16, Krisztián Szűcs a écrit :
> The packaging builds are passing without the following patches too:
> - ARROW-6631: [C++] Do not build any compression libraries by
>> default in
> C++ build
> - ARROW-6831: [R] Update R macOS/Windows builds for change in cmake
> compression defaults
> - ARROW-6855: [FlightRPC][C++][Python] Flight middleware for
>> C++/Python
> - ARROW-6864: [C++] Add compression-related compile definitions
>> before
> adding any unit tests
>
> So I've excluded these patches from PR for the maintenance branch
>> [1].
> Once the remaining PRs are ready, I'll cherry pick those changes and
> the patch release is good to go.
>
> [1]: https://github.com/apache/arrow/pull/5708
>
> On Mon, Oct 21, 2019 at 8:20 PM Krisztián Szűcs <
 szucs.kriszt...@gmail.com>
> wrote:
>
>> All the relevant packaging builds are passing for the current
>> release branch: https://github.com/apache/arrow/pull/5708
>>
>> Although I've created another branch excluding the following
>> patches [1]:
>> - ARROW-6631: [C++] Do not build any compression libraries by
>> default in
>> C++ build
>> - ARROW-6831: [R] Update R macOS/Windows builds for change in cmake
>> compression defaults
>> - ARROW-6855: [FlightRPC][C++][Python] Flight middleware for
>> C++/Python
>> - ARROW-6864: [C++] Add compression-related compile definitions
>> before
>> adding any unit tests
>>
>> Also submitted the packaging tasks for the new one, waiting
>> for the results [2].
>>
>> [1]:
>> https://gist.github.com/kszucs/dbe43f3d5ac3d1ba8865cf08785dc019
>> [2]:
>> https://github.com/ursa-labs/crossbow/branches/all?query=build-694
>>
>> On Mon, Oct 21, 2019 at 7:06 PM Wes McKinney 
 wrote:
>>
>>> If that patch is not included then I would guess a number of manual
>>> changes
>>> will be required to fix builds as the result of the Cython linking
 changes
>>> and the libarrow_python_flight shared library split.
>>>
>>> On Mon, Oct 21, 2019, 9:45 AM Neal Richardson <
>>> neal.p.richard...@gmail.com>
>>> wrote:
>>>
 I'd like to propose that "ARROW-6631: [C++] Do not build any
 compression libraries by default in C++ build" and all other
 cmake-related changes (ARROW-6831, ARROW-6610), be excluded from
>> the
 patch release. They sound like (build) API changes, not bugfixes,
>> and
 I fear that including them will cause problems.

 Neal

 On Mon, Oct 21, 2019 at 9:41 AM Krisztián Szűcs
  wrote:
>
> Because of https://github.com/apache/arrow/pull/5627
> I can apply the exports manually if that is desired.
>
>
>
> On Mon, Oct 21, 2019 at 6:26 PM Antoine Pitrou <
>> anto...@python.org>
 wrote:
>
>>
>> How did you do the cherry-picking?  I'm very surprised that you
>>> needed
>> to pick up the Flight additions.
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 21/10/2019 à 18:18, Krisztián Szűcs a écrit :
>>> On Mon, Oct 21, 2019 at 5:28 PM Wes McKinney <
>> wesmck...@gmail.com

>> wrote:
>>>
 Thanks Krisztian

 I'm going to finish ARROW-6910 soon so it can be merged. I
>> don't
 have
 a patch yet for ARROW-6895 but hope to complete it today, but
>> it
>>> is
 not a blocker. If you are able to prepare the maintenance
>>> branch, it
 probably makes sense to check that the binary package builds
>> are
 looking okay in the meantime

>>> I had to include
>> https://issues.apache.org/jira/browse/ARROW-6855
>>> and https://issues.apache.org/jira/browse/ARROW-6610 to keep
>> 

Re: [DISCUSS] Result vs Status

2019-10-24 Thread Omer F. Ozarslan
Hi,

I don't have much experience on customized clang-tidy plugins, but
this might be a good use case for such a plugin from what I read here
and there (frankly this was a good excuse for me to have a look at
clang tooling as well). I wanted to ensure it isn't obviously overkill
before this suggestion: Running a clang query which lists functions
returning `arrow::Status` and taking a pointer parameter named `out`
showed that there are 13947 such functions in `cpp/src/**/*.h`. [1]

I checked logs and it seemed legitimate to me, but please check it in
case I missed something. If that's the case, it might be tedious to do
this work manually.

[1]: https://gist.github.com/ozars/ecbb1b8acd4a57ba4721c1965f83f342
(Note that the log file is shown as truncated by github after ~30k
lines)

Best,
Omer



On Wed, Oct 23, 2019 at 9:23 PM Micah Kornfield  wrote:
>
> OK, it sounds like people want Result (at least in some circumstances).
> Any thoughts on migrating old APIs and what to do for new APIs going
> forward?
>
> A very rough approximation [1] yields the following counts by module:
>
>  853 arrow
>
>   17 gandiva
>
>   25 parquet
>
>   50 plasma
>
>
>
> [1] grep -r Status cpp/src/* |grep ".h:" | grep "\\*" |grep -v Accept |sed
> s/:.*// | cut -f3 -d/ |sort
>
>
> Thanks,
>
> Micah
>
>
>
> On Sat, Oct 19, 2019 at 7:50 PM Francois Saint-Jacques <
> fsaintjacq...@gmail.com> wrote:
>
> > As mentioned, Result is an improvement for function which returns a
> > single value, e.g. Make/Factory-like. My vote goes Result for such
> > case. For multiple return types, we have std::tuple like Antoine
> > proposed.
> >
> > François
> >
> > On Fri, Oct 18, 2019 at 9:19 PM Antoine Pitrou  wrote:
> > >
> > >
> > > Le 18/10/2019 à 20:58, Wes McKinney a écrit :
> > > > I'm definitely uncomfortable with the idea of deprecating Status.
> > > >
> > > > We have a few kinds of functions that can fail:
> > > >
> > > > 1. Functions with no "out" arguments
> > > > 2. Functions with one out argument
> > > > 3. Functions with multiple out arguments
> > > >
> > > > IMHO functions in category 2 are the best candidates for utilizing
> > > > Status. In some cases, Case 3 may be more usable Result-based, but it
> > > > can also create more work (or confusion) on the part of the developer,
> > > > either
> > > >
> > > > * The T in Result has to be a struct-like value that transports
> > > > multiple pieces of data
> > >
> > > The T can be a std::tuple though, so you need not necessarily define a
> > > dedicated struct type for a single API's return value.
> > >
> > >  > Can't say I'm thrilled about having Result or similar for Case
> > >  > 1-type functions (if I'm understanding what would be the solution
> > >  > there).
> > >
> > > Agreed.
> > >
> > > Regards
> > >
> > > Antoine.
> >


Re: [DISCUSS] Result vs Status

2019-10-24 Thread Omer F. Ozarslan
Forgot to mention most of those lines are longer than line width while
out is usually (always?) last parameter, so probably that's why grep
possibly underestimates their number.

On Thu, Oct 24, 2019 at 4:33 PM Omer F. Ozarslan  wrote:
>
> Hi,
>
> I don't have much experience on customized clang-tidy plugins, but
> this might be a good use case for such a plugin from what I read here
> and there (frankly this was a good excuse for me to have a look at
> clang tooling as well). I wanted to ensure it isn't obviously overkill
> before this suggestion: Running a clang query which lists functions
> returning `arrow::Status` and taking a pointer parameter named `out`
> showed that there are 13947 such functions in `cpp/src/**/*.h`. [1]
>
> I checked logs and it seemed legitimate to me, but please check it in
> case I missed something. If that's the case, it might be tedious to do
> this work manually.
>
> [1]: https://gist.github.com/ozars/ecbb1b8acd4a57ba4721c1965f83f342
> (Note that the log file is shown as truncated by github after ~30k
> lines)
>
> Best,
> Omer
>
>
>
> On Wed, Oct 23, 2019 at 9:23 PM Micah Kornfield  wrote:
> >
> > OK, it sounds like people want Result (at least in some circumstances).
> > Any thoughts on migrating old APIs and what to do for new APIs going
> > forward?
> >
> > A very rough approximation [1] yields the following counts by module:
> >
> >  853 arrow
> >
> >   17 gandiva
> >
> >   25 parquet
> >
> >   50 plasma
> >
> >
> >
> > [1] grep -r Status cpp/src/* |grep ".h:" | grep "\\*" |grep -v Accept |sed
> > s/:.*// | cut -f3 -d/ |sort
> >
> >
> > Thanks,
> >
> > Micah
> >
> >
> >
> > On Sat, Oct 19, 2019 at 7:50 PM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > As mentioned, Result is an improvement for function which returns a
> > > single value, e.g. Make/Factory-like. My vote goes Result for such
> > > case. For multiple return types, we have std::tuple like Antoine
> > > proposed.
> > >
> > > François
> > >
> > > On Fri, Oct 18, 2019 at 9:19 PM Antoine Pitrou  wrote:
> > > >
> > > >
> > > > Le 18/10/2019 à 20:58, Wes McKinney a écrit :
> > > > > I'm definitely uncomfortable with the idea of deprecating Status.
> > > > >
> > > > > We have a few kinds of functions that can fail:
> > > > >
> > > > > 1. Functions with no "out" arguments
> > > > > 2. Functions with one out argument
> > > > > 3. Functions with multiple out arguments
> > > > >
> > > > > IMHO functions in category 2 are the best candidates for utilizing
> > > > > Status. In some cases, Case 3 may be more usable Result-based, but it
> > > > > can also create more work (or confusion) on the part of the developer,
> > > > > either
> > > > >
> > > > > * The T in Result has to be a struct-like value that transports
> > > > > multiple pieces of data
> > > >
> > > > The T can be a std::tuple though, so you need not necessarily define a
> > > > dedicated struct type for a single API's return value.
> > > >
> > > >  > Can't say I'm thrilled about having Result or similar for Case
> > > >  > 1-type functions (if I'm understanding what would be the solution
> > > >  > there).
> > > >
> > > > Agreed.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > >


Re: [DISCUSS] Result vs Status

2019-10-24 Thread Micah Kornfield
Hi Omer,
I think this is really cool.  It is quite possible it was underestimated (I
agree about line lengths), but I think the clang query is double counting
somehow.

For instance:

"grep -r Status *" only returns ~9000 results in total for me.

Similarly using grep for "FinishTyped" returns 18 results for me.
Searching through the log that you linked seems to return 450 (for "Status
FinishTyped").

It is quite possible, I'm doing something naive with grep.

Thanks,
Micah

On Thu, Oct 24, 2019 at 2:41 PM Omer F. Ozarslan  wrote:

> Forgot to mention most of those lines are longer than line width while
> out is usually (always?) last parameter, so probably that's why grep
> possibly underestimates their number.
>
> On Thu, Oct 24, 2019 at 4:33 PM Omer F. Ozarslan 
> wrote:
> >
> > Hi,
> >
> > I don't have much experience on customized clang-tidy plugins, but
> > this might be a good use case for such a plugin from what I read here
> > and there (frankly this was a good excuse for me to have a look at
> > clang tooling as well). I wanted to ensure it isn't obviously overkill
> > before this suggestion: Running a clang query which lists functions
> > returning `arrow::Status` and taking a pointer parameter named `out`
> > showed that there are 13947 such functions in `cpp/src/**/*.h`. [1]
> >
> > I checked logs and it seemed legitimate to me, but please check it in
> > case I missed something. If that's the case, it might be tedious to do
> > this work manually.
> >
> > [1]: https://gist.github.com/ozars/ecbb1b8acd4a57ba4721c1965f83f342
> > (Note that the log file is shown as truncated by github after ~30k
> > lines)
> >
> > Best,
> > Omer
> >
> >
> >
> > On Wed, Oct 23, 2019 at 9:23 PM Micah Kornfield 
> wrote:
> > >
> > > OK, it sounds like people want Result (at least in some
> circumstances).
> > > Any thoughts on migrating old APIs and what to do for new APIs going
> > > forward?
> > >
> > > A very rough approximation [1] yields the following counts by module:
> > >
> > >  853 arrow
> > >
> > >   17 gandiva
> > >
> > >   25 parquet
> > >
> > >   50 plasma
> > >
> > >
> > >
> > > [1] grep -r Status cpp/src/* |grep ".h:" | grep "\\*" |grep -v Accept
> |sed
> > > s/:.*// | cut -f3 -d/ |sort
> > >
> > >
> > > Thanks,
> > >
> > > Micah
> > >
> > >
> > >
> > > On Sat, Oct 19, 2019 at 7:50 PM Francois Saint-Jacques <
> > > fsaintjacq...@gmail.com> wrote:
> > >
> > > > As mentioned, Result is an improvement for function which returns
> a
> > > > single value, e.g. Make/Factory-like. My vote goes Result for such
> > > > case. For multiple return types, we have std::tuple like Antoine
> > > > proposed.
> > > >
> > > > François
> > > >
> > > > On Fri, Oct 18, 2019 at 9:19 PM Antoine Pitrou 
> wrote:
> > > > >
> > > > >
> > > > > Le 18/10/2019 à 20:58, Wes McKinney a écrit :
> > > > > > I'm definitely uncomfortable with the idea of deprecating Status.
> > > > > >
> > > > > > We have a few kinds of functions that can fail:
> > > > > >
> > > > > > 1. Functions with no "out" arguments
> > > > > > 2. Functions with one out argument
> > > > > > 3. Functions with multiple out arguments
> > > > > >
> > > > > > IMHO functions in category 2 are the best candidates for
> utilizing
> > > > > > Status. In some cases, Case 3 may be more usable Result-based,
> but it
> > > > > > can also create more work (or confusion) on the part of the
> developer,
> > > > > > either
> > > > > >
> > > > > > * The T in Result has to be a struct-like value that
> transports
> > > > > > multiple pieces of data
> > > > >
> > > > > The T can be a std::tuple though, so you need not necessarily
> define a
> > > > > dedicated struct type for a single API's return value.
> > > > >
> > > > >  > Can't say I'm thrilled about having Result or similar for
> Case
> > > > >  > 1-type functions (if I'm understanding what would be the
> solution
> > > > >  > there).
> > > > >
> > > > > Agreed.
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > >
>


[jira] [Created] (ARROW-6986) [R] Add basic Expression class

2019-10-24 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6986:
--

 Summary: [R] Add basic Expression class
 Key: ARROW-6986
 URL: https://issues.apache.org/jira/browse/ARROW-6986
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


I started this as part of ARROW-6980 but it proved not necessary. This will be 
a foundation for ARROW-6982, in addition to being useful on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Result vs Status

2019-10-24 Thread Omer F. Ozarslan
Hi Micah,

You're right. Quite possible that clang-query counted same function
separately for each include in each file. (I was iterating each file
separately, but providing all of them at once didn't change the result
either.)

It's cool and wrong, so not very useful apparently. :-)

Best,
Omer

On Thu, Oct 24, 2019 at 4:51 PM Micah Kornfield  wrote:
>
> Hi Omer,
> I think this is really cool.  It is quite possible it was underestimated (I 
> agree about line lengths), but I think the clang query is double counting 
> somehow.
>
> For instance:
>
> "grep -r Status *" only returns ~9000 results in total for me.
>
> Similarly using grep for "FinishTyped" returns 18 results for me.  Searching 
> through the log that you linked seems to return 450 (for "Status 
> FinishTyped").
>
> It is quite possible, I'm doing something naive with grep.
>
> Thanks,
> Micah
>
> On Thu, Oct 24, 2019 at 2:41 PM Omer F. Ozarslan  wrote:
>>
>> Forgot to mention most of those lines are longer than line width while
>> out is usually (always?) last parameter, so probably that's why grep
>> possibly underestimates their number.
>>
>> On Thu, Oct 24, 2019 at 4:33 PM Omer F. Ozarslan  wrote:
>> >
>> > Hi,
>> >
>> > I don't have much experience on customized clang-tidy plugins, but
>> > this might be a good use case for such a plugin from what I read here
>> > and there (frankly this was a good excuse for me to have a look at
>> > clang tooling as well). I wanted to ensure it isn't obviously overkill
>> > before this suggestion: Running a clang query which lists functions
>> > returning `arrow::Status` and taking a pointer parameter named `out`
>> > showed that there are 13947 such functions in `cpp/src/**/*.h`. [1]
>> >
>> > I checked logs and it seemed legitimate to me, but please check it in
>> > case I missed something. If that's the case, it might be tedious to do
>> > this work manually.
>> >
>> > [1]: https://gist.github.com/ozars/ecbb1b8acd4a57ba4721c1965f83f342
>> > (Note that the log file is shown as truncated by github after ~30k
>> > lines)
>> >
>> > Best,
>> > Omer
>> >
>> >
>> >
>> > On Wed, Oct 23, 2019 at 9:23 PM Micah Kornfield  
>> > wrote:
>> > >
>> > > OK, it sounds like people want Result (at least in some 
>> > > circumstances).
>> > > Any thoughts on migrating old APIs and what to do for new APIs going
>> > > forward?
>> > >
>> > > A very rough approximation [1] yields the following counts by module:
>> > >
>> > >  853 arrow
>> > >
>> > >   17 gandiva
>> > >
>> > >   25 parquet
>> > >
>> > >   50 plasma
>> > >
>> > >
>> > >
>> > > [1] grep -r Status cpp/src/* |grep ".h:" | grep "\\*" |grep -v Accept 
>> > > |sed
>> > > s/:.*// | cut -f3 -d/ |sort
>> > >
>> > >
>> > > Thanks,
>> > >
>> > > Micah
>> > >
>> > >
>> > >
>> > > On Sat, Oct 19, 2019 at 7:50 PM Francois Saint-Jacques <
>> > > fsaintjacq...@gmail.com> wrote:
>> > >
>> > > > As mentioned, Result is an improvement for function which returns a
>> > > > single value, e.g. Make/Factory-like. My vote goes Result for such
>> > > > case. For multiple return types, we have std::tuple like Antoine
>> > > > proposed.
>> > > >
>> > > > François
>> > > >
>> > > > On Fri, Oct 18, 2019 at 9:19 PM Antoine Pitrou  
>> > > > wrote:
>> > > > >
>> > > > >
>> > > > > Le 18/10/2019 à 20:58, Wes McKinney a écrit :
>> > > > > > I'm definitely uncomfortable with the idea of deprecating Status.
>> > > > > >
>> > > > > > We have a few kinds of functions that can fail:
>> > > > > >
>> > > > > > 1. Functions with no "out" arguments
>> > > > > > 2. Functions with one out argument
>> > > > > > 3. Functions with multiple out arguments
>> > > > > >
>> > > > > > IMHO functions in category 2 are the best candidates for utilizing
>> > > > > > Status. In some cases, Case 3 may be more usable Result-based, but 
>> > > > > > it
>> > > > > > can also create more work (or confusion) on the part of the 
>> > > > > > developer,
>> > > > > > either
>> > > > > >
>> > > > > > * The T in Result has to be a struct-like value that transports
>> > > > > > multiple pieces of data
>> > > > >
>> > > > > The T can be a std::tuple though, so you need not necessarily define 
>> > > > > a
>> > > > > dedicated struct type for a single API's return value.
>> > > > >
>> > > > >  > Can't say I'm thrilled about having Result or similar for 
>> > > > > Case
>> > > > >  > 1-type functions (if I'm understanding what would be the solution
>> > > > >  > there).
>> > > > >
>> > > > > Agreed.
>> > > > >
>> > > > > Regards
>> > > > >
>> > > > > Antoine.
>> > > >


[jira] [Created] (ARROW-6987) [CI] Travis OSX failing to install sdk headers

2019-10-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6987:
-

 Summary: [CI] Travis OSX failing to install sdk headers
 Key: ARROW-6987
 URL: https://issues.apache.org/jira/browse/ARROW-6987
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques


{code:java}
sudo installer -pkg 
/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
 -target /343installer: Package name is 
macOS_SDK_headers_for_macOS_10.14344installer: Certificate used to sign package 
is not trusted. Use -allowUntrusted to override.345The command 
"$TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh --only-library --homebrew" 
failed and exited with 1 during .
{code}
See [https://travis-ci.org/apache/arrow/jobs/602434884#L342-L345]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6988) [CI][R] Buildbot's R Conda is failing

2019-10-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6988:
-

 Summary: [CI][R] Buildbot's R Conda is failing
 Key: ARROW-6988
 URL: https://issues.apache.org/jira/browse/ARROW-6988
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


{code:java}
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  25: tryCatch(withCallingHandlers({eval(code, test_env)if (!handled && 
!is.null(test)) {skip_empty()}}, expectation = handle_expectation, 
skip = handle_skip, warning = handle_warning, message = handle_message, 
error = handle_error), error = handle_fatal, skip = function(e) {})
  26: test_code(NULL, exprs, env)
  27: source_file(path, new.env(parent = env), chdir = TRUE, wrap = wrap)
  28: force(code)
  29: with_reporter(reporter = reporter, start_end_reporter = 
start_end_reporter, {reporter$start_file(basename(path))
lister$start_file(basename(path))source_file(path, new.env(parent = 
env), chdir = TRUE, wrap = wrap)reporter$.end_context() 
   reporter$end_file()})
  30: FUN(X[[i]], ...)
  31: lapply(paths, test_file, env = env, reporter = current_reporter, 
start_end_reporter = FALSE, load_helpers = FALSE, wrap = wrap)
  32: force(code)
  33: with_reporter(reporter = current_reporter, results <- lapply(paths, 
test_file, env = env, reporter = current_reporter, start_end_reporter = FALSE,  
   load_helpers = FALSE, wrap = wrap))
  34: test_files(paths, reporter = reporter, env = env, stop_on_failure = 
stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap)
  35: test_dir(path = test_path, reporter = reporter, env = env, filter = 
filter, ..., stop_on_failure = stop_on_failure, stop_on_warning = 
stop_on_warning, wrap = wrap)
  36: test_package_dir(package = package, test_path = test_path, filter = 
filter, reporter = reporter, ..., stop_on_failure = stop_on_failure, 
stop_on_warning = stop_on_warning, wrap = wrap)
  37: test_check("arrow")
  An irrecoverable exception occurred. R is aborting now ...
  Segmentation fault (core dumped)
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking re-building of vignette outputs ... OK
* DONE
Status: 1 ERROR, 1 WARNING, 2 NOTEs
See
  ‘/buildbot/AMD64_Conda_R/r/arrow.Rcheck/00check.log’
for details.
 {code}
[|https://ci.ursalabs.org/#/builders/95] 
[https://ci.ursalabs.org/#/builders/95/builds/2386] 
[https://ci.ursalabs.org/#/builders/95]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6989) [Python][C++] Assert is triggered when decimal type inference occurs on a value with out of range precision

2019-10-24 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6989:
--

 Summary: [Python][C++] Assert is triggered when decimal type 
inference occurs on a value with out of range precision
 Key: ARROW-6989
 URL: https://issues.apache.org/jira/browse/ARROW-6989
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield


Example:
pa.array([decimal.Decimal(123.234)] )
 
The problem is that inference.cc calls the direct constructor for decimal types 
instead using Make.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6990) [C++] Support casting between decimal types with compatible precision/scales

2019-10-24 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6990:
--

 Summary: [C++] Support casting between decimal types with 
compatible precision/scales
 Key: ARROW-6990
 URL: https://issues.apache.org/jira/browse/ARROW-6990
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield


This seems like a reasonable thing to support and showed up as a question on 
the user mailing list (through some sort of python code).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6991) [Packaging][deb] Add support for Ubuntu 19.10

2019-10-24 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-6991:
---

 Summary: [Packaging][deb] Add support for Ubuntu 19.10
 Key: ARROW-6991
 URL: https://issues.apache.org/jira/browse/ARROW-6991
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)