[jira] [Created] (ARROW-3994) [C++] Remove ARROW_GANDIVA_BUILD_TESTS option

2018-12-10 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3994:
---

 Summary: [C++] Remove ARROW_GANDIVA_BUILD_TESTS option
 Key: ARROW-3994
 URL: https://issues.apache.org/jira/browse/ARROW-3994
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Gandiva
Reporter: Wes McKinney
 Fix For: 0.12.0


This is not needed now that both libraries and tests are tied to the same 
"gandiva" build target and label. So {{ninja gandiva && ctest -L gandiva}} will 
build only the relevant targets

Follow up to ARROW-3988



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3993) [JS] CI Jobs Failing

2018-12-10 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-3993:


 Summary: [JS] CI Jobs Failing
 Key: ARROW-3993
 URL: https://issues.apache.org/jira/browse/ARROW-3993
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Affects Versions: JS-0.3.1
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: JS-0.4.0


JS Jobs failing with:
npm ERR! code ETARGET
npm ERR! notarget No matching version found for gulp@next
npm ERR! notarget In most cases you or one of your dependencies are requesting
npm ERR! notarget a package version that doesn't exist.
npm ERR! notarget 
npm ERR! notarget It was specified as a dependency of 'apache-arrow'
npm ERR! notarget 
npm ERR! A complete log of this run can be found in:
npm ERR! /home/travis/.npm/_logs/2018-12-10T22_33_26_272Z-debug.log
The command "$TRAVIS_BUILD_DIR/ci/travis_before_script_js.sh" failed and exited 
with 1 during .

Reported by [~wesmckinn] in 
https://github.com/apache/arrow/pull/3152#issuecomment-446020105



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3992) pyarrow compile from source issues on RedHat 7.4

2018-12-10 Thread David Lee (JIRA)
David Lee created ARROW-3992:


 Summary: pyarrow compile from source issues on RedHat 7.4
 Key: ARROW-3992
 URL: https://issues.apache.org/jira/browse/ARROW-3992
 Project: Apache Arrow
  Issue Type: Bug
Reporter: David Lee


Opening a ticket for: [https://github.com/apache/arrow/issues/2281] after 
running into the same problems with RedHat 7.4.

[https://arrow.apache.org/docs/python/development.html#development]

Additional steps taken:

Added double-conversion, glog and hypothesis:

 
{code:java}
conda create -y -q -n pyarrow-dev \
python=3.6 numpy six setuptools cython pandas pytest double-conversion \
cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib glog hypothesis\
gflags brotli jemalloc lz4-c zstd -c conda-forge
{code}
Added export LD_LIBRARY_PATH to conda lib64 before running py.test pyarrow:

 

 
{code:java}
export LD_LIBRARY_PATH=/home/my_login/anaconda3/envs/pyarrow-dev/lib64
py.test pyarrow
{code}
 

Added extra symlinks with a period at the end to fix string concatenation 
issues. Running setup.py for the first time didn't need this, but running 
setup.py a second time would error out with:
{code:java}
CMake Error: File /home/my_login/anaconda3/envs/pyarrow-dev/lib64/libarrow.so. 
does not exist.
{code}
There is an extra period at the end of the *.so files so I had to make symlinks 
with extra periods.

 
{code:java}
ln -s libparquet.so.12.0.0 libparquet.so.
ln -s libplasma.so.12.0.0 libplasma.so.
ln -s libarrow.so.12.0.0 libarrow.so.
ln -s libarrow_python.so.12.0.0 libarrow_python.so.
{code}
Creating a wheel file using --with-plasma gives the following error:

 
{code:java}
error: [Errno 2] No such file or directory: 'release/plasma_store_server'
{code}
Had to create the wheel file without plasma..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3991) [gandiva] floating point division shouldn't cause errors

2018-12-10 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-3991:
-

 Summary: [gandiva] floating point division shouldn't cause errors
 Key: ARROW-3991
 URL: https://issues.apache.org/jira/browse/ARROW-3991
 Project: Apache Arrow
  Issue Type: Bug
  Components: Gandiva
Reporter: Pindikura Ravindra
Assignee: Pindikura Ravindra


for division, gandiva explicitly checks if the divisor is zero and raises an 
error.

This is correct for integer division. For float point divisions, it should just 
return infinity.

https://www.gnu.org/software/libc/manual/html_node/Infinity-and-NaN.html#Infinity-and-NaN



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3990) [Python] developer documentation is missing double-conversion dep

2018-12-10 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-3990:
-

 Summary: [Python] developer documentation is missing 
double-conversion dep
 Key: ARROW-3990
 URL: https://issues.apache.org/jira/browse/ARROW-3990
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Thread-safety of buffer allocators in Java

2018-12-10 Thread Li Jin
Thanks Jacques!

On Sun, Dec 9, 2018 at 6:47 AM Jacques Nadeau  wrote:

> Yes, that is true. Allocators, buffers and pretty much everything inbetween
> is thread safe. Locking around the memory backing a buffer itself needs to
> be managed in the app. You can review the doc here which talks more about
> the structure of the allocators:
>
>
> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/README.md
>
>
> On Thu, Dec 6, 2018 at 10:08 PM Li Jin  wrote:
>
> > Hey folks,
> >
> > I am studying a bit of the buffer allocators code in the Java library.
> From
> > my understanding of the code, the buffer allocators are thread safe.
> i.e.,
> > you can use the same buffer allocator to multiple threads, as well as
> > creating child allocators from the same allocator and use them in
> different
> > threads.
> >
> > Is my understanding correct? I also get this question from other users
> so I
> > think it will be helpful to put that in the doc as well.
> >
> > Thanks,
> > Li
> >
>


[jira] [Created] (ARROW-3989) [RUST] CSV Reader Should Handle Case Sensitivity for Boolean Values

2018-12-10 Thread nevi_me (JIRA)
nevi_me created ARROW-3989:
--

 Summary: [RUST] CSV Reader Should Handle Case Sensitivity for 
Boolean Values
 Key: ARROW-3989
 URL: https://issues.apache.org/jira/browse/ARROW-3989
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.11.1
Reporter: nevi_me


Excel saves booleans in CSV in upper case, Pandas uses Proper case.

Our CSV reader doesn't recognise (True, False, TRUE, FALSE). I noticed this 
when making boolean schema inference case insensitive.

 

I would propose that we convert Boolean strings to lower-case before casting 
them to Rust's bool type. [~andygrove], what do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3988) [C++] Do not build unit tests by default in build system

2018-12-10 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3988:
---

 Summary: [C++] Do not build unit tests by default in build system
 Key: ARROW-3988
 URL: https://issues.apache.org/jira/browse/ARROW-3988
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.12.0


This is partly an RFC: given that many people are building the C++ library in 
order to develop or test other parts of Apache Arrow, it might make sense to 
opt-in to building the unit tests rather than opting out. There are additional 
system requirements to build the unit tests, such as a newer version of CMake

[~kou] suggested this originally here

https://github.com/apache/arrow/issues/3140#issuecomment-445694144

In principle I'm in favor of this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Analytics in Rust [was Re: Timeline for Arrow 0.12.0 release]

2018-12-10 Thread Wes McKinney
Changing the subject as we've veered off topic

On Mon, Dec 10, 2018 at 8:04 AM Andy Grove  wrote:
>
> Cool. I will continue to add primitive operations but I am now adding this
> in a separate source file to keep it separate from the core array code.
>
> I'm not sure how important it will be to support Rust data sources with
> Gandiva. I can see that each language should be able to construct the
> logical query plan to submit to Gandiva and let Gandiva handle execution.

Note: Gandiva isn't an execution engine. It generates compiled
function kernels given an expression tree. It depends on an execution
engine to invoke the kernels in a database runtime-type environment --
Dremio is doing so in production already IIUC.

It might be that Rust developers would choose someday to develop a
Rust-native query runtime, in which case the Gandiva JIT-compiling
could be used to generate custom kernels in a similar fashion to how
they're being used by Dremio in Java.

> I think the more interesting part is how do we support language-specific
> lambda functions as part of that logical query plan. Maybe it is possible
> to compile the lambda down to LLVM (I haven't started learning about LLVM
> in detail yet so this is wild speculation on my part).

Generally database systems define operator nodes for each type of
user-defined function, and the user code is invoked dynamically
similar to interpreted languages. Compiling to LLVM isn't possible in
generality.

> Another option is for Gandiva to support calling into shared libraries and 
> that maybe is
> simpler for languages that support building C-native shared libraries (Rust
> supports this with zero overhead).

These would be C UDFs. I'm familiar with Impala's UDF system, for example:

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_udf.html

There you can declare a new function that is looked up in a shared
library using dlopen / dlsym

- Wes

>
> Andy.
>
>
>
>
> On Sun, Dec 9, 2018 at 11:42 AM Wes McKinney  wrote:
>
> > hi Andy,
> >
> > I can see an argument for having some basic native function kernel
> > support in Rust. One of the things that Gandiva has begun is a
> > Protobuf-based serialized representation representation of projection
> > and filter expressions. In the long run I would like to see a more
> > complete relational algebra / logical query plan that can be submitted
> > for execution. There's complexities, though, such as bridging
> > iteration of data sources written in Rust, say, with a query engine
> > written in C++. You would need to provide some kind of a callback
> > mechanism for the query engine to request the next chunk of a dataset
> > to be materialized.
> >
> > It will be interested to see what contributors will be motivated
> > enough to build over the next few years. At the end of the day, Apache
> > projects are do-ocracies.
> >
> > - Wes
> > On Fri, Dec 7, 2018 at 6:22 AM Andy Grove  wrote:
> > >
> > > I've added one PR to the list (https://github.com/apache/arrow/pull/3119
> > )
> > > to update the project to use Rust 2018 Edition.
> > >
> > > I'm also considering removing one PR from the list and would like to get
> > > opinions here.
> > >
> > > I have a PR (https://github.com/apache/arrow/pull/3033) to add some
> > basic
> > > math and comparison operators to primitive arrays. These are baby steps
> > > towards implementing more query execution capabilities such as
> > projection,
> > > selection, etc but Chao made a good point that other Rust implementations
> > > don't have these kind of capabilities and I am now wondering if this is a
> > > distraction. We already have Gandiva and the new efforts in Ursa labs and
> > > it would probably make more sense to look at having Rust bindings for the
> > > query execution capabilities there rather than having a competing (and
> > less
> > > capable) implementation in Rust.
> > >
> > > Thoughts?
> > >
> > > Andy.
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 6, 2018 at 8:42 PM paddy horan 
> > wrote:
> > >
> > > > Other than Andy’s PR below I’m going to try and find time to work on
> > > > ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week.
> > > > There is nothing else in the 0.12 backlog for Rust.  It would be nice
> > to
> > > > get the parquet merge in though.
> > > >
> > > >
> > > >
> > > > Paddy
> > > >
> > > >
> > > >
> > > > 
> > > > From: Andy Grove 
> > > > Sent: Thursday, December 6, 2018 10:20:48 AM
> > > > To: dev@arrow.apache.org
> > > > Subject: Re: Timeline for Arrow 0.12.0 release
> > > >
> > > > I have PRs pending for all the Rust issues that I want to get into
> > 0.12.0
> > > > and would appreciate some reviews so I can go ahead and merge:
> > > >
> > > > https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and
> > > > ARROW-3881
> > > > - add math and comparison operations to primitive arrays)
> > > > https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release
> > > > pro

Re: valid NaNs versus invalid NaNs?

2018-12-10 Thread Donald Foss
Alternately Rhys, what Wes said. :)

Donald E. Foss | @DonaldFoss 
Never Stop Learning!
-- __o
_`\<,_
---(_)/ (_)

> On Dec 10, 2018, at 11:23 AM, Donald Foss  wrote:
> 
> +1 on NaNs being an interop nightmare already, especially for those who work 
> with multiple coding languages at the same time.
> 
> Issues regarding NaNs may be found at 
> https://issues.apache.org/jira/browse/ARROW-2806?jql=text%20~%20%22NaN%22 
> . 
> The last issue I see was from July 2018, with Python, and marked resolved 17 
> July 2018. The description may be helpful.
> 
> Regards,
> 
> Donald E. Foss | @DonaldFoss 
> Never Stop Learning!
> -- __o
> _`\<,_
> ---(_)/ (_)
> 
>> On Dec 10, 2018, at 10:47 AM, Rhys Ulerich > > wrote:
>> 
>> 'Morning,
>> 
>> 
>> 
>> Regarding https://arrow.apache.org/docs/memory_layout.html 
>> , how should is_valid be 
>> interpreted for primitive types that have their own notions of is_valid?
>> 
>> 
>> 
>> Concretely, how should folks interpret a "valid NaN" (is_valid 1 with float 
>> NaN) versus an "invalid NaN" (is valid 0 with float NaN)?  In RFC-ese, MUST 
>> individual NaNs be valid?  Or, MUST floats all be valid by omitting the 
>> validity bitset?
>> 
>> 
>> 
>> I ask because otherwise I can see a bunch of different systems interpreting 
>> this detail in many different ways.  That'd be an interop nightmare.  
>> Especially since understanding why NaNs sneak into large datasets is already 
>> quite a hassle.
>> 
>> 
>> 
>> Anyhow, it seems worth addressing this gap at the written specification 
>> level.
>> 
>> 
>> 
>> (Apologies if this has been discussed previously-- I've found no searchable 
>> mailing list archives under 
>> http://mail-archives.apache.org/mod_mbox/arrow-dev/ 
>>  or 
>> https://cwiki.apache.org/confluence/display/ARROW 
>> .)
>> 
>> 
>> 
>> Thanks,
>> 
>> Rhys
> 



Re: valid NaNs versus invalid NaNs?

2018-12-10 Thread Donald Foss
+1 on NaNs being an interop nightmare already, especially for those who work 
with multiple coding languages at the same time.

Issues regarding NaNs may be found at 
https://issues.apache.org/jira/browse/ARROW-2806?jql=text%20~%20%22NaN%22 
. 
The last issue I see was from July 2018, with Python, and marked resolved 17 
July 2018. The description may be helpful.

Regards,

Donald E. Foss | @DonaldFoss 
Never Stop Learning!
-- __o
_`\<,_
---(_)/ (_)

> On Dec 10, 2018, at 10:47 AM, Rhys Ulerich  wrote:
> 
> 'Morning,
> 
> 
> 
> Regarding https://arrow.apache.org/docs/memory_layout.html, how should 
> is_valid be interpreted for primitive types that have their own notions of 
> is_valid?
> 
> 
> 
> Concretely, how should folks interpret a "valid NaN" (is_valid 1 with float 
> NaN) versus an "invalid NaN" (is valid 0 with float NaN)?  In RFC-ese, MUST 
> individual NaNs be valid?  Or, MUST floats all be valid by omitting the 
> validity bitset?
> 
> 
> 
> I ask because otherwise I can see a bunch of different systems interpreting 
> this detail in many different ways.  That'd be an interop nightmare.  
> Especially since understanding why NaNs sneak into large datasets is already 
> quite a hassle.
> 
> 
> 
> Anyhow, it seems worth addressing this gap at the written specification level.
> 
> 
> 
> (Apologies if this has been discussed previously-- I've found no searchable 
> mailing list archives under 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/ or 
> https://cwiki.apache.org/confluence/display/ARROW.)
> 
> 
> 
> Thanks,
> 
> Rhys



RE: valid NaNs versus invalid NaNs?

2018-12-10 Thread Rhys Ulerich
>> Anyhow, it seems worth addressing this gap at the written specification 
>> level.
> What would you suggest? We could add a statement to be explicit that no 
> special / sentinel values (which includes NaN) are recognized as null.

I like your suggestion Wes.  Please consider making that amendment (or similar) 
in the next specification update.

Cheers,
Rhys


Re: valid NaNs versus invalid NaNs?

2018-12-10 Thread Wes McKinney
hi Rhys,

On Mon, Dec 10, 2018 at 9:53 AM Rhys Ulerich  wrote:
>
> 'Morning,
>
>
>
> Regarding https://arrow.apache.org/docs/memory_layout.html, how should 
> is_valid be interpreted for primitive types that have their own notions of 
> is_valid?
>
>
>
> Concretely, how should folks interpret a "valid NaN" (is_valid 1 with float 
> NaN) versus an "invalid NaN" (is valid 0 with float NaN)?  In RFC-ese, MUST 
> individual NaNs be valid?  Or, MUST floats all be valid by omitting the 
> validity bitset?
>

In floating point types, NaN is a valid value. I think you're talking
about systems that use sentinel values to represent nulls. The Arrow
columnar format does not have any notion of sentinel values. So if you
want other Arrow systems to recognize your values as being null, then
you must construct the validity bitmap accordingly.

>
>
> I ask because otherwise I can see a bunch of different systems interpreting 
> this detail in many different ways.  That'd be an interop nightmare.  
> Especially since understanding why NaNs sneak into large datasets is already 
> quite a hassle.
>

It is up to applications to determine what NaN means. It would not be
appropriate for Arrow to assume anything, particularly since most
database systems (AFAIK) distinguish NaN and NULL.

For example, in Python interop, we recognize NaN as null when
converting to Arrow, but _only_ if the data originated from pandas:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/type_traits.h#L102

In [1]: import pyarrow as pa

In [2]: import numpy as np

In [3]: arr = np.array([1, np.nan])

In [4]: arr1 = pa.array(arr)

In [5]: arr2 = pa.array(arr, from_pandas=True)

In [6]: arr1
Out[6]:

[
  1,
  nan
]

In [7]: arr2
Out[7]:

[
  1,
  null
]

In [8]: arr1.null_count
Out[8]: 0

In [9]: arr2.null_count
Out[9]: 1

In R, NaN and NA are distinct

https://github.com/apache/arrow/commit/3ab4a0f481211c5d115845519eb9398dc02e2e24#diff-4b43b0aee35624cd95b910189b3dc231

>
>
> Anyhow, it seems worth addressing this gap at the written specification level.
>

What would you suggest? We could add a statement to be explicit that
no special / sentinel values (which includes NaN) are recognized as
null.

- Wes

>
>
> (Apologies if this has been discussed previously-- I've found no searchable 
> mailing list archives under 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/ or 
> https://cwiki.apache.org/confluence/display/ARROW.)
>
>
>
> Thanks,
>
> Rhys


valid NaNs versus invalid NaNs?

2018-12-10 Thread Rhys Ulerich
'Morning,



Regarding https://arrow.apache.org/docs/memory_layout.html, how should is_valid 
be interpreted for primitive types that have their own notions of is_valid?



Concretely, how should folks interpret a "valid NaN" (is_valid 1 with float 
NaN) versus an "invalid NaN" (is valid 0 with float NaN)?  In RFC-ese, MUST 
individual NaNs be valid?  Or, MUST floats all be valid by omitting the 
validity bitset?



I ask because otherwise I can see a bunch of different systems interpreting 
this detail in many different ways.  That'd be an interop nightmare.  
Especially since understanding why NaNs sneak into large datasets is already 
quite a hassle.



Anyhow, it seems worth addressing this gap at the written specification level.



(Apologies if this has been discussed previously-- I've found no searchable 
mailing list archives under http://mail-archives.apache.org/mod_mbox/arrow-dev/ 
or https://cwiki.apache.org/confluence/display/ARROW.)



Thanks,

Rhys


[jira] [Created] (ARROW-3987) Compact ArrowBuf to reduce its heap footprint

2018-12-10 Thread shyam narayan singh (JIRA)
shyam narayan singh created ARROW-3987:
--

 Summary: Compact ArrowBuf to reduce its heap footprint
 Key: ARROW-3987
 URL: https://issues.apache.org/jira/browse/ARROW-3987
 Project: Apache Arrow
  Issue Type: Task
Reporter: shyam narayan singh


This is to record numbers for a test that has been done internally to compact 
ArrowBuf heap foot print by moving variables (udle, refcnt, isempty) to buffer 
ledger. It also removes the debug fields in ledger (buffers, historical log) to 
a different class.

Running a test that does 2 allocations of arrow bufs and with ravindra's 
ongoing fix that does the slicing of arrow bufs, below are the results 
(without/with above fix)
Without above fix :
 
       Total Bytes: 13,028,109
       Total Classes: 2,367
       Total Instances: 193,471
       Classloaders: 77
       GC Roots: 1,641
       Number of Objects Pending for Finalization: 0
  
With above fix :
 
       Total Bytes: 12,700,115
       Total Classes: 2,373
       Total Instances: 193,635
       Classloaders: 82
       GC Roots: 1,642
       Number of Objects Pending for Finalization: 0
 
 
ArrowBuf size decreased from 109 bytes to 88 bytes and BufferLedger size 
increased from 80 bytes to 89 bytes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Timeline for Arrow 0.12.0 release

2018-12-10 Thread Andy Grove
Cool. I will continue to add primitive operations but I am now adding this
in a separate source file to keep it separate from the core array code.

I'm not sure how important it will be to support Rust data sources with
Gandiva. I can see that each language should be able to construct the
logical query plan to submit to Gandiva and let Gandiva handle execution. I
think the more interesting part is how do we support language-specific
lambda functions as part of that logical query plan. Maybe it is possible
to compile the lambda down to LLVM (I haven't started learning about LLVM
in detail yet so this is wild speculation on my part). Another option is
for Gandiva to support calling into shared libraries and that maybe is
simpler for languages that support building C-native shared libraries (Rust
supports this with zero overhead).

Andy.




On Sun, Dec 9, 2018 at 11:42 AM Wes McKinney  wrote:

> hi Andy,
>
> I can see an argument for having some basic native function kernel
> support in Rust. One of the things that Gandiva has begun is a
> Protobuf-based serialized representation representation of projection
> and filter expressions. In the long run I would like to see a more
> complete relational algebra / logical query plan that can be submitted
> for execution. There's complexities, though, such as bridging
> iteration of data sources written in Rust, say, with a query engine
> written in C++. You would need to provide some kind of a callback
> mechanism for the query engine to request the next chunk of a dataset
> to be materialized.
>
> It will be interested to see what contributors will be motivated
> enough to build over the next few years. At the end of the day, Apache
> projects are do-ocracies.
>
> - Wes
> On Fri, Dec 7, 2018 at 6:22 AM Andy Grove  wrote:
> >
> > I've added one PR to the list (https://github.com/apache/arrow/pull/3119
> )
> > to update the project to use Rust 2018 Edition.
> >
> > I'm also considering removing one PR from the list and would like to get
> > opinions here.
> >
> > I have a PR (https://github.com/apache/arrow/pull/3033) to add some
> basic
> > math and comparison operators to primitive arrays. These are baby steps
> > towards implementing more query execution capabilities such as
> projection,
> > selection, etc but Chao made a good point that other Rust implementations
> > don't have these kind of capabilities and I am now wondering if this is a
> > distraction. We already have Gandiva and the new efforts in Ursa labs and
> > it would probably make more sense to look at having Rust bindings for the
> > query execution capabilities there rather than having a competing (and
> less
> > capable) implementation in Rust.
> >
> > Thoughts?
> >
> > Andy.
> >
> >
> >
> >
> >
> > On Thu, Dec 6, 2018 at 8:42 PM paddy horan 
> wrote:
> >
> > > Other than Andy’s PR below I’m going to try and find time to work on
> > > ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week.
> > > There is nothing else in the 0.12 backlog for Rust.  It would be nice
> to
> > > get the parquet merge in though.
> > >
> > >
> > >
> > > Paddy
> > >
> > >
> > >
> > > 
> > > From: Andy Grove 
> > > Sent: Thursday, December 6, 2018 10:20:48 AM
> > > To: dev@arrow.apache.org
> > > Subject: Re: Timeline for Arrow 0.12.0 release
> > >
> > > I have PRs pending for all the Rust issues that I want to get into
> 0.12.0
> > > and would appreciate some reviews so I can go ahead and merge:
> > >
> > > https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and
> > > ARROW-3881
> > > - add math and comparison operations to primitive arrays)
> > > https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release
> > > process)
> > > https://github.com/apache/arrow/pull/3111 (ARROW-3838 - CSV Writer)
> > >
> > > With these in place I plan on writing a tutorial for reading a CSV
> file,
> > > performing some operations on primitive arrays and writing the output
> to a
> > > new CSV file.
> > >
> > > I am deferring ARROW-3882 (casting for primitive arrays) to 0.13.0
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > On Tue, Dec 4, 2018 at 7:57 PM Andy Grove 
> wrote:
> > >
> > > > I'd love to tackle the three related issues for supporting simple
> > > > math/comparison operations on primitive arrays and casting primitive
> > > arrays
> > > > but since the change to use Rust specialization feature I'm a bit
> stuck
> > > and
> > > > need some assistance applying the math operations to the numeric
> types
> > > and
> > > > not the boolean primitives. I have added a comment to
> > > > https://github.com/apache/arrow/pull/3033 ... if I can get help
> solving
> > > > for this PR then I should be able to handle the others. I'll also do
> some
> > > > research and try and figure this out myself.
> > > >
> > > > Andy.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Dec 4, 2018 at 7:03 PM Wes McKinney 
> wrote:
> > > >
> > > >> Andy, Paddy, or other Rust deve

[jira] [Created] (ARROW-3986) [C++] Write prose documentation

2018-12-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3986:
-

 Summary: [C++] Write prose documentation
 Key: ARROW-3986
 URL: https://issues.apache.org/jira/browse/ARROW-3986
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Affects Versions: 0.11.1
Reporter: Antoine Pitrou


Now that the C++ docs are in the Sphinx doctree, we should write more 
comprehensive prose documentation for C++ programmers.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3984) [C++] Exit with error if user hits zstd ExternalProject path

2018-12-10 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3984:
---

 Summary: [C++] Exit with error if user hits zstd ExternalProject 
path
 Key: ARROW-3984
 URL: https://issues.apache.org/jira/browse/ARROW-3984
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.12.0


We should check the CMake version and exit with a more informative error if 
{{ARROW_WITH_ZSTD}} is on, but the CMake version is too old



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3985) [C++] Pass -C option when compiling with ccache to avoid some warnings

2018-12-10 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3985:
---

 Summary: [C++] Pass -C option when compiling with ccache to avoid 
some warnings
 Key: ARROW-3985
 URL: https://issues.apache.org/jira/browse/ARROW-3985
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.13.0


ccache by default will eat comments, some of which are used by gcc to disable 
certain warnings, like case statements that fall through.

see https://github.com/apache/arrow/issues/3004



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3983) [Gandiva][Crossbow] Use static boost while packaging

2018-12-10 Thread Praveen Kumar Desabandu (JIRA)
Praveen Kumar Desabandu created ARROW-3983:
--

 Summary: [Gandiva][Crossbow] Use static boost while packaging
 Key: ARROW-3983
 URL: https://issues.apache.org/jira/browse/ARROW-3983
 Project: Apache Arrow
  Issue Type: Task
Reporter: Praveen Kumar Desabandu
Assignee: Praveen Kumar Desabandu


Gandiva is getting some transitive dependencies to Boost from Arrow. Since we 
are using the static version of arrow in the packaged gandiva library, it was 
thought that we would be using the static versions of boost.

This holds true in linux where there is no dependency on shared arrow library, 
but in mac there seems to be a dependency on shared boost libraries even for 
the static arrow library.

So using "ARROW_BOOST_USE_SHARED" to force use the boost static libraries while 
packaging Gandiva in Crossbow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3982) [C++] Allow "binary" input in simple JSON format

2018-12-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3982:
-

 Summary: [C++] Allow "binary" input in simple JSON format
 Key: ARROW-3982
 URL: https://issues.apache.org/jira/browse/ARROW-3982
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.11.1
Reporter: Antoine Pitrou


See review comment at 
https://github.com/apache/arrow/pull/3084#discussion_r240049276



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3981) [C++] Rename json.h

2018-12-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3981:
-

 Summary: [C++] Rename json.h
 Key: ARROW-3981
 URL: https://issues.apache.org/jira/browse/ARROW-3981
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Affects Versions: 0.11.1
Reporter: Antoine Pitrou


This JSON format is mostly used for integration testing, it's not meant for 
outside consumption. Perhaps rename the header to {{json-integration.h}}?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3980) [C++] Fix CRTP use in json-simple.cc

2018-12-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3980:
-

 Summary: [C++] Fix CRTP use in json-simple.cc
 Key: ARROW-3980
 URL: https://issues.apache.org/jira/browse/ARROW-3980
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.11.1
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


See review comment at 
https://github.com/apache/arrow/pull/3084#discussion_r240049157



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3979) [Gandiva] fix all valgrind reported errors

2018-12-10 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-3979:
-

 Summary: [Gandiva] fix all valgrind reported errors
 Key: ARROW-3979
 URL: https://issues.apache.org/jira/browse/ARROW-3979
 Project: Apache Arrow
  Issue Type: Bug
  Components: Gandiva
Reporter: Pindikura Ravindra
Assignee: Pindikura Ravindra


Travis reports lots of valgrind errors when running gandiva tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)