[jira] [Created] (ARROW-7919) [R] install_arrow() should conda install if appropriate

2020-02-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7919:
--

 Summary: [R] install_arrow() should conda install if appropriate
 Key: ARROW-7919
 URL: https://issues.apache.org/jira/browse/ARROW-7919
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Like, check {{if (grepl("conda", R.Version()$platform))}} and if so then 
{{system("conda install ...")}}. Error if nightly == TRUE because we don't host 
conda nightlies yet.

This would help with issues like https://github.com/apache/arrow/issues/6448



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Integration testing

2020-02-21 Thread Neal Richardson
Thanks, I've marked that as not required for 1.0 (and added a legend to the
bottom of the table). Anything else that needs to be added or reclassified?

Neal

On Fri, Feb 21, 2020 at 12:19 AM Antoine Pitrou  wrote:

>
> Hi,
>
> I don't think float16 support is required for 1.0.
> On the C++ side at least, it will require integrating a dedicated
> library (probably in other languages as well).
>
> Regards
>
> Antoine.
>
>
> Le 21/02/2020 à 00:33, Neal Richardson a écrit :
> > Hi all,
> > To help us reach 1.0 with as complete and thoroughly tested
> implementations
> > of the Arrow format, I've surveyed our integration test suite and open
> > issues and collected information here:
> >
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347
> >
> > I'll happily grant edit privileges on the doc to anyone who requests.
> >
> > This replaces the content on
> >
> https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone
> ,
> > which was a bit stale. I carried over some notes from there as
> appropriate,
> > but most were no longer accurate. Hopefully this new document helps us
> > revise our understanding of what is implemented and makes clear what's
> left
> > to do.
> >
> > Most of the outstanding issues (at least for C++ and Java) are already
> > ticketed in Jira and marked as blockers for 1.0, but let me know if you
> see
> > something missing.
> >
> > Neal
> >
>


[jira] [Created] (ARROW-7918) [R] Improve instructions for conda users in installation vignette

2020-02-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7918:
--

 Summary: [R] Improve instructions for conda users in installation 
vignette 
 Key: ARROW-7918
 URL: https://issues.apache.org/jira/browse/ARROW-7918
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7917) [CMake] FindPythonInterp should check for python3

2020-02-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7917:
-

 Summary: [CMake] FindPythonInterp should check for python3
 Key: ARROW-7917
 URL: https://issues.apache.org/jira/browse/ARROW-7917
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.16.0
Reporter: Francois Saint-Jacques


On ubuntu 18.04 it'll pick python2 by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7916) [C++][Dataset] Project IPC record batches to materialized fields

2020-02-21 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7916:
---

 Summary: [C++][Dataset] Project IPC record batches to materialized 
fields
 Key: ARROW-7916
 URL: https://issues.apache.org/jira/browse/ARROW-7916
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, C++ - Dataset
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 1.0.0


If batches mmaped from disk are projected before post filtering, unreferenced 
columns will never be accessed (so the memory map shouldn't do I/O on them).

At the same time, it'd probably be wise to explicitly document that batches 
yielded directly from fragments rather than from a Scanner will not be filtered 
or projected (so they will not match the fragment's schema and will include 
columns referenced by the filter even if they were not projected).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7915) [CI] [Python] Run tests with Python development mode enabled

2020-02-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7915:
-

 Summary: [CI] [Python] Run tests with Python development mode 
enabled
 Key: ARROW-7915
 URL: https://issues.apache.org/jira/browse/ARROW-7915
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Python
Reporter: Antoine Pitrou


Python's "development mode" enable a few runtime checks and warnings, see the 
docs for "{{-X dev}}": https://docs.python.org/3/using/cmdline.html#id5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7914) Allow pandas datetime as index for feather

2020-02-21 Thread Samuel Jones (Jira)
Samuel Jones created ARROW-7914:
---

 Summary: Allow pandas datetime as index for feather
 Key: ARROW-7914
 URL: https://issues.apache.org/jira/browse/ARROW-7914
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Affects Versions: 0.15.1
 Environment: Windows, python 3.6.7,
Reporter: Samuel Jones
 Attachments: PEC fine course 1 grid 199001.csv, PEC fine course 1 grid 
199001.feather

Sorry in advance if I mess anything up. This is my first issue.

I have hourly data for 3 years using a  Pandas datetime as the index. Pandas 
allows me load/save .csv with the following code (only one month with 2 
variables shown):
`
h1. Write data to .csv

jan90.to_csv('PEC fine course 1 grid 199001.csv', index=True)
h1. Load data from .csv

jan90 = pd.read_csv('PEC fine course 1 grid 199001.csv', index_col=0, 
parse_dates=True)
`
Using .csv works, but is slow when I get to the full dataset of 26k+ rows and 
21.6k+ columns (and more columns may be coming if I have to add lags to my 
data). So, a more efficient load/save routine is very desirable. I was excited 
when I found feather, but the lost index is a no-go for my use.

Thanks for your consideration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7913) [C++][Python][R] C++ implementation of C data protocol

2020-02-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7913:
--

 Summary: [C++][Python][R] C++ implementation of C data protocol
 Key: ARROW-7913
 URL: https://issues.apache.org/jira/browse/ARROW-7913
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python, R
Affects Versions: 1.0.0
Reporter: Neal Richardson
Assignee: Antoine Pitrou


See ARROW-7912



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7912) [Format] C data interface

2020-02-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7912:
--

 Summary: [Format] C data interface
 Key: ARROW-7912
 URL: https://issues.apache.org/jira/browse/ARROW-7912
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Affects Versions: 1.0.0
Reporter: Neal Richardson
Assignee: Antoine Pitrou


Apache Arrow is designed to be a universal in-memory format for the 
representation
of tabular ("columnar") data. However, some projects may face a difficult
choice between either depending on a fast-evolving project such as the
Arrow C++ library, or having to reimplement adapters for data interchange,
which may require significant, redundant development effort.

The Arrow C data interface defines a very small, stable set of C definitions
that can be easily *copied* in any project's source code and used for columnar
data interchange in the Arrow format.  For non-C/C++ languages and runtimes,
it should be almost as easy to translate the C definitions into the
corresponding C FFI declarations.

Applications and libraries can therefore work with Arrow memory without
necessarily using Arrow libraries or reinventing the wheel. Developers can
choose between tight integration
with the Arrow *software project* (benefitting from the growing array of
facilities exposed by e.g. the C++ or Java implementations of Apache Arrow,
but with the cost of a dependency) or minimal integration with the Arrow
*format* only.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7911) [C++] Gandiva tests crash when compiled with clang

2020-02-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7911:
-

 Summary: [C++] Gandiva tests crash when compiled with clang
 Key: ARROW-7911
 URL: https://issues.apache.org/jira/browse/ARROW-7911
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva
Reporter: Antoine Pitrou


Recently, Gandiva tests have started to crash when compiled with clang 7.0:
{code}
clang version 7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
{code}

The same crashes occur with clang 9.0:
{code}
clang version 9.0.0-2~ubuntu18.04.2 (tags/RELEASE_900/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
{code}

Tests run fine with gcc 7.4.0, though:
{code}
gcc-7 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7910) [C++] Provide function to query page size portably

2020-02-21 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7910:
---

 Summary: [C++] Provide function to query page size portably
 Key: ARROW-7910
 URL: https://issues.apache.org/jira/browse/ARROW-7910
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.16.0
Reporter: Ben Kietzman
 Fix For: 1.0.0


Page size is a useful default buffer size for buffered readers. Where should 
this property be attached? MemoryManager/Device?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7909) Example URL in documentation does not resolve correctly

2020-02-21 Thread Taeke (Jira)
Taeke created ARROW-7909:


 Summary: Example URL in documentation does not resolve correctly
 Key: ARROW-7909
 URL: https://issues.apache.org/jira/browse/ARROW-7909
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
 Environment: Red Hat Enterprise Linux Server 7.6 (Maipo) 
Reporter: Taeke


On the installation page for Arrow:
https://arrow.apache.org/install/

it says for *CentOS 6 and 7*:

{code:sh}
sudo yum install -y https://apache.bintray.com/arrow/centos/$(cut -d: -f5 
/etc/system-release-cpe)/apache-arrow-release-latest.rpm
{code}

That results in an invalid URL. The download is at:
https://apache.bintray.com/arrow/centos/7/apache-arrow-release-latest.rpm

not:
https://apache.bintray.com/arrow/centos/7.6/apache-arrow-release-latest.rpm





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7908) Can't install R-library arrow without setting LIBARROW_DOWNLOAD=true

2020-02-21 Thread Taeke (Jira)
Taeke created ARROW-7908:


 Summary: Can't install R-library arrow without setting 
LIBARROW_DOWNLOAD=true
 Key: ARROW-7908
 URL: https://issues.apache.org/jira/browse/ARROW-7908
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 0.16.0
 Environment: Operating System: Red Hat Enterprise Linux Server 7.6 
(Maipo) 
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server
Kernel: Linux 3.10.0-957.35.2.el7.x86_64
Architecture: x86-64  
Reporter: Taeke
 Fix For: 0.16.0


Hi,

Installing arrow in R does not work intuitively on our server.
{code:r}
install.packages("arrow")`
{code}
results in an error:
{code:sh}
Installing package into '/home//R/x86_64-redhat-linux-gnu-library/3.6'
(as 'lib' is unspecified)

trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.16.0.2.tar.gz'
Content type 'application/x-gzip' length 216119 bytes (211 KB)
==
downloaded 211 KB

* installing *source* package 'arrow' ...
** package 'arrow' successfully unpacked and MD5 sums checked
** using staged installation
PKG_CFLAGS=-I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include
  -DARROW_R_WITH_ARROW
PKG_LIBS=-L/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/lib
 -larrow_dataset -lparquet -larrow -lthrift -lsnappy -lz -lzstd -llz4 
-lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static -lboost_filesystem 
-lboost_regex -lboost_system -ljemalloc_pic
** libs
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG 
-I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include  
-DARROW_R_WITH_ARROW -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include 
 -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 
-mtune=generic  -c array.cpp -o array.o
In file included from array.cpp:18:0:
./arrow_types.h:201:31: fatal error: arrow/dataset/api.h: No such file or 
directory
{code}
It appears that the C++ code is not built. With arrow 0.16.0.1 things do work 
out, because it tries to build the C++ code from source. With arrow 0.16.0.2 
such is no longer the case. I could finish the installation by setting the 
environment variable LIBARROW_DOWNLOAD to 'true':
{code:java}
export LIBARROW_DOWNLOAD=true
{code}
That, apparently, triggers the build from source. I would have expected that I 
would not need to set this variable explicitly.

I found that [between 
versions|[https://github.com/apache/arrow/commit/660d0e7cbaa1cfb51498299d445636fdd6a58420]],
 the default value of LIBARROW_DOWNLOAD has changed:
{code:sh}
- download_ok <- locally_installing && !env_is("LIBARROW_DOWNLOAD", "false")
+ download_ok <- env_is("LIBARROW_DOWNLOAD", "true")
{code}
In our environment, that variable was _not_ set, resulting (accidentally?) in 
download_ok being false and therefore the libraries not being installed and 
finally the resulting error above.

 

I can't quite figure out the logic behind all this, but it would be nice if 
we'd be able to install the package without first having to set 
LIBARROW_DOWNLOAD.

 

Thank you for looking into this!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7907) [Python] Conversion to pandas of empty table with timestamp type aborts

2020-02-21 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7907:


 Summary: [Python] Conversion to pandas of empty table with 
timestamp type aborts
 Key: ARROW-7907
 URL: https://issues.apache.org/jira/browse/ARROW-7907
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche
 Fix For: 0.16.1


Creating an empty table:

{code}
In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})  

   

In [2]: table['a']  

   
Out[2]: 

[
  []
]

In [3]: table.to_pandas()   

   
Out[3]: 
Empty DataFrame
Columns: [a]
Index: []
{code}

the above works. But the ChunkedArray still has 1 empty chunk. When filtering 
data, you can actually get no chunks, and this fails:


{code}
In [4]: table2 = table.slice(0, 0)  

   

In [5]: table2['a'] 

   
Out[5]: 

[

]

In [6]: table2.to_pandas()  

   
../src/arrow/table.cc:48:  Check failed: (chunks.size()) > (0) cannot construct 
ChunkedArray from empty vector and omitted type
...
Aborted (core dumped)
{code}

and this seems to happen specifically for timestamp type, and specifically with 
non-ns unit (eg with us as above, which is the default in arrow).

I noticed this when reading a parquet file of the taxi dataset, where the 
filter I used resulted in an empty batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7906) Full functionality for ORC format

2020-02-21 Thread HAOFENG DENG (Jira)
HAOFENG DENG created ARROW-7906:
---

 Summary: Full functionality for ORC format
 Key: ARROW-7906
 URL: https://issues.apache.org/jira/browse/ARROW-7906
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Reporter: HAOFENG DENG


Just like parquet format, ORC have a big group fans in Bigdata area, it have 
better performance that parquet in some use case.

But there has a problem in python is which is does not have the standard write 
function.

 

Seem the ORC team itself maintain the standard C++ 
code([ORC-C++|[https://github.com/apache/orc/tree/master/c%2B%2B]]), so I think 
will not take too much effort to integrate into Arrow(C++) and build the hook 
for python.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7905) [Go] Port the C++ Parquet implementation to Go

2020-02-21 Thread Nick Poorman (Jira)
Nick Poorman created ARROW-7905:
---

 Summary: [Go] Port the C++ Parquet implementation to Go
 Key: ARROW-7905
 URL: https://issues.apache.org/jira/browse/ARROW-7905
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Nick Poorman


I’m currently in the progress of porting the C++ version of Parquet in the 
Apache Arrow project to Golang. Many projects and companies have been and are 
building their data lakes and persistence layer using Parquet. Apache Spark 
uses it heavily for persistence (including Databricks DeltaLake).

To me this is the missing component for people to truly begin using the Go 
implementation of Arrow with any existing data architectures.

If you have any interest in this project, give this post a like / bookmark it 
as it will keep me motivated to finish the port. Also, if you have specific use 
cases feel free to drop them in here so I can keep them in mind as I continue 
with the port.

Things with the code base are rather in flux at the moment as I figure out how 
to solve various nuances between the features of C++ and Go. As soon as I have 
a solid chunk of the port working, I’ll create a PR in the Apache Arrow project 
on Github and let everyone know in here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7904) [C++] Decide about Field/Schema metadata printing parameters and how much to show by default

2020-02-21 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7904:
---

 Summary: [C++] Decide about Field/Schema metadata printing 
parameters and how much to show by default
 Key: ARROW-7904
 URL: https://issues.apache.org/jira/browse/ARROW-7904
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


See discussion in https://github.com/apache/arrow/pull/6472 for follow up 
discussions to ARROW-7063



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-02-21-0

2020-02-21 Thread Crossbow


Arrow Build Report for Job nightly-2020-02-21-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0

Failed Tasks:
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-centos-7
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-debian-stretch
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-gandiva-jar-trusty
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-macos-r-autobrew
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-2.7
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-turbodbc-master
- wheel-manylinux2014-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-wheel-manylinux2014-cp37m
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-wheel-osx-cp35m
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-wheel-osx-cp36m
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-wheel-osx-cp37m
- wheel-osx-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-wheel-osx-cp38

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-centos-6
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-centos-8
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-win-vs2015-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-debian-buster
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-gandiva-jar-osx
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-homebrew-cpp
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-cpp
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-spark-master:
  URL: 

[RESULT] [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-21 Thread Antoine Pitrou


Hello,

The vote succeeds with 3 +1 (binding) and 2 +1 (non-binding).

I'll soon open a JIRA for the specification and the C++ implementation,
so that we can merge those timely.

Regards

Antoine.



On Tue, 11 Feb 2020 20:06:33 +0100
Antoine Pitrou  wrote:
> Hello,
> 
> We have been discussing the creation of a minimalist C-based data
> interface for applications to exchange Arrow columnar data structures
> with each other. Some notable features of this interface include:
> 
> * A small amount of header-only C code can be copied independently into
> third-party libraries and downstream applications, no dependencies are
> needed even on Arrow C++ itself (notably, it is not required to use
> Flatbuffers, though there are trade-offs resulting from this).
> 
> * Low development investment (in other words: limited-scope use cases
> can be accomplished with little code), so as to enable C or C++
> libraries to export Arrow columnar data with minimal code.
> 
> * Data lifetime management hooks so as to properly handle non-trivial
> data sharing (for example passing Arrow columnar data to an async
> processing consumer).
> 
> This "C Data Interface" serves different use cases from the
> language-independent IPC protocol and trades away a number of features
> in the interest of minimalism / simplicity. It is not a replacement for
> the IPC protocol and will only be used to interchange in-process data at
> C or C++ call sites.
> 
> The PR providing the specification is here:
> https://github.com/apache/arrow/pull/5442
> 
> In particular, you can read the spec document here:
> https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst
> 
> A fairly comprehensive C++ implementation of this demonstrating its
> use is found here:
> https://github.com/apache/arrow/pull/5608
> 
> (note that other applications implementing the interface may choose to
> only support a few features and thus have far less code to write)
> 
> Please vote to adopt the SPECIFICATION (GitHub PR #5442).
> 
> This vote will be open for at least 72 hours
> 
> [ ] +1 Adopt C Data Interface specification
> [ ] +0
> [ ] -1 Do not adopt because...
> 
> Thank you
> 
> Regards
> 
> Antoine.
> 
> 
> (PS: yes, this is in large part a copy/paste of Wes's previous vote
> email :-))
> 





Re: Integration testing

2020-02-21 Thread Antoine Pitrou


Hi,

I don't think float16 support is required for 1.0.
On the C++ side at least, it will require integrating a dedicated
library (probably in other languages as well).

Regards

Antoine.


Le 21/02/2020 à 00:33, Neal Richardson a écrit :
> Hi all,
> To help us reach 1.0 with as complete and thoroughly tested implementations
> of the Arrow format, I've surveyed our integration test suite and open
> issues and collected information here:
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347
> 
> I'll happily grant edit privileges on the doc to anyone who requests.
> 
> This replaces the content on
> https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone,
> which was a bit stale. I carried over some notes from there as appropriate,
> but most were no longer accurate. Hopefully this new document helps us
> revise our understanding of what is implemented and makes clear what's left
> to do.
> 
> Most of the outstanding issues (at least for C++ and Java) are already
> ticketed in Jira and marked as blockers for 1.0, but let me know if you see
> something missing.
> 
> Neal
>