[jira] [Created] (ARROW-7919) [R] install_arrow() should conda install if appropriate
Neal Richardson created ARROW-7919: -- Summary: [R] install_arrow() should conda install if appropriate Key: ARROW-7919 URL: https://issues.apache.org/jira/browse/ARROW-7919 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 Like, check {{if (grepl("conda", R.Version()$platform))}} and if so then {{system("conda install ...")}}. Error if nightly == TRUE because we don't host conda nightlies yet. This would help with issues like https://github.com/apache/arrow/issues/6448 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Integration testing
Thanks, I've marked that as not required for 1.0 (and added a legend to the bottom of the table). Anything else that needs to be added or reclassified? Neal On Fri, Feb 21, 2020 at 12:19 AM Antoine Pitrou wrote: > > Hi, > > I don't think float16 support is required for 1.0. > On the C++ side at least, it will require integrating a dedicated > library (probably in other languages as well). > > Regards > > Antoine. > > > Le 21/02/2020 à 00:33, Neal Richardson a écrit : > > Hi all, > > To help us reach 1.0 with as complete and thoroughly tested > implementations > > of the Arrow format, I've surveyed our integration test suite and open > > issues and collected information here: > > > https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347 > > > > I'll happily grant edit privileges on the doc to anyone who requests. > > > > This replaces the content on > > > https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone > , > > which was a bit stale. I carried over some notes from there as > appropriate, > > but most were no longer accurate. Hopefully this new document helps us > > revise our understanding of what is implemented and makes clear what's > left > > to do. > > > > Most of the outstanding issues (at least for C++ and Java) are already > > ticketed in Jira and marked as blockers for 1.0, but let me know if you > see > > something missing. > > > > Neal > > >
[jira] [Created] (ARROW-7918) [R] Improve instructions for conda users in installation vignette
Neal Richardson created ARROW-7918: -- Summary: [R] Improve instructions for conda users in installation vignette Key: ARROW-7918 URL: https://issues.apache.org/jira/browse/ARROW-7918 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7917) [CMake] FindPythonInterp should check for python3
Francois Saint-Jacques created ARROW-7917: - Summary: [CMake] FindPythonInterp should check for python3 Key: ARROW-7917 URL: https://issues.apache.org/jira/browse/ARROW-7917 Project: Apache Arrow Issue Type: Improvement Affects Versions: 0.16.0 Reporter: Francois Saint-Jacques On ubuntu 18.04 it'll pick python2 by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7916) [C++][Dataset] Project IPC record batches to materialized fields
Ben Kietzman created ARROW-7916: --- Summary: [C++][Dataset] Project IPC record batches to materialized fields Key: ARROW-7916 URL: https://issues.apache.org/jira/browse/ARROW-7916 Project: Apache Arrow Issue Type: Improvement Components: C++, C++ - Dataset Affects Versions: 0.16.0 Reporter: Ben Kietzman Assignee: Ben Kietzman Fix For: 1.0.0 If batches mmaped from disk are projected before post filtering, unreferenced columns will never be accessed (so the memory map shouldn't do I/O on them). At the same time, it'd probably be wise to explicitly document that batches yielded directly from fragments rather than from a Scanner will not be filtered or projected (so they will not match the fragment's schema and will include columns referenced by the filter even if they were not projected). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7915) [CI] [Python] Run tests with Python development mode enabled
Antoine Pitrou created ARROW-7915: - Summary: [CI] [Python] Run tests with Python development mode enabled Key: ARROW-7915 URL: https://issues.apache.org/jira/browse/ARROW-7915 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration, Python Reporter: Antoine Pitrou Python's "development mode" enable a few runtime checks and warnings, see the docs for "{{-X dev}}": https://docs.python.org/3/using/cmdline.html#id5 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7914) Allow pandas datetime as index for feather
Samuel Jones created ARROW-7914: --- Summary: Allow pandas datetime as index for feather Key: ARROW-7914 URL: https://issues.apache.org/jira/browse/ARROW-7914 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 0.15.1 Environment: Windows, python 3.6.7, Reporter: Samuel Jones Attachments: PEC fine course 1 grid 199001.csv, PEC fine course 1 grid 199001.feather Sorry in advance if I mess anything up. This is my first issue. I have hourly data for 3 years using a Pandas datetime as the index. Pandas allows me load/save .csv with the following code (only one month with 2 variables shown): ` h1. Write data to .csv jan90.to_csv('PEC fine course 1 grid 199001.csv', index=True) h1. Load data from .csv jan90 = pd.read_csv('PEC fine course 1 grid 199001.csv', index_col=0, parse_dates=True) ` Using .csv works, but is slow when I get to the full dataset of 26k+ rows and 21.6k+ columns (and more columns may be coming if I have to add lags to my data). So, a more efficient load/save routine is very desirable. I was excited when I found feather, but the lost index is a no-go for my use. Thanks for your consideration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7913) [C++][Python][R] C++ implementation of C data protocol
Neal Richardson created ARROW-7913: -- Summary: [C++][Python][R] C++ implementation of C data protocol Key: ARROW-7913 URL: https://issues.apache.org/jira/browse/ARROW-7913 Project: Apache Arrow Issue Type: Improvement Components: C++, Python, R Affects Versions: 1.0.0 Reporter: Neal Richardson Assignee: Antoine Pitrou See ARROW-7912 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7912) [Format] C data interface
Neal Richardson created ARROW-7912: -- Summary: [Format] C data interface Key: ARROW-7912 URL: https://issues.apache.org/jira/browse/ARROW-7912 Project: Apache Arrow Issue Type: Improvement Components: Format Affects Versions: 1.0.0 Reporter: Neal Richardson Assignee: Antoine Pitrou Apache Arrow is designed to be a universal in-memory format for the representation of tabular ("columnar") data. However, some projects may face a difficult choice between either depending on a fast-evolving project such as the Arrow C++ library, or having to reimplement adapters for data interchange, which may require significant, redundant development effort. The Arrow C data interface defines a very small, stable set of C definitions that can be easily *copied* in any project's source code and used for columnar data interchange in the Arrow format. For non-C/C++ languages and runtimes, it should be almost as easy to translate the C definitions into the corresponding C FFI declarations. Applications and libraries can therefore work with Arrow memory without necessarily using Arrow libraries or reinventing the wheel. Developers can choose between tight integration with the Arrow *software project* (benefitting from the growing array of facilities exposed by e.g. the C++ or Java implementations of Apache Arrow, but with the cost of a dependency) or minimal integration with the Arrow *format* only. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7911) [C++] Gandiva tests crash when compiled with clang
Antoine Pitrou created ARROW-7911: - Summary: [C++] Gandiva tests crash when compiled with clang Key: ARROW-7911 URL: https://issues.apache.org/jira/browse/ARROW-7911 Project: Apache Arrow Issue Type: Bug Components: C++ - Gandiva Reporter: Antoine Pitrou Recently, Gandiva tests have started to crash when compiled with clang 7.0: {code} clang version 7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin {code} The same crashes occur with clang 9.0: {code} clang version 9.0.0-2~ubuntu18.04.2 (tags/RELEASE_900/final) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin {code} Tests run fine with gcc 7.4.0, though: {code} gcc-7 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7910) [C++] Provide function to query page size portably
Ben Kietzman created ARROW-7910: --- Summary: [C++] Provide function to query page size portably Key: ARROW-7910 URL: https://issues.apache.org/jira/browse/ARROW-7910 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.16.0 Reporter: Ben Kietzman Fix For: 1.0.0 Page size is a useful default buffer size for buffered readers. Where should this property be attached? MemoryManager/Device? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7909) Example URL in documentation does not resolve correctly
Taeke created ARROW-7909: Summary: Example URL in documentation does not resolve correctly Key: ARROW-7909 URL: https://issues.apache.org/jira/browse/ARROW-7909 Project: Apache Arrow Issue Type: Bug Components: Documentation Environment: Red Hat Enterprise Linux Server 7.6 (Maipo) Reporter: Taeke On the installation page for Arrow: https://arrow.apache.org/install/ it says for *CentOS 6 and 7*: {code:sh} sudo yum install -y https://apache.bintray.com/arrow/centos/$(cut -d: -f5 /etc/system-release-cpe)/apache-arrow-release-latest.rpm {code} That results in an invalid URL. The download is at: https://apache.bintray.com/arrow/centos/7/apache-arrow-release-latest.rpm not: https://apache.bintray.com/arrow/centos/7.6/apache-arrow-release-latest.rpm -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7908) Can't install R-library arrow without setting LIBARROW_DOWNLOAD=true
Taeke created ARROW-7908: Summary: Can't install R-library arrow without setting LIBARROW_DOWNLOAD=true Key: ARROW-7908 URL: https://issues.apache.org/jira/browse/ARROW-7908 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 0.16.0 Environment: Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo) CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server Kernel: Linux 3.10.0-957.35.2.el7.x86_64 Architecture: x86-64 Reporter: Taeke Fix For: 0.16.0 Hi, Installing arrow in R does not work intuitively on our server. {code:r} install.packages("arrow")` {code} results in an error: {code:sh} Installing package into '/home//R/x86_64-redhat-linux-gnu-library/3.6' (as 'lib' is unspecified) trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.16.0.2.tar.gz' Content type 'application/x-gzip' length 216119 bytes (211 KB) == downloaded 211 KB * installing *source* package 'arrow' ... ** package 'arrow' successfully unpacked and MD5 sums checked ** using staged installation PKG_CFLAGS=-I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include -DARROW_R_WITH_ARROW PKG_LIBS=-L/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/lib -larrow_dataset -lparquet -larrow -lthrift -lsnappy -lz -lzstd -llz4 -lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static -lboost_filesystem -lboost_regex -lboost_system -ljemalloc_pic ** libs g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include -DARROW_R_WITH_ARROW -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o In file included from array.cpp:18:0: ./arrow_types.h:201:31: fatal error: arrow/dataset/api.h: No such file or directory {code} It appears that the C++ code is not built. With arrow 0.16.0.1 things do work out, because it tries to build the C++ code from source. With arrow 0.16.0.2 such is no longer the case. I could finish the installation by setting the environment variable LIBARROW_DOWNLOAD to 'true': {code:java} export LIBARROW_DOWNLOAD=true {code} That, apparently, triggers the build from source. I would have expected that I would not need to set this variable explicitly. I found that [between versions|[https://github.com/apache/arrow/commit/660d0e7cbaa1cfb51498299d445636fdd6a58420]], the default value of LIBARROW_DOWNLOAD has changed: {code:sh} - download_ok <- locally_installing && !env_is("LIBARROW_DOWNLOAD", "false") + download_ok <- env_is("LIBARROW_DOWNLOAD", "true") {code} In our environment, that variable was _not_ set, resulting (accidentally?) in download_ok being false and therefore the libraries not being installed and finally the resulting error above. I can't quite figure out the logic behind all this, but it would be nice if we'd be able to install the package without first having to set LIBARROW_DOWNLOAD. Thank you for looking into this! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7907) [Python] Conversion to pandas of empty table with timestamp type aborts
Joris Van den Bossche created ARROW-7907: Summary: [Python] Conversion to pandas of empty table with timestamp type aborts Key: ARROW-7907 URL: https://issues.apache.org/jira/browse/ARROW-7907 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche Fix For: 0.16.1 Creating an empty table: {code} In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))}) In [2]: table['a'] Out[2]: [ [] ] In [3]: table.to_pandas() Out[3]: Empty DataFrame Columns: [a] Index: [] {code} the above works. But the ChunkedArray still has 1 empty chunk. When filtering data, you can actually get no chunks, and this fails: {code} In [4]: table2 = table.slice(0, 0) In [5]: table2['a'] Out[5]: [ ] In [6]: table2.to_pandas() ../src/arrow/table.cc:48: Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type ... Aborted (core dumped) {code} and this seems to happen specifically for timestamp type, and specifically with non-ns unit (eg with us as above, which is the default in arrow). I noticed this when reading a parquet file of the taxi dataset, where the filter I used resulted in an empty batch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7906) Full functionality for ORC format
HAOFENG DENG created ARROW-7906: --- Summary: Full functionality for ORC format Key: ARROW-7906 URL: https://issues.apache.org/jira/browse/ARROW-7906 Project: Apache Arrow Issue Type: New Feature Components: C++, Python Reporter: HAOFENG DENG Just like parquet format, ORC have a big group fans in Bigdata area, it have better performance that parquet in some use case. But there has a problem in python is which is does not have the standard write function. Seem the ORC team itself maintain the standard C++ code([ORC-C++|[https://github.com/apache/orc/tree/master/c%2B%2B]]), so I think will not take too much effort to integrate into Arrow(C++) and build the hook for python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7905) [Go] Port the C++ Parquet implementation to Go
Nick Poorman created ARROW-7905: --- Summary: [Go] Port the C++ Parquet implementation to Go Key: ARROW-7905 URL: https://issues.apache.org/jira/browse/ARROW-7905 Project: Apache Arrow Issue Type: New Feature Components: Go Reporter: Nick Poorman I’m currently in the progress of porting the C++ version of Parquet in the Apache Arrow project to Golang. Many projects and companies have been and are building their data lakes and persistence layer using Parquet. Apache Spark uses it heavily for persistence (including Databricks DeltaLake). To me this is the missing component for people to truly begin using the Go implementation of Arrow with any existing data architectures. If you have any interest in this project, give this post a like / bookmark it as it will keep me motivated to finish the port. Also, if you have specific use cases feel free to drop them in here so I can keep them in mind as I continue with the port. Things with the code base are rather in flux at the moment as I figure out how to solve various nuances between the features of C++ and Go. As soon as I have a solid chunk of the port working, I’ll create a PR in the Apache Arrow project on Github and let everyone know in here. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7904) [C++] Decide about Field/Schema metadata printing parameters and how much to show by default
Wes McKinney created ARROW-7904: --- Summary: [C++] Decide about Field/Schema metadata printing parameters and how much to show by default Key: ARROW-7904 URL: https://issues.apache.org/jira/browse/ARROW-7904 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 See discussion in https://github.com/apache/arrow/pull/6472 for follow up discussions to ARROW-7063 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2020-02-21-0
Arrow Build Report for Job nightly-2020-02-21-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0 Failed Tasks: - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-centos-7 - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-debian-stretch - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-gandiva-jar-trusty - macos-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-macos-r-autobrew - test-conda-python-2.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-2.7-pandas-latest - test-conda-python-2.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-2.7 - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-pandas-master - test-conda-python-3.7-turbodbc-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-turbodbc-latest - test-conda-python-3.7-turbodbc-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-turbodbc-master - wheel-manylinux2014-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-wheel-manylinux2014-cp37m - wheel-osx-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-wheel-osx-cp35m - wheel-osx-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-wheel-osx-cp36m - wheel-osx-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-wheel-osx-cp37m - wheel-osx-cp38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-wheel-osx-cp38 Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-centos-6 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-centos-8 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-conda-win-vs2015-py38 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-azure-debian-buster - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-gandiva-jar-osx - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-travis-homebrew-cpp - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-cpp-valgrind - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-cpp - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-21-0-circle-test-conda-python-3.7-pandas-latest - test-conda-python-3.7-spark-master: URL:
[RESULT] [VOTE] Adopt Arrow in-process C Data Interface specification
Hello, The vote succeeds with 3 +1 (binding) and 2 +1 (non-binding). I'll soon open a JIRA for the specification and the C++ implementation, so that we can merge those timely. Regards Antoine. On Tue, 11 Feb 2020 20:06:33 +0100 Antoine Pitrou wrote: > Hello, > > We have been discussing the creation of a minimalist C-based data > interface for applications to exchange Arrow columnar data structures > with each other. Some notable features of this interface include: > > * A small amount of header-only C code can be copied independently into > third-party libraries and downstream applications, no dependencies are > needed even on Arrow C++ itself (notably, it is not required to use > Flatbuffers, though there are trade-offs resulting from this). > > * Low development investment (in other words: limited-scope use cases > can be accomplished with little code), so as to enable C or C++ > libraries to export Arrow columnar data with minimal code. > > * Data lifetime management hooks so as to properly handle non-trivial > data sharing (for example passing Arrow columnar data to an async > processing consumer). > > This "C Data Interface" serves different use cases from the > language-independent IPC protocol and trades away a number of features > in the interest of minimalism / simplicity. It is not a replacement for > the IPC protocol and will only be used to interchange in-process data at > C or C++ call sites. > > The PR providing the specification is here: > https://github.com/apache/arrow/pull/5442 > > In particular, you can read the spec document here: > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > A fairly comprehensive C++ implementation of this demonstrating its > use is found here: > https://github.com/apache/arrow/pull/5608 > > (note that other applications implementing the interface may choose to > only support a few features and thus have far less code to write) > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > This vote will be open for at least 72 hours > > [ ] +1 Adopt C Data Interface specification > [ ] +0 > [ ] -1 Do not adopt because... > > Thank you > > Regards > > Antoine. > > > (PS: yes, this is in large part a copy/paste of Wes's previous vote > email :-)) >
Re: Integration testing
Hi, I don't think float16 support is required for 1.0. On the C++ side at least, it will require integrating a dedicated library (probably in other languages as well). Regards Antoine. Le 21/02/2020 à 00:33, Neal Richardson a écrit : > Hi all, > To help us reach 1.0 with as complete and thoroughly tested implementations > of the Arrow format, I've surveyed our integration test suite and open > issues and collected information here: > https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347 > > I'll happily grant edit privileges on the doc to anyone who requests. > > This replaces the content on > https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone, > which was a bit stale. I carried over some notes from there as appropriate, > but most were no longer accurate. Hopefully this new document helps us > revise our understanding of what is implemented and makes clear what's left > to do. > > Most of the outstanding issues (at least for C++ and Java) are already > ticketed in Jira and marked as blockers for 1.0, but let me know if you see > something missing. > > Neal >