[jira] [Created] (ARROW-8280) [C++] MinGW builds failing due to CARES-related toolchain issue
Wes McKinney created ARROW-8280: --- Summary: [C++] MinGW builds failing due to CARES-related toolchain issue Key: ARROW-8280 URL: https://issues.apache.org/jira/browse/ARROW-8280 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Fix For: 0.17.0 This just started occurring today I think Example run https://github.com/apache/arrow/pull/6774/checks?check_run_id=547420903 {code} CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: CARES_INCLUDE_DIR {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8279) [C++] Do not export symbols from Codec implementations, remove need for PIMPL pattern
Wes McKinney created ARROW-8279: --- Summary: [C++] Do not export symbols from Codec implementations, remove need for PIMPL pattern Key: ARROW-8279 URL: https://issues.apache.org/jira/browse/ARROW-8279 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.17.0 This is a small bit of code tidying that I noticed while reviewing the recent compression patch -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-03-30-1
In the conda tasks like there might be an issue with cmake 3.17, or something else? CMake Error at /build/arrow-cpp_1585598942132/_build_env/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:164 (message): Could NOT find Python3 (missing: Python3_EXECUTABLE Interpreter NumPy) (found version "3.6") Call Stack (most recent call first): /build/arrow-cpp_1585598942132/_build_env/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:445 (_FPHSA_FAILURE_MESSAGE) /build/arrow-cpp_1585598942132/_build_env/share/cmake-3.17/Modules/FindPython/Support.cmake:2398 (find_package_handle_standard_args) /build/arrow-cpp_1585598942132/_build_env/share/cmake-3.17/Modules/FindPython3.cmake:311 (include) cmake_modules/FindPython3Alt.cmake:46 (find_package) src/arrow/python/CMakeLists.txt:22 (find_package) FTR, the number of clicks to get to a build log on Azure Pipelines is kind of crazy On Mon, Mar 30, 2020 at 7:24 PM Crossbow wrote: > > > Arrow Build Report for Job nightly-2020-03-30-1 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1 > > Failed Tasks: > - conda-linux-gcc-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py36 > - conda-linux-gcc-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py37 > - conda-linux-gcc-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py38 > - conda-osx-clang-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py36 > - conda-osx-clang-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py37 > - conda-osx-clang-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py38 > - conda-win-vs2015-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py36 > - conda-win-vs2015-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py37 > - conda-win-vs2015-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py38 > - gandiva-jar-osx: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-gandiva-jar-osx > - gandiva-jar-trusty: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-gandiva-jar-trusty > - test-conda-cpp-valgrind: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-cpp-valgrind > - test-conda-python-3.7-hdfs-2.9.2: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-hdfs-2.9.2 > - test-conda-python-3.7-pandas-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-pandas-master > - test-ubuntu-18.04-docs: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-ubuntu-18.04-docs > - wheel-manylinux1-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-wheel-manylinux1-cp35m > - wheel-manylinux2014-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-wheel-manylinux2014-cp35m > > Succeeded Tasks: > - centos-6: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-6 > - centos-7: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-7 > - centos-8: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-8 > - debian-buster: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-debian-buster > - debian-stretch: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-debian-stretch > - homebrew-cpp: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-homebrew-cpp > - macos-r-autobrew: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-macos-r-autobrew > - test-conda-cpp: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-cpp > - test-conda-python-3.6: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.6 > - test-conda-python-3.7-dask-latest: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-dask-latest > -
[NIGHTLY] Arrow Build Report for Job nightly-2020-03-30-1
Arrow Build Report for Job nightly-2020-03-30-1 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1 Failed Tasks: - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py38 - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-gandiva-jar-osx - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-gandiva-jar-trusty - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-cpp-valgrind - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-pandas-master - test-ubuntu-18.04-docs: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-ubuntu-18.04-docs - wheel-manylinux1-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-wheel-manylinux1-cp35m - wheel-manylinux2014-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-wheel-manylinux2014-cp35m Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-7 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-8 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-debian-stretch - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-homebrew-cpp - macos-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-macos-r-autobrew - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-cpp - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-kartothek-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-kartothek-latest - test-conda-python-3.7-kartothek-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-kartothek-master - test-conda-python-3.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-pandas-latest - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-spark-master - test-conda-python-3.7-turbodbc-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-turbodbc-latest - test-conda-python-3.7-turbodbc-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-turbodbc-master - test-conda-python-3.7: URL:
The future of Parquet development for Arrow Rust?
hi folks, More than a year has passed since the Parquet Rust project joined forces with Apache Arrow. I raised this issue in the past, but the project still cannot write files originating from Arrow records. In my opinion, this creates sustainability / development scalability problems for the ongoing development of the project. In particular, testing has to rely on binary files either pre-generated or generated by another library. This makes everything harder (testing, feature development, benchmarking, and so forth) and increases the chance of failing to cover edge cases. Looking back on over 4 years of C++ Parquet development, I doubt we could have gotten the project to where it is now without a writer implementation moving together with the reader. For example, we've had to deal with issues arising in very large files (e.g. BinaryArray overflows), and in many cases it would not be practical to store a pre-generated file exhibiting some of these problems. Of course, as a volunteer driven effort no one can be forced to implement a writer, but since a good amount of time has passed I feel I need to raise awareness of the issue again to see if an effort might be mobilized, since this also impacts people who might come to rely on this code in production. Given the importance of Parquet in current times, having a rock solid Parquet library will likely become essential to sustained adoption of the Arrow Rust project (it has certainly been very important for C++/Python/R adoption). best, Wes
[jira] [Created] (ARROW-8278) [C++] Simplify IPC tests by using BufferOutputStreams
Wes McKinney created ARROW-8278: --- Summary: [C++] Simplify IPC tests by using BufferOutputStreams Key: ARROW-8278 URL: https://issues.apache.org/jira/browse/ARROW-8278 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney The use of memory maps in these IPC tests adds developer overhead for each new test added. I'm not sure they add much. If we're concerned about zero-copy in reads, we can just as easily check for that with an in-memory buffer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8277) [Python] RecordBatch interface improvements
Zhuo Peng created ARROW-8277: Summary: [Python] RecordBatch interface improvements Key: ARROW-8277 URL: https://issues.apache.org/jira/browse/ARROW-8277 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Zhuo Peng Assignee: Zhuo Peng Currently __eq__, __repr__ of RecordBatch are not implemented. compute::Take also supports RecordBatch inputs but there's no python wrapper for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8276) [C++][Dataset] Scannin a Fragment does not take into account the partition columns
Joris Van den Bossche created ARROW-8276: Summary: [C++][Dataset] Scannin a Fragment does not take into account the partition columns Key: ARROW-8276 URL: https://issues.apache.org/jira/browse/ARROW-8276 Project: Apache Arrow Issue Type: Bug Components: C++, C++ - Dataset Reporter: Joris Van den Bossche Fix For: 0.17.0 Follow-up on ARROW-8061, the {{to_table}} method doesn't work for fragments created from a partitioned dataset. (will add a reproducer later) cc [~bkietz] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8275) [Python][Docs] Review Feather + IPC file documentation per "Feather V2" changes
Wes McKinney created ARROW-8275: --- Summary: [Python][Docs] Review Feather + IPC file documentation per "Feather V2" changes Key: ARROW-8275 URL: https://issues.apache.org/jira/browse/ARROW-8275 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Python Reporter: Wes McKinney Fix For: 0.17.0 Bring documentation up to date with what's in master -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8274) [C++] Use LZ4 frame format for "LZ4" compression in IPC write
Wes McKinney created ARROW-8274: --- Summary: [C++] Use LZ4 frame format for "LZ4" compression in IPC write Key: ARROW-8274 URL: https://issues.apache.org/jira/browse/ARROW-8274 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Fix For: 0.17.0 Currently the non-frame format is being used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Clarification regarding the `CDataInterface.rst`
On Mon, 30 Mar 2020 15:17:02 - Anish Biswas wrote: > Thanks! I'll probably build the Arrow Library from source. Thanks again! You should be able to get a nightly build using: $ pip install -U --extra-index-url \ https://pypi.fury.io/arrow-nightlies/ --pre pyarrow Regards Antoine.
Re: Clarification regarding the `CDataInterface.rst`
Thanks! I'll probably build the Arrow Library from source. Thanks again! On 2020/03/30 14:49:35, Wes McKinney wrote: > The first release containing this functionality is the upcoming one 0.17.0. > In the meantime you can build from source or use the wheel build scripts in > python/manylinux1. We are working on nightlies for development and testing, > so someone may be able to point you to a nightly package > > On Mon, Mar 30, 2020, 9:28 AM Anish Biswas wrote: > > > I am extremely sorry for the late reply, I didn't get an email regarding > > your reply. Thanks for the links! This is exactly what I wanted. I tried > > doing the same `_import_from_c` in my code but it throws an error stating > > that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow > > 0.16.0. Is there a case of version mismatch here? > > > > On 2020/03/29 20:46:32, Wes McKinney wrote: > > > To add to this, take a look at the C interface functions in pyarrow > > > > > > Reconstruct pyarrow.DataType from C ArrowSchema > > > > > > > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203 > > > > > > Reconstruct pyarrow.Array from C ArrowArray > > > > > > > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176 > > > > > > The idea is that a single ArrowSchema may correspond to a sequence of > > > ArrowArray, so the data type (equivalently schema) is represented > > > separately from the array data. > > > > > > You can see examples of both of these in the unit tests (which use > > > cffi to create the C structs) > > > > > > > > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py > > > > > > If you're having trouble getting things to work, it would be helpful > > > if you could show what data exactly you are putting into the C > > > structures and how it is not returning the expected result when > > > imported into pyarrow. > > > > > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson > > > wrote: > > > > > > > > Hi Anish, > > > > You may be interested in how the Arrow R package uses the C interface > > to > > > > pass data to/from pyarrow. Both sides use the Arrow C++ library's > > > > implementation of the C interface. See > > > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > > > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow > > C++ > > > > implementation is in > > > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > > > > > > > Neal > > > > > > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas < > > anishbiswas...@gmail.com> > > > > wrote: > > > > > > > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > > > > > > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > > > > ] > > > > > document for a few days now. So what I am trying is basically to use > > the C > > > > > interface with a minimum dependencies to produce blocks of bytes that > > > > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > > > > vice-versa: both directions). > > > > > > > > > > Here's what I already tried doing. > > > > > > > > > >- Created a C library that contains the two structs ArrowSchema > > and > > > > >ArrowArray and some functions to export an int64_t array as an > > Arrow > > > > > Array. > > > > >This is very similar to what the document did with int32_t arrays. > > > > >- Imported the C library in Python. Created an int64_t > > pyarrow.array. > > > > >Serialized it to read the bytes via Numpy and populated the C > > struct I > > > > >created using the C library function. > > > > > > > > > > What I expected was that the bytes would have some resemblance to > > each > > > > > other and that pyarrow would have some utility to pick up the > > ArrowArray > > > > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > > > > > > > I am also confused as to how do I use ArrowSchema properly. The > > > > > ArrowSchema is > > > > > the only structure that differentiates different ArrowArray formats. > > > > > However, the fact that I am not using it anywhere with the ArrowArray > > > > > struct > > > > > or for that matter for any kind of initialization which tells the > > Arrow > > > > > library that "The next structure you will encounter would be of the > > kind > > > > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > > > > > > > It would really help me out, if you could tell if I actually > > misinterpreted > > > > > the doc, or am I doing something wrong. Thanks! > > > > > > > > > > >
Re: Clarification regarding the `CDataInterface.rst`
The first release containing this functionality is the upcoming one 0.17.0. In the meantime you can build from source or use the wheel build scripts in python/manylinux1. We are working on nightlies for development and testing, so someone may be able to point you to a nightly package On Mon, Mar 30, 2020, 9:28 AM Anish Biswas wrote: > I am extremely sorry for the late reply, I didn't get an email regarding > your reply. Thanks for the links! This is exactly what I wanted. I tried > doing the same `_import_from_c` in my code but it throws an error stating > that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow > 0.16.0. Is there a case of version mismatch here? > > On 2020/03/29 20:46:32, Wes McKinney wrote: > > To add to this, take a look at the C interface functions in pyarrow > > > > Reconstruct pyarrow.DataType from C ArrowSchema > > > > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203 > > > > Reconstruct pyarrow.Array from C ArrowArray > > > > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176 > > > > The idea is that a single ArrowSchema may correspond to a sequence of > > ArrowArray, so the data type (equivalently schema) is represented > > separately from the array data. > > > > You can see examples of both of these in the unit tests (which use > > cffi to create the C structs) > > > > > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py > > > > If you're having trouble getting things to work, it would be helpful > > if you could show what data exactly you are putting into the C > > structures and how it is not returning the expected result when > > imported into pyarrow. > > > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson > > wrote: > > > > > > Hi Anish, > > > You may be interested in how the Arrow R package uses the C interface > to > > > pass data to/from pyarrow. Both sides use the Arrow C++ library's > > > implementation of the C interface. See > > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow > C++ > > > implementation is in > > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > > > > > Neal > > > > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas < > anishbiswas...@gmail.com> > > > wrote: > > > > > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > > > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > > > ] > > > > document for a few days now. So what I am trying is basically to use > the C > > > > interface with a minimum dependencies to produce blocks of bytes that > > > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > > > vice-versa: both directions). > > > > > > > > Here's what I already tried doing. > > > > > > > >- Created a C library that contains the two structs ArrowSchema > and > > > >ArrowArray and some functions to export an int64_t array as an > Arrow > > > > Array. > > > >This is very similar to what the document did with int32_t arrays. > > > >- Imported the C library in Python. Created an int64_t > pyarrow.array. > > > >Serialized it to read the bytes via Numpy and populated the C > struct I > > > >created using the C library function. > > > > > > > > What I expected was that the bytes would have some resemblance to > each > > > > other and that pyarrow would have some utility to pick up the > ArrowArray > > > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > > > > > I am also confused as to how do I use ArrowSchema properly. The > > > > ArrowSchema is > > > > the only structure that differentiates different ArrowArray formats. > > > > However, the fact that I am not using it anywhere with the ArrowArray > > > > struct > > > > or for that matter for any kind of initialization which tells the > Arrow > > > > library that "The next structure you will encounter would be of the > kind > > > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > > > > > It would really help me out, if you could tell if I actually > misinterpreted > > > > the doc, or am I doing something wrong. Thanks! > > > > > > >
Re: Clarification regarding the `CDataInterface.rst`
Hi Neil Richardson, I apologize for the late reply. The links are pretty helpful, thanks a ton! I went through them and this would be a very good starting point for a larger project that I am working on where my task is exactly this. Conversions "to Arrow" and "from Arrow". On 2020/03/29 20:40:59, Neal Richardson wrote: > Hi Anish, > You may be interested in how the Arrow R package uses the C interface to > pass data to/from pyarrow. Both sides use the Arrow C++ library's > implementation of the C interface. See > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++ > implementation is in > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > Neal > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas > wrote: > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > ] > > document for a few days now. So what I am trying is basically to use the C > > interface with a minimum dependencies to produce blocks of bytes that > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > vice-versa: both directions). > > > > Here's what I already tried doing. > > > >- Created a C library that contains the two structs ArrowSchema and > >ArrowArray and some functions to export an int64_t array as an Arrow > > Array. > >This is very similar to what the document did with int32_t arrays. > >- Imported the C library in Python. Created an int64_t pyarrow.array. > >Serialized it to read the bytes via Numpy and populated the C struct I > >created using the C library function. > > > > What I expected was that the bytes would have some resemblance to each > > other and that pyarrow would have some utility to pick up the ArrowArray > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > I am also confused as to how do I use ArrowSchema properly. The > > ArrowSchema is > > the only structure that differentiates different ArrowArray formats. > > However, the fact that I am not using it anywhere with the ArrowArray > > struct > > or for that matter for any kind of initialization which tells the Arrow > > library that "The next structure you will encounter would be of the kind > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > It would really help me out, if you could tell if I actually misinterpreted > > the doc, or am I doing something wrong. Thanks! > > >
[jira] [Created] (ARROW-8273) Fail to convert an integer list (arrow) to pandas
Jonathan mercier created ARROW-8273: --- Summary: Fail to convert an integer list (arrow) to pandas Key: ARROW-8273 URL: https://issues.apache.org/jira/browse/ARROW-8273 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Jonathan mercier Dear, I have an arrow table and one of columns as an arrow list(size=2) When I try to convert the table to pandas I got an ArrowNotImplementedError error below minimal case: {code:java} from pyarrow import Schema, Table, int64, list_, schema, array fields = [('foo', list_(int64(), 2),)] sc = schema(fields) foo_column = [[1,2], [3,4]] columns = [foo_column] a_table = Table.from_arrays(arrays=columns, schema=sc) df = a_table.to_pandas() --- ArrowNotImplementedError Traceback (most recent call last) in > 1 df2 = a_table.to_pandas() ~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas() ~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas() ~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper) 764 _check_data_column_metadata_consistency(all_columns) 765 columns = _deserialize_column_index(table, all_columns, column_indexes) --> 766 blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes) 767 768 axes = [columns, index] ~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories, extension_columns) 1099 columns = block_table.column_names 1100 result = pa.lib.table_to_blocks(options, block_table, categories, -> 1101 list(extension_columns.keys())) 1102 return [_reconstruct_block(item, columns, extension_columns) 1103 for item in result] ~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.table_to_blocks() ~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of type fixed_size_list[2] is known {code} Maybe I need to convert python list (column) to array Thanks for your help -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Clarification regarding the `CDataInterface.rst`
I am extremely sorry for the late reply, I didn't get an email regarding your reply. Thanks for the links! This is exactly what I wanted. I tried doing the same `_import_from_c` in my code but it throws an error stating that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow 0.16.0. Is there a case of version mismatch here? On 2020/03/29 20:46:32, Wes McKinney wrote: > To add to this, take a look at the C interface functions in pyarrow > > Reconstruct pyarrow.DataType from C ArrowSchema > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203 > > Reconstruct pyarrow.Array from C ArrowArray > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176 > > The idea is that a single ArrowSchema may correspond to a sequence of > ArrowArray, so the data type (equivalently schema) is represented > separately from the array data. > > You can see examples of both of these in the unit tests (which use > cffi to create the C structs) > > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py > > If you're having trouble getting things to work, it would be helpful > if you could show what data exactly you are putting into the C > structures and how it is not returning the expected result when > imported into pyarrow. > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson > wrote: > > > > Hi Anish, > > You may be interested in how the Arrow R package uses the C interface to > > pass data to/from pyarrow. Both sides use the Arrow C++ library's > > implementation of the C interface. See > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++ > > implementation is in > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > > > Neal > > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas > > wrote: > > > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > > ] > > > document for a few days now. So what I am trying is basically to use the C > > > interface with a minimum dependencies to produce blocks of bytes that > > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > > vice-versa: both directions). > > > > > > Here's what I already tried doing. > > > > > >- Created a C library that contains the two structs ArrowSchema and > > >ArrowArray and some functions to export an int64_t array as an Arrow > > > Array. > > >This is very similar to what the document did with int32_t arrays. > > >- Imported the C library in Python. Created an int64_t pyarrow.array. > > >Serialized it to read the bytes via Numpy and populated the C struct I > > >created using the C library function. > > > > > > What I expected was that the bytes would have some resemblance to each > > > other and that pyarrow would have some utility to pick up the ArrowArray > > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > > > I am also confused as to how do I use ArrowSchema properly. The > > > ArrowSchema is > > > the only structure that differentiates different ArrowArray formats. > > > However, the fact that I am not using it anywhere with the ArrowArray > > > struct > > > or for that matter for any kind of initialization which tells the Arrow > > > library that "The next structure you will encounter would be of the kind > > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > > > It would really help me out, if you could tell if I actually > > > misinterpreted > > > the doc, or am I doing something wrong. Thanks! > > > >
[jira] [Created] (ARROW-8272) [CI][Python] Test failure on Ubuntu 16.04
Antoine Pitrou created ARROW-8272: - Summary: [CI][Python] Test failure on Ubuntu 16.04 Key: ARROW-8272 URL: https://issues.apache.org/jira/browse/ARROW-8272 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, Python Reporter: Antoine Pitrou See https://github.com/pitrou/arrow/runs/545291564 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8271) [Packaging] Allow wheel upload failures to gemfury
Krisztian Szucs created ARROW-8271: -- Summary: [Packaging] Allow wheel upload failures to gemfury Key: ARROW-8271 URL: https://issues.apache.org/jira/browse/ARROW-8271 Project: Apache Arrow Issue Type: Improvement Components: Packaging, Python Reporter: Krisztian Szucs Assignee: Krisztian Szucs If we run multiple nightly/scheduled jobs per day for the same arrow commit then gemfury's API will refuse the upload because of conflicting versions, see [build|https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=9053=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=b525c197-f769-5e52-d38a-e6301f5260f2=27]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8270) [Python][Flight] Flight server with TLS's certificate and key is not working
Ravindra Wagh created ARROW-8270: Summary: [Python][Flight] Flight server with TLS's certificate and key is not working Key: ARROW-8270 URL: https://issues.apache.org/jira/browse/ARROW-8270 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Python Affects Versions: 0.16.0 Reporter: Ravindra Wagh Assignee: Ravindra Wagh On starting the python server(arrow/python/examples/flight/server.py --host localhost --tls serv.crt serv.key), it gives below error: {noformat} TypeError: __init__() got an unexpected keyword argument 'tls_cert_chain'{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)