date:20200330

[jira] [Created] (ARROW-8280) [C++] MinGW builds failing due to CARES-related toolchain issue

2020-03-30 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8280:
---

 Summary: [C++] MinGW builds failing due to CARES-related toolchain 
issue
 Key: ARROW-8280
 URL: https://issues.apache.org/jira/browse/ARROW-8280
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.17.0


This just started occurring today I think

Example run 
https://github.com/apache/arrow/pull/6774/checks?check_run_id=547420903

{code}
CMake Error: The following variables are used in this project, but they are set 
to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake 
files:
CARES_INCLUDE_DIR
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8279) [C++] Do not export symbols from Codec implementations, remove need for PIMPL pattern

2020-03-30 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8279:
---

 Summary: [C++] Do not export symbols from Codec implementations, 
remove need for PIMPL pattern
 Key: ARROW-8279
 URL: https://issues.apache.org/jira/browse/ARROW-8279
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.17.0


This is a small bit of code tidying that I noticed while reviewing the recent 
compression patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-03-30-1

2020-03-30 Thread Wes McKinney

In the conda tasks like there might be an issue with cmake 3.17, or
something else?

CMake Error at 
/build/arrow-cpp_1585598942132/_build_env/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:164
(message):
Could NOT find Python3 (missing: Python3_EXECUTABLE Interpreter NumPy)
(found version "3.6")
Call Stack (most recent call first):
/build/arrow-cpp_1585598942132/_build_env/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:445
(_FPHSA_FAILURE_MESSAGE)
/build/arrow-cpp_1585598942132/_build_env/share/cmake-3.17/Modules/FindPython/Support.cmake:2398
(find_package_handle_standard_args)
/build/arrow-cpp_1585598942132/_build_env/share/cmake-3.17/Modules/FindPython3.cmake:311
(include)
cmake_modules/FindPython3Alt.cmake:46 (find_package)
src/arrow/python/CMakeLists.txt:22 (find_package)

FTR, the number of clicks to get to a build log on Azure Pipelines is
kind of crazy

On Mon, Mar 30, 2020 at 7:24 PM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2020-03-30-1
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1
>
> Failed Tasks:
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py37
> - conda-linux-gcc-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py38
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py37
> - conda-osx-clang-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py38
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py37
> - conda-win-vs2015-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py38
> - gandiva-jar-osx:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-gandiva-jar-osx
> - gandiva-jar-trusty:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-gandiva-jar-trusty
> - test-conda-cpp-valgrind:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-cpp-valgrind
> - test-conda-python-3.7-hdfs-2.9.2:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-hdfs-2.9.2
> - test-conda-python-3.7-pandas-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-pandas-master
> - test-ubuntu-18.04-docs:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-ubuntu-18.04-docs
> - wheel-manylinux1-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-wheel-manylinux1-cp35m
> - wheel-manylinux2014-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-wheel-manylinux2014-cp35m
>
> Succeeded Tasks:
> - centos-6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-6
> - centos-7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-7
> - centos-8:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-8
> - debian-buster:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-debian-buster
> - debian-stretch:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-debian-stretch
> - homebrew-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-homebrew-cpp
> - macos-r-autobrew:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-macos-r-autobrew
> - test-conda-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-cpp
> - test-conda-python-3.6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.6
> - test-conda-python-3.7-dask-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-dask-latest
> -

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-30-1

2020-03-30 Thread Crossbow



Arrow Build Report for Job nightly-2020-03-30-1

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1

Failed Tasks:
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-conda-win-vs2015-py38
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-gandiva-jar-osx
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-gandiva-jar-trusty
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-cpp-valgrind
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-pandas-master
- test-ubuntu-18.04-docs:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-ubuntu-18.04-docs
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-wheel-manylinux1-cp35m
- wheel-manylinux2014-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-azure-wheel-manylinux2014-cp35m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-centos-8
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-github-debian-stretch
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-travis-macos-r-autobrew
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-cpp
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-30-1-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL:

The future of Parquet development for Arrow Rust?

2020-03-30 Thread Wes McKinney

hi folks,

More than a year has passed since the Parquet Rust project joined
forces with Apache Arrow.

I raised this issue in the past, but the project still cannot write
files originating from Arrow records. In my opinion, this creates
sustainability / development scalability problems for the ongoing
development of the project. In particular, testing has to rely on
binary files either pre-generated or generated by another library.
This makes everything harder (testing, feature development,
benchmarking, and so forth) and increases the chance of failing to
cover edge cases.

Looking back on over 4 years of C++ Parquet development, I doubt we
could have gotten the project to where it is now without a writer
implementation moving together with the reader. For example, we've had
to deal with issues arising in very large files (e.g. BinaryArray
overflows), and in many cases it would not be practical to store a
pre-generated file exhibiting some of these problems.

Of course, as a volunteer driven effort no one can be forced to
implement a writer, but since a good amount of time has passed I feel
I need to raise awareness of the issue again to see if an effort might
be mobilized, since this also impacts people who might come to rely on
this code in production. Given the importance of Parquet in current
times, having a rock solid Parquet library will likely become
essential to sustained adoption of the Arrow Rust project (it has
certainly been very important for C++/Python/R adoption).

best,
Wes

[jira] [Created] (ARROW-8278) [C++] Simplify IPC tests by using BufferOutputStreams

2020-03-30 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8278:
---

 Summary: [C++] Simplify IPC tests by using BufferOutputStreams
 Key: ARROW-8278
 URL: https://issues.apache.org/jira/browse/ARROW-8278
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


The use of memory maps in these IPC tests adds developer overhead for each new 
test added. I'm not sure they add much. If we're concerned about zero-copy in 
reads, we can just as easily check for that with an in-memory buffer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8277) [Python] RecordBatch interface improvements

2020-03-30 Thread Zhuo Peng (Jira)

Zhuo Peng created ARROW-8277:


 Summary: [Python] RecordBatch interface improvements
 Key: ARROW-8277
 URL: https://issues.apache.org/jira/browse/ARROW-8277
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Zhuo Peng
Assignee: Zhuo Peng


Currently __eq__, __repr__ of RecordBatch are not implemented.

compute::Take also supports RecordBatch inputs but there's no python wrapper 
for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8276) [C++][Dataset] Scannin a Fragment does not take into account the partition columns

2020-03-30 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-8276:


 Summary: [C++][Dataset] Scannin a Fragment does not take into 
account the partition columns
 Key: ARROW-8276
 URL: https://issues.apache.org/jira/browse/ARROW-8276
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Dataset
Reporter: Joris Van den Bossche
 Fix For: 0.17.0


Follow-up on ARROW-8061, the {{to_table}} method doesn't work for fragments 
created from a partitioned dataset.

(will add a reproducer later)

cc [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8275) [Python][Docs] Review Feather + IPC file documentation per "Feather V2" changes

2020-03-30 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8275:
---

 Summary: [Python][Docs] Review Feather + IPC file documentation 
per "Feather V2" changes
 Key: ARROW-8275
 URL: https://issues.apache.org/jira/browse/ARROW-8275
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Python
Reporter: Wes McKinney
 Fix For: 0.17.0


Bring documentation up to date with what's in master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8274) [C++] Use LZ4 frame format for "LZ4" compression in IPC write

2020-03-30 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8274:
---

 Summary: [C++] Use LZ4 frame format for "LZ4" compression in IPC 
write
 Key: ARROW-8274
 URL: https://issues.apache.org/jira/browse/ARROW-8274
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.17.0


Currently the non-frame format is being used. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Antoine Pitrou

On Mon, 30 Mar 2020 15:17:02 -
Anish Biswas  wrote:
> Thanks! I'll probably build the Arrow Library from source. Thanks again!

You should be able to get a nightly build using:

$ pip install -U --extra-index-url \
https://pypi.fury.io/arrow-nightlies/ --pre pyarrow

Regards

Antoine.

Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Anish Biswas

Thanks! I'll probably build the Arrow Library from source. Thanks again!

On 2020/03/30 14:49:35, Wes McKinney  wrote: 
> The first release containing this functionality is the upcoming one 0.17.0.
> In the meantime you can build from source or use the wheel build scripts in
> python/manylinux1. We are working on nightlies for development and testing,
> so someone may be able to point you to a nightly package
> 
> On Mon, Mar 30, 2020, 9:28 AM Anish Biswas  wrote:
> 
> > I am extremely sorry for the late reply, I didn't get an email regarding
> > your reply. Thanks for the links! This is exactly what I wanted. I tried
> > doing the same `_import_from_c` in my code but it throws an error stating
> > that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow
> > 0.16.0. Is there a case of version mismatch here?
> >
> > On 2020/03/29 20:46:32, Wes McKinney  wrote:
> > > To add to this, take a look at the C interface functions in pyarrow
> > >
> > > Reconstruct pyarrow.DataType from C ArrowSchema
> > >
> > >
> > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203
> > >
> > > Reconstruct pyarrow.Array from C ArrowArray
> > >
> > >
> > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176
> > >
> > > The idea is that a single ArrowSchema may correspond to a sequence of
> > > ArrowArray, so the data type (equivalently schema) is represented
> > > separately from the array data.
> > >
> > > You can see examples of both of these in the unit tests (which use
> > > cffi to create the C structs)
> > >
> > >
> > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py
> > >
> > > If you're having trouble getting things to work, it would be helpful
> > > if you could show what data exactly you are putting into the C
> > > structures and how it is not returning the expected result when
> > > imported into pyarrow.
> > >
> > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson
> > >  wrote:
> > > >
> > > > Hi Anish,
> > > > You may be interested in how the Arrow R package uses the C interface
> > to
> > > > pass data to/from pyarrow. Both sides use the Arrow C++ library's
> > > > implementation of the C interface. See
> > > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> > > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow
> > C++
> > > > implementation is in
> > > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
> > > >
> > > > Neal
> > > >
> > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas <
> > anishbiswas...@gmail.com>
> > > > wrote:
> > > >
> > > > > I have been trying to wrap my head around the[ CDataInterface.rst|
> > > > >
> > > > >
> > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > > > > ]
> > > > > document for a few days now. So what I am trying is basically to use
> > the C
> > > > > interface with a minimum dependencies to produce blocks of bytes that
> > > > > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > > > > vice-versa: both directions).
> > > > >
> > > > > Here's what I already tried doing.
> > > > >
> > > > >- Created a C library that contains the two structs ArrowSchema
> > and
> > > > >ArrowArray and some functions to export an int64_t array as an
> > Arrow
> > > > > Array.
> > > > >This is very similar to what the document did with int32_t arrays.
> > > > >- Imported the C library in Python. Created an int64_t
> > pyarrow.array.
> > > > >Serialized it to read the bytes via Numpy and populated the C
> > struct I
> > > > >created using the C library function.
> > > > >
> > > > > What I expected was that the bytes would have some resemblance to
> > each
> > > > > other and that pyarrow would have some utility to pick up the
> > ArrowArray
> > > > > struct and treat it as an Arrow Array. But I couldn't get it to work.
> > > > >
> > > > > I am also confused as to how do I use ArrowSchema properly. The
> > > > > ArrowSchema is
> > > > > the only structure that differentiates different ArrowArray formats.
> > > > > However, the fact that I am not using it anywhere with the ArrowArray
> > > > > struct
> > > > > or for that matter for any kind of initialization which tells the
> > Arrow
> > > > > library that "The next structure you will encounter would be of the
> > kind
> > > > > that the ArrowSchema has provided you", doesn't seem correct to me.
> > > > >
> > > > > It would really help me out, if you could tell if I actually
> > misinterpreted
> > > > > the doc, or am I doing something wrong. Thanks!
> > > > >
> > >
> >
>

Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Wes McKinney

The first release containing this functionality is the upcoming one 0.17.0.
In the meantime you can build from source or use the wheel build scripts in
python/manylinux1. We are working on nightlies for development and testing,
so someone may be able to point you to a nightly package

On Mon, Mar 30, 2020, 9:28 AM Anish Biswas  wrote:

> I am extremely sorry for the late reply, I didn't get an email regarding
> your reply. Thanks for the links! This is exactly what I wanted. I tried
> doing the same `_import_from_c` in my code but it throws an error stating
> that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow
> 0.16.0. Is there a case of version mismatch here?
>
> On 2020/03/29 20:46:32, Wes McKinney  wrote:
> > To add to this, take a look at the C interface functions in pyarrow
> >
> > Reconstruct pyarrow.DataType from C ArrowSchema
> >
> >
> https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203
> >
> > Reconstruct pyarrow.Array from C ArrowArray
> >
> >
> https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176
> >
> > The idea is that a single ArrowSchema may correspond to a sequence of
> > ArrowArray, so the data type (equivalently schema) is represented
> > separately from the array data.
> >
> > You can see examples of both of these in the unit tests (which use
> > cffi to create the C structs)
> >
> >
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py
> >
> > If you're having trouble getting things to work, it would be helpful
> > if you could show what data exactly you are putting into the C
> > structures and how it is not returning the expected result when
> > imported into pyarrow.
> >
> > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson
> >  wrote:
> > >
> > > Hi Anish,
> > > You may be interested in how the Arrow R package uses the C interface
> to
> > > pass data to/from pyarrow. Both sides use the Arrow C++ library's
> > > implementation of the C interface. See
> > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow
> C++
> > > implementation is in
> > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
> > >
> > > Neal
> > >
> > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas <
> anishbiswas...@gmail.com>
> > > wrote:
> > >
> > > > I have been trying to wrap my head around the[ CDataInterface.rst|
> > > >
> > > >
> https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > > > ]
> > > > document for a few days now. So what I am trying is basically to use
> the C
> > > > interface with a minimum dependencies to produce blocks of bytes that
> > > > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > > > vice-versa: both directions).
> > > >
> > > > Here's what I already tried doing.
> > > >
> > > >- Created a C library that contains the two structs ArrowSchema
> and
> > > >ArrowArray and some functions to export an int64_t array as an
> Arrow
> > > > Array.
> > > >This is very similar to what the document did with int32_t arrays.
> > > >- Imported the C library in Python. Created an int64_t
> pyarrow.array.
> > > >Serialized it to read the bytes via Numpy and populated the C
> struct I
> > > >created using the C library function.
> > > >
> > > > What I expected was that the bytes would have some resemblance to
> each
> > > > other and that pyarrow would have some utility to pick up the
> ArrowArray
> > > > struct and treat it as an Arrow Array. But I couldn't get it to work.
> > > >
> > > > I am also confused as to how do I use ArrowSchema properly. The
> > > > ArrowSchema is
> > > > the only structure that differentiates different ArrowArray formats.
> > > > However, the fact that I am not using it anywhere with the ArrowArray
> > > > struct
> > > > or for that matter for any kind of initialization which tells the
> Arrow
> > > > library that "The next structure you will encounter would be of the
> kind
> > > > that the ArrowSchema has provided you", doesn't seem correct to me.
> > > >
> > > > It would really help me out, if you could tell if I actually
> misinterpreted
> > > > the doc, or am I doing something wrong. Thanks!
> > > >
> >
>

Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Anish Biswas

Hi Neil Richardson,
I apologize for the late reply. The links are pretty helpful, thanks a ton! I 
went through them and this would be a very good starting point for a larger 
project that I am working on where my task is exactly this. Conversions "to 
Arrow" and "from Arrow".

On 2020/03/29 20:40:59, Neal Richardson  wrote: 
> Hi Anish,
> You may be interested in how the Arrow R package uses the C interface to
> pass data to/from pyarrow. Both sides use the Arrow C++ library's
> implementation of the C interface. See
> https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++
> implementation is in
> https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
> 
> Neal
> 
> On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas 
> wrote:
> 
> > I have been trying to wrap my head around the[ CDataInterface.rst|
> >
> > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > ]
> > document for a few days now. So what I am trying is basically to use the C
> > interface with a minimum dependencies to produce blocks of bytes that
> > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > vice-versa: both directions).
> >
> > Here's what I already tried doing.
> >
> >- Created a C library that contains the two structs ArrowSchema and
> >ArrowArray and some functions to export an int64_t array as an Arrow
> > Array.
> >This is very similar to what the document did with int32_t arrays.
> >- Imported the C library in Python. Created an int64_t pyarrow.array.
> >Serialized it to read the bytes via Numpy and populated the C struct I
> >created using the C library function.
> >
> > What I expected was that the bytes would have some resemblance to each
> > other and that pyarrow would have some utility to pick up the ArrowArray
> > struct and treat it as an Arrow Array. But I couldn't get it to work.
> >
> > I am also confused as to how do I use ArrowSchema properly. The
> > ArrowSchema is
> > the only structure that differentiates different ArrowArray formats.
> > However, the fact that I am not using it anywhere with the ArrowArray
> > struct
> > or for that matter for any kind of initialization which tells the Arrow
> > library that "The next structure you will encounter would be of the kind
> > that the ArrowSchema has provided you", doesn't seem correct to me.
> >
> > It would really help me out, if you could tell if I actually misinterpreted
> > the doc, or am I doing something wrong. Thanks!
> >
>

[jira] [Created] (ARROW-8273) Fail to convert an integer list (arrow) to pandas

2020-03-30 Thread Jonathan mercier (Jira)

Jonathan mercier created ARROW-8273:
---

 Summary: Fail to convert an integer list (arrow) to pandas
 Key: ARROW-8273
 URL: https://issues.apache.org/jira/browse/ARROW-8273
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Jonathan mercier


Dear,

 

I have an arrow table and one of columns as an arrow list(size=2) 

When I try to convert the table to pandas I got an ArrowNotImplementedError 
error

 below minimal case: 
{code:java}
from pyarrow import Schema, Table, int64, list_, schema, array
fields = [('foo', list_(int64(), 2),)]
sc = schema(fields)
foo_column = [[1,2], [3,4]]
columns = [foo_column]
a_table = Table.from_arrays(arrays=columns, schema=sc)
df = a_table.to_pandas()
---
ArrowNotImplementedError Traceback (most recent call last)
 in 
> 1 df2 = a_table.to_pandas()
~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/array.pxi in 
pyarrow.lib._PandasConvertible.to_pandas()
~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/table.pxi in 
pyarrow.lib.Table._to_pandas()
~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py in 
table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
764 _check_data_column_metadata_consistency(all_columns)
765 columns = _deserialize_column_index(table, all_columns, column_indexes)
--> 766 blocks = _table_to_blocks(options, table, categories, 
ext_columns_dtypes)
767
768 axes = [columns, index]
~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py in 
_table_to_blocks(options, block_table, categories, extension_columns)
1099 columns = block_table.column_names
1100 result = pa.lib.table_to_blocks(options, block_table, categories,
-> 1101 list(extension_columns.keys()))
1102 return [_reconstruct_block(item, columns, extension_columns)
1103 for item in result]
~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/table.pxi in 
pyarrow.lib.table_to_blocks()
~/somewhere/foo/venv/lib64/python3.7/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()
ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of 
type fixed_size_list[2] is known
{code}
 

Maybe I need to convert python list (column) to array

Thanks for your help
  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Anish Biswas

I am extremely sorry for the late reply, I didn't get an email regarding your 
reply. Thanks for the links! This is exactly what I wanted. I tried doing the 
same `_import_from_c` in my code but it throws an error stating that 
`pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow 0.16.0. 
Is there a case of version mismatch here?

On 2020/03/29 20:46:32, Wes McKinney  wrote: 
> To add to this, take a look at the C interface functions in pyarrow
> 
> Reconstruct pyarrow.DataType from C ArrowSchema
> 
> https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203
> 
> Reconstruct pyarrow.Array from C ArrowArray
> 
> https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176
> 
> The idea is that a single ArrowSchema may correspond to a sequence of
> ArrowArray, so the data type (equivalently schema) is represented
> separately from the array data.
> 
> You can see examples of both of these in the unit tests (which use
> cffi to create the C structs)
> 
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py
> 
> If you're having trouble getting things to work, it would be helpful
> if you could show what data exactly you are putting into the C
> structures and how it is not returning the expected result when
> imported into pyarrow.
> 
> On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson
>  wrote:
> >
> > Hi Anish,
> > You may be interested in how the Arrow R package uses the C interface to
> > pass data to/from pyarrow. Both sides use the Arrow C++ library's
> > implementation of the C interface. See
> > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++
> > implementation is in
> > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
> >
> > Neal
> >
> > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas 
> > wrote:
> >
> > > I have been trying to wrap my head around the[ CDataInterface.rst|
> > >
> > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > > ]
> > > document for a few days now. So what I am trying is basically to use the C
> > > interface with a minimum dependencies to produce blocks of bytes that
> > > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > > vice-versa: both directions).
> > >
> > > Here's what I already tried doing.
> > >
> > >- Created a C library that contains the two structs ArrowSchema and
> > >ArrowArray and some functions to export an int64_t array as an Arrow
> > > Array.
> > >This is very similar to what the document did with int32_t arrays.
> > >- Imported the C library in Python. Created an int64_t pyarrow.array.
> > >Serialized it to read the bytes via Numpy and populated the C struct I
> > >created using the C library function.
> > >
> > > What I expected was that the bytes would have some resemblance to each
> > > other and that pyarrow would have some utility to pick up the ArrowArray
> > > struct and treat it as an Arrow Array. But I couldn't get it to work.
> > >
> > > I am also confused as to how do I use ArrowSchema properly. The
> > > ArrowSchema is
> > > the only structure that differentiates different ArrowArray formats.
> > > However, the fact that I am not using it anywhere with the ArrowArray
> > > struct
> > > or for that matter for any kind of initialization which tells the Arrow
> > > library that "The next structure you will encounter would be of the kind
> > > that the ArrowSchema has provided you", doesn't seem correct to me.
> > >
> > > It would really help me out, if you could tell if I actually 
> > > misinterpreted
> > > the doc, or am I doing something wrong. Thanks!
> > >
>

[jira] [Created] (ARROW-8272) [CI][Python] Test failure on Ubuntu 16.04

2020-03-30 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-8272:
-

 Summary: [CI][Python] Test failure on Ubuntu 16.04
 Key: ARROW-8272
 URL: https://issues.apache.org/jira/browse/ARROW-8272
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Python
Reporter: Antoine Pitrou


See https://github.com/pitrou/arrow/runs/545291564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8271) [Packaging] Allow wheel upload failures to gemfury

2020-03-30 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-8271:
--

 Summary: [Packaging] Allow wheel upload failures to gemfury
 Key: ARROW-8271
 URL: https://issues.apache.org/jira/browse/ARROW-8271
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


If we run multiple nightly/scheduled jobs per day for the same arrow commit 
then gemfury's API will refuse the upload because of conflicting versions, see 
[build|https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=9053=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=b525c197-f769-5e52-d38a-e6301f5260f2=27].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8270) [Python][Flight] Flight server with TLS's certificate and key is not working

2020-03-30 Thread Ravindra Wagh (Jira)

Ravindra Wagh created ARROW-8270:


 Summary: [Python][Flight] Flight server with TLS's certificate and 
key is not working
 Key: ARROW-8270
 URL: https://issues.apache.org/jira/browse/ARROW-8270
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Python
Affects Versions: 0.16.0
Reporter: Ravindra Wagh
Assignee: Ravindra Wagh


On starting the python server(arrow/python/examples/flight/server.py --host 
localhost --tls serv.crt serv.key), it gives below error:
{noformat}
TypeError: __init__() got an unexpected keyword argument 
'tls_cert_chain'{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8280) [C++] MinGW builds failing due to CARES-related toolchain issue

[jira] [Created] (ARROW-8279) [C++] Do not export symbols from Codec implementations, remove need for PIMPL pattern

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-03-30-1

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-30-1

The future of Parquet development for Arrow Rust?

[jira] [Created] (ARROW-8278) [C++] Simplify IPC tests by using BufferOutputStreams

[jira] [Created] (ARROW-8277) [Python] RecordBatch interface improvements

[jira] [Created] (ARROW-8276) [C++][Dataset] Scannin a Fragment does not take into account the partition columns

[jira] [Created] (ARROW-8275) [Python][Docs] Review Feather + IPC file documentation per "Feather V2" changes

[jira] [Created] (ARROW-8274) [C++] Use LZ4 frame format for "LZ4" compression in IPC write

Re: Clarification regarding the `CDataInterface.rst`

Re: Clarification regarding the `CDataInterface.rst`

Re: Clarification regarding the `CDataInterface.rst`

Re: Clarification regarding the `CDataInterface.rst`

[jira] [Created] (ARROW-8273) Fail to convert an integer list (arrow) to pandas

Re: Clarification regarding the `CDataInterface.rst`

[jira] [Created] (ARROW-8272) [CI][Python] Test failure on Ubuntu 16.04

[jira] [Created] (ARROW-8271) [Packaging] Allow wheel upload failures to gemfury

[jira] [Created] (ARROW-8270) [Python][Flight] Flight server with TLS's certificate and key is not working

19 matches

Site Navigation

Mail list logo

Footer information