[jira] [Created] (ARROW-7856) to_pandas() Causing datetimes > pd.Timestamp.max to wrap around
Kevin Glasson created ARROW-7856: Summary: to_pandas() Causing datetimes > pd.Timestamp.max to wrap around Key: ARROW-7856 URL: https://issues.apache.org/jira/browse/ARROW-7856 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.1 Environment: Distributor ID: Ubuntu Description:Ubuntu 18.04.4 LTS Release:18.04 Codename: bionic Python 3.7.3 In [3]: pa.__version__ Out[3]: '0.15.1' In [4]: pd.__version__ Out[4]: '0.25.2' Reporter: Kevin Glasson When writing a dataframe containing `datetime.datetime` in an object columns any datetime that is greater than pd.Timestamp.max or less than pd.Timestamp.min is wrapped around. For reference these are the timestamp min and max values. {code:java} In [43]: pd.Timestamp.max Out[43]: Timestamp('2262-04-11 23:47:16.854775807') In [44]: pd.Timestamp.min Out[44]: Timestamp('1677-09-21 00:12:43.145225') {code} To reproduce the error using pandas {code:java} In [49]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]}) In [50]: df Out[50]: A 0 2262-04-12 00:00:00 In [51]: df.to_parquet("datetimething.parquet") In [52]: pd.read_parquet("datetimething.parquet") Out[52]: A 0 1677-09-21 00:25:26.290448384 {code} I have narrowed it down as far as to note that it is happening when converting a `pa.Table` using the `to_pandas()` method. {code:java} In [30]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]}) In [31]: tf = pa.Table.from_pandas(df) In [32]: tf.columns Out[32]: [ [ [ 2262-04-12 00:00:00.00 ] ] ] In [33]: tf.to_pandas() Out[33]: A 0 1677-09-21 00:25:26.290448384 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1 On Thu, Feb 13, 2020 at 9:08 PM Fan Liya wrote: > > +1 (binding) > > On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney wrote: > > > +1 (binding) > > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou wrote: > > > > > > > > > Ah, you're right, it's PR 6040: > > > https://github.com/apache/arrow/pull/6040 > > > > > > Similarly, the C++ implementation is at PR 6026: > > > https://github.com/apache/arrow/pull/6026 > > > > > > Regards > > > > > > Antoine. > > > > > > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > > > hi Antoine, PR 5442 seems to no longer be the right one. Which open PR > > > > contains the specification now? > > > > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou > > wrote: > > > >> > > > >> > > > >> Hello, > > > >> > > > >> We have been discussing the creation of a minimalist C-based data > > > >> interface for applications to exchange Arrow columnar data structures > > > >> with each other. Some notable features of this interface include: > > > >> > > > >> * A small amount of header-only C code can be copied independently > > into > > > >> third-party libraries and downstream applications, no dependencies are > > > >> needed even on Arrow C++ itself (notably, it is not required to use > > > >> Flatbuffers, though there are trade-offs resulting from this). > > > >> > > > >> * Low development investment (in other words: limited-scope use cases > > > >> can be accomplished with little code), so as to enable C or C++ > > > >> libraries to export Arrow columnar data with minimal code. > > > >> > > > >> * Data lifetime management hooks so as to properly handle non-trivial > > > >> data sharing (for example passing Arrow columnar data to an async > > > >> processing consumer). > > > >> > > > >> This "C Data Interface" serves different use cases from the > > > >> language-independent IPC protocol and trades away a number of features > > > >> in the interest of minimalism / simplicity. It is not a replacement > > for > > > >> the IPC protocol and will only be used to interchange in-process data > > at > > > >> C or C++ call sites. > > > >> > > > >> The PR providing the specification is here: > > > >> https://github.com/apache/arrow/pull/5442 > > > >> > > > >> In particular, you can read the spec document here: > > > >> > > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > > >> > > > >> A fairly comprehensive C++ implementation of this demonstrating its > > > >> use is found here: > > > >> https://github.com/apache/arrow/pull/5608 > > > >> > > > >> (note that other applications implementing the interface may choose to > > > >> only support a few features and thus have far less code to write) > > > >> > > > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > >> > > > >> This vote will be open for at least 72 hours > > > >> > > > >> [ ] +1 Adopt C Data Interface specification > > > >> [ ] +0 > > > >> [ ] -1 Do not adopt because... > > > >> > > > >> Thank you > > > >> > > > >> Regards > > > >> > > > >> Antoine. > > > >> > > > >> > > > >> (PS: yes, this is in large part a copy/paste of Wes's previous vote > > > >> email :-)) > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1 (binding) On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney wrote: > +1 (binding) > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou wrote: > > > > > > Ah, you're right, it's PR 6040: > > https://github.com/apache/arrow/pull/6040 > > > > Similarly, the C++ implementation is at PR 6026: > > https://github.com/apache/arrow/pull/6026 > > > > Regards > > > > Antoine. > > > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > > hi Antoine, PR 5442 seems to no longer be the right one. Which open PR > > > contains the specification now? > > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou > wrote: > > >> > > >> > > >> Hello, > > >> > > >> We have been discussing the creation of a minimalist C-based data > > >> interface for applications to exchange Arrow columnar data structures > > >> with each other. Some notable features of this interface include: > > >> > > >> * A small amount of header-only C code can be copied independently > into > > >> third-party libraries and downstream applications, no dependencies are > > >> needed even on Arrow C++ itself (notably, it is not required to use > > >> Flatbuffers, though there are trade-offs resulting from this). > > >> > > >> * Low development investment (in other words: limited-scope use cases > > >> can be accomplished with little code), so as to enable C or C++ > > >> libraries to export Arrow columnar data with minimal code. > > >> > > >> * Data lifetime management hooks so as to properly handle non-trivial > > >> data sharing (for example passing Arrow columnar data to an async > > >> processing consumer). > > >> > > >> This "C Data Interface" serves different use cases from the > > >> language-independent IPC protocol and trades away a number of features > > >> in the interest of minimalism / simplicity. It is not a replacement > for > > >> the IPC protocol and will only be used to interchange in-process data > at > > >> C or C++ call sites. > > >> > > >> The PR providing the specification is here: > > >> https://github.com/apache/arrow/pull/5442 > > >> > > >> In particular, you can read the spec document here: > > >> > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > >> > > >> A fairly comprehensive C++ implementation of this demonstrating its > > >> use is found here: > > >> https://github.com/apache/arrow/pull/5608 > > >> > > >> (note that other applications implementing the interface may choose to > > >> only support a few features and thus have far less code to write) > > >> > > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > >> > > >> This vote will be open for at least 72 hours > > >> > > >> [ ] +1 Adopt C Data Interface specification > > >> [ ] +0 > > >> [ ] -1 Do not adopt because... > > >> > > >> Thank you > > >> > > >> Regards > > >> > > >> Antoine. > > >> > > >> > > >> (PS: yes, this is in large part a copy/paste of Wes's previous vote > > >> email :-)) >
[jira] [Created] (ARROW-7854) [C++][Dataset] Option to memory map when reading IPC format
Joris Van den Bossche created ARROW-7854: Summary: [C++][Dataset] Option to memory map when reading IPC format Key: ARROW-7854 URL: https://issues.apache.org/jira/browse/ARROW-7854 Project: Apache Arrow Issue Type: Improvement Components: C++ - Dataset Reporter: Joris Van den Bossche For the IPC format it would be interesting to be able to memory map the IPC files? cc [~fsaintjacques] [~bkietz] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7855) TypeError on mixed array values
Rob DiCiuccio created ARROW-7855: Summary: TypeError on mixed array values Key: ARROW-7855 URL: https://issues.apache.org/jira/browse/ARROW-7855 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.1, 0.16.0 Reporter: Rob DiCiuccio The following data structure passed to `pa.array` raises a generic `TypeError`: {code:java} import pyarrow as pa pa.array([{'TestKey': [123456, 'foo']}]) {code} {code:java} Traceback (most recent call last): File "pyarrow_list_test.py", line 30, in pa_array = pa.array([\{'TestKey': [123456, 'foo']}]) File "pyarrow/array.pxi", line 269, in pyarrow.lib.array File "pyarrow/array.pxi", line 38, in pyarrow.lib._sequence_to_array TypeError: an integer is required (got type str) {code} I understand there may be a way to overcome this by setting the `type` value as an argument to `pa.array`, but the use case here is storing results of a SQL query where the structure/type of the column is unknown. If Arrow is ultimately unable to handle this data structure without a predefined `type` passed to `pa.array`, can the exception at least us the PyArrow namespace (e.g. `pa.lib.ArrowTypeError` or `pa.lib.ArrowNotImplementedError). Any other workaround suggestions welcome. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7853) [CI][Packaging] Add nightly test that pip-installs nightly wheels
Neal Richardson created ARROW-7853: -- Summary: [CI][Packaging] Add nightly test that pip-installs nightly wheels Key: ARROW-7853 URL: https://issues.apache.org/jira/browse/ARROW-7853 Project: Apache Arrow Issue Type: New Feature Components: Continuous Integration, Packaging, Python Reporter: Neal Richardson Assignee: Krisztian Szucs Fix For: 1.0.0 This would catch issues with wheels that we only encountered during release verification. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2020-02-13-0
Arrow Build Report for Job nightly-2020-02-13-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0 Failed Tasks: - conda-linux-gcc-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-linux-gcc-py27 - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-osx-clang-py27 - macos-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-travis-macos-r-autobrew - test-conda-python-2.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-python-2.7-pandas-latest - test-conda-python-2.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-python-2.7 - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-python-3.7-pandas-master - test-conda-python-3.7-turbodbc-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-python-3.7-turbodbc-latest - test-conda-python-3.7-turbodbc-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-python-3.7-turbodbc-master - test-r-rhub-debian-gcc-devel: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-test-r-rhub-debian-gcc-devel - test-ubuntu-18.04-docs: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-ubuntu-18.04-docs - wheel-manylinux2010-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-wheel-manylinux2010-cp36m - wheel-osx-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-travis-wheel-osx-cp35m Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-centos-7 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-centos-8 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-conda-win-vs2015-py38 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-azure-debian-stretch - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-travis-gandiva-jar-osx - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-travis-gandiva-jar-trusty - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-travis-homebrew-cpp - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-cpp-valgrind - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-cpp - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-13-0-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-pandas-latest: URL:
[jira] [Created] (ARROW-7852) Update pyarrow numpy requirement
Stephanie Gott created ARROW-7852: - Summary: Update pyarrow numpy requirement Key: ARROW-7852 URL: https://issues.apache.org/jira/browse/ARROW-7852 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.16.0 Reporter: Stephanie Gott Using python 3.7.5 and numpy 1.14.6, I am unable to import pyarrow 0.16.0 (see below for error). Updating numpy to the most recent version fixes this, and I'm wondering if pyarrow needs update its requirements.txt. {code:java} ➜ ~ ipython Python 3.7.5 (default, Nov 7 2019, 10:50:52) Type 'copyright', 'credits' or 'license' for more information IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import numpy as npIn [2]: np.__version__ Out[2]: '1.14.6' In [3]: import pyarrow --- ModuleNotFoundError Traceback (most recent call last) ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' --- ImportError Traceback (most recent call last) in > 1 import pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in 47 import pyarrow.compat as compat 48 ---> 49 from pyarrow.lib import cpu_count, set_cpu_count 50 from pyarrow.lib import (null, bool_, 51 int8, int16, int32, int64,~/.local/lib/python3.7/site-packages/pyarrow/lib.pyx in init pyarrow.lib()ImportError: numpy.core.multiarray failed to import In [4]: import pyarrow --- AttributeErrorTraceback (most recent call last) in > 1 import pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in 47 import pyarrow.compat as compat 48 ---> 49 from pyarrow.lib import cpu_count, set_cpu_count 50 from pyarrow.lib import (null, bool_, 51 int8, int16, int32, int64,~/.local/lib/python3.7/site-packages/pyarrow/ipc.pxi in init pyarrow.lib()AttributeError: type object 'pyarrow.lib.Message' has no attribute '__reduce_cython__' {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7851) [Javascript] JS Documentation generation fails
Krisztian Szucs created ARROW-7851: -- Summary: [Javascript] JS Documentation generation fails Key: ARROW-7851 URL: https://issues.apache.org/jira/browse/ARROW-7851 Project: Apache Arrow Issue Type: Task Reporter: Krisztian Szucs Just surfaced on GHA https://github.com/apache/arrow/runs/443762627#step:5:11647 cc [~paultaylor] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7850) [Packaging][Python] Document how to install nightly built wheels
Krisztian Szucs created ARROW-7850: -- Summary: [Packaging][Python] Document how to install nightly built wheels Key: ARROW-7850 URL: https://issues.apache.org/jira/browse/ARROW-7850 Project: Apache Arrow Issue Type: Task Components: Python Reporter: Krisztian Szucs Fix For: 1.0.0 Follow-up work on https://github.com/apache/arrow/pull/6366#issue-371626256 As per comment https://github.com/apache/arrow/pull/6366#issuecomment-585750794 It'd be also nice to resolve the version selection issue described in the comments above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7849) [Packaging][Python] Remove the remaining py27 crossbow wheel tasks from the nightlies
Krisztian Szucs created ARROW-7849: -- Summary: [Packaging][Python] Remove the remaining py27 crossbow wheel tasks from the nightlies Key: ARROW-7849 URL: https://issues.apache.org/jira/browse/ARROW-7849 Project: Apache Arrow Issue Type: Task Components: Packaging, Python Reporter: Krisztian Szucs Assignee: Krisztian Szucs Fix For: 1.0.0 The nightly tasks are referencing deleted py27 wheel tasks, so the nightly submission has failed: https://ci.ursalabs.org/#/builders/98/builds/536 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Arrow doesn't have a MapType
On Thu, 13 Feb 2020 13:58:13 +0800 Shawn Yang wrote: > Thanks Wes. I was using 0.14 before. BTW, it seems the doc for data types > didn't updated fully. I'll submit a PR for this. The PR is integrated. Thank you Shawn! Regards Antoine.
[jira] [Created] (ARROW-7848) Add doc for MapType
Shawn Yang created ARROW-7848: - Summary: Add doc for MapType Key: ARROW-7848 URL: https://issues.apache.org/jira/browse/ARROW-7848 Project: Apache Arrow Issue Type: Bug Components: Documentation Affects Versions: 0.15.1, 0.16.0, 0.15.0 Reporter: Shawn Yang Fix For: 0.15.1, 0.16.0, 0.15.0 MapType added in 0.15 doesn't update doc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7847) [Web] Write a blog post about fuzzing
Antoine Pitrou created ARROW-7847: - Summary: [Web] Write a blog post about fuzzing Key: ARROW-7847 URL: https://issues.apache.org/jira/browse/ARROW-7847 Project: Apache Arrow Issue Type: Task Components: Website Reporter: Antoine Pitrou At some point we should probably write a blog post about the current fuzzing setup. Perhaps when we have fixed all reported crashes :-) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7846) [Python][Dev] Remove last dependencies on six
Antoine Pitrou created ARROW-7846: - Summary: [Python][Dev] Remove last dependencies on six Key: ARROW-7846 URL: https://issues.apache.org/jira/browse/ARROW-7846 Project: Apache Arrow Issue Type: Task Components: Developer Tools, Python Reporter: Antoine Pitrou Fix For: 1.0.0 Looks like {{six}} (the Python 2-3 compatibility library) is still being used and referenced in a couple of places, notable {{archery}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)