[jira] [Created] (ARROW-18433) Optimize aggregate functions to work with batches.
A. Coady created ARROW-18433: Summary: Optimize aggregate functions to work with batches. Key: ARROW-18433 URL: https://issues.apache.org/jira/browse/ARROW-18433 Project: Apache Arrow Issue Type: New Feature Components: C++, Python Affects Versions: 10.0.1 Reporter: A. Coady Most compute functions work with the dataset api and don't load columns. But aggregate functions which are associative could also work: `min`, `max`, `any`, `all`, `sum`, `product`. Even `unique` and `value_counts`. A couple of implementation ideas: * expand the dataset api to support expressions which return scalars * add a `BatchedArray` type which is like a `ChunkedArray` but with lazy loading -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18432) [Python] Array constructor doesn't support arrow scalars.
A. Coady created ARROW-18432: Summary: [Python] Array constructor doesn't support arrow scalars. Key: ARROW-18432 URL: https://issues.apache.org/jira/browse/ARROW-18432 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 10.0.1 Reporter: A. Coady {code:python} pa.array([pa.scalar(0)]) ArrowInvalid: Could not convert with type pyarrow.lib.Int64Scalar: did not recognize Python value type when inferring an Arrow data type pa.array([pa.scalar(0)], 'int64') ArrowInvalid: Could not convert with type pyarrow.lib.Int64Scalar: tried to convert to int64{code} It seems odd that the array constructors don't recognize their own scalars. In practice, a list of scalars has to be converted with `.as_py()` just to be converted back, and that also loses the type information. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16277) [Python] No builds for macOS arm64.
A. Coady created ARROW-16277: Summary: [Python] No builds for macOS arm64. Key: ARROW-16277 URL: https://issues.apache.org/jira/browse/ARROW-16277 Project: Apache Arrow Issue Type: Task Components: Python Affects Versions: 8.0.0 Environment: macOS Reporter: A. Coady Nightly builds no longer include a build for macOS for arm64. The last one to do so was 8.0.0.dev312. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16015) Scanning batch size is limited to 65536 (2**16).
A. Coady created ARROW-16015: Summary: Scanning batch size is limited to 65536 (2**16). Key: ARROW-16015 URL: https://issues.apache.org/jira/browse/ARROW-16015 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 7.0.0, 8.0.0 Environment: macOS Reporter: A. Coady [Scanning batches|https://arrow.apache.org/docs/python/dataset.html#iterative-out-of-core-or-streaming-reads] is documented to default to a batch size of 1,000,000. But the behavior is that batch size defaults to - and is limited to - 65536. {code:python} In []: dataset.count_rows() Out[]: 538038292 In []: next(dataset.to_batches()).num_rows Out[]: 65536 In []: next(dataset.to_batches(batch_size=10**6)).num_rows Out[]: 65536 In []: next(dataset.to_batches(batch_size=10**4)).num_rows Out[]: 1 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15748) [Python] Round temporal options default unit is `day` but documented as `second`.
A. Coady created ARROW-15748: Summary: [Python] Round temporal options default unit is `day` but documented as `second`. Key: ARROW-15748 URL: https://issues.apache.org/jira/browse/ARROW-15748 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 8.0.0 Reporter: A. Coady The [python documentation for round temporal options |https://arrow.apache.org/docs/dev/python/generated/pyarrow.compute.RoundTemporalOptions.html] says the default unit is `second`, but the [actual behavior|https://arrow.apache.org/docs/dev/cpp/api/compute.html#classarrow_1_1compute_1_1_round_temporal_options] is a default of `day`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15736) [C++] Aggregate functions for min and max index.
A. Coady created ARROW-15736: Summary: [C++] Aggregate functions for min and max index. Key: ARROW-15736 URL: https://issues.apache.org/jira/browse/ARROW-15736 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: A. Coady Numpy and Pandas both have `argmin` and `argmax`, for the common use case of finding values in parallel arrays which correspond to min or max values. Proposals: * `min_max_index` for arrays * `hash_min_max_index` for aggregations * some ability to break ties: ** `min_max_index` for tables with multiple sort keys, similar to `sort_indices` ** `min_max_indices` for arrays to match all equal values -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15735) [C++] Hash aggregate functions to return first and last value from a group.
A. Coady created ARROW-15735: Summary: [C++] Hash aggregate functions to return first and last value from a group. Key: ARROW-15735 URL: https://issues.apache.org/jira/browse/ARROW-15735 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: A. Coady Follow-up to ARROW-13993, which implemented `hash_one` to select an arbitrary value, as the core engine lack support for ordering. I think `first` and `last` will still be in demand though, based on pandas and sql usage. It could be done without core changes by using `min_max` on an array of indices. For that reason, maybe it would be better as `hash_\{first,last}_index`, suitable for use with `take`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15412) [C++][Python] Slicing a table with no columns returns a table with incorrect length.
A. Coady created ARROW-15412: Summary: [C++][Python] Slicing a table with no columns returns a table with incorrect length. Key: ARROW-15412 URL: https://issues.apache.org/jira/browse/ARROW-15412 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 6.0.1 Reporter: A. Coady Python `[:]` slicing works on tables with no columns, because the slice inputs are normalized. But the `slice` method is inconsistent. {code:python} In [1]: import pyarrow as pa In [2]: table = pa.table({'col': range(3)}) In [3]: table.slice(1).num_rows Out[3]: 2 In [4]: table.select([])[1:].num_rows Out[4]: 2 In [5]: table.select([]).slice(1).num_rows Out[5]: 3 In [6]: table.select([]).slice(1, 4).num_rows Out[6]: 4 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15318) [C++][Python] Regression reading partition keys of large batches.
A. Coady created ARROW-15318: Summary: [C++][Python] Regression reading partition keys of large batches. Key: ARROW-15318 URL: https://issues.apache.org/jira/browse/ARROW-15318 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 7.0.0 Reporter: A. Coady In a partitioned dataset with chunks larger than the default 1Gi batch size, reading _only_ the partition keys is hanging, and consuming unbounded memory. The bug first appeared in nightly build `7.0.0.dev468`. {code:python} In [1]: import pyarrow as pa, pyarrow.parquet as pq, numpy as np In [2]: pa.__version__ Out[2]: '7.0.0.dev468' In [3]: table = pa.table({'key': pa.repeat(0, 2 ** 20 + 1), 'value': np.arange(2 ** 20 + 1)}) In [4]: pq.write_to_dataset(table[:2 ** 20], 'one', partition_cols=['key']) In [5]: pq.write_to_dataset(table[:2 ** 20 + 1], 'two', partition_cols=['key']) In [6]: pq.read_table('one', columns=['key'])['key'].num_chunks Out[6]: 1 In [7]: pq.read_table('two', columns=['key', 'value'])['key'].num_chunks Out[7]: 2 In [8]: pq.read_table('two', columns=['key'])['key'].num_chunks zsh: killed ipython # hangs; kllled {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15210) [Python] pyarrow.compute functions don't convert args with __arrow_array__.
A. Coady created ARROW-15210: Summary: [Python] pyarrow.compute functions don't convert args with __arrow_array__. Key: ARROW-15210 URL: https://issues.apache.org/jira/browse/ARROW-15210 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 6.0.1 Reporter: A. Coady When the compute functions pack args, lists and numpy arrays are converted into arrow arrays, but objects with `__arrow_array__` defined aren't. A TypeError is raised. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15202) Create pyarrow array using an object's `__array__` method.
A. Coady created ARROW-15202: Summary: Create pyarrow array using an object's `__array__` method. Key: ARROW-15202 URL: https://issues.apache.org/jira/browse/ARROW-15202 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 6.0.1 Reporter: A. Coady `pa.array` supports optimized creation from an object with the `__arrow_array__` method, or from a literal NumPy ndarray. But there's a performance gap if the input object has only an `__array__` method, as it isn't used. So the user has to know to call `np.asarray` first. And even if the original object could be extended to support '__arrow_array__`, it doesn't seems like a great workaround if all that method would do is call `pa.array(np.asarray(self))`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15152) Implement a `hash_list` aggregate function.
A. Coady created ARROW-15152: Summary: Implement a `hash_list` aggregate function. Key: ARROW-15152 URL: https://issues.apache.org/jira/browse/ARROW-15152 Project: Apache Arrow Issue Type: New Feature Components: C++, Python Affects Versions: 6.0.1 Reporter: A. Coady For more advanced aggregations, it's helpful to be able to gather the grouped values into a list array. Pandas and Polars both have that feature. And `hash_distinct` already aggregates to lists, so all the building blocks are there. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14129) [C++] An empty dictionary arrow crashes on `unique` and `value_counts`.
A. Coady created ARROW-14129: Summary: [C++] An empty dictionary arrow crashes on `unique` and `value_counts`. Key: ARROW-14129 URL: https://issues.apache.org/jira/browse/ARROW-14129 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 5.0.0 Reporter: A. Coady {code:python} import pyarrow as pa arr = pa.array(range(3)).dictionary_encode() assert not arr[:0] assert arr[:0].unique() # Check failed: (data->dictionary) != (nullptr) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13688) [Python] Expose GroupBy function.
A. Coady created ARROW-13688: Summary: [Python] Expose GroupBy function. Key: ARROW-13688 URL: https://issues.apache.org/jira/browse/ARROW-13688 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 5.0.0 Reporter: A. Coady The hash_aggregate functions are in `pyarrow.compute`, but they're not directly callable. Looks like GroupBy is unavailable in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13522) Regression with compute `utf8_*trim` functions.
A. Coady created ARROW-13522: Summary: Regression with compute `utf8_*trim` functions. Key: ARROW-13522 URL: https://issues.apache.org/jira/browse/ARROW-13522 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 5.0.0 Reporter: A. Coady ```python import pyarrow as pa import pyarrow.compute as pc arr = pa.array(["ab", "ac"]) assert pc.utf8_ltrim(arr, characters="a").to_pylist() == ["b", ""] assert pc.utf8_rtrim(arr, characters="b").to_pylist() == ["a", "a"] ``` Seems to go awry after the first match. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12906) `fill_null` called with a null value seg faults on non fixed-sized types.
A. Coady created ARROW-12906: Summary: `fill_null` called with a null value seg faults on non fixed-sized types. Key: ARROW-12906 URL: https://issues.apache.org/jira/browse/ARROW-12906 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 4.0.0 Environment: macOS, ubuntu Reporter: A. Coady {code:java} import pyarrow as pa assert pa.array([0]).fill_null(None) pa.array([""]).fill_null(None) # crash {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12606) Quantile function failing on list scalars.
A. Coady created ARROW-12606: Summary: Quantile function failing on list scalars. Key: ARROW-12606 URL: https://issues.apache.org/jira/browse/ARROW-12606 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 4.0.0 Environment: macOS, ubuntu Reporter: A. Coady Sounds unrelated, but it's the simplest example I've gotten to reproduce. {code:java} import pyarrow as pa, pyarrow.compute as pc array = pa.array([[0], [1]]) first, second = [pc.quantile(scalar.values) for scalar in array] assert first.to_pylist() == [0] assert (second.to_pylist() == [1]), second # 7.20576e+16{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11694) Sorting a non-chunked array may seg fault.
A. Coady created ARROW-11694: Summary: Sorting a non-chunked array may seg fault. Key: ARROW-11694 URL: https://issues.apache.org/jira/browse/ARROW-11694 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 3.0.0 Environment: macOS, ubuntu Reporter: A. Coady Sorting a non-chunked array has some sort of reference problem. Some operations on the resulting indices array crash. {code:python} import pyarrow as pa, pyarrow.compute as pc array = pa.array(list("abcba")) assert pc.sort_indices(pa.chunked_array([array])).take([0]) assert pc.array_sort_indices(pa.chunked_array([array])).take([0]) pc.sort_indices(array).take([0]) # crash pc.array_sort_indices(array).take([0]) # crash {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8911) An empty ChunkedArray created by `filter` can crash.
A. Coady created ARROW-8911: --- Summary: An empty ChunkedArray created by `filter` can crash. Key: ARROW-8911 URL: https://issues.apache.org/jira/browse/ARROW-8911 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.1 Environment: macOS, ubuntu Reporter: A. Coady {code:python} import pyarrow as pa arr = pa.chunked_array([[1]]) empty = arr.filter(pa.array([False])) print(empty) print(empty[:]) # <- crash {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8685) [Python] ImportError with NumPy<1.16.
[ https://issues.apache.org/jira/browse/ARROW-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Coady resolved ARROW-8685. - Resolution: Not A Bug Thanks. Confirmed that this is Python 3.8 only, and numpy 1.15 doesn't officially support 3.8. So nothing to fix. > [Python] ImportError with NumPy<1.16. > - > > Key: ARROW-8685 > URL: https://issues.apache.org/jira/browse/ARROW-8685 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging, Python >Affects Versions: 0.17.0 >Reporter: A. Coady >Priority: Major > > {noformat} > # pip install 'numpy<1.16' pyarrow... > Successfully built numpy > Installing collected packages: numpy, pyarrow > Successfully installed numpy-1.15.4 pyarrow-0.17.0 > # python -c 'import pyarrow'ModuleNotFoundError: No module named > 'numpy.core._multiarray_umath'Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, > in > from pyarrow.lib import cpu_count, set_cpu_count > File "pyarrow/lib.pyx", line 37, in init pyarrow.lib > ImportError: numpy.core.multiarray failed to import > {noformat} > Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to > be >=1.16. > This is related to ARROW-7852; users will still see an ImportError if an > older NumPy was already installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8685) [Python] ImportError with NumPy<1.16.
[ https://issues.apache.org/jira/browse/ARROW-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Coady updated ARROW-8685: Description: {noformat} # pip install 'numpy<1.16' pyarrow... Successfully built numpy Installing collected packages: numpy, pyarrow Successfully installed numpy-1.15.4 pyarrow-0.17.0 # python -c 'import pyarrow'ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, in from pyarrow.lib import cpu_count, set_cpu_count File "pyarrow/lib.pyx", line 37, in init pyarrow.lib ImportError: numpy.core.multiarray failed to import {noformat} Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to be >=1.16. This is related to ARROW-7852; users will still see an ImportError if an older NumPy was already installed. was: # pip install 'numpy<1.16' pyarrow... Successfully built numpy Installing collected packages: numpy, pyarrow Successfully installed numpy-1.15.4 pyarrow-0.17.0 # python -c 'import pyarrow'ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, in from pyarrow.lib import cpu_count, set_cpu_count File "pyarrow/lib.pyx", line 37, in init pyarrow.lib ImportError: numpy.core.multiarray failed to import Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to be >=1.16. This is related to ARROW-7852; users will still see an ImportError if an older NumPy was already installed. > [Python] ImportError with NumPy<1.16. > - > > Key: ARROW-8685 > URL: https://issues.apache.org/jira/browse/ARROW-8685 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging, Python >Affects Versions: 0.17.0 >Reporter: A. Coady >Priority: Major > > {noformat} > # pip install 'numpy<1.16' pyarrow... > Successfully built numpy > Installing collected packages: numpy, pyarrow > Successfully installed numpy-1.15.4 pyarrow-0.17.0 > # python -c 'import pyarrow'ModuleNotFoundError: No module named > 'numpy.core._multiarray_umath'Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, > in > from pyarrow.lib import cpu_count, set_cpu_count > File "pyarrow/lib.pyx", line 37, in init pyarrow.lib > ImportError: numpy.core.multiarray failed to import > {noformat} > Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to > be >=1.16. > This is related to ARROW-7852; users will still see an ImportError if an > older NumPy was already installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8685) [Python] ImportError with NumPy<1.16.
A. Coady created ARROW-8685: --- Summary: [Python] ImportError with NumPy<1.16. Key: ARROW-8685 URL: https://issues.apache.org/jira/browse/ARROW-8685 Project: Apache Arrow Issue Type: Bug Components: Packaging, Python Affects Versions: 0.17.0 Reporter: A. Coady # pip install 'numpy<1.16' pyarrow... Successfully built numpy Installing collected packages: numpy, pyarrow Successfully installed numpy-1.15.4 pyarrow-0.17.0 # python -c 'import pyarrow'ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, in from pyarrow.lib import cpu_count, set_cpu_count File "pyarrow/lib.pyx", line 37, in init pyarrow.lib ImportError: numpy.core.multiarray failed to import Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to be >=1.16. This is related to ARROW-7852; users will still see an ImportError if an older NumPy was already installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-7852) [Python] 0.16.0 wheels not compatible with older numpy
[ https://issues.apache.org/jira/browse/ARROW-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098512#comment-17098512 ] A. Coady edited comment on ARROW-7852 at 5/3/20, 5:57 PM: -- This is still an issue with an existing numpy installed. {code:java} // # pip install 'numpy<1.16' pyarrow ... Successfully built numpy Installing collected packages: numpy, pyarrow Successfully installed numpy-1.15.4 pyarrow-0.17.0 # python -c 'import pyarrow' ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, in from pyarrow.lib import cpu_count, set_cpu_count File "pyarrow/lib.pyx", line 37, in init pyarrow.lib ImportError: numpy.core.multiarray failed to import {code} I think the setup `install_requires` needs to specify numpy>=1.16. was (Author: coady): This is still an issue with an existing numpy installed. {code:java} // # pip install 'numpy<1.16' pyarrow Collecting numpy<1.16 Downloading numpy-1.15.4.zip (4.5 MB) || 4.5 MB 193 kB/s Collecting pyarrow Downloading pyarrow-0.17.0-cp38-cp38-manylinux2014_x86_64.whl (63.8 MB) || 63.8 MB 119 kB/s Building wheels for collected packages: numpy Building wheel for numpy (setup.py) ... done Created wheel for numpy: filename=numpy-1.15.4-cp38-cp38-linux_x86_64.whl size=13772718 sha256=cec36267c8aee27facc89dfba5910937d8e31b19eb4e3ee68beb9c3e936d7ee8 Stored in directory: /root/.cache/pip/wheels/d6/69/4d/48915a531b781ba9f19dd1d5c3da7e46303e31b6ad5b726d84 Successfully built numpy Installing collected packages: numpy, pyarrow Successfully installed numpy-1.15.4 pyarrow-0.17.0 root@d5515e090702:/# python -c 'import pyarrow' ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, in from pyarrow.lib import cpu_count, set_cpu_count File "pyarrow/lib.pyx", line 37, in init pyarrow.lib ImportError: numpy.core.multiarray failed to import {code} I think the setup `install_requires` needs to specify numpy>=1.16. > [Python] 0.16.0 wheels not compatible with older numpy > -- > > Key: ARROW-7852 > URL: https://issues.apache.org/jira/browse/ARROW-7852 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging, Python >Affects Versions: 0.16.0 >Reporter: Stephanie Gott >Assignee: Krisztian Szucs >Priority: Blocker > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Using python 3.7.5 and numpy 1.14.6, I am unable to import pyarrow 0.16.0 > (see below for error). Updating numpy to the most recent version fixes this, > and I'm wondering if pyarrow needs update its requirements.txt. > > {code:java} > ➜ ~ ipython > Python 3.7.5 (default, Nov 7 2019, 10:50:52) > Type 'copyright', 'credits' or 'license' for more information > IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: import numpy as npIn [2]: np.__version__ > Out[2]: '1.14.6' > In [3]: import pyarrow > --- > ModuleNotFoundError Traceback (most recent call last) > ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' > --- > ImportError Traceback (most recent call last) > in > > 1 import > pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in > 47 import pyarrow.compat as compat > 48 > ---> 49 from pyarrow.lib import cpu_count, set_cpu_count > 50 from pyarrow.lib import (null, bool_, > 51 int8, int16, int32, > int64,~/.local/lib/python3.7/site-packages/pyarrow/lib.pyx in init > pyarrow.lib()ImportError: numpy.core.multiarray failed to import > In [4]: import pyarrow > --- > AttributeErrorTraceback (most recent call last) > in > > 1 import > pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in > 47 import pyarrow.compat as compat > 48 > ---> 49 from pyarrow.lib import cpu_count, set_cpu_count > 50 from pyarrow.lib import (null, bool_, > 51 int8, int16, int32, > int64,~/.local/lib/python3.7/site-packages/pyarrow/ipc.pxi in init > pyarrow.lib()AttributeError: type object 'pyarrow.lib.Message' has no > attribute '__reduce_cyt
[jira] [Commented] (ARROW-7852) [Python] 0.16.0 wheels not compatible with older numpy
[ https://issues.apache.org/jira/browse/ARROW-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098512#comment-17098512 ] A. Coady commented on ARROW-7852: - This is still an issue with an existing numpy installed. {code:java} // # pip install 'numpy<1.16' pyarrow Collecting numpy<1.16 Downloading numpy-1.15.4.zip (4.5 MB) || 4.5 MB 193 kB/s Collecting pyarrow Downloading pyarrow-0.17.0-cp38-cp38-manylinux2014_x86_64.whl (63.8 MB) || 63.8 MB 119 kB/s Building wheels for collected packages: numpy Building wheel for numpy (setup.py) ... done Created wheel for numpy: filename=numpy-1.15.4-cp38-cp38-linux_x86_64.whl size=13772718 sha256=cec36267c8aee27facc89dfba5910937d8e31b19eb4e3ee68beb9c3e936d7ee8 Stored in directory: /root/.cache/pip/wheels/d6/69/4d/48915a531b781ba9f19dd1d5c3da7e46303e31b6ad5b726d84 Successfully built numpy Installing collected packages: numpy, pyarrow Successfully installed numpy-1.15.4 pyarrow-0.17.0 root@d5515e090702:/# python -c 'import pyarrow' ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, in from pyarrow.lib import cpu_count, set_cpu_count File "pyarrow/lib.pyx", line 37, in init pyarrow.lib ImportError: numpy.core.multiarray failed to import {code} I think the setup `install_requires` needs to specify numpy>=1.16. > [Python] 0.16.0 wheels not compatible with older numpy > -- > > Key: ARROW-7852 > URL: https://issues.apache.org/jira/browse/ARROW-7852 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging, Python >Affects Versions: 0.16.0 >Reporter: Stephanie Gott >Assignee: Krisztian Szucs >Priority: Blocker > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Using python 3.7.5 and numpy 1.14.6, I am unable to import pyarrow 0.16.0 > (see below for error). Updating numpy to the most recent version fixes this, > and I'm wondering if pyarrow needs update its requirements.txt. > > {code:java} > ➜ ~ ipython > Python 3.7.5 (default, Nov 7 2019, 10:50:52) > Type 'copyright', 'credits' or 'license' for more information > IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: import numpy as npIn [2]: np.__version__ > Out[2]: '1.14.6' > In [3]: import pyarrow > --- > ModuleNotFoundError Traceback (most recent call last) > ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' > --- > ImportError Traceback (most recent call last) > in > > 1 import > pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in > 47 import pyarrow.compat as compat > 48 > ---> 49 from pyarrow.lib import cpu_count, set_cpu_count > 50 from pyarrow.lib import (null, bool_, > 51 int8, int16, int32, > int64,~/.local/lib/python3.7/site-packages/pyarrow/lib.pyx in init > pyarrow.lib()ImportError: numpy.core.multiarray failed to import > In [4]: import pyarrow > --- > AttributeErrorTraceback (most recent call last) > in > > 1 import > pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in > 47 import pyarrow.compat as compat > 48 > ---> 49 from pyarrow.lib import cpu_count, set_cpu_count > 50 from pyarrow.lib import (null, bool_, > 51 int8, int16, int32, > int64,~/.local/lib/python3.7/site-packages/pyarrow/ipc.pxi in init > pyarrow.lib()AttributeError: type object 'pyarrow.lib.Message' has no > attribute '__reduce_cython__' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)