[jira] [Created] (ARROW-18433) Optimize aggregate functions to work with batches.

2022-12-10 Thread A. Coady (Jira)
A. Coady created ARROW-18433:


 Summary: Optimize aggregate functions to work with batches.
 Key: ARROW-18433
 URL: https://issues.apache.org/jira/browse/ARROW-18433
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Affects Versions: 10.0.1
Reporter: A. Coady


Most compute functions work with the dataset api and don't load columns. But 
aggregate functions which are associative could also work: `min`, `max`, `any`, 
`all`, `sum`, `product`. Even `unique` and `value_counts`.

A couple of implementation ideas:
 * expand the dataset api to support expressions which return scalars
 * add a `BatchedArray` type which is like a `ChunkedArray` but with lazy 
loading



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18432) [Python] Array constructor doesn't support arrow scalars.

2022-12-10 Thread A. Coady (Jira)
A. Coady created ARROW-18432:


 Summary: [Python] Array constructor doesn't support arrow scalars.
 Key: ARROW-18432
 URL: https://issues.apache.org/jira/browse/ARROW-18432
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 10.0.1
Reporter: A. Coady


{code:python}
pa.array([pa.scalar(0)])
ArrowInvalid: Could not convert  with type 
pyarrow.lib.Int64Scalar: did not recognize Python value type when inferring an 
Arrow data type

pa.array([pa.scalar(0)], 'int64')
ArrowInvalid: Could not convert  with type 
pyarrow.lib.Int64Scalar: tried to convert to int64{code}
It seems odd that the array constructors don't recognize their own scalars.

In practice, a list of scalars has to be converted with `.as_py()` just to be 
converted back, and that also loses the type information.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16277) [Python] No builds for macOS arm64.

2022-04-21 Thread A. Coady (Jira)
A. Coady created ARROW-16277:


 Summary: [Python] No builds for macOS arm64.
 Key: ARROW-16277
 URL: https://issues.apache.org/jira/browse/ARROW-16277
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Affects Versions: 8.0.0
 Environment: macOS
Reporter: A. Coady


Nightly builds no longer include a build for macOS for arm64. The last one to 
do so was 8.0.0.dev312.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16015) Scanning batch size is limited to 65536 (2**16).

2022-03-23 Thread A. Coady (Jira)
A. Coady created ARROW-16015:


 Summary: Scanning batch size is limited to 65536 (2**16).
 Key: ARROW-16015
 URL: https://issues.apache.org/jira/browse/ARROW-16015
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 7.0.0, 8.0.0
 Environment: macOS
Reporter: A. Coady


[Scanning 
batches|https://arrow.apache.org/docs/python/dataset.html#iterative-out-of-core-or-streaming-reads]
 is documented to default to a batch size of 1,000,000. But the behavior is 
that batch size defaults to - and is limited to - 65536.
{code:python}
In []: dataset.count_rows()
Out[]: 538038292

In []: next(dataset.to_batches()).num_rows
Out[]: 65536

In []: next(dataset.to_batches(batch_size=10**6)).num_rows
Out[]: 65536

In []: next(dataset.to_batches(batch_size=10**4)).num_rows
Out[]: 1
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15748) [Python] Round temporal options default unit is `day` but documented as `second`.

2022-02-21 Thread A. Coady (Jira)
A. Coady created ARROW-15748:


 Summary: [Python] Round temporal options default unit is `day` but 
documented as `second`.
 Key: ARROW-15748
 URL: https://issues.apache.org/jira/browse/ARROW-15748
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 8.0.0
Reporter: A. Coady


The [python documentation for round temporal options 
|https://arrow.apache.org/docs/dev/python/generated/pyarrow.compute.RoundTemporalOptions.html]
 says the default unit is `second`, but the [actual 
behavior|https://arrow.apache.org/docs/dev/cpp/api/compute.html#classarrow_1_1compute_1_1_round_temporal_options]
 is a default of `day`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15736) [C++] Aggregate functions for min and max index.

2022-02-20 Thread A. Coady (Jira)
A. Coady created ARROW-15736:


 Summary: [C++] Aggregate functions for min and max index.
 Key: ARROW-15736
 URL: https://issues.apache.org/jira/browse/ARROW-15736
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: A. Coady


Numpy and Pandas both have `argmin` and `argmax`, for the common use case of 
finding values in parallel arrays which correspond to min or max values. 
Proposals:
 * `min_max_index` for arrays
 * `hash_min_max_index` for aggregations
 * some ability to break ties:
 ** `min_max_index` for tables with multiple sort keys, similar to 
`sort_indices`
 ** `min_max_indices` for arrays to match all equal values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15735) [C++] Hash aggregate functions to return first and last value from a group.

2022-02-20 Thread A. Coady (Jira)
A. Coady created ARROW-15735:


 Summary: [C++] Hash aggregate functions to return first and last 
value from a group.
 Key: ARROW-15735
 URL: https://issues.apache.org/jira/browse/ARROW-15735
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: A. Coady


Follow-up to ARROW-13993, which implemented `hash_one` to select an arbitrary 
value, as the core engine lack support for ordering. I think `first` and `last` 
will still be in demand though, based on pandas and sql usage.

It could be done without core changes by using `min_max` on an array of 
indices. For that reason, maybe it would be better as 
`hash_\{first,last}_index`, suitable for use with `take`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15412) [C++][Python] Slicing a table with no columns returns a table with incorrect length.

2022-01-21 Thread A. Coady (Jira)
A. Coady created ARROW-15412:


 Summary: [C++][Python] Slicing a table with no columns returns a 
table with incorrect length.
 Key: ARROW-15412
 URL: https://issues.apache.org/jira/browse/ARROW-15412
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 6.0.1
Reporter: A. Coady


Python `[:]` slicing works on tables with no columns, because the slice inputs 
are normalized. But the `slice` method is inconsistent.
{code:python}
In [1]: import pyarrow as pa

In [2]: table = pa.table({'col': range(3)})

In [3]: table.slice(1).num_rows
Out[3]: 2

In [4]: table.select([])[1:].num_rows
Out[4]: 2

In [5]: table.select([]).slice(1).num_rows
Out[5]: 3

In [6]: table.select([]).slice(1, 4).num_rows
Out[6]: 4
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15318) [C++][Python] Regression reading partition keys of large batches.

2022-01-12 Thread A. Coady (Jira)
A. Coady created ARROW-15318:


 Summary: [C++][Python] Regression reading partition keys of large 
batches.
 Key: ARROW-15318
 URL: https://issues.apache.org/jira/browse/ARROW-15318
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 7.0.0
Reporter: A. Coady


In a partitioned dataset with chunks larger than the default 1Gi batch size, 
reading _only_ the partition keys is hanging, and consuming unbounded memory. 
The bug first appeared in nightly build `7.0.0.dev468`.

{code:python}
In [1]: import pyarrow as pa, pyarrow.parquet as pq, numpy as np

In [2]: pa.__version__
Out[2]: '7.0.0.dev468'

In [3]: table = pa.table({'key': pa.repeat(0, 2 ** 20 + 1), 'value': 
np.arange(2 ** 20 + 1)})

In [4]: pq.write_to_dataset(table[:2 ** 20], 'one', partition_cols=['key'])

In [5]: pq.write_to_dataset(table[:2 ** 20 + 1], 'two', partition_cols=['key'])

In [6]: pq.read_table('one', columns=['key'])['key'].num_chunks
Out[6]: 1

In [7]: pq.read_table('two', columns=['key', 'value'])['key'].num_chunks
Out[7]: 2

In [8]: pq.read_table('two', columns=['key'])['key'].num_chunks
zsh: killed ipython # hangs; kllled
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15210) [Python] pyarrow.compute functions don't convert args with __arrow_array__.

2021-12-28 Thread A. Coady (Jira)
A. Coady created ARROW-15210:


 Summary: [Python] pyarrow.compute functions don't convert args 
with __arrow_array__.
 Key: ARROW-15210
 URL: https://issues.apache.org/jira/browse/ARROW-15210
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Affects Versions: 6.0.1
Reporter: A. Coady


When the compute functions pack args, lists and numpy arrays are converted into 
arrow arrays, but objects with `__arrow_array__` defined aren't. A TypeError is 
raised.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15202) Create pyarrow array using an object's `__array__` method.

2021-12-24 Thread A. Coady (Jira)
A. Coady created ARROW-15202:


 Summary: Create pyarrow array using an object's `__array__` method.
 Key: ARROW-15202
 URL: https://issues.apache.org/jira/browse/ARROW-15202
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Affects Versions: 6.0.1
Reporter: A. Coady


`pa.array` supports optimized creation from an object with the 
`__arrow_array__` method, or from a literal NumPy ndarray. But there's a 
performance gap if the input object has only an `__array__` method, as it isn't 
used.

 

So the user has to know to call `np.asarray` first. And even if the original 
object could be extended to support '__arrow_array__`, it doesn't seems like a 
great workaround if all that method would do is call 
`pa.array(np.asarray(self))`.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15152) Implement a `hash_list` aggregate function.

2021-12-19 Thread A. Coady (Jira)
A. Coady created ARROW-15152:


 Summary: Implement a `hash_list` aggregate function.
 Key: ARROW-15152
 URL: https://issues.apache.org/jira/browse/ARROW-15152
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Affects Versions: 6.0.1
Reporter: A. Coady


For more advanced aggregations, it's helpful to be able to gather the grouped 
values into a list array. Pandas and Polars both have that feature. And 
`hash_distinct` already aggregates to lists, so all the building blocks are 
there.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-14129) [C++] An empty dictionary arrow crashes on `unique` and `value_counts`.

2021-09-24 Thread A. Coady (Jira)
A. Coady created ARROW-14129:


 Summary: [C++] An empty dictionary arrow crashes on `unique` and 
`value_counts`.
 Key: ARROW-14129
 URL: https://issues.apache.org/jira/browse/ARROW-14129
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 5.0.0
Reporter: A. Coady


{code:python}
import pyarrow as pa
arr = pa.array(range(3)).dictionary_encode()
assert not arr[:0]
assert arr[:0].unique() # Check failed: (data->dictionary) != (nullptr) 
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13688) [Python] Expose GroupBy function.

2021-08-21 Thread A. Coady (Jira)
A. Coady created ARROW-13688:


 Summary: [Python] Expose GroupBy function.
 Key: ARROW-13688
 URL: https://issues.apache.org/jira/browse/ARROW-13688
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 5.0.0
Reporter: A. Coady


The hash_aggregate functions are in `pyarrow.compute`, but they're not directly 
callable. Looks like GroupBy is unavailable in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13522) Regression with compute `utf8_*trim` functions.

2021-08-01 Thread A. Coady (Jira)
A. Coady created ARROW-13522:


 Summary: Regression with compute `utf8_*trim` functions.
 Key: ARROW-13522
 URL: https://issues.apache.org/jira/browse/ARROW-13522
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 5.0.0
Reporter: A. Coady


```python
import pyarrow as pa
import pyarrow.compute as pc

arr = pa.array(["ab", "ac"])
assert pc.utf8_ltrim(arr, characters="a").to_pylist() == ["b", ""]
assert pc.utf8_rtrim(arr, characters="b").to_pylist() == ["a", "a"]
```

Seems to go awry after the first match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12906) `fill_null` called with a null value seg faults on non fixed-sized types.

2021-05-29 Thread A. Coady (Jira)
A. Coady created ARROW-12906:


 Summary: `fill_null` called with a null value seg faults on non 
fixed-sized types.
 Key: ARROW-12906
 URL: https://issues.apache.org/jira/browse/ARROW-12906
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 4.0.0
 Environment: macOS, ubuntu
Reporter: A. Coady


 
{code:java}
import pyarrow as pa 

assert pa.array([0]).fill_null(None)
pa.array([""]).fill_null(None) # crash  {code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12606) Quantile function failing on list scalars.

2021-04-29 Thread A. Coady (Jira)
A. Coady created ARROW-12606:


 Summary: Quantile function failing on list scalars.
 Key: ARROW-12606
 URL: https://issues.apache.org/jira/browse/ARROW-12606
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 4.0.0
 Environment: macOS, ubuntu
Reporter: A. Coady


Sounds unrelated, but it's the simplest example I've gotten to reproduce.
 
{code:java}
import pyarrow as pa, pyarrow.compute as pc 

array = pa.array([[0], [1]])
first, second = [pc.quantile(scalar.values) for scalar in array]
assert first.to_pylist() == [0]
assert (second.to_pylist() == [1]), second # 7.20576e+16{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11694) Sorting a non-chunked array may seg fault.

2021-02-18 Thread A. Coady (Jira)
A. Coady created ARROW-11694:


 Summary: Sorting a non-chunked array may seg fault.
 Key: ARROW-11694
 URL: https://issues.apache.org/jira/browse/ARROW-11694
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 3.0.0
 Environment: macOS, ubuntu
Reporter: A. Coady


Sorting a non-chunked array has some sort of reference problem. Some operations 
on the resulting indices array crash.
{code:python}
import pyarrow as pa, pyarrow.compute as pc

array = pa.array(list("abcba"))
assert pc.sort_indices(pa.chunked_array([array])).take([0])
assert pc.array_sort_indices(pa.chunked_array([array])).take([0])
pc.sort_indices(array).take([0])  # crash
pc.array_sort_indices(array).take([0])  # crash
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8911) An empty ChunkedArray created by `filter` can crash.

2020-05-23 Thread A. Coady (Jira)
A. Coady created ARROW-8911:
---

 Summary: An empty ChunkedArray created by `filter` can crash.
 Key: ARROW-8911
 URL: https://issues.apache.org/jira/browse/ARROW-8911
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.1
 Environment: macOS, ubuntu
Reporter: A. Coady


{code:python}
import pyarrow as pa
arr = pa.chunked_array([[1]])
empty = arr.filter(pa.array([False]))
print(empty)
print(empty[:]) # <- crash
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8685) [Python] ImportError with NumPy<1.16.

2020-05-04 Thread A. Coady (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Coady resolved ARROW-8685.
-
Resolution: Not A Bug

Thanks.  Confirmed that this is Python 3.8 only, and numpy 1.15 doesn't 
officially support 3.8.  So nothing to fix.

> [Python] ImportError with NumPy<1.16.
> -
>
> Key: ARROW-8685
> URL: https://issues.apache.org/jira/browse/ARROW-8685
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Affects Versions: 0.17.0
>Reporter: A. Coady
>Priority: Major
>
> {noformat}
> # pip install 'numpy<1.16' pyarrow...
> Successfully built numpy
> Installing collected packages: numpy, pyarrow
> Successfully installed numpy-1.15.4 pyarrow-0.17.0
> # python -c 'import pyarrow'ModuleNotFoundError: No module named 
> 'numpy.core._multiarray_umath'Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, 
> in 
>     from pyarrow.lib import cpu_count, set_cpu_count
>   File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
> ImportError: numpy.core.multiarray failed to import
> {noformat}
> Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to 
> be >=1.16.  
> This is related to ARROW-7852; users will still see an ImportError if an 
> older NumPy was already installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8685) [Python] ImportError with NumPy<1.16.

2020-05-03 Thread A. Coady (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Coady updated ARROW-8685:

Description: 
{noformat}
# pip install 'numpy<1.16' pyarrow...
Successfully built numpy
Installing collected packages: numpy, pyarrow
Successfully installed numpy-1.15.4 pyarrow-0.17.0
# python -c 'import pyarrow'ModuleNotFoundError: No module named 
'numpy.core._multiarray_umath'Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, 
in 
    from pyarrow.lib import cpu_count, set_cpu_count
  File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import
{noformat}




Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to be 
>=1.16.  

This is related to ARROW-7852; users will still see an ImportError if an older 
NumPy was already installed.

  was:
# pip install 'numpy<1.16' pyarrow...
Successfully built numpy
Installing collected packages: numpy, pyarrow
Successfully installed numpy-1.15.4 pyarrow-0.17.0
# python -c 'import pyarrow'ModuleNotFoundError: No module named 
'numpy.core._multiarray_umath'Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, 
in 
    from pyarrow.lib import cpu_count, set_cpu_count
  File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import
Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to be 
>=1.16.  

This is related to ARROW-7852; users will still see an ImportError if an older 
NumPy was already installed.


> [Python] ImportError with NumPy<1.16.
> -
>
> Key: ARROW-8685
> URL: https://issues.apache.org/jira/browse/ARROW-8685
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Affects Versions: 0.17.0
>Reporter: A. Coady
>Priority: Major
>
> {noformat}
> # pip install 'numpy<1.16' pyarrow...
> Successfully built numpy
> Installing collected packages: numpy, pyarrow
> Successfully installed numpy-1.15.4 pyarrow-0.17.0
> # python -c 'import pyarrow'ModuleNotFoundError: No module named 
> 'numpy.core._multiarray_umath'Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, 
> in 
>     from pyarrow.lib import cpu_count, set_cpu_count
>   File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
> ImportError: numpy.core.multiarray failed to import
> {noformat}
> Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to 
> be >=1.16.  
> This is related to ARROW-7852; users will still see an ImportError if an 
> older NumPy was already installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8685) [Python] ImportError with NumPy<1.16.

2020-05-03 Thread A. Coady (Jira)
A. Coady created ARROW-8685:
---

 Summary: [Python] ImportError with NumPy<1.16.
 Key: ARROW-8685
 URL: https://issues.apache.org/jira/browse/ARROW-8685
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging, Python
Affects Versions: 0.17.0
Reporter: A. Coady


# pip install 'numpy<1.16' pyarrow...
Successfully built numpy
Installing collected packages: numpy, pyarrow
Successfully installed numpy-1.15.4 pyarrow-0.17.0
# python -c 'import pyarrow'ModuleNotFoundError: No module named 
'numpy.core._multiarray_umath'Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, 
in 
    from pyarrow.lib import cpu_count, set_cpu_count
  File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import
Arrow's setup.py requires numpy>=1.14, but the actual requirement appears to be 
>=1.16.  

This is related to ARROW-7852; users will still see an ImportError if an older 
NumPy was already installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-7852) [Python] 0.16.0 wheels not compatible with older numpy

2020-05-03 Thread A. Coady (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098512#comment-17098512
 ] 

A. Coady edited comment on ARROW-7852 at 5/3/20, 5:57 PM:
--

This is still an issue with an existing numpy installed.
{code:java}
// # pip install 'numpy<1.16' pyarrow
...
Successfully built numpy
Installing collected packages: numpy, pyarrow
Successfully installed numpy-1.15.4 pyarrow-0.17.0
# python -c 'import pyarrow'
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, 
in 
    from pyarrow.lib import cpu_count, set_cpu_count
  File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import
{code}
I think the setup `install_requires` needs to specify numpy>=1.16.


was (Author: coady):
This is still an issue with an existing numpy installed.
{code:java}
// # pip install 'numpy<1.16' pyarrow
Collecting numpy<1.16
  Downloading numpy-1.15.4.zip (4.5 MB)
     || 4.5 MB 193 kB/s 
Collecting pyarrow
  Downloading pyarrow-0.17.0-cp38-cp38-manylinux2014_x86_64.whl (63.8 MB)
     || 63.8 MB 119 kB/s 
Building wheels for collected packages: numpy
  Building wheel for numpy (setup.py) ... done
  Created wheel for numpy: filename=numpy-1.15.4-cp38-cp38-linux_x86_64.whl 
size=13772718 
sha256=cec36267c8aee27facc89dfba5910937d8e31b19eb4e3ee68beb9c3e936d7ee8
  Stored in directory: 
/root/.cache/pip/wheels/d6/69/4d/48915a531b781ba9f19dd1d5c3da7e46303e31b6ad5b726d84
Successfully built numpy
Installing collected packages: numpy, pyarrow
Successfully installed numpy-1.15.4 pyarrow-0.17.0
root@d5515e090702:/# python -c 'import pyarrow'
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, 
in 
    from pyarrow.lib import cpu_count, set_cpu_count
  File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import
{code}
I think the setup `install_requires` needs to specify numpy>=1.16.

> [Python] 0.16.0 wheels not compatible with older numpy
> --
>
> Key: ARROW-7852
> URL: https://issues.apache.org/jira/browse/ARROW-7852
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Affects Versions: 0.16.0
>Reporter: Stephanie Gott
>Assignee: Krisztian Szucs
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Using python 3.7.5 and numpy 1.14.6, I am unable to import pyarrow 0.16.0 
> (see below for error). Updating numpy to the most recent version fixes this, 
> and I'm wondering if pyarrow needs update its requirements.txt.
>  
> {code:java}
> ➜  ~ ipython
> Python 3.7.5 (default, Nov  7 2019, 10:50:52)
> Type 'copyright', 'credits' or 'license' for more information
> IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.
> In [1]: import numpy as npIn [2]: np.__version__
> Out[2]: '1.14.6'
> In [3]: import pyarrow
> ---
> ModuleNotFoundError   Traceback (most recent call last)
> ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
> ---
> ImportError   Traceback (most recent call last)
>  in 
> > 1 import 
> pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in 
>  47 import pyarrow.compat as compat
>  48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>  50 from pyarrow.lib import (null, bool_,
>  51  int8, int16, int32, 
> int64,~/.local/lib/python3.7/site-packages/pyarrow/lib.pyx in init 
> pyarrow.lib()ImportError: numpy.core.multiarray failed to import
> In [4]: import pyarrow
> ---
> AttributeErrorTraceback (most recent call last)
>  in 
> > 1 import 
> pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in 
>  47 import pyarrow.compat as compat
>  48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>  50 from pyarrow.lib import (null, bool_,
>  51  int8, int16, int32, 
> int64,~/.local/lib/python3.7/site-packages/pyarrow/ipc.pxi in init 
> pyarrow.lib()AttributeError: type object 'pyarrow.lib.Message' has no 
> attribute '__reduce_cython__'
> {code}


[jira] [Commented] (ARROW-7852) [Python] 0.16.0 wheels not compatible with older numpy

2020-05-03 Thread A. Coady (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098512#comment-17098512
 ] 

A. Coady commented on ARROW-7852:
-

This is still an issue with an existing numpy installed.
{code:java}
// # pip install 'numpy<1.16' pyarrow
Collecting numpy<1.16
  Downloading numpy-1.15.4.zip (4.5 MB)
     || 4.5 MB 193 kB/s 
Collecting pyarrow
  Downloading pyarrow-0.17.0-cp38-cp38-manylinux2014_x86_64.whl (63.8 MB)
     || 63.8 MB 119 kB/s 
Building wheels for collected packages: numpy
  Building wheel for numpy (setup.py) ... done
  Created wheel for numpy: filename=numpy-1.15.4-cp38-cp38-linux_x86_64.whl 
size=13772718 
sha256=cec36267c8aee27facc89dfba5910937d8e31b19eb4e3ee68beb9c3e936d7ee8
  Stored in directory: 
/root/.cache/pip/wheels/d6/69/4d/48915a531b781ba9f19dd1d5c3da7e46303e31b6ad5b726d84
Successfully built numpy
Installing collected packages: numpy, pyarrow
Successfully installed numpy-1.15.4 pyarrow-0.17.0
root@d5515e090702:/# python -c 'import pyarrow'
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 59, 
in 
    from pyarrow.lib import cpu_count, set_cpu_count
  File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import
{code}
I think the setup `install_requires` needs to specify numpy>=1.16.

> [Python] 0.16.0 wheels not compatible with older numpy
> --
>
> Key: ARROW-7852
> URL: https://issues.apache.org/jira/browse/ARROW-7852
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Affects Versions: 0.16.0
>Reporter: Stephanie Gott
>Assignee: Krisztian Szucs
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Using python 3.7.5 and numpy 1.14.6, I am unable to import pyarrow 0.16.0 
> (see below for error). Updating numpy to the most recent version fixes this, 
> and I'm wondering if pyarrow needs update its requirements.txt.
>  
> {code:java}
> ➜  ~ ipython
> Python 3.7.5 (default, Nov  7 2019, 10:50:52)
> Type 'copyright', 'credits' or 'license' for more information
> IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.
> In [1]: import numpy as npIn [2]: np.__version__
> Out[2]: '1.14.6'
> In [3]: import pyarrow
> ---
> ModuleNotFoundError   Traceback (most recent call last)
> ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
> ---
> ImportError   Traceback (most recent call last)
>  in 
> > 1 import 
> pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in 
>  47 import pyarrow.compat as compat
>  48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>  50 from pyarrow.lib import (null, bool_,
>  51  int8, int16, int32, 
> int64,~/.local/lib/python3.7/site-packages/pyarrow/lib.pyx in init 
> pyarrow.lib()ImportError: numpy.core.multiarray failed to import
> In [4]: import pyarrow
> ---
> AttributeErrorTraceback (most recent call last)
>  in 
> > 1 import 
> pyarrow~/.local/lib/python3.7/site-packages/pyarrow/__init__.py in 
>  47 import pyarrow.compat as compat
>  48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>  50 from pyarrow.lib import (null, bool_,
>  51  int8, int16, int32, 
> int64,~/.local/lib/python3.7/site-packages/pyarrow/ipc.pxi in init 
> pyarrow.lib()AttributeError: type object 'pyarrow.lib.Message' has no 
> attribute '__reduce_cython__'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)