Fabian Höring created ARROW-5651: ------------------------------------ Summary: [Python] Incorrect conversion from strided Numpy array when other type is specified Key: ARROW-5651 URL: https://issues.apache.org/jira/browse/ARROW-5651 Project: Apache Arrow Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Fabian Höring
In the example below the pyarrow array gives wrong results for strided numpy arrays: {code} >> import pyarrow as pa >> import numpy as np >> p_s = pd.Series(np.arange(0, 10, dtype=np.float32)[1:-1:2]) >> pa.array(p_s, type=pa.float64()) <pyarrow.lib.DoubleArray object at 0x7f8453de8138> [ 1, 2, 3, 4 ] {code} When copying the numpy array to a new location is gives the expected output: {code} >> import pyarrow as pa >> import numpy as np >> import pandas as pd >> p_s = pd.Series(np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2])) >> pa.array(p_s, type=pa.float64()) <pyarrow.lib.DoubleArray object at 0x7f5a0af0a4a8> [ 1, 3, 5, 7 ] {code} Looking at the [code|https://github.com/apache/arrow/blob/7a5562174cffb21b16f990f64d114c1a94a30556/cpp/src/arrow/python/numpy_to_arrow.cc#L407] it seems like to determine the number of elements it uses the target type instead of the initial numpy type. In this case the stride is 8 bytes which corresponds to 2 elements in float32 whereas the codes tries to determine the number of elements with the target type which gives 1 element of float64 and therefore it reads the array one by one instead of every 2 elements until reaching the total number of elements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)