On Wed, Jul 23, 2014 at 9:34 PM, Julian Taylor
<jtaylor.deb...@googlemail.com> wrote:
> On 23.07.2014 22:04, Robert Kern wrote:
>> On Wed, Jul 23, 2014 at 8:50 PM, Julian Taylor
>> <jtaylor.deb...@googlemail.com> wrote:
>>> On 23.07.2014 20:54, Robert Kern wrote:
>>>> On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
>>>> <jtaylor.deb...@googlemail.com> wrote:
>>>>> hi,
>>>>> it recently came to my attention that the default integer type in numpy
>>>>> on windows 64 bit is a 32 bit integers [0].
>>>>> This seems like a quite serious problem as it means you can't use any
>>>>> integers created from python integers < 32 bit to index arrays larger
>>>>> than 2GB.
>>>>> For example np.product(array.shape) which will never overflow on linux
>>>>> and mac, can overflow on win64.
>>>>
>>>> Currently, on win64, we use Python long integer objects for `.shape`
>>>> and related attributes. I wonder if we could return numpy int64
>>>> scalars instead. Then np.product() (or anything else that consumes
>>>> these via np.asarray()) would infer the correct dtype for the result.
>>>
>>> this might be a less invasive alternative that might solve a lot of the
>>> incompatibilities, but it would probably also change np.arange(5) and
>>> similar functions to int64 which might change the dtype of a lot of
>>> arrays. The difference to just changing it everywhere might not be so
>>> large anymore.
>>
>> No, np.arange(5) would not change behavior given my suggestion, only
>> the type of the integer objects in ndarray.shape and related tuples.
>
> ndarray.shape are not numpy scalars but python objects, so they would
> always be converted back to 32 bit integers when given back to numpy.

That's what I'm suggesting that we change: make
`type(ndarray.shape[i])` be `np.intp` instead of `long`.

However, I'm not sure that this is an issue with numpy 1.8.0 at least.
I can't reproduce the reported problem on Win64:

In [12]: import numpy as np

In [13]: from numpy.lib import stride_tricks

In [14]: import sys

In [15]: b = stride_tricks.as_strided(np.zeros(1), shape=(100000,
200000, 400000), strides=(0, 0, 0))

In [16]: b.shape
Out[16]: (100000L, 200000L, 400000L)

In [17]: np.product(b.shape)
Out[17]: 8000000000000000

In [18]: np.product(b.shape).dtype
Out[18]: dtype('int64')

In [19]: sys.maxint
Out[19]: 2147483647

In [20]: np.__version__
Out[20]: '1.8.0'

In [21]: np.array(b.shape)
Out[21]: array([100000, 200000, 400000], dtype=int64)


This is on Python 2.7, so maybe something got weird in the Python 3
version that Chris Gohlke tested?

>>>>> I think this is a very dangerous platform difference and a quite large
>>>>> inconvenience for win64 users so I think it would be good to fix this.
>>>>> This would be a very large change of API and probably also ABI.
>>>>
>>>> Yes. Not only would it be a very large change from the status quo, I
>>>> think it introduces *much greater* platform difference than what we
>>>> have currently. The assumption that the default integer object
>>>> corresponds to the platform C long, whatever that is, is pretty
>>>> heavily ingrained.
>>>
>>> This should be only a concern for the ABI which can be solved by simply
>>> recompiling.
>>> In comparison that the API is different on win64 compared to all other
>>> platforms is something that needs source level changes.
>>
>> No, the API is no different on win64 than other platforms. Why do you
>> think it is? The win64 platform is a weird platform in this respect,
>> having made a choice that other 64-bit platforms didn't, but numpy's
>> API treats it consistently. When we say that something is a C long,
>> it's a C long on all platforms.
>
> The API is different if you consider it from a python perspective.
> The default integer dtype should be sufficiently large to index into any
> numpy array, thats what I call an API here.

That's perhaps what you want, but numpy has never claimed to do this.
The numpy project deliberately chose (and is so documented) to make
its default integer type a C long, not a C size_t, to match Python's
default.

> win64 behaves different, you
> have to explicitly upcast your index to be able to index all memory.
> But API or ABI is just semantics here, what I actually mean is the
> difference of source changes vs recompiling to deal with the issue.
> Of course there might be C code that needs more than recompiling, but it
> should not be that much, it would have to be already somewhat
> broken/restrictive code that uses numpy buffers without first checking
> which type it has.
>
> There can also be python code that might need source changes e.g.
> np.int_ memory mapping a binary from win32 assuming np.int_ is also 32
> bit on win64, but this would be broken on linux and mac already now.

Anything that assumes that np.int_ is any particular fixed size is
always broken, naturally.

-- 
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to