Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-26 Thread Lars Buitinck
 Date: Fri, 25 Jul 2014 15:06:40 +0200
 From: Olivier Grisel olivier.gri...@ensta.org
 Subject: Re: [Numpy-discussion] change default integer from int32 to
 int64   on win64?
 To: Discussion of Numerical Python numpy-discussion@scipy.org
 Content-Type: text/plain; charset=UTF-8

 The dtype returned by np.where looks right (int64):

 import platform
 platform.architecture()
 ('64bit', 'WindowsPE')
 import numpy as np
 np.__version__
 '1.9.0b1'
 a = np.zeros(10)
 np.where(a == 0)
 (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64),)

Strange. In [1] we had to cast the result of np.where because it was
an array of long. I ran through the NumPy code, and I couldn't find
the flaw, but neither could I find a point in the history where it was
fixed.

[1] 
https://github.com/scikit-learn/scikit-learn/commit/ebdeddbab1620c2473d04dc242d1e30684af9511
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-26 Thread Robert Kern
On Sat, Jul 26, 2014 at 9:19 AM, Lars Buitinck larsm...@gmail.com wrote:
 Date: Fri, 25 Jul 2014 15:06:40 +0200
 From: Olivier Grisel olivier.gri...@ensta.org
 Subject: Re: [Numpy-discussion] change default integer from int32 to
 int64   on win64?
 To: Discussion of Numerical Python numpy-discussion@scipy.org
 Content-Type: text/plain; charset=UTF-8

 The dtype returned by np.where looks right (int64):

 import platform
 platform.architecture()
 ('64bit', 'WindowsPE')
 import numpy as np
 np.__version__
 '1.9.0b1'
 a = np.zeros(10)
 np.where(a == 0)
 (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64),)

 Strange. In [1] we had to cast the result of np.where because it was
 an array of long. I ran through the NumPy code, and I couldn't find
 the flaw, but neither could I find a point in the history where it was
 fixed.

 [1] 
 https://github.com/scikit-learn/scikit-learn/commit/ebdeddbab1620c2473d04dc242d1e30684af9511

As far as I can tell, it's been that way essentially forever, before
numpy was numpy:

https://github.com/numpy/numpy/commit/8cb36a62#diff-88aedadb94e0ead6b434d55f81668471R645

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-25 Thread Olivier Grisel
The dtype returned by np.where looks right (int64):

 import platform
 platform.architecture()
('64bit', 'WindowsPE')
 import numpy as np
 np.__version__
'1.9.0b1'
 a = np.zeros(10)
 np.where(a == 0)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64),)

-- 
Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-24 Thread Robert Kern
On Thu, Jul 24, 2014 at 3:47 AM, Sturla Molden sturla.mol...@gmail.com wrote:
 Julian Taylor jtaylor.deb...@googlemail.com wrote:

 The default integer dtype should be sufficiently large to index into any
 numpy array, thats what I call an API here. win64 behaves different, you
 have to explicitly upcast your index to be able to index all memory.

 No, you don't have to manually upcast Python int to Python long.

 Python 2 will automatically create a Python long if you overflow a Python
 int.

 On Python 3 the Python int does not have a size limit.

Please reread the thread more carefully. That's not what this
discussion is about.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-24 Thread Lars Buitinck
Wed, 23 Jul 2014 22:13:33 +0100  Nathaniel Smith n...@pobox.com:
 On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern robert.k...@gmail.com wrote:
 That's perhaps what you want, but numpy has never claimed to do this.

... except in np.where, which promises to return indices but actually
returns arrays of longs and thus doesn't work with large arrays on
Windows.

I know this is a bug that can be fixed without changing the size of
np.int, but it goes to show that even core functionality in NumPy gets
it wrong.

 This is true, but it's not very compelling on its own -- big as a
 pointer is a much much more useful property than big as a long. The
 only real reason this made sense in the first place is the equivalence
 between Python int and C long, but even that is gone now with Python
 3. IMO at this point backcompat is really the only serious reason for
 keeping int32 as the default integer type in win64. But of course this
 is a pretty serious concern...

Hear, hear.

The C type long is only useful as an at least 32-bit integer, but on
the platforms that NumPy targets, int is also at least that large. The
only real benefit of long is that it makes porting more interesting
/sarcasm.

If you have intp and a bunch of explicitly-sized integer types, you
don't need an additional type that behaves like a long *except* for
backward compat.

The Go people got this right; they only have explicitly-sized integer
types and an int type the size of a pointer [1].

[1] http://golang.org/doc/go1.1#int
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-24 Thread Robert Kern
On Thu, Jul 24, 2014 at 10:39 AM, Lars Buitinck larsm...@gmail.com wrote:
 Wed, 23 Jul 2014 22:13:33 +0100  Nathaniel Smith n...@pobox.com:
 On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern robert.k...@gmail.com wrote:
 That's perhaps what you want, but numpy has never claimed to do this.

 ... except in np.where, which promises to return indices but actually
 returns arrays of longs and thus doesn't work with large arrays on
 Windows.

 I know this is a bug that can be fixed without changing the size of
 np.int, but it goes to show that even core functionality in NumPy gets
 it wrong.

Does it? I don't have my Windows VM available at the moment, but it
looks like PyArray_Nonzero() is correctly returning an intp array:

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L2478

If it is incorrect somewhere else, please submit a bug report.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Julian Taylor
hi,
it recently came to my attention that the default integer type in numpy
on windows 64 bit is a 32 bit integers [0].
This seems like a quite serious problem as it means you can't use any
integers created from python integers  32 bit to index arrays larger
than 2GB.
For example np.product(array.shape) which will never overflow on linux
and mac, can overflow on win64.

I think this is a very dangerous platform difference and a quite large
inconvenience for win64 users so I think it would be good to fix this.
This would be a very large change of API and probably also ABI.
But as we also never officially released win64 binaries we could change
it for from source compilations and give win64 binary distributors the
option to keep the old ABI/API at their discretion.

Any thoughts on this from win64 users?

Cheers,
Julian Taylor

[0] https://github.com/astropy/astropy/pull/2697
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Robert Kern
On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 hi,
 it recently came to my attention that the default integer type in numpy
 on windows 64 bit is a 32 bit integers [0].
 This seems like a quite serious problem as it means you can't use any
 integers created from python integers  32 bit to index arrays larger
 than 2GB.
 For example np.product(array.shape) which will never overflow on linux
 and mac, can overflow on win64.

Currently, on win64, we use Python long integer objects for `.shape`
and related attributes. I wonder if we could return numpy int64
scalars instead. Then np.product() (or anything else that consumes
these via np.asarray()) would infer the correct dtype for the result.

 I think this is a very dangerous platform difference and a quite large
 inconvenience for win64 users so I think it would be good to fix this.
 This would be a very large change of API and probably also ABI.

Yes. Not only would it be a very large change from the status quo, I
think it introduces *much greater* platform difference than what we
have currently. The assumption that the default integer object
corresponds to the platform C long, whatever that is, is pretty
heavily ingrained.

 But as we also never officially released win64 binaries we could change
 it for from source compilations and give win64 binary distributors the
 option to keep the old ABI/API at their discretion.

That option would make the problem worse, not better.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Julian Taylor
On 23.07.2014 20:54, Robert Kern wrote:
 On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
 hi,
 it recently came to my attention that the default integer type in numpy
 on windows 64 bit is a 32 bit integers [0].
 This seems like a quite serious problem as it means you can't use any
 integers created from python integers  32 bit to index arrays larger
 than 2GB.
 For example np.product(array.shape) which will never overflow on linux
 and mac, can overflow on win64.
 
 Currently, on win64, we use Python long integer objects for `.shape`
 and related attributes. I wonder if we could return numpy int64
 scalars instead. Then np.product() (or anything else that consumes
 these via np.asarray()) would infer the correct dtype for the result.

this might be a less invasive alternative that might solve a lot of the
incompatibilities, but it would probably also change np.arange(5) and
similar functions to int64 which might change the dtype of a lot of
arrays. The difference to just changing it everywhere might not be so
large anymore.

 
 I think this is a very dangerous platform difference and a quite large
 inconvenience for win64 users so I think it would be good to fix this.
 This would be a very large change of API and probably also ABI.
 
 Yes. Not only would it be a very large change from the status quo, I
 think it introduces *much greater* platform difference than what we
 have currently. The assumption that the default integer object
 corresponds to the platform C long, whatever that is, is pretty
 heavily ingrained.

This should be only a concern for the ABI which can be solved by simply
recompiling.
In comparison that the API is different on win64 compared to all other
platforms is something that needs source level changes.

 
 But as we also never officially released win64 binaries we could change
 it for from source compilations and give win64 binary distributors the
 option to keep the old ABI/API at their discretion.
 
 That option would make the problem worse, not better.
 

maybe, I'm not familiar with the numpy win64 distribution landscape.
Is it not like linux where you have one distributor per workstation
setup that can update all its packages to a new ABI on one go?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Robert Kern
On Wed, Jul 23, 2014 at 8:50 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 On 23.07.2014 20:54, Robert Kern wrote:
 On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
 hi,
 it recently came to my attention that the default integer type in numpy
 on windows 64 bit is a 32 bit integers [0].
 This seems like a quite serious problem as it means you can't use any
 integers created from python integers  32 bit to index arrays larger
 than 2GB.
 For example np.product(array.shape) which will never overflow on linux
 and mac, can overflow on win64.

 Currently, on win64, we use Python long integer objects for `.shape`
 and related attributes. I wonder if we could return numpy int64
 scalars instead. Then np.product() (or anything else that consumes
 these via np.asarray()) would infer the correct dtype for the result.

 this might be a less invasive alternative that might solve a lot of the
 incompatibilities, but it would probably also change np.arange(5) and
 similar functions to int64 which might change the dtype of a lot of
 arrays. The difference to just changing it everywhere might not be so
 large anymore.

No, np.arange(5) would not change behavior given my suggestion, only
the type of the integer objects in ndarray.shape and related tuples.

 I think this is a very dangerous platform difference and a quite large
 inconvenience for win64 users so I think it would be good to fix this.
 This would be a very large change of API and probably also ABI.

 Yes. Not only would it be a very large change from the status quo, I
 think it introduces *much greater* platform difference than what we
 have currently. The assumption that the default integer object
 corresponds to the platform C long, whatever that is, is pretty
 heavily ingrained.

 This should be only a concern for the ABI which can be solved by simply
 recompiling.
 In comparison that the API is different on win64 compared to all other
 platforms is something that needs source level changes.

No, the API is no different on win64 than other platforms. Why do you
think it is? The win64 platform is a weird platform in this respect,
having made a choice that other 64-bit platforms didn't, but numpy's
API treats it consistently. When we say that something is a C long,
it's a C long on all platforms.

 But as we also never officially released win64 binaries we could change
 it for from source compilations and give win64 binary distributors the
 option to keep the old ABI/API at their discretion.

 That option would make the problem worse, not better.

 maybe, I'm not familiar with the numpy win64 distribution landscape.
 Is it not like linux where you have one distributor per workstation
 setup that can update all its packages to a new ABI on one go?

No. There tend to be multiple providers.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Sebastian Berg
On Wed, 2014-07-23 at 21:50 +0200, Julian Taylor wrote:
 On 23.07.2014 20:54, Robert Kern wrote:
  On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
  jtaylor.deb...@googlemail.com wrote:
  hi,
  it recently came to my attention that the default integer type in numpy
  on windows 64 bit is a 32 bit integers [0].
  This seems like a quite serious problem as it means you can't use any
  integers created from python integers  32 bit to index arrays larger
  than 2GB.
  For example np.product(array.shape) which will never overflow on linux
  and mac, can overflow on win64.
  
  Currently, on win64, we use Python long integer objects for `.shape`
  and related attributes. I wonder if we could return numpy int64
  scalars instead. Then np.product() (or anything else that consumes
  these via np.asarray()) would infer the correct dtype for the result.
 
 this might be a less invasive alternative that might solve a lot of the
 incompatibilities, but it would probably also change np.arange(5) and
 similar functions to int64 which might change the dtype of a lot of
 arrays. The difference to just changing it everywhere might not be so
 large anymore.
 

Aren't most such functions already using intp? Just guessing, but:

In [16]: np.arange(30, dtype=np.long).dtype.num
Out[16]: 9

In [17]: np.arange(30, dtype=np.intp).dtype.num
Out[17]: 7

In [18]: np.arange(30).dtype.num
Out[18]: 7

frankly, I am not sure what needs to change at all, except the normal
array creation and the sum promotion rule. I am probably naive here, but
what is the ABI change that is necessary for that?

I guess the problem you see is breaking code doing np.array([1,2,3]) and
then assuming in C that it is a long array?

- Sebastian

  
  I think this is a very dangerous platform difference and a quite large
  inconvenience for win64 users so I think it would be good to fix this.
  This would be a very large change of API and probably also ABI.
  
  Yes. Not only would it be a very large change from the status quo, I
  think it introduces *much greater* platform difference than what we
  have currently. The assumption that the default integer object
  corresponds to the platform C long, whatever that is, is pretty
  heavily ingrained.
 
 This should be only a concern for the ABI which can be solved by simply
 recompiling.
 In comparison that the API is different on win64 compared to all other
 platforms is something that needs source level changes.
 
  
  But as we also never officially released win64 binaries we could change
  it for from source compilations and give win64 binary distributors the
  option to keep the old ABI/API at their discretion.
  
  That option would make the problem worse, not better.
  
 
 maybe, I'm not familiar with the numpy win64 distribution landscape.
 Is it not like linux where you have one distributor per workstation
 setup that can update all its packages to a new ABI on one go?
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Sebastian Berg
On Wed, 2014-07-23 at 22:06 +0200, Sebastian Berg wrote:
 On Wed, 2014-07-23 at 21:50 +0200, Julian Taylor wrote:
  On 23.07.2014 20:54, Robert Kern wrote:
   On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
   jtaylor.deb...@googlemail.com wrote:
   hi,
   it recently came to my attention that the default integer type in numpy
   on windows 64 bit is a 32 bit integers [0].
   This seems like a quite serious problem as it means you can't use any
   integers created from python integers  32 bit to index arrays larger
   than 2GB.
   For example np.product(array.shape) which will never overflow on linux
   and mac, can overflow on win64.
   
   Currently, on win64, we use Python long integer objects for `.shape`
   and related attributes. I wonder if we could return numpy int64
   scalars instead. Then np.product() (or anything else that consumes
   these via np.asarray()) would infer the correct dtype for the result.
  
  this might be a less invasive alternative that might solve a lot of the
  incompatibilities, but it would probably also change np.arange(5) and
  similar functions to int64 which might change the dtype of a lot of
  arrays. The difference to just changing it everywhere might not be so
  large anymore.
  
 
 Aren't most such functions already using intp? Just guessing, but:
 
 In [16]: np.arange(30, dtype=np.long).dtype.num
 Out[16]: 9
 
 In [17]: np.arange(30, dtype=np.intp).dtype.num
 Out[17]: 7
 
 In [18]: np.arange(30).dtype.num
 Out[18]: 7
 

Ops, never mind that stuff, probably not... np.int_ is 7 too, this is
just the way how intp is chosen.

 frankly, I am not sure what needs to change at all, except the normal
 array creation and the sum promotion rule. I am probably naive here, but
 what is the ABI change that is necessary for that?
 
 I guess the problem you see is breaking code doing np.array([1,2,3]) and
 then assuming in C that it is a long array?
 
 - Sebastian
 
   
   I think this is a very dangerous platform difference and a quite large
   inconvenience for win64 users so I think it would be good to fix this.
   This would be a very large change of API and probably also ABI.
   
   Yes. Not only would it be a very large change from the status quo, I
   think it introduces *much greater* platform difference than what we
   have currently. The assumption that the default integer object
   corresponds to the platform C long, whatever that is, is pretty
   heavily ingrained.
  
  This should be only a concern for the ABI which can be solved by simply
  recompiling.
  In comparison that the API is different on win64 compared to all other
  platforms is something that needs source level changes.
  
   
   But as we also never officially released win64 binaries we could change
   it for from source compilations and give win64 binary distributors the
   option to keep the old ABI/API at their discretion.
   
   That option would make the problem worse, not better.
   
  
  maybe, I'm not familiar with the numpy win64 distribution landscape.
  Is it not like linux where you have one distributor per workstation
  setup that can update all its packages to a new ABI on one go?
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Julian Taylor
On 23.07.2014 22:04, Robert Kern wrote:
 On Wed, Jul 23, 2014 at 8:50 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
 On 23.07.2014 20:54, Robert Kern wrote:
 On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
 hi,
 it recently came to my attention that the default integer type in numpy
 on windows 64 bit is a 32 bit integers [0].
 This seems like a quite serious problem as it means you can't use any
 integers created from python integers  32 bit to index arrays larger
 than 2GB.
 For example np.product(array.shape) which will never overflow on linux
 and mac, can overflow on win64.

 Currently, on win64, we use Python long integer objects for `.shape`
 and related attributes. I wonder if we could return numpy int64
 scalars instead. Then np.product() (or anything else that consumes
 these via np.asarray()) would infer the correct dtype for the result.

 this might be a less invasive alternative that might solve a lot of the
 incompatibilities, but it would probably also change np.arange(5) and
 similar functions to int64 which might change the dtype of a lot of
 arrays. The difference to just changing it everywhere might not be so
 large anymore.
 
 No, np.arange(5) would not change behavior given my suggestion, only
 the type of the integer objects in ndarray.shape and related tuples.

ndarray.shape are not numpy scalars but python objects, so they would
always be converted back to 32 bit integers when given back to numpy.

 
 I think this is a very dangerous platform difference and a quite large
 inconvenience for win64 users so I think it would be good to fix this.
 This would be a very large change of API and probably also ABI.

 Yes. Not only would it be a very large change from the status quo, I
 think it introduces *much greater* platform difference than what we
 have currently. The assumption that the default integer object
 corresponds to the platform C long, whatever that is, is pretty
 heavily ingrained.

 This should be only a concern for the ABI which can be solved by simply
 recompiling.
 In comparison that the API is different on win64 compared to all other
 platforms is something that needs source level changes.
 
 No, the API is no different on win64 than other platforms. Why do you
 think it is? The win64 platform is a weird platform in this respect,
 having made a choice that other 64-bit platforms didn't, but numpy's
 API treats it consistently. When we say that something is a C long,
 it's a C long on all platforms.

The API is different if you consider it from a python perspective.
The default integer dtype should be sufficiently large to index into any
numpy array, thats what I call an API here. win64 behaves different, you
have to explicitly upcast your index to be able to index all memory.
But API or ABI is just semantics here, what I actually mean is the
difference of source changes vs recompiling to deal with the issue.
Of course there might be C code that needs more than recompiling, but it
should not be that much, it would have to be already somewhat
broken/restrictive code that uses numpy buffers without first checking
which type it has.

There can also be python code that might need source changes e.g.
np.int_ memory mapping a binary from win32 assuming np.int_ is also 32
bit on win64, but this would be broken on linux and mac already now.

 But as we also never officially released win64 binaries we could change
 it for from source compilations and give win64 binary distributors the
 option to keep the old ABI/API at their discretion.

 That option would make the problem worse, not better.

 maybe, I'm not familiar with the numpy win64 distribution landscape.
 Is it not like linux where you have one distributor per workstation
 setup that can update all its packages to a new ABI on one go?
 
 No. There tend to be multiple providers.
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Robert Kern
On Wed, Jul 23, 2014 at 9:34 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 On 23.07.2014 22:04, Robert Kern wrote:
 On Wed, Jul 23, 2014 at 8:50 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
 On 23.07.2014 20:54, Robert Kern wrote:
 On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
 hi,
 it recently came to my attention that the default integer type in numpy
 on windows 64 bit is a 32 bit integers [0].
 This seems like a quite serious problem as it means you can't use any
 integers created from python integers  32 bit to index arrays larger
 than 2GB.
 For example np.product(array.shape) which will never overflow on linux
 and mac, can overflow on win64.

 Currently, on win64, we use Python long integer objects for `.shape`
 and related attributes. I wonder if we could return numpy int64
 scalars instead. Then np.product() (or anything else that consumes
 these via np.asarray()) would infer the correct dtype for the result.

 this might be a less invasive alternative that might solve a lot of the
 incompatibilities, but it would probably also change np.arange(5) and
 similar functions to int64 which might change the dtype of a lot of
 arrays. The difference to just changing it everywhere might not be so
 large anymore.

 No, np.arange(5) would not change behavior given my suggestion, only
 the type of the integer objects in ndarray.shape and related tuples.

 ndarray.shape are not numpy scalars but python objects, so they would
 always be converted back to 32 bit integers when given back to numpy.

That's what I'm suggesting that we change: make
`type(ndarray.shape[i])` be `np.intp` instead of `long`.

However, I'm not sure that this is an issue with numpy 1.8.0 at least.
I can't reproduce the reported problem on Win64:

In [12]: import numpy as np

In [13]: from numpy.lib import stride_tricks

In [14]: import sys

In [15]: b = stride_tricks.as_strided(np.zeros(1), shape=(10,
20, 40), strides=(0, 0, 0))

In [16]: b.shape
Out[16]: (10L, 20L, 40L)

In [17]: np.product(b.shape)
Out[17]: 8000

In [18]: np.product(b.shape).dtype
Out[18]: dtype('int64')

In [19]: sys.maxint
Out[19]: 2147483647

In [20]: np.__version__
Out[20]: '1.8.0'

In [21]: np.array(b.shape)
Out[21]: array([10, 20, 40], dtype=int64)


This is on Python 2.7, so maybe something got weird in the Python 3
version that Chris Gohlke tested?

 I think this is a very dangerous platform difference and a quite large
 inconvenience for win64 users so I think it would be good to fix this.
 This would be a very large change of API and probably also ABI.

 Yes. Not only would it be a very large change from the status quo, I
 think it introduces *much greater* platform difference than what we
 have currently. The assumption that the default integer object
 corresponds to the platform C long, whatever that is, is pretty
 heavily ingrained.

 This should be only a concern for the ABI which can be solved by simply
 recompiling.
 In comparison that the API is different on win64 compared to all other
 platforms is something that needs source level changes.

 No, the API is no different on win64 than other platforms. Why do you
 think it is? The win64 platform is a weird platform in this respect,
 having made a choice that other 64-bit platforms didn't, but numpy's
 API treats it consistently. When we say that something is a C long,
 it's a C long on all platforms.

 The API is different if you consider it from a python perspective.
 The default integer dtype should be sufficiently large to index into any
 numpy array, thats what I call an API here.

That's perhaps what you want, but numpy has never claimed to do this.
The numpy project deliberately chose (and is so documented) to make
its default integer type a C long, not a C size_t, to match Python's
default.

 win64 behaves different, you
 have to explicitly upcast your index to be able to index all memory.
 But API or ABI is just semantics here, what I actually mean is the
 difference of source changes vs recompiling to deal with the issue.
 Of course there might be C code that needs more than recompiling, but it
 should not be that much, it would have to be already somewhat
 broken/restrictive code that uses numpy buffers without first checking
 which type it has.

 There can also be python code that might need source changes e.g.
 np.int_ memory mapping a binary from win32 assuming np.int_ is also 32
 bit on win64, but this would be broken on linux and mac already now.

Anything that assumes that np.int_ is any particular fixed size is
always broken, naturally.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Robert Kern
On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern robert.k...@gmail.com wrote:

 That's what I'm suggesting that we change: make
 `type(ndarray.shape[i])` be `np.intp` instead of `long`.

 However, I'm not sure that this is an issue with numpy 1.8.0 at least.
 I can't reproduce the reported problem on Win64:

 In [12]: import numpy as np

 In [13]: from numpy.lib import stride_tricks

 In [14]: import sys

 In [15]: b = stride_tricks.as_strided(np.zeros(1), shape=(10,
 20, 40), strides=(0, 0, 0))

 In [16]: b.shape
 Out[16]: (10L, 20L, 40L)

 In [17]: np.product(b.shape)
 Out[17]: 8000

 In [18]: np.product(b.shape).dtype
 Out[18]: dtype('int64')

 In [19]: sys.maxint
 Out[19]: 2147483647

 In [20]: np.__version__
 Out[20]: '1.8.0'

 In [21]: np.array(b.shape)
 Out[21]: array([10, 20, 40], dtype=int64)


 This is on Python 2.7, so maybe something got weird in the Python 3
 version that Chris Gohlke tested?

Ah yes, naturally. Because there is no separate `long` type in Python
3, np.asarray() can't use the type to distinguish what type to build
the array. Returning np.intp objects in the tuple would resolve the
problem in much the same way the problem is currently resolved in
Python 2. This would also have the effect of unifying API on all
platforms: currently, win64 is the only platform where the `.shape`
tuple and related attribute returns Python longs instead of Python
ints.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Nathaniel Smith
On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern robert.k...@gmail.com wrote:
 That's perhaps what you want, but numpy has never claimed to do this.
 The numpy project deliberately chose (and is so documented) to make
 its default integer type a C long, not a C size_t, to match Python's
 default.

This is true, but it's not very compelling on its own -- big as a
pointer is a much much more useful property than big as a long. The
only real reason this made sense in the first place is the equivalence
between Python int and C long, but even that is gone now with Python
3. IMO at this point backcompat is really the only serious reason for
keeping int32 as the default integer type in win64. But of course this
is a pretty serious concern...

Julian: making the change experimentally and checking how badly scipy
and some similar libraries break might be a way to focus the
backcompat discussion more.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-23 Thread Sturla Molden
Julian Taylor jtaylor.deb...@googlemail.com wrote:

 The default integer dtype should be sufficiently large to index into any
 numpy array, thats what I call an API here. win64 behaves different, you
 have to explicitly upcast your index to be able to index all memory.

No, you don't have to manually upcast Python int to Python long.

Python 2 will automatically create a Python long if you overflow a Python
int.

On Python 3 the Python int does not have a size limit.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion