Re: [Numpy-discussion] Change in memmap behaviour

2012-07-04 Thread Nathaniel Smith
On Tue, Jul 3, 2012 at 4:08 PM, Nathaniel Smith  wrote:
> On Tue, Jul 3, 2012 at 10:35 AM, Thouis (Ray) Jones  wrote:
>> On Mon, Jul 2, 2012 at 11:52 PM, Sveinung Gundersen  
>> wrote:
>>>
>>> On 2. juli 2012, at 22.40, Nathaniel Smith wrote:
>>>
 On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen  
 wrote:
> [snip]
>
>
>
> Your actual memory usage may not have increased as much as you think,
> since memmap objects don't necessarily take much memory -- it sounds
> like you're leaking virtual memory, but your resident set size
> shouldn't go up as much.
>
>
> As I understand it, memmap objects retain the contents of the memmap in
> memory after it has been read the first time (in a lazy manner). Thus, 
> when
> reading a slice of a 24GB file, only that part recides in memory. Our 
> system
> reads a slice of a memmap, calculates something (say, the sum), and then
> deletes the memmap. It then loops through this for consequitive slices,
> retaining a low memory usage. Consider the following code:
>
> import numpy as np
> res = []
> vecLen = 3095677412
> for i in xrange(vecLen/10**8+1):
> x = i * 10**8
> y = min((i+1) * 10**8, vecLen)
> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())
>
> The memory usage of this code on a 24GB file (one value for each 
> nucleotide
> in the human DNA!) is 23g resident memory after the loop is finished (not
> 24g for some reason..).
>
> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the
> loop.

 Your memory measurement tools are misleading you. The same memory is
 resident in both cases, just in one case your tools say it is
 operating system disk cache (and not attributed to your app), and in
 the other case that same memory, treated in the same way by the OS, is
 shown as part of your app's resident memory. Virtual memory is
 confusing...
>>>
>>> But the crucial difference is perhaps that the disk cache can be cleared by 
>>> the OS if needed, but not the application memory in the same way, which 
>>> must be swapped to disk? Or am I still confused?
>>>
>>> (snip)
>>>
>
> Great! Any idea on whether such a patch may be included in 1.7?

 Not really, if I or you or someone else gets inspired to take the time
 to write a patch soon then it will be, otherwise not...

 -N
>>>
>>> I have now tried to add a patch, in the way you proposed, but I may have 
>>> gotten it wrong..
>>>
>>> http://projects.scipy.org/numpy/ticket/2179
>>
>> I put this in a github repo, and added tests (author credit to Sveinung)
>> https://github.com/thouis/numpy/tree/mmap_children
>>
>> I'm not sure which branch to issue a PR request against, though.
>
> Looks good to me, thanks to both of you!
>
> Obviously should be merged to master; beyond that I'm not sure. We
> definitely want it in 1.7, but I'm not sure if that's been branched
> yet or not. (Or rather, it has been branched, but then maybe it was
> unbranched again? Travis?) Since it was a 1.6 regression it'd make
> sense to cherrypick to the 1.6 branch too, just in case it gets
> another release.

Merged into master and maintenance/1.6.x, but not maintenance/1.7.x,
I'll let Ondrej or Travis figure that out...

-N
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change in memmap behaviour

2012-07-03 Thread Nathaniel Smith
On Tue, Jul 3, 2012 at 10:35 AM, Thouis (Ray) Jones  wrote:
> On Mon, Jul 2, 2012 at 11:52 PM, Sveinung Gundersen  
> wrote:
>>
>> On 2. juli 2012, at 22.40, Nathaniel Smith wrote:
>>
>>> On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen  
>>> wrote:
 [snip]



 Your actual memory usage may not have increased as much as you think,
 since memmap objects don't necessarily take much memory -- it sounds
 like you're leaking virtual memory, but your resident set size
 shouldn't go up as much.


 As I understand it, memmap objects retain the contents of the memmap in
 memory after it has been read the first time (in a lazy manner). Thus, when
 reading a slice of a 24GB file, only that part recides in memory. Our 
 system
 reads a slice of a memmap, calculates something (say, the sum), and then
 deletes the memmap. It then loops through this for consequitive slices,
 retaining a low memory usage. Consider the following code:

 import numpy as np
 res = []
 vecLen = 3095677412
 for i in xrange(vecLen/10**8+1):
 x = i * 10**8
 y = min((i+1) * 10**8, vecLen)
 res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())

 The memory usage of this code on a 24GB file (one value for each nucleotide
 in the human DNA!) is 23g resident memory after the loop is finished (not
 24g for some reason..).

 Running the same code on 1.5.1rc1 gives a resident memory of 23m after the
 loop.
>>>
>>> Your memory measurement tools are misleading you. The same memory is
>>> resident in both cases, just in one case your tools say it is
>>> operating system disk cache (and not attributed to your app), and in
>>> the other case that same memory, treated in the same way by the OS, is
>>> shown as part of your app's resident memory. Virtual memory is
>>> confusing...
>>
>> But the crucial difference is perhaps that the disk cache can be cleared by 
>> the OS if needed, but not the application memory in the same way, which must 
>> be swapped to disk? Or am I still confused?
>>
>> (snip)
>>

 Great! Any idea on whether such a patch may be included in 1.7?
>>>
>>> Not really, if I or you or someone else gets inspired to take the time
>>> to write a patch soon then it will be, otherwise not...
>>>
>>> -N
>>
>> I have now tried to add a patch, in the way you proposed, but I may have 
>> gotten it wrong..
>>
>> http://projects.scipy.org/numpy/ticket/2179
>
> I put this in a github repo, and added tests (author credit to Sveinung)
> https://github.com/thouis/numpy/tree/mmap_children
>
> I'm not sure which branch to issue a PR request against, though.

Looks good to me, thanks to both of you!

Obviously should be merged to master; beyond that I'm not sure. We
definitely want it in 1.7, but I'm not sure if that's been branched
yet or not. (Or rather, it has been branched, but then maybe it was
unbranched again? Travis?) Since it was a 1.6 regression it'd make
sense to cherrypick to the 1.6 branch too, just in case it gets
another release.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change in memmap behaviour

2012-07-03 Thread Thouis (Ray) Jones
On Mon, Jul 2, 2012 at 11:52 PM, Sveinung Gundersen  wrote:
>
> On 2. juli 2012, at 22.40, Nathaniel Smith wrote:
>
>> On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen  
>> wrote:
>>> [snip]
>>>
>>>
>>>
>>> Your actual memory usage may not have increased as much as you think,
>>> since memmap objects don't necessarily take much memory -- it sounds
>>> like you're leaking virtual memory, but your resident set size
>>> shouldn't go up as much.
>>>
>>>
>>> As I understand it, memmap objects retain the contents of the memmap in
>>> memory after it has been read the first time (in a lazy manner). Thus, when
>>> reading a slice of a 24GB file, only that part recides in memory. Our system
>>> reads a slice of a memmap, calculates something (say, the sum), and then
>>> deletes the memmap. It then loops through this for consequitive slices,
>>> retaining a low memory usage. Consider the following code:
>>>
>>> import numpy as np
>>> res = []
>>> vecLen = 3095677412
>>> for i in xrange(vecLen/10**8+1):
>>> x = i * 10**8
>>> y = min((i+1) * 10**8, vecLen)
>>> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())
>>>
>>> The memory usage of this code on a 24GB file (one value for each nucleotide
>>> in the human DNA!) is 23g resident memory after the loop is finished (not
>>> 24g for some reason..).
>>>
>>> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the
>>> loop.
>>
>> Your memory measurement tools are misleading you. The same memory is
>> resident in both cases, just in one case your tools say it is
>> operating system disk cache (and not attributed to your app), and in
>> the other case that same memory, treated in the same way by the OS, is
>> shown as part of your app's resident memory. Virtual memory is
>> confusing...
>
> But the crucial difference is perhaps that the disk cache can be cleared by 
> the OS if needed, but not the application memory in the same way, which must 
> be swapped to disk? Or am I still confused?
>
> (snip)
>
>>>
>>> Great! Any idea on whether such a patch may be included in 1.7?
>>
>> Not really, if I or you or someone else gets inspired to take the time
>> to write a patch soon then it will be, otherwise not...
>>
>> -N
>
> I have now tried to add a patch, in the way you proposed, but I may have 
> gotten it wrong..
>
> http://projects.scipy.org/numpy/ticket/2179

I put this in a github repo, and added tests (author credit to Sveinung)
https://github.com/thouis/numpy/tree/mmap_children

I'm not sure which branch to issue a PR request against, though.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Sveinung Gundersen

On 2. juli 2012, at 22.40, Nathaniel Smith wrote:

> On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen  wrote:
>> [snip]
>> 
>> 
>> 
>> Your actual memory usage may not have increased as much as you think,
>> since memmap objects don't necessarily take much memory -- it sounds
>> like you're leaking virtual memory, but your resident set size
>> shouldn't go up as much.
>> 
>> 
>> As I understand it, memmap objects retain the contents of the memmap in
>> memory after it has been read the first time (in a lazy manner). Thus, when
>> reading a slice of a 24GB file, only that part recides in memory. Our system
>> reads a slice of a memmap, calculates something (say, the sum), and then
>> deletes the memmap. It then loops through this for consequitive slices,
>> retaining a low memory usage. Consider the following code:
>> 
>> import numpy as np
>> res = []
>> vecLen = 3095677412
>> for i in xrange(vecLen/10**8+1):
>> x = i * 10**8
>> y = min((i+1) * 10**8, vecLen)
>> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())
>> 
>> The memory usage of this code on a 24GB file (one value for each nucleotide
>> in the human DNA!) is 23g resident memory after the loop is finished (not
>> 24g for some reason..).
>> 
>> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the
>> loop.
> 
> Your memory measurement tools are misleading you. The same memory is
> resident in both cases, just in one case your tools say it is
> operating system disk cache (and not attributed to your app), and in
> the other case that same memory, treated in the same way by the OS, is
> shown as part of your app's resident memory. Virtual memory is
> confusing...

But the crucial difference is perhaps that the disk cache can be cleared by the 
OS if needed, but not the application memory in the same way, which must be 
swapped to disk? Or am I still confused?

(snip)

>> 
>> Great! Any idea on whether such a patch may be included in 1.7?
> 
> Not really, if I or you or someone else gets inspired to take the time
> to write a patch soon then it will be, otherwise not...
> 
> -N

I have now tried to add a patch, in the way you proposed, but I may have gotten 
it wrong..

http://projects.scipy.org/numpy/ticket/2179

Sveinung
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Nathaniel Smith
On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen  wrote:
> [snip]
>
>
>
> Your actual memory usage may not have increased as much as you think,
> since memmap objects don't necessarily take much memory -- it sounds
> like you're leaking virtual memory, but your resident set size
> shouldn't go up as much.
>
>
> As I understand it, memmap objects retain the contents of the memmap in
> memory after it has been read the first time (in a lazy manner). Thus, when
> reading a slice of a 24GB file, only that part recides in memory. Our system
> reads a slice of a memmap, calculates something (say, the sum), and then
> deletes the memmap. It then loops through this for consequitive slices,
> retaining a low memory usage. Consider the following code:
>
> import numpy as np
> res = []
> vecLen = 3095677412
> for i in xrange(vecLen/10**8+1):
> x = i * 10**8
> y = min((i+1) * 10**8, vecLen)
> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())
>
> The memory usage of this code on a 24GB file (one value for each nucleotide
> in the human DNA!) is 23g resident memory after the loop is finished (not
> 24g for some reason..).
>
> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the
> loop.

Your memory measurement tools are misleading you. The same memory is
resident in both cases, just in one case your tools say it is
operating system disk cache (and not attributed to your app), and in
the other case that same memory, treated in the same way by the OS, is
shown as part of your app's resident memory. Virtual memory is
confusing...

> That said, this is clearly a bug, and it's even worse than you mention
> -- *all* operations on memmap arrays are holding onto references to
> the original mmap object, regardless of whether they share any memory:
>
> a = np.memmap("/etc/passwd", np.uint8, "r")
>
>  # arithmetic
>
> (a + 10)._mmap is a._mmap
>
>  True
>  # fancy indexing (doesn't return a view!)
>
> a[[1, 2, 3]]._mmap is a._mmap
>
>  True
>
> a.sum()._mmap is a._mmap
>
>  True
> Really, only slicing should be returning a np.memmap object at all.
> Unfortunately, it is currently impossible to create an ndarray
> subclass that returns base-class ndarrays from any operations --
> __array_finalize__() has no way to do this. And this is the third
> ndarray subclass in a row that I've looked at that wanted to be able
> to do this, so I guess maybe it's something we should implement...
>
> In the short term, the numpy-upstream fix is to change
> numpy.core.memmap:memmap.__array_finalize__ so that it only copies
> over the ._mmap attribute of its parent if np.may_share_memory(self,
> parent) is True. Patches gratefully accepted ;-)
>
>
> Great! Any idea on whether such a patch may be included in 1.7?

Not really, if I or you or someone else gets inspired to take the time
to write a patch soon then it will be, otherwise not...

-N
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Sveinung Gundersen
[snip]

> 
> Your actual memory usage may not have increased as much as you think,
> since memmap objects don't necessarily take much memory -- it sounds
> like you're leaking virtual memory, but your resident set size
> shouldn't go up as much.

As I understand it, memmap objects retain the contents of the memmap in memory 
after it has been read the first time (in a lazy manner). Thus, when reading a 
slice of a 24GB file, only that part recides in memory. Our system reads a 
slice of a memmap, calculates something (say, the sum), and then deletes the 
memmap. It then loops through this for consequitive slices, retaining a low 
memory usage. Consider the following code:

import numpy as np
res = []
vecLen = 3095677412
for i in xrange(vecLen/10**8+1): 
x = i * 10**8
y = min((i+1) * 10**8, vecLen)
res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())

The memory usage of this code on a 24GB file (one value for each nucleotide in 
the human DNA!) is 23g resident memory after the loop is finished (not 24g for 
some reason..).

Running the same code on 1.5.1rc1 gives a resident memory of 23m after the loop.

> 
> That said, this is clearly a bug, and it's even worse than you mention
> -- *all* operations on memmap arrays are holding onto references to
> the original mmap object, regardless of whether they share any memory:
 a = np.memmap("/etc/passwd", np.uint8, "r")
>  # arithmetic
 (a + 10)._mmap is a._mmap
>  True
>  # fancy indexing (doesn't return a view!)
 a[[1, 2, 3]]._mmap is a._mmap
>  True
 a.sum()._mmap is a._mmap
>  True
> Really, only slicing should be returning a np.memmap object at all.
> Unfortunately, it is currently impossible to create an ndarray
> subclass that returns base-class ndarrays from any operations --
> __array_finalize__() has no way to do this. And this is the third
> ndarray subclass in a row that I've looked at that wanted to be able
> to do this, so I guess maybe it's something we should implement...
> 
> In the short term, the numpy-upstream fix is to change
> numpy.core.memmap:memmap.__array_finalize__ so that it only copies
> over the ._mmap attribute of its parent if np.may_share_memory(self,
> parent) is True. Patches gratefully accepted ;-)

Great! Any idea on whether such a patch may be included in 1.7?

> 
> In the short term, you have a few options for hacky workarounds. You
> could monkeypatch the above fix into the memmap class. You could
> manually assign None to the _mmap attribute of offending arrays (being
> careful only to do this to arrays where you know it is safe!). And for
> reduction operations like sum() in particular, what you have right now
> is not actually a scalar object -- it is a 0-dimensional array that
> holds a single scalar. You can pull this scalar out by calling .item()
> on the array, and then throw away the array itself -- the scalar won't
> have any _mmap attribute.
>  def scalarify(scalar_or_0d_array):
>if isinstance(scalar_or_0d_array, np.ndarray):
>  return scalar_or_0d_array.item()
>else:
>  return scalar_or_0d_array
>  # works on both numpy 1.5 and numpy 1.6:
>  total = scalarify(a.sum())

Thank you for this! However, such a solution would have to be scattered 
throughout the code (probably over 100 places), and I would rather not do that. 
I guess the abovementioned patch would be the best solution. I do not have 
experience in the numpy core code, so I am also eagerly awaiting such a patch!

Sveinung

--
Sveinung Gundersen
PhD Student, Bioinformatics, Dept. of Tumor Biology, Inst. for Cancer Research, 
The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway
E-mail: sveinung.gunder...@medisin.uio.no, Phone: +47 93 00 94 54


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Nathaniel Smith
On Mon, Jul 2, 2012 at 3:53 PM, Sveinung Gundersen  wrote:
> Hi,
>
> We are developing a large project for genome analysis
> (http://hyperbrowser.uio.no), where we use memmap vectors as the basic data
> structure for storage. The stored data are accessed in slices, and used as
> basis for calculations. As the stored data may be large (up to 24 GB), the
> memory footprint is important.
>
> We experienced a problem with 64-bit addressing for the function concatenate
> (using quite old numpy version 1.5.1rc), and have thus updated the version
> of numpy to 1.7.0.dev-651ef74, where the problem has been fixed. We have,
> however, experienced another problem connected to a change in memmap
> behaviour. This change seems to have come with the 1.6 release.
>
> Before (1.5.1rc1):
>
 import platform; print platform.python_version()
> 2.7.0
 import numpy as np
 np.version.version
> '1.5.1rc1'
 a = np.memmap('testmemmap', 'int32', 'w+', shape=20)
 a[:] = 2
 a[0:2]
> memmap([2, 2], dtype=int32)
 a[0:2]._mmap
> 
 a.sum()
> 40
 a.sum()._mmap
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'numpy.int64' object has no attribute '_mmap'
>
> After (1.6.2):
>
 import platform; print platform.python_version()
> 2.7.0
 import numpy as np
 np.version.version
> '1.6.2'
 a = np.memmap('testmemmap', 'int32', 'w+', shape=20)
 a[:] = 2
 a[0:2]
> memmap([2, 2], dtype=int32)
 a[0:2]._mmap
> 
 a.sum()
> memmap(40)
 a.sum()._mmap
> 
>
> The problem is then that doing calculations of memmap objects, resulting in
> scalar results, previously returned a numpy scalar, with no reference to the
> memmap object. We could then just keep the result, and mark the memmap for
> garbage collection. Now, the memory usage of the system has increased
> dramatically, as we now longer have this option.

Your actual memory usage may not have increased as much as you think,
since memmap objects don't necessarily take much memory -- it sounds
like you're leaking virtual memory, but your resident set size
shouldn't go up as much.

That said, this is clearly a bug, and it's even worse than you mention
-- *all* operations on memmap arrays are holding onto references to
the original mmap object, regardless of whether they share any memory:
  >>> a = np.memmap("/etc/passwd", np.uint8, "r")
  # arithmetic
  >>> (a + 10)._mmap is a._mmap
  True
  # fancy indexing (doesn't return a view!)
  >>> a[[1, 2, 3]]._mmap is a._mmap
  True
  >>> a.sum()._mmap is a._mmap
  True
Really, only slicing should be returning a np.memmap object at all.
Unfortunately, it is currently impossible to create an ndarray
subclass that returns base-class ndarrays from any operations --
__array_finalize__() has no way to do this. And this is the third
ndarray subclass in a row that I've looked at that wanted to be able
to do this, so I guess maybe it's something we should implement...

In the short term, the numpy-upstream fix is to change
numpy.core.memmap:memmap.__array_finalize__ so that it only copies
over the ._mmap attribute of its parent if np.may_share_memory(self,
parent) is True. Patches gratefully accepted ;-)

In the short term, you have a few options for hacky workarounds. You
could monkeypatch the above fix into the memmap class. You could
manually assign None to the _mmap attribute of offending arrays (being
careful only to do this to arrays where you know it is safe!). And for
reduction operations like sum() in particular, what you have right now
is not actually a scalar object -- it is a 0-dimensional array that
holds a single scalar. You can pull this scalar out by calling .item()
on the array, and then throw away the array itself -- the scalar won't
have any _mmap attribute.
  def scalarify(scalar_or_0d_array):
if isinstance(scalar_or_0d_array, np.ndarray):
  return scalar_or_0d_array.item()
else:
  return scalar_or_0d_array
  # works on both numpy 1.5 and numpy 1.6:
  total = scalarify(a.sum())

-N
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Sveinung Gundersen
Hi,

We are developing a large project for genome analysis 
(http://hyperbrowser.uio.no), where we use memmap vectors as the basic data 
structure for storage. The stored data are accessed in slices, and used as 
basis for calculations. As the stored data may be large (up to 24 GB), the 
memory footprint is important. 

We experienced a problem with 64-bit addressing for the function concatenate 
(using quite old numpy version 1.5.1rc), and have thus updated the version of 
numpy to 1.7.0.dev-651ef74, where the problem has been fixed. We have, however, 
experienced another problem connected to a change in memmap behaviour. This 
change seems to have come with the 1.6 release.

Before (1.5.1rc1):

>>> import platform; print platform.python_version()
2.7.0
>>> import numpy as np
>>> np.version.version
'1.5.1rc1'
>>> a = np.memmap('testmemmap', 'int32', 'w+', shape=20)
>>> a[:] = 2
>>> a[0:2]
memmap([2, 2], dtype=int32)
>>> a[0:2]._mmap

>>> a.sum()
40
>>> a.sum()._mmap
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'numpy.int64' object has no attribute '_mmap'

After (1.6.2):
>>> import platform; print platform.python_version()
2.7.0
>>> import numpy as np
>>> np.version.version
'1.6.2'
>>> a = np.memmap('testmemmap', 'int32', 'w+', shape=20)
>>> a[:] = 2
>>> a[0:2]
memmap([2, 2], dtype=int32)
>>> a[0:2]._mmap

>>> a.sum()
memmap(40)
>>> a.sum()._mmap


The problem is then that doing calculations of memmap objects, resulting in 
scalar results, previously returned a numpy scalar, with no reference to the 
memmap object. We could then just keep the result, and mark the memmap for 
garbage collection. Now, the memory usage of the system has increased 
dramatically, as we now longer have this option.

So, the question is twofold:

1) What is the reason behind this change? It makes sense to keep the reference 
to the mmap when slicing, but to go from a scalar value to the mmap does not 
seem very useful. Is there a possibility to return to the old solution?
2) If not, do you have any advice how we can retain the old solution without 
rewriting the system. We could cast the results of all functions on the memmap, 
but these are scattered throughout the system and would probably cause much 
headache. So we would rather implement a general solution, for instance 
wrapping the memmap object somehow. Do you have any ideas?

Connected to this is the rather puzzling fact that the 'new' memmap scalar 
object has an __iter__ method, but no length. Should not the __iter__ method be 
removed, as this signals that the object is iterable?

Before (1.5.1rc1):
>>> a[0:2].__iter__()

>>> len(a[0:2])
2
>>> a.sum().__iter__
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'numpy.int64' object has no attribute '__iter__'
>>> len(a.sum())
Traceback (most recent call last):
  File "", line 1, in 
TypeError: object of type 'numpy.int64' has no len()

After (1.6.2):
>>> a[0:2].__iter__()

>>> len(a[0:2])  
2
>>> a.sum().__iter__

>>> len(a.sum())
Traceback (most recent call last):
  File "", line 1, in 
TypeError: len() of unsized object
>>> [x for x in a.sum()]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: iteration over a 0-d array

Regards,
Sveinung Gundersen


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion