On Tue, Jul 3, 2012 at 4:08 PM, Nathaniel Smith <n...@pobox.com> wrote:
> On Tue, Jul 3, 2012 at 10:35 AM, Thouis (Ray) Jones <tho...@gmail.com> wrote:
>> On Mon, Jul 2, 2012 at 11:52 PM, Sveinung Gundersen <svein...@gmail.com> 
>> wrote:
>>>
>>> On 2. juli 2012, at 22.40, Nathaniel Smith wrote:
>>>
>>>> On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen <svein...@gmail.com> 
>>>> wrote:
>>>>> [snip]
>>>>>
>>>>>
>>>>>
>>>>> Your actual memory usage may not have increased as much as you think,
>>>>> since memmap objects don't necessarily take much memory -- it sounds
>>>>> like you're leaking virtual memory, but your resident set size
>>>>> shouldn't go up as much.
>>>>>
>>>>>
>>>>> As I understand it, memmap objects retain the contents of the memmap in
>>>>> memory after it has been read the first time (in a lazy manner). Thus, 
>>>>> when
>>>>> reading a slice of a 24GB file, only that part recides in memory. Our 
>>>>> system
>>>>> reads a slice of a memmap, calculates something (say, the sum), and then
>>>>> deletes the memmap. It then loops through this for consequitive slices,
>>>>> retaining a low memory usage. Consider the following code:
>>>>>
>>>>> import numpy as np
>>>>> res = []
>>>>> vecLen = 3095677412
>>>>> for i in xrange(vecLen/10**8+1):
>>>>> x = i * 10**8
>>>>> y = min((i+1) * 10**8, vecLen)
>>>>> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())
>>>>>
>>>>> The memory usage of this code on a 24GB file (one value for each 
>>>>> nucleotide
>>>>> in the human DNA!) is 23g resident memory after the loop is finished (not
>>>>> 24g for some reason..).
>>>>>
>>>>> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the
>>>>> loop.
>>>>
>>>> Your memory measurement tools are misleading you. The same memory is
>>>> resident in both cases, just in one case your tools say it is
>>>> operating system disk cache (and not attributed to your app), and in
>>>> the other case that same memory, treated in the same way by the OS, is
>>>> shown as part of your app's resident memory. Virtual memory is
>>>> confusing...
>>>
>>> But the crucial difference is perhaps that the disk cache can be cleared by 
>>> the OS if needed, but not the application memory in the same way, which 
>>> must be swapped to disk? Or am I still confused?
>>>
>>> (snip)
>>>
>>>>>
>>>>> Great! Any idea on whether such a patch may be included in 1.7?
>>>>
>>>> Not really, if I or you or someone else gets inspired to take the time
>>>> to write a patch soon then it will be, otherwise not...
>>>>
>>>> -N
>>>
>>> I have now tried to add a patch, in the way you proposed, but I may have 
>>> gotten it wrong..
>>>
>>> http://projects.scipy.org/numpy/ticket/2179
>>
>> I put this in a github repo, and added tests (author credit to Sveinung)
>> https://github.com/thouis/numpy/tree/mmap_children
>>
>> I'm not sure which branch to issue a PR request against, though.
>
> Looks good to me, thanks to both of you!
>
> Obviously should be merged to master; beyond that I'm not sure. We
> definitely want it in 1.7, but I'm not sure if that's been branched
> yet or not. (Or rather, it has been branched, but then maybe it was
> unbranched again? Travis?) Since it was a 1.6 regression it'd make
> sense to cherrypick to the 1.6 branch too, just in case it gets
> another release.

Merged into master and maintenance/1.6.x, but not maintenance/1.7.x,
I'll let Ondrej or Travis figure that out...

-N
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to