On Tue, Jul 3, 2012 at 4:08 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Tue, Jul 3, 2012 at 10:35 AM, Thouis (Ray) Jones <tho...@gmail.com> wrote: >> On Mon, Jul 2, 2012 at 11:52 PM, Sveinung Gundersen <svein...@gmail.com> >> wrote: >>> >>> On 2. juli 2012, at 22.40, Nathaniel Smith wrote: >>> >>>> On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen <svein...@gmail.com> >>>> wrote: >>>>> [snip] >>>>> >>>>> >>>>> >>>>> Your actual memory usage may not have increased as much as you think, >>>>> since memmap objects don't necessarily take much memory -- it sounds >>>>> like you're leaking virtual memory, but your resident set size >>>>> shouldn't go up as much. >>>>> >>>>> >>>>> As I understand it, memmap objects retain the contents of the memmap in >>>>> memory after it has been read the first time (in a lazy manner). Thus, >>>>> when >>>>> reading a slice of a 24GB file, only that part recides in memory. Our >>>>> system >>>>> reads a slice of a memmap, calculates something (say, the sum), and then >>>>> deletes the memmap. It then loops through this for consequitive slices, >>>>> retaining a low memory usage. Consider the following code: >>>>> >>>>> import numpy as np >>>>> res = [] >>>>> vecLen = 3095677412 >>>>> for i in xrange(vecLen/10**8+1): >>>>> x = i * 10**8 >>>>> y = min((i+1) * 10**8, vecLen) >>>>> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum()) >>>>> >>>>> The memory usage of this code on a 24GB file (one value for each >>>>> nucleotide >>>>> in the human DNA!) is 23g resident memory after the loop is finished (not >>>>> 24g for some reason..). >>>>> >>>>> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the >>>>> loop. >>>> >>>> Your memory measurement tools are misleading you. The same memory is >>>> resident in both cases, just in one case your tools say it is >>>> operating system disk cache (and not attributed to your app), and in >>>> the other case that same memory, treated in the same way by the OS, is >>>> shown as part of your app's resident memory. Virtual memory is >>>> confusing... >>> >>> But the crucial difference is perhaps that the disk cache can be cleared by >>> the OS if needed, but not the application memory in the same way, which >>> must be swapped to disk? Or am I still confused? >>> >>> (snip) >>> >>>>> >>>>> Great! Any idea on whether such a patch may be included in 1.7? >>>> >>>> Not really, if I or you or someone else gets inspired to take the time >>>> to write a patch soon then it will be, otherwise not... >>>> >>>> -N >>> >>> I have now tried to add a patch, in the way you proposed, but I may have >>> gotten it wrong.. >>> >>> http://projects.scipy.org/numpy/ticket/2179 >> >> I put this in a github repo, and added tests (author credit to Sveinung) >> https://github.com/thouis/numpy/tree/mmap_children >> >> I'm not sure which branch to issue a PR request against, though. > > Looks good to me, thanks to both of you! > > Obviously should be merged to master; beyond that I'm not sure. We > definitely want it in 1.7, but I'm not sure if that's been branched > yet or not. (Or rather, it has been branched, but then maybe it was > unbranched again? Travis?) Since it was a 1.6 regression it'd make > sense to cherrypick to the 1.6 branch too, just in case it gets > another release.
Merged into master and maintenance/1.6.x, but not maintenance/1.7.x, I'll let Ondrej or Travis figure that out... -N _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion