On 09/25/2012 12:21 AM, Junkshops wrote: >> Just curious; which is it, two million lines, or half a million bytes? <snip> > > Sorry, that should've been a 500Mb, 2M line file. > >> which machine is 2gb, the Windows machine, or the VM? > VM. Winders is 4gb. > >> ...but I would point out that just because >> you free up the memory from the Python doesn't mean it gets released >> back to the system. The C runtime manages its own heap, and is pretty >> persistent about hanging onto memory once obtained. It's not normally a >> problem, since most small blocks are reused. But it can get >> fragmented. And i have no idea how well Virtual Box maps the Linux >> memory map into the Windows one. > Right, I understand that - but what's confusing me is that, given the > memory use is (I assume) monotonically increasing, the code should never > use more than what's reported by heapy once all the data is loaded into > memory, given that memory released by the code to the Python runtime is > reused. To the best of my ability to tell I'm not storing anything I > shouldn't, so the only thing I can think of is that all the object > creation and destruction, for some reason, it preventing reuse of > memory. I'm at a bit of a loss regarding what to try next.
I'm not familiar with heapy, but perhaps it's missing something there. I'm a bit surprised you aren't beyond the 2gb limit, just with the structures you describe for the file. You do realize that each object has quite a few bytes of overhead, so it's not surprising to use several times the size of a file, to store the file in an organized way. I also wonder if heapy has been written to take into account the larger size of pointers in a 64bit build. Perhaps one way to save space would be to use a long to store those md5 values. You'd have to measure it, but I suspect it'd help (at the cost of lots of extra hexlify-type calls). Another thing is to make sure that the md5 object used in your two maps is the same object, and not just one with the same value. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list