Martin MOKREJŠ wrote: > Hi, > could someone tell me what all does and what all doesn't copy > references in python. I have found my script after reaching some > state and taking say 600MB, pushes it's internal dictionaries > to hard disk. The for loop consumes another 300MB (as gathered > by vmstat) to push the data to dictionaries, then releases > little bit less than 300MB and the program start to fill-up > again it's internal dictionaries, when "full" will do the > flush again ...
Almost anything you do copies references. > > The point here is, that this code takes a lot of extra memory. > I believe it's the references problem, and I remeber complains > of frineds facing same problem. I'm a newbie, yes, but don't > have this problem with Perl. OK, I want to improve my Pyhton > knowledge ... :-)) > > > > <long code extract snipped> > > > The above routine doesn't release of the memory back when it > exits. That's probably because there isn't any memory it can reasonable be expected to release. What memory would *you* expect it to release? The member variables are all still accessible as member variables until you run your loop at the end to clear them, so no way could Python release them. Some hints: When posting code, try to post complete examples which actually work. I don't know what type the self._dict_on_diskXX variables are supposed to be. It makes a big difference if they are dictionaries (so you are trying to hold everything in memory at one time) or shelve.Shelf objects which would store the values on disc in a reasonably efficient manner. Even if they are Shelf objects, I see no reason here why you have to process everything at once. Write a simple function which processes one tmpdict object into one dict_on_disk object and then closes the dict_on_disk object. If you want to compare results later then do that by reopening the dict_on_disk objects when you have deleted all the tmpdicts. Extract out everything you want to do into a class which has at most one tmpdict and one dict_on_disk That way your code will be a lot easier to read. Make your code more legible by using fewer underscores. What on earth is the point of an explicit call to __add__? If Guido had meant us to use __add__ he woudn't have created '+'. What is the purpose of dict_on_disk? Is it for humans to read the data? If not, then don't store everything as a string. Far better to just store a tuple of your values then you don't have to use split or cast the strings to integers. If you do want humans to read some final output then produce that separately from the working data files. You split out 4 values from dict_on_disk and set three of them to 0. If that really what you meant or should you be preserving the previous values? Here is some (untested) code which might help you: import shelve def push_to_disc(data, filename): database = shelve.open(filename) try: for key in data: if database.has_key(key): count, a, b, expected = database[key] database[key] = count+data[key], a, b, expected else: database[key] = data[key], 0, 0, 0 finally: database.close() data.clear() Call that once for each input dictionary and your data will be written out to a disc file and the internal dictionary cleared without any great spike of memory use. -- http://mail.python.org/mailman/listinfo/python-list