Would you try the pull request in this issue? https://bugs.python.org/issue36694
I'm not sure this issue is relating to you because I don't know about your data. Regards, On Sun, Dec 1, 2019 at 10:14 AM José María Mateos <ch...@rinzewind.org> wrote: > > Hi, > > I just asked this question on the IRC channel but didn't manage to get a > response, though some people replied with suggestions that expanded this > question a bit. > > I have a program that has to read some pickle files, perform some > operations on them, and then return. The pickle objects I am reading all > have the same structure, which consists of a single list with two > elements: the first one is a long list, the second one is a numpy > object. > > I found out that, after calling that function, the memory taken by the > Python executable (monitored using htop -- the entire thing runs on > Python 3.6 on an Ubuntu 16.04, pretty standard conda installation with a > few packages installed directly using `conda install`) increases in > proportion to the size of the pickle object being read. My intuition is > that that memory should be free upon exiting. > > Does pickle keep a cache of objects in memory after they have been > returned? I thought that could be the answer, but then someone suggested > to measure the time it takes to load the objects. This is a script I > wrote to test this; nothing(filepath) just loads the pickle file, > doesn't do anything with the output and returns how long it took to > perform the load operation. > > --- > import glob > import pickle > import timeit > import os > import psutil > > def nothing(filepath): > start = timeit.default_timer() > with open(filepath, 'rb') as f: > _ = pickle.load(f) > return timeit.default_timer() - start > > if __name__ == "__main__": > > filelist = glob.glob('/tmp/test/*.pk') > > for i, filepath in enumerate(filelist): > print("Size of file {}: {}".format(i, os.path.getsize(filepath))) > print("First call:", nothing(filepath)) > print("Second call:", nothing(filepath)) > print("Memory usage:", psutil.Process(os.getpid()).memory_info().rss) > print() > --- > > This is the output of the second time the script was run, to avoid any > effects of potential IO caches: > > --- > Size of file 0: 11280531 > First call: 0.1466723980847746 > Second call: 0.10044755204580724 > Memory usage: 49418240 > > Size of file 1: 8955825 > First call: 0.07904054620303214 > Second call: 0.07996074995025992 > Memory usage: 49831936 > > Size of file 2: 43727266 > First call: 0.37741047400049865 > Second call: 0.38176894187927246 > Memory usage: 49758208 > > Size of file 3: 31122090 > First call: 0.271301960805431 > Second call: 0.27462846506386995 > Memory usage: 49991680 > > Size of file 4: 634456686 > First call: 5.526095286011696 > Second call: 5.558765463065356 > Memory usage: 539324416 > > Size of file 5: 3349952658 > First call: 29.50982437795028 > Second call: 29.461691531119868 > Memory usage: 3443597312 > > Size of file 6: 9384929 > First call: 0.0826977719552815 > Second call: 0.08362263604067266 > Memory usage: 3443597312 > > Size of file 7: 422137 > First call: 0.0057482069823890924 > Second call: 0.005949910031631589 > Memory usage: 3443597312 > > Size of file 8: 409458799 > First call: 3.562588643981144 > Second call: 3.6001368327997625 > Memory usage: 3441451008 > > Size of file 9: 44843816 > First call: 0.39132978999987245 > Second call: 0.398518088972196 > Memory usage: 3441451008 > --- > > Notice that memory usage increases noticeably specially on files 4 and > 5, the biggest ones, and doesn't come down as I would expect it to. But > the loading time is constant, so I think I can disregard any pickle > caching mechanisms. > > So I guess now my question is: can anyone give me any pointers as to why > is this happening? Any help is appreciated. > > Thanks, > > -- > José María (Chema) Mateos || https://rinzewind.org/ > -- > https://mail.python.org/mailman/listinfo/python-list -- Inada Naoki <songofaca...@gmail.com> -- https://mail.python.org/mailman/listinfo/python-list