[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects
Change by Paul Ellenbogen : Removed file: https://bugs.python.org/file48278/dump.py ___ Python tracker <https://bugs.python.org/issue36694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects
Change by Paul Ellenbogen : Removed file: https://bugs.python.org/file48281/dump.py ___ Python tracker <https://bugs.python.org/issue36694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects
Change by Paul Ellenbogen : Added file: https://bugs.python.org/file48282/dump.py ___ Python tracker <https://bugs.python.org/issue36694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects
Paul Ellenbogen added the comment: Good point. I have created a new version of dump that uses random() instead. float reuse explains the getsizeof difference, but there is still a significant memory usage difference. This makes sense to me because the original code I saw this issue in is more analogous to random() -- Added file: https://bugs.python.org/file48281/dump.py ___ Python tracker <https://bugs.python.org/issue36694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects
Change by Paul Ellenbogen : Added file: https://bugs.python.org/file48280/common.py ___ Python tracker <https://bugs.python.org/issue36694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects
New submission from Paul Ellenbogen : Python encounters significant memory fragmentation when unpickling many small objects. I have attached two scripts that I believe demonstrate the issue. When you run "dumpy.py" it will generate a large list of namedtuples, then write that list to a file using pickle. Before it does so, it pauses for user input. Before exiting the script you can view the memory usage in htop or whatever your preferred method is. The "load.py" script loads the file written by dump.py. After loading the data is complete, it waits for user input. The memory usage at the point where the script is waiting for user input is (more than) twice as much in the "load" case as the "dump" case. The small objects in the list I am storing have 3 values, and I have tested three alternative representations: tuple, namedtuple, and a custom class. The namedtuple and custom class both have the memory use/fragmentation issue. The built in tuple type does not have this issue. Using optimize in pickletools doesn't seem to make a difference. Matthew Cowles from the python help list had some good suggestions, and found that the object size themselves, as observed by sys.getsizeof was different before and after pickling. Perhaps this is something other than memory fragmentation, or something in addition to memory fragmentation. Although high water mark is similar for both scripts, the pickling script settles down on a reasonably smaller memory footprint. I would still consider the long run memory waste of unpickling a bug. For example in my use case I will run one instance of the equivalent of pickling script, then run many many instances of the script that unpickles. These scripts were run with Python 3.6.7 (GCC 8.2.0) on Ubuntu 18.10. -- components: Library (Lib) files: dump.py messages: 340615 nosy: Ellenbogen, alexandre.vassalotti priority: normal severity: normal status: open title: Excessive memory use or memory fragmentation when unpickling many small objects type: resource usage versions: Python 3.6 Added file: https://bugs.python.org/file48278/dump.py ___ Python tracker <https://bugs.python.org/issue36694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects
Change by Paul Ellenbogen : Added file: https://bugs.python.org/file48279/load.py ___ Python tracker <https://bugs.python.org/issue36694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26773] Shelve works inconsistently when carried over to child processes
Paul Ellenbogen added the comment: I think this behavior is due to the underlying behavior of the dbm. The same code using dbm, rather than shelve, also throws KeyErrors: from multiprocessing import Process import dbm db = dbm.open("example.dbm", "c") for i in range(100): db[str(i)] = str(i ** 2) def parallel(): for i in range(100): print(db[str(i)]) a = Process(target = parallel) b = Process(target = parallel) a.start() b.start() a.join() b.join() -- ___ Python tracker <http://bugs.python.org/issue26773> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26773] Shelve works inconsistently when carried over to child processes
New submission from Paul Ellenbogen: If a shelve is opened, then the processed forked, sometime the shelve will appear to work in the child, and other times it will throw a KeyError. I suspect the order of element access may trigger the issue. I have included a python script that will exhibit the error. It may need to be run a few times. If shelve is not meant to be inherited by the child process in this way, it should consistently throw an error (probably not a KeyError) on any use, including the first. This way it can be caught in the child, and the shelve can potentially be reopened in the child. A current workaround is to find all places where a process may fork, and reopen any shelves in the child process after the fork. This may work for most smaller scripts. This could become tedious in more complex applications that fork in multiple places and open shelves in multiple places. --- Running #!/usr/bin/env python3 import multiprocessing import platform import sys print(sys.version) print(multiprocessing.cpu_count()) print(platform.platform()) outputs: 3.4.3+ (default, Oct 14 2015, 16:03:50) [GCC 5.2.1 20151010] 8 Linux-4.2.0-34-generic-x86_64-with-Ubuntu-15.10-wily -- components: Interpreter Core files: shelve_process.py messages: 263522 nosy: Paul Ellenbogen priority: normal severity: normal status: open title: Shelve works inconsistently when carried over to child processes versions: Python 3.4, Python 3.5 Added file: http://bugs.python.org/file42475/shelve_process.py ___ Python tracker <http://bugs.python.org/issue26773> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com