[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects

2019-04-21 Thread Paul Ellenbogen


Change by Paul Ellenbogen :


Removed file: https://bugs.python.org/file48278/dump.py

___
Python tracker 
<https://bugs.python.org/issue36694>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects

2019-04-21 Thread Paul Ellenbogen


Change by Paul Ellenbogen :


Removed file: https://bugs.python.org/file48281/dump.py

___
Python tracker 
<https://bugs.python.org/issue36694>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects

2019-04-21 Thread Paul Ellenbogen


Change by Paul Ellenbogen :


Added file: https://bugs.python.org/file48282/dump.py

___
Python tracker 
<https://bugs.python.org/issue36694>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects

2019-04-21 Thread Paul Ellenbogen


Paul Ellenbogen  added the comment:

Good point. I have created a new version of dump that uses random() instead. 
float reuse explains the getsizeof difference, but there is still a significant 
memory usage difference. This makes sense to me because the original code I saw 
this issue in is more analogous to random()

--
Added file: https://bugs.python.org/file48281/dump.py

___
Python tracker 
<https://bugs.python.org/issue36694>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects

2019-04-21 Thread Paul Ellenbogen


Change by Paul Ellenbogen :


Added file: https://bugs.python.org/file48280/common.py

___
Python tracker 
<https://bugs.python.org/issue36694>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects

2019-04-21 Thread Paul Ellenbogen


New submission from Paul Ellenbogen :

Python encounters significant memory fragmentation when unpickling many small 
objects.

I have attached two scripts that I believe demonstrate the issue. When you run 
"dumpy.py" it will generate a large list of namedtuples, then write that list 
to a file using pickle. Before it does so, it pauses for user input. Before 
exiting the script you can view the memory usage in htop or whatever your 
preferred method is.

The "load.py" script loads the file written by dump.py. After loading the data 
is complete, it waits for user input. The memory usage at the point where the 
script is waiting for user input is (more than) twice as much in the "load" 
case as the "dump" case.

The small objects in the list I am storing have 3 values, and I have tested 
three alternative representations: tuple, namedtuple, and a custom class. The 
namedtuple and custom class both have the memory use/fragmentation issue. The 
built in tuple type does not have this issue. Using optimize in pickletools 
doesn't seem to make a difference.

Matthew Cowles from the python help list had some good suggestions, and found 
that the object size themselves, as observed by sys.getsizeof was different 
before and after pickling. Perhaps this is something other than memory 
fragmentation, or something in addition to memory fragmentation.

Although high water mark is similar for both scripts, the pickling script 
settles down on a reasonably smaller memory footprint. I would still consider 
the long run memory waste of unpickling a bug. For example in my use case I 
will run one instance of the equivalent of pickling script, then run many many 
instances of the script that unpickles.


These scripts were run with Python 3.6.7 (GCC 8.2.0) on Ubuntu 18.10.

--
components: Library (Lib)
files: dump.py
messages: 340615
nosy: Ellenbogen, alexandre.vassalotti
priority: normal
severity: normal
status: open
title: Excessive memory use or memory fragmentation when unpickling many small 
objects
type: resource usage
versions: Python 3.6
Added file: https://bugs.python.org/file48278/dump.py

___
Python tracker 
<https://bugs.python.org/issue36694>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36694] Excessive memory use or memory fragmentation when unpickling many small objects

2019-04-21 Thread Paul Ellenbogen


Change by Paul Ellenbogen :


Added file: https://bugs.python.org/file48279/load.py

___
Python tracker 
<https://bugs.python.org/issue36694>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26773] Shelve works inconsistently when carried over to child processes

2016-04-17 Thread Paul Ellenbogen

Paul Ellenbogen added the comment:

I think this behavior is due to the underlying behavior of the dbm. The same 
code using dbm, rather than shelve, also throws KeyErrors:

from multiprocessing import Process
import dbm

db = dbm.open("example.dbm", "c")
for i in range(100):
db[str(i)] = str(i ** 2)


def parallel():
for i in range(100):
print(db[str(i)])

a = Process(target = parallel)
b = Process(target = parallel)
a.start()
b.start()
a.join()
b.join()

--

___
Python tracker 
<http://bugs.python.org/issue26773>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26773] Shelve works inconsistently when carried over to child processes

2016-04-15 Thread Paul Ellenbogen

New submission from Paul Ellenbogen:

If a shelve is opened, then the processed forked, sometime the shelve will 
appear to work in the child, and other times it will throw a KeyError. I 
suspect the order of element access may trigger the issue. I have included a 
python script that will exhibit the error. It may need to be run a few times.

If shelve is not meant to be inherited by the child process in this way, it 
should consistently throw an error (probably not a KeyError) on any use, 
including the first. This way it can be caught in the child, and the shelve can 
potentially be reopened in the child.

A current workaround is to find all places where a process may fork, and reopen 
any shelves in the child process after the fork. This may work for most smaller 
scripts. This could become tedious in more complex applications that fork in 
multiple places and open shelves in multiple places.

---

Running

#!/usr/bin/env python3

import multiprocessing
import platform
import sys

print(sys.version)
print(multiprocessing.cpu_count())
print(platform.platform())


outputs:
3.4.3+ (default, Oct 14 2015, 16:03:50) 
[GCC 5.2.1 20151010]
8
Linux-4.2.0-34-generic-x86_64-with-Ubuntu-15.10-wily

--
components: Interpreter Core
files: shelve_process.py
messages: 263522
nosy: Paul Ellenbogen
priority: normal
severity: normal
status: open
title: Shelve works inconsistently when carried over to child processes
versions: Python 3.4, Python 3.5
Added file: http://bugs.python.org/file42475/shelve_process.py

___
Python tracker 
<http://bugs.python.org/issue26773>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com