New submission from Antoine Pitrou <[email protected]>:
On a 8GB RAM box (more than 6GB free), serializing many small objects can eat
all memory, while the end result would take around 600MB on an UCS2 build:
$ LANG=C time opt/python -c "import json; l = [1] * (100*1024*1024); encoded =
json.dumps(l)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/antoine/cpython/opt/Lib/json/__init__.py", line 224, in dumps
return _default_encoder.encode(obj)
File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 188, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 246, in iterencode
return _iterencode(o, 0)
MemoryError
Command exited with non-zero status 1
11.25user 2.43system 0:13.72elapsed 99%CPU (0avgtext+0avgdata
27820320maxresident)k
2920inputs+0outputs (12major+1261388minor)pagefaults 0swaps
I suppose the encoder internally builds a large list of very small unicode
objects, and only joins them at the end. Probably we could join it by chunks so
as to avoid this behaviour.
----------
messages: 142338
nosy: ezio.melotti, pitrou, rhettinger
priority: normal
severity: normal
status: open
title: JSON-serializing a large container takes too much memory
type: resource usage
versions: Python 3.3
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12778>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com