[issue15504] pickle/cPickle saves invalid/incomplete data
New submission from Philipp Lies: I just stumbled upon a very serious bug in cPickle where cPickle stores the data passed to it only partially without a warning/error: #creating a 8GB long random data sting import os import cPickle random_string = os.urandom(int(1.1*2**33)) print len(random_string) fout = open('test.pickle', 'wb') cPickle.dump(random_string, fout, 2) fout.close() fin = open('test.pickle', 'rb') random_string2 = cPickle.load(fin) print len(random_string2) print random_string == random_string2 The loaded string is significantly shorter, meaning that some of the data got lost while storing the string. This is a serious issue. However, when I use pickle, writing fails with error: 'i' format requires -2147483648 = number = 2147483647 so I guess pickle is not able to handle large data, therefore cPickle should either throw an error as well of pickle/cPickle should be patched to handle larger data. Code to reproduce error using numpy (that's how I stumbled upon it): import numpy as np import cPickle as pickle A = np.random.randn(1080,1920,553) fout = open('test.pickle', 'wb') pickle.dump(A, fout, 2) fout.close() fin = open('test.pickle', 'rb') B = pickle.load(fin) Here, numpy detects that the amount of data is wrong and throws an error. However, still serious because saving does not lead to an error so the user expects that the data are safely stored. I guess might be related to http://bugs.python.org/issue13555 which is still open. Python 2.7.3 on latest Ubuntu with numpy 1.6.2, 64bit architecture, 128GB RAM -- messages: 166906 nosy: Philipp.Lies priority: normal severity: normal status: open title: pickle/cPickle saves invalid/incomplete data type: crash versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15504 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13555] cPickle MemoryError when loading large file (while pickle works)
Philipp Lies p...@bethgelab.org added the comment: a) it's 122GB free RAM (out of 128GB total RAM) b) when I convert the numpy array to a list it works. So seems to be a problem with cPickle and numpy at/from a certain array size c) $ /usr/bin/time -v python test_np.py Traceback (most recent call last): File test_np.py, line 12, in module A2 = cPickle.load(f2) MemoryError Command exited with non-zero status 1 Command being timed: python test_np.py User time (seconds): 73.72 System time (seconds): 4.56 Percent of CPU this job got: 87% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:29.52 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 7402448 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 726827 Voluntary context switches: 41043 Involuntary context switches: 7793 Swaps: 0 File system inputs: 3368 File system outputs: 2180744 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1 hth -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13555 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13555] cPickle MemoryError when loading large file (while pickle works)
Philipp Lies p...@bethgelab.org added the comment: Well, replace cPickle by pickle and it works. So if there is a memory allocation problem cPickle should be able to handle it, especially since it should be completely compatible to pickle. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13555 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com