On 18.04.2014 18:29, Valentin Haenel wrote: > Hi, > > * Valentin Haenel <valen...@haenel.co> [2014-04-17]: >> * Valentin Haenel <valen...@haenel.co> [2014-04-17]: >>> * Julian Taylor <jtaylor.deb...@googlemail.com> [2014-04-17]: >>>> On 17.04.2014 21:30, onefire wrote: >>>>> Thanks for the suggestion. I did profile the program before, just not >>>>> using Python. >>>> >>>> one problem of npz is that the zipfile module does not support streaming >>>> data in (or if it does now we aren't using it). >>>> So numpy writes the file uncompressed to disk and then zips it which is >>>> horrible for performance and disk usage. >>> >>> As a workaround may also be possible to write the temporary NPY files to >>> cStringIO instances and then use ``ZipFile.writestr`` with the >>> ``getvalue()`` of the cStringIO object. However that approach may >>> require some memory. In python 2.7, for each array: one copy inside the >>> cStringIO instance and then another copy of when calling getvalue on the >>> cString, I believe. >> >> There is a proof-of-concept implementation here: >> >> https://github.com/esc/numpy/compare/feature;npz_no_temp_file > > Anybody interested in me fixing this up (unit tests, API, etc..) for > inclusion? >
I wonder if it would be better to instead use a fifo to avoid the memory doubling. Windows probably hasn't got them (exposed via python) but one can slap a platform check in front. attached a proof of concept without proper error handling (which is unfortunately the tricky part)
>From 472b4c0a44804b65d0774147010ec7a931a1c52d Mon Sep 17 00:00:00 2001 From: Julian Taylor <jtaylor.deb...@googlemail.com> Date: Thu, 17 Apr 2014 23:01:47 +0200 Subject: [PATCH] use a pipe for savez --- numpy/lib/npyio.py | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/numpy/lib/npyio.py b/numpy/lib/npyio.py index 98b4b6e..baafa9d 100644 --- a/numpy/lib/npyio.py +++ b/numpy/lib/npyio.py @@ -585,22 +585,19 @@ def _savez(file, args, kwds, compress): zipf = zipfile_factory(file, mode="w", compression=compression) # Stage arrays in a temporary file on disk, before writing to zip. - fd, tmpfile = tempfile.mkstemp(suffix='-numpy.npy') - os.close(fd) - try: + import threading + with tempfile.TemporaryDirectory() as td: + fifoname = os.path.join(td, "fifo") + os.mkfifo(fifoname) for key, val in namedict.items(): fname = key + '.npy' - fid = open(tmpfile, 'wb') - try: - format.write_array(fid, np.asanyarray(val)) - fid.close() - fid = None - zipf.write(tmpfile, arcname=fname) - finally: - if fid: - fid.close() - finally: - os.remove(tmpfile) + def mywrite(pipe, val): + with open(pipe, "wb") as wpipe: + format.write_array(wpipe, np.asanyarray(val)) + t = threading.Thread(target=mywrite, args=(fifoname, val)) + t.start() + zipf.write(fifoname, arcname=fname) + t.join() zipf.close() -- 1.9.1
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion