On 18.04.2014 18:29, Valentin Haenel wrote:
> Hi,
> 
> * Valentin Haenel <valen...@haenel.co> [2014-04-17]:
>> * Valentin Haenel <valen...@haenel.co> [2014-04-17]:
>>> * Julian Taylor <jtaylor.deb...@googlemail.com> [2014-04-17]:
>>>> On 17.04.2014 21:30, onefire wrote:
>>>>> Thanks for the suggestion. I did profile the program before, just not
>>>>> using Python.
>>>>
>>>> one problem of npz is that the zipfile module does not support streaming
>>>> data in (or if it does now we aren't using it).
>>>> So numpy writes the file uncompressed to disk and then zips it which is
>>>> horrible for performance and disk usage.
>>>
>>> As a workaround may also be possible to write the temporary NPY files to
>>> cStringIO instances and then use ``ZipFile.writestr`` with the
>>> ``getvalue()`` of the cStringIO object. However that approach may
>>> require some memory. In python 2.7, for each array: one copy inside the
>>> cStringIO instance and then another copy of when calling getvalue on the
>>> cString, I believe.
>>
>> There is a proof-of-concept implementation here:
>>
>> https://github.com/esc/numpy/compare/feature;npz_no_temp_file
> 
> Anybody interested in me fixing this up (unit tests, API, etc..) for
> inclusion?
> 

I wonder if it would be better to instead use a fifo to avoid the memory
doubling. Windows probably hasn't got them (exposed via python) but one
can slap a platform check in front.
attached a proof of concept without proper error handling (which is
unfortunately the tricky part)
>From 472b4c0a44804b65d0774147010ec7a931a1c52d Mon Sep 17 00:00:00 2001
From: Julian Taylor <jtaylor.deb...@googlemail.com>
Date: Thu, 17 Apr 2014 23:01:47 +0200
Subject: [PATCH] use a pipe for savez

---
 numpy/lib/npyio.py | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/numpy/lib/npyio.py b/numpy/lib/npyio.py
index 98b4b6e..baafa9d 100644
--- a/numpy/lib/npyio.py
+++ b/numpy/lib/npyio.py
@@ -585,22 +585,19 @@ def _savez(file, args, kwds, compress):
     zipf = zipfile_factory(file, mode="w", compression=compression)
 
     # Stage arrays in a temporary file on disk, before writing to zip.
-    fd, tmpfile = tempfile.mkstemp(suffix='-numpy.npy')
-    os.close(fd)
-    try:
+    import threading
+    with tempfile.TemporaryDirectory() as td:
+        fifoname = os.path.join(td, "fifo")
+        os.mkfifo(fifoname)
         for key, val in namedict.items():
             fname = key + '.npy'
-            fid = open(tmpfile, 'wb')
-            try:
-                format.write_array(fid, np.asanyarray(val))
-                fid.close()
-                fid = None
-                zipf.write(tmpfile, arcname=fname)
-            finally:
-                if fid:
-                    fid.close()
-    finally:
-        os.remove(tmpfile)
+            def mywrite(pipe, val):
+                with open(pipe, "wb") as wpipe:
+                    format.write_array(wpipe, np.asanyarray(val))
+            t = threading.Thread(target=mywrite, args=(fifoname, val))
+            t.start()
+            zipf.write(fifoname, arcname=fname)
+            t.join()
 
     zipf.close()
 
-- 
1.9.1

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to