Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
Great! Thanks for all your answers! I actually have the files created as .npy (appending a new array eact time). I know it's weird, and it's not its intended use. But, for whatsoever reasons, I came to use that. No turn back now. Fortunately, I am able to read the files correctly, so being weird also, at least, it works. Repeating the tests would be very time consuming. I'll just try the different options mentioned for the following tests. Anyway, I think this is a quite common situation. Tests running for a loong time, producing results at very different times (not necessarily huge amounts of data of results, it could be just a single float, or array), and repeating these tests a lot of times, makes it absolutely necessary to have numpyish functions/filetype to APPEND these freshly-new produced data each time it is available. Having to load a .npz file, adding the new data and saving again is wasting unnecesary resources. Having a single file for each run of the test, though possible, for me, complicates the post-processing section, while increasing the time to copy these files (many small files tend to take longer to copy than one single bigger file). Why not just a modified .npy filetype/function with a header indicating it's hosting more than one array¿? Cheers! On Tue, Jun 29, 2010 at 12:43 AM, Friedrich Romstedt friedrichromst...@gmail.com wrote: 2010/6/28 Keith Goodman kwgood...@gmail.com: How about using h5py? It's not part of numpy but it gives you a dictionary-like interface to your archive: Yeaa, or PyTables (is that equivalent)? It's also a hdf (or whatever, I don't recall precisely) interface. There were [ANN]s on the list about PyTables. Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Rubén Salvador PhD student @ Centro de Electrónica Industrial (CEI) http://www.cei.upm.es Blog: http://aesatcei.wordpress.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
Ruben Salvador wrote: Great! Thanks for all your answers! I actually have the files created as .npy (appending a new array eact time). I know it's weird, and it's not its intended use. But, for whatsoever reasons, I came to use that. No turn back now. Fortunately, I am able to read the files correctly, so being weird also, at least, it works. Repeating the tests would be very time consuming. I'll just try the different options mentioned for the following tests. Anyway, I think this is a quite common situation. Tests running for a loong time, producing results at very different times (not necessarily huge amounts of data of results, it could be just a single float, or array), and repeating these tests a lot of times, makes it absolutely necessary to have numpyish functions/filetype to APPEND these freshly-new produced data each time it is available. Having to load a .npz file, adding the new data and saving again is wasting unnecesary resources. Having a single file for each run of the test, though possible, for me, complicates the post-processing section, while increasing the time to copy these files (many small files tend to take longer to copy than one single bigger file). Why not just a modified .npy filetype/function with a header indicating it's hosting more than one array¿? Well, at our lab we are collecting images and saving them into HDF5 files. Since the files are self-describing it is quite convenient. You can decide if you want the images as individual arrays or stacked into a bigger one because you know it when you open the file. You can keep adding items at any time because HDF5 does not force you to specify the final size of the array and you can access it like any numpy array without needing to load the whole array into memory nor being limited in memory in 32-bit machines. I am currently working on a 100Gbytes array on a 32bit machine without problems. Really, I would give a try to HDF5. In our case we are using h5py, but latest release candidate of PyTables seems to have the same numpy like functionality. Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
Sorry I had no access during these days. Thanks for the answer Friedrich, I had already checked numpy.savez, but unfortunately I cannot make use of it. I don't have all the data needed to be saved at the same time...it is produced each time I run a test. Thanks anyway! Any other idea why this is happening? Is it expected behavior? On Thu, Jun 24, 2010 at 7:30 PM, Friedrich Romstedt friedrichromst...@gmail.com wrote: 2010/6/23 Ruben Salvador rsalvador...@gmail.com: Therefore, is this a bug? Shouldn't EOFError be raised instead of IOError? Or am I missunderstanding something? If this is not a bug, how can I detect the EOF to stop reading (I expect a way for this to work without tweaking the code with saving first in the file the number of dumps done)? Maybe you can make use of numpy.savez, http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy-savez . Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Rubén Salvador PhD student @ Centro de Electrónica Industrial (CEI) http://www.cei.upm.es Blog: http://aesatcei.wordpress.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
ke, 2010-06-23 kello 12:46 +0200, Ruben Salvador kirjoitti: [clip] how can I detect the EOF to stop reading r = f.read(1) if not r: break # EOF else: f.seek(-1, 1) -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
2010/6/28 Ruben Salvador rsalvador...@gmail.com: Thanks for the answer Friedrich, I had already checked numpy.savez, but unfortunately I cannot make use of it. I don't have all the data needed to be saved at the same time...it is produced each time I run a test. Yes, I thought of something like: all_data = numpy.load('file.npz') all_data[new_key] = new_data numpy.savez('file.npz', **all_data) Will this work? Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
On 28 June 2010 10:52, Ruben Salvador rsalvador...@gmail.com wrote: Sorry I had no access during these days. Thanks for the answer Friedrich, I had already checked numpy.savez, but unfortunately I cannot make use of it. I don't have all the data needed to be saved at the same time...it is produced each time I run a test. I think people are uncomfortable because .npy files are not designed to contain more than one array. It's a bit like concatenating a whole lot of .PNG files together - while a good decoder could pick them apart again, it's a highly misleading file since the headers do not contain any information about all the other files. npy files are similarly self-describing, and so concatenating them is a peculiar sort of thing to do. Why not simply save a separate file each time, so that you have a directory full of files? Or, if you must have just one file, use np.savez (loading the old one each time then saving the expanded object). Come to think of it, it's possible to append files to an existing zip file without rewriting the whole thing. Does numpy.savez allow this mode? That said, good exception hygiene argues that np.load should throw EOFErrors rather than the more generic IOErrors, but I don't know how difficult this would be to achieve. Anne Thanks anyway! Any other idea why this is happening? Is it expected behavior? On Thu, Jun 24, 2010 at 7:30 PM, Friedrich Romstedt friedrichromst...@gmail.com wrote: 2010/6/23 Ruben Salvador rsalvador...@gmail.com: Therefore, is this a bug? Shouldn't EOFError be raised instead of IOError? Or am I missunderstanding something? If this is not a bug, how can I detect the EOF to stop reading (I expect a way for this to work without tweaking the code with saving first in the file the number of dumps done)? Maybe you can make use of numpy.savez, http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy-savez . Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Rubén Salvador PhD student @ Centro de Electrónica Industrial (CEI) http://www.cei.upm.es Blog: http://aesatcei.wordpress.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
ma, 2010-06-28 kello 15:48 -0400, Anne Archibald kirjoitti: [clip] That said, good exception hygiene argues that np.load should throw EOFErrors rather than the more generic IOErrors, but I don't know how difficult this would be to achieve. np.load is in any case unhygienic, since it tries to unpickle, if it doesn't see the .npy magic header. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
On Wed, Jun 23, 2010 at 3:46 AM, Ruben Salvador rsalvador...@gmail.com wrote: Hi there, I have a .npy file built by succesively adding results from different test runs of an algorithm. Each time it's run, I save a numpy.array using numpy.save as follows: fn = 'file.npy' f = open(fn, 'a+b') np.save(f, arr) f.close() How about using h5py? It's not part of numpy but it gives you a dictionary-like interface to your archive: import h5py io = h5py.File('/tmp/data.hdf5') arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) arr3 = np.array([7, 8, 9]) io['arr1'] = arr1 io['arr2'] = arr2 io['arr3'] = arr3 io.keys() ['arr1', 'arr2', 'arr3'] io['arr1'][:] array([1, 2, 3]) You can also load part of an array (useful when the array is large): io['arr1'][-1] 3 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
2010/6/28 Keith Goodman kwgood...@gmail.com: How about using h5py? It's not part of numpy but it gives you a dictionary-like interface to your archive: Yeaa, or PyTables (is that equivalent)? It's also a hdf (or whatever, I don't recall precisely) interface. There were [ANN]s on the list about PyTables. Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
2010/6/23 Ruben Salvador rsalvador...@gmail.com: Therefore, is this a bug? Shouldn't EOFError be raised instead of IOError? Or am I missunderstanding something? If this is not a bug, how can I detect the EOF to stop reading (I expect a way for this to work without tweaking the code with saving first in the file the number of dumps done)? Maybe you can make use of numpy.savez, http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy-savez . Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.load raising IOError but EOFError expected
Hi there, I have a .npy file built by succesively adding results from different test runs of an algorithm. Each time it's run, I save a numpy.array using numpy.save as follows: fn = 'file.npy' f = open(fn, 'a+b') np.save(f, arr) f.close() When I try to read the file with the following code, for a file containing 3 array saves (a is there to show when the error arises): f = open(fn, 'rb') arr = np.load(f) a = 0 try: while True: print a a += 1 arr = np.vstack((arr, np.load(f))) except EOFError: pass f.close() I get the following output: 0 1 2 Traceback (most recent call last): File ./proc_stat.py, line 32, in module arr = np.vstack((arr, np.load(f))) File /usr/lib/python2.5/site-packages/numpy/lib/io.py, line 201, in load Failed to interpret file %s as a pickle % repr(file) IOError: Failed to interpret file open file '/home/rsalvador/trabajo/research/phd/devel/evowv/retest/avgfit_random_sigma_normal_Flp_20G.npy', mode 'rb' at 0x174ca08 as a pickle Using IOError in the except makes the code work, but this way I am masking other possible sources of error. I have tried with a file containing 3 dumps from numpy.save (this is, 3 arrays saved). As shown, the error is raised when trying to read a fourth time (since EOFError is not raised). Therefore, is this a bug? Shouldn't EOFError be raised instead of IOError? Or am I missunderstanding something? If this is not a bug, how can I detect the EOF to stop reading (I expect a way for this to work without tweaking the code with saving first in the file the number of dumps done)? Thanks in advance! -- Rubén Salvador PhD student @ Centro de Electrónica Industrial (CEI) http://www.cei.upm.es Blog: http://aesatcei.wordpress.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected
Sorry, I forgot to include versions info: Python 2.5.5 Numpy Version: 1:1.3.0-3+b1 (actual debian testing) On Wed, Jun 23, 2010 at 12:46 PM, Ruben Salvador rsalvador...@gmail.comwrote: Hi there, I have a .npy file built by succesively adding results from different test runs of an algorithm. Each time it's run, I save a numpy.array using numpy.save as follows: fn = 'file.npy' f = open(fn, 'a+b') np.save(f, arr) f.close() When I try to read the file with the following code, for a file containing 3 array saves (a is there to show when the error arises): f = open(fn, 'rb') arr = np.load(f) a = 0 try: while True: print a a += 1 arr = np.vstack((arr, np.load(f))) except EOFError: pass f.close() I get the following output: 0 1 2 Traceback (most recent call last): File ./proc_stat.py, line 32, in module arr = np.vstack((arr, np.load(f))) File /usr/lib/python2.5/site-packages/numpy/lib/io.py, line 201, in load Failed to interpret file %s as a pickle % repr(file) IOError: Failed to interpret file open file '/home/rsalvador/trabajo/research/phd/devel/evowv/retest/avgfit_random_sigma_normal_Flp_20G.npy', mode 'rb' at 0x174ca08 as a pickle Using IOError in the except makes the code work, but this way I am masking other possible sources of error. I have tried with a file containing 3 dumps from numpy.save (this is, 3 arrays saved). As shown, the error is raised when trying to read a fourth time (since EOFError is not raised). Therefore, is this a bug? Shouldn't EOFError be raised instead of IOError? Or am I missunderstanding something? If this is not a bug, how can I detect the EOF to stop reading (I expect a way for this to work without tweaking the code with saving first in the file the number of dumps done)? Thanks in advance! -- Rubén Salvador PhD student @ Centro de Electrónica Industrial (CEI) http://www.cei.upm.es Blog: http://aesatcei.wordpress.com -- Rubén Salvador PhD student @ Centro de Electrónica Industrial (CEI) http://www.cei.upm.es Blog: http://aesatcei.wordpress.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion