Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-07-01 Thread Ruben Salvador
Great! Thanks for all your answers!

I actually have the files created as .npy (appending a new array eact time).
I know it's weird, and it's not its intended use. But, for whatsoever
reasons, I came to use that. No turn back now.

Fortunately, I am able to read the files correctly, so being weird also, at
least, it works. Repeating the tests would be very time consuming. I'll just
try the different options mentioned for the following tests.

Anyway, I think this is a quite common situation. Tests running for a
loong time, producing results at very different times (not necessarily
huge amounts of data of results, it could be just a single float, or array),
and repeating these tests a lot of times, makes it absolutely necessary to
have numpyish functions/filetype to APPEND these freshly-new produced data
each time it is available. Having to load a .npz file, adding the new data
and saving again is wasting unnecesary resources. Having a single file for
each run of the test, though possible, for me, complicates the
post-processing section, while increasing the time to copy these files (many
small files tend to take longer to copy than one single bigger file). Why
not just a modified .npy filetype/function with a header indicating it's
hosting more than one array¿?

Cheers!

On Tue, Jun 29, 2010 at 12:43 AM, Friedrich Romstedt 
friedrichromst...@gmail.com wrote:

 2010/6/28 Keith Goodman kwgood...@gmail.com:
  How about using h5py? It's not part of numpy but it gives you a
  dictionary-like interface to your archive:

 Yeaa, or PyTables (is that equivalent)?  It's also a hdf (or whatever,
 I don't recall precisely) interface.

 There were [ANN]s on the list about PyTables.

 Friedrich
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Rubén Salvador
PhD student @ Centro de Electrónica Industrial (CEI)
http://www.cei.upm.es
Blog: http://aesatcei.wordpress.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-07-01 Thread V. Armando Solé
Ruben Salvador wrote:
 Great! Thanks for all your answers!

 I actually have the files created as .npy (appending a new array eact 
 time). I know it's weird, and it's not its intended use. But, for 
 whatsoever reasons, I came to use that. No turn back now. 

 Fortunately, I am able to read the files correctly, so being weird 
 also, at least, it works. Repeating the tests would be very time 
 consuming. I'll just try the different options mentioned for the 
 following tests. 

 Anyway, I think this is a quite common situation. Tests running for a 
 loong time, producing results at very different times (not 
 necessarily huge amounts of data of results, it could be just a single 
 float, or array), and repeating these tests a lot of times, makes it 
 absolutely necessary to have numpyish functions/filetype to APPEND 
 these freshly-new produced data each time it is available. Having to 
 load a .npz file, adding the new data and saving again is wasting 
 unnecesary resources. Having a single file for each run of the test, 
 though possible, for me, complicates the post-processing section, 
 while increasing the time to copy these files (many small files tend 
 to take longer to copy than one single bigger file). Why not just a 
 modified .npy filetype/function with a header indicating it's hosting 
 more than one array¿?


Well, at our lab we are collecting images and saving them into HDF5 
files. Since the files are self-describing it is quite convenient. You 
can decide if you want the images as individual arrays or stacked into a 
bigger one because you know it when you open the file. You can keep 
adding items at any time because HDF5 does not force you to specify the 
final size of the array and you can access it like any numpy array 
without needing to load the whole array into memory nor being limited in 
memory in 32-bit machines. I am currently working on a 100Gbytes array 
on a 32bit machine without problems.

Really, I would give a try to HDF5. In our case we are using h5py, but 
latest release candidate of PyTables seems to have the same numpy like 
functionality.

Armando

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Ruben Salvador
Sorry I had no access during these days.

Thanks for the answer Friedrich, I had already checked numpy.savez, but
unfortunately I cannot make use of it. I don't have all the data needed to
be saved at the same time...it is produced each time I run a test.

Thanks anyway!

Any other idea why this is happening? Is it expected behavior?

On Thu, Jun 24, 2010 at 7:30 PM, Friedrich Romstedt 
friedrichromst...@gmail.com wrote:

 2010/6/23 Ruben Salvador rsalvador...@gmail.com:
  Therefore, is this a bug? Shouldn't EOFError be raised instead of
 IOError?
  Or am I missunderstanding something? If this is not a bug, how can I
 detect
  the EOF to stop reading (I expect a way for this to work without tweaking
  the code with saving first in the file the number of dumps done)?

 Maybe you can make use of numpy.savez,

 http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy-savez
 .

 Friedrich
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Rubén Salvador
PhD student @ Centro de Electrónica Industrial (CEI)
http://www.cei.upm.es
Blog: http://aesatcei.wordpress.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Pauli Virtanen
ke, 2010-06-23 kello 12:46 +0200, Ruben Salvador kirjoitti:
[clip]
 how can I detect the EOF to stop reading

r = f.read(1)
if not r:
break # EOF
else:
f.seek(-1, 1)

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Friedrich Romstedt
2010/6/28 Ruben Salvador rsalvador...@gmail.com:
 Thanks for the answer Friedrich, I had already checked numpy.savez, but
 unfortunately I cannot make use of it. I don't have all the data needed to
 be saved at the same time...it is produced each time I run a test.

Yes, I thought of something like:

all_data = numpy.load('file.npz')
all_data[new_key] = new_data
numpy.savez('file.npz', **all_data)

Will this work?

Friedrich
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Anne Archibald
On 28 June 2010 10:52, Ruben Salvador rsalvador...@gmail.com wrote:
 Sorry I had no access during these days.

 Thanks for the answer Friedrich, I had already checked numpy.savez, but
 unfortunately I cannot make use of it. I don't have all the data needed to
 be saved at the same time...it is produced each time I run a test.

I think people are uncomfortable because .npy files are not designed
to contain more than one array. It's a bit like concatenating a whole
lot of .PNG files together - while a good decoder could pick them
apart again, it's a highly misleading file since the headers do not
contain any information about all the other files. npy files are
similarly self-describing, and so concatenating them is a peculiar
sort of thing to do. Why not simply save a separate file each time, so
that you have a directory full of files? Or, if you must have just one
file, use np.savez (loading the old one each time then saving the
expanded object).

Come to think of it, it's possible to append files to an existing
zip file without rewriting the whole thing. Does numpy.savez allow
this mode?

That said, good exception hygiene argues that np.load should throw
EOFErrors rather than the more generic IOErrors, but I don't know how
difficult this would be to achieve.


Anne

 Thanks anyway!

 Any other idea why this is happening? Is it expected behavior?

 On Thu, Jun 24, 2010 at 7:30 PM, Friedrich Romstedt
 friedrichromst...@gmail.com wrote:

 2010/6/23 Ruben Salvador rsalvador...@gmail.com:
  Therefore, is this a bug? Shouldn't EOFError be raised instead of
  IOError?
  Or am I missunderstanding something? If this is not a bug, how can I
  detect
  the EOF to stop reading (I expect a way for this to work without
  tweaking
  the code with saving first in the file the number of dumps done)?

 Maybe you can make use of numpy.savez,

 http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy-savez
 .

 Friedrich
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 --
 Rubén Salvador
 PhD student @ Centro de Electrónica Industrial (CEI)
 http://www.cei.upm.es
 Blog: http://aesatcei.wordpress.com

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Pauli Virtanen
ma, 2010-06-28 kello 15:48 -0400, Anne Archibald kirjoitti:
[clip]
 That said, good exception hygiene argues that np.load should throw
 EOFErrors rather than the more generic IOErrors, but I don't know how
 difficult this would be to achieve.

np.load is in any case unhygienic, since it tries to unpickle, if it
doesn't see the .npy magic header.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Keith Goodman
On Wed, Jun 23, 2010 at 3:46 AM, Ruben Salvador rsalvador...@gmail.com wrote:
 Hi there,

 I have a .npy file built by succesively adding results from different test
 runs of an algorithm. Each time it's run, I save a numpy.array using
 numpy.save as follows:

 fn = 'file.npy'
 f = open(fn, 'a+b')
 np.save(f, arr)
 f.close()

How about using h5py? It's not part of numpy but it gives you a
dictionary-like interface to your archive:

 import h5py
 io = h5py.File('/tmp/data.hdf5')
 arr1 = np.array([1, 2, 3])
 arr2 = np.array([4, 5, 6])
 arr3 = np.array([7, 8, 9])
 io['arr1'] = arr1
 io['arr2'] = arr2
 io['arr3'] = arr3
 io.keys()
   ['arr1', 'arr2', 'arr3']
 io['arr1'][:]
   array([1, 2, 3])

You can also load part of an array (useful when the array is large):

 io['arr1'][-1]
   3
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Friedrich Romstedt
2010/6/28 Keith Goodman kwgood...@gmail.com:
 How about using h5py? It's not part of numpy but it gives you a
 dictionary-like interface to your archive:

Yeaa, or PyTables (is that equivalent)?  It's also a hdf (or whatever,
I don't recall precisely) interface.

There were [ANN]s on the list about PyTables.

Friedrich
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-24 Thread Friedrich Romstedt
2010/6/23 Ruben Salvador rsalvador...@gmail.com:
 Therefore, is this a bug? Shouldn't EOFError be raised instead of IOError?
 Or am I missunderstanding something? If this is not a bug, how can I detect
 the EOF to stop reading (I expect a way for this to work without tweaking
 the code with saving first in the file the number of dumps done)?

Maybe you can make use of numpy.savez,
http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy-savez
.

Friedrich
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-23 Thread Ruben Salvador
Hi there,

I have a .npy file built by succesively adding results from different test
runs of an algorithm. Each time it's run, I save a numpy.array using
numpy.save as follows:

fn = 'file.npy'
f = open(fn, 'a+b')
np.save(f, arr)
f.close()

When I try to read the file with the following code, for a file containing 3
array saves (a is there to show when the error arises):

f = open(fn, 'rb')
arr = np.load(f)
a = 0
try:
while True:
print a
a += 1
arr = np.vstack((arr, np.load(f)))
except EOFError:
pass
f.close()

I get the following output:

0
1
2
Traceback (most recent call last):
  File ./proc_stat.py, line 32, in module
arr = np.vstack((arr, np.load(f)))
  File /usr/lib/python2.5/site-packages/numpy/lib/io.py, line 201, in load
Failed to interpret file %s as a pickle % repr(file)
IOError: Failed to interpret file open file
'/home/rsalvador/trabajo/research/phd/devel/evowv/retest/avgfit_random_sigma_normal_Flp_20G.npy',
mode 'rb' at 0x174ca08 as a pickle

Using IOError in the except makes the code work, but this way I am masking
other possible sources of error.

I have tried with a file containing 3 dumps from numpy.save (this is, 3
arrays saved). As shown, the error is raised when trying to read a fourth
time (since EOFError is not raised).

Therefore, is this a bug? Shouldn't EOFError be raised instead of IOError?
Or am I missunderstanding something? If this is not a bug, how can I detect
the EOF to stop reading (I expect a way for this to work without tweaking
the code with saving first in the file the number of dumps done)?

Thanks in advance!

-- 
Rubén Salvador
PhD student @ Centro de Electrónica Industrial (CEI)
http://www.cei.upm.es
Blog: http://aesatcei.wordpress.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-23 Thread Ruben Salvador
Sorry, I forgot to include versions info:
Python 2.5.5
Numpy Version: 1:1.3.0-3+b1 (actual debian testing)


On Wed, Jun 23, 2010 at 12:46 PM, Ruben Salvador rsalvador...@gmail.comwrote:

 Hi there,

 I have a .npy file built by succesively adding results from different test
 runs of an algorithm. Each time it's run, I save a numpy.array using
 numpy.save as follows:

 fn = 'file.npy'
 f = open(fn, 'a+b')
 np.save(f, arr)
 f.close()

 When I try to read the file with the following code, for a file containing
 3 array saves (a is there to show when the error arises):

 f = open(fn, 'rb')
 arr = np.load(f)
 a = 0
 try:
 while True:
 print a
 a += 1
 arr = np.vstack((arr, np.load(f)))
 except EOFError:
 pass
 f.close()

 I get the following output:

 0
 1
 2
 Traceback (most recent call last):
   File ./proc_stat.py, line 32, in module
 arr = np.vstack((arr, np.load(f)))
   File /usr/lib/python2.5/site-packages/numpy/lib/io.py, line 201, in
 load
 Failed to interpret file %s as a pickle % repr(file)
 IOError: Failed to interpret file open file
 '/home/rsalvador/trabajo/research/phd/devel/evowv/retest/avgfit_random_sigma_normal_Flp_20G.npy',
 mode 'rb' at 0x174ca08 as a pickle

 Using IOError in the except makes the code work, but this way I am masking
 other possible sources of error.

 I have tried with a file containing 3 dumps from numpy.save (this is, 3
 arrays saved). As shown, the error is raised when trying to read a fourth
 time (since EOFError is not raised).

 Therefore, is this a bug? Shouldn't EOFError be raised instead of IOError?
 Or am I missunderstanding something? If this is not a bug, how can I detect
 the EOF to stop reading (I expect a way for this to work without tweaking
 the code with saving first in the file the number of dumps done)?

 Thanks in advance!

 --
 Rubén Salvador
 PhD student @ Centro de Electrónica Industrial (CEI)
 http://www.cei.upm.es
 Blog: http://aesatcei.wordpress.com




-- 
Rubén Salvador
PhD student @ Centro de Electrónica Industrial (CEI)
http://www.cei.upm.es
Blog: http://aesatcei.wordpress.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion