Hi Tom,

Il 09/12/2011 14:12, Tom Diethe ha scritto:
> I have files stored using Matlab's sparse format (HDF5, csc I
> believe), and I'm trying to use Pytables to operate on them directly,
> but haven't succeeded yet. Using h5py I can do the following:
> 
> # Method 1: uses h5py (WORKS)
> f1 = h5py.File(fname)
> data = f1['M']['data]
> ir = f1['M]['ir']
> jc = f1['M']['jc']
> M = scipy.sparse.csc_matrix( (data,ir,jc) )
> 
> but if I try to do the equivalent in Pytables:
> 
> # Method 2: uses pyTables (DOESN'T WORK)
> f2 = tables.openFile(fname)
> data = f2.root.M.data
> ir = f2.root.M.ir
> jc = f2.root.M.jc
> M = scipy.sparse.csc_matrix( (data,ir,jc) )
> 

If my understanding is correct before calling csc_matrix you should
actually read data from disk

> data = f2.root.M.data[...]
> ir = f2.root.M.ir[...]
> jc = f2.root.M.jc[...]

Please note that f3.root.M.data in a pytables object, and not a numpy array

In [23]: f2.root.M.data
Out[23]:
/M/data (Array(20,)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None



> this fails (after a long wait) with the error:
> 
> TypeError                                 Traceback (most recent call last)
> 
> /home/tdiethe/BMJ/<ipython console> in <module>()
> 
> /usr/lib/python2.6/dist-packages/scipy/sparse/compressed.pyc in
> __init__(self, arg1, shape, dtype, copy, dims, nzmax)
>      56                     self.indices = np.array(indices, copy=copy)
>      57                     self.indptr  = np.array(indptr, copy=copy)
> ---> 58                     self.data    = np.array(data, copy=copy,
> dtype=getdtype(dtype, data))
>      59                 else:
>      60                     raise ValueError, "unrecognized %s_matrix
> constructor usage" %\
> 
> /usr/lib/python2.6/dist-packages/scipy/sparse/sputils.pyc in
> getdtype(dtype, a, default)
>      69                 canCast = False
>      70             else:
> ---> 71                 raise TypeError, "could not interpret data type"
>      72     else:
>      73         newdtype = np.dtype(dtype)
> 
> TypeError: could not interpret data type
> 
> 
> I have two questions:
> 
> - how do I load this file in
> - do I need to perform the conversion to a scipy sparse matrix in
> order to be able to perform operations on it, or can I perform
> operations directly on the disk files (matrix multiplication etc)?

Not sure to understand the second question but I guess the answer is
yes, unless it is a trivial element-by-element operation.


Best regards

-- 
Antonio Valentino

------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to