Oh, good point. Hbase seems good fit for huge sparse matrcies.
- Non-zero value - Index for row and column However, It's too good for dense matrix. IMO, We can't store the huge dense matrix to Hbase. When I store the 5000 * 5000 double matrix with row/column/time index to Hbase, 15~16 GB was used for each nodes. (replica = 3) So, I made a two implement. We should survey about data structures. And, There is also a difference of algorithms/benefits between Dense and Sparse. - The blocking algorithm only work for Dense Matrix, And stores all. - Sparse Matrix stores only non-zero value (storage efficient) but, If sparsity is low, manipulations will have some overhead by irregular access through network. I've start the work for documentation -- http://wiki.apache.org/hama/Architecture -- Please also review this. On Wed, Mar 18, 2009 at 8:24 PM, Samuel Guo <[email protected]> wrote: > Hi all, > > It seems that DenseVector and SparseVector both use *MapWritable* as the > container of vector data. And the methods' implementations of DenseVector & > SparseVector are similarly. so why we need two copies of the code? > > There are same issues in DenseMatrix and SparseMatrix. > > Regards, > Samuel > -- Best Regards, Edward J. Yoon [email protected] http://blog.udanax.org
