Have you experimented with, for instance, row number as id, value as binary serialized vector?
On Tue, May 7, 2013 at 2:16 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > 2 options: > > 1- row index as the row key, column index as column identifier, and value > as value > 2- row index and column index combined as the row key, and value in a > column called "value" > > Row indices are kept in a member variable in memory, to make iteration > fast. > > > > On Wed, May 8, 2013 at 12:11 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > How did you store the matrix in HBase? > > > > > > On Tue, May 7, 2013 at 1:08 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > > > > > Hi, > > > > > > For taking large matrices as input and persisting large models (like > > factor > > > models), I created an HBase-backed version of Mahout matrix. > > > > > > It allows random access to cells and rows as well as assignment, and > > > iteration over rows. viewRow returns a view, and lazy loads actual data > > if > > > a get is actually invoked. > > > > > > I plan to add a VectorInputFormat on top of it, too. > > > > > > The code that we need to have for our algorithms is tested, but there > are > > > still parts of it that are not. > > > > > > I am going to speak about this at HBaseCon, and I wanted to let you > know > > > that it can be contributed after some refactoring. > > > > > > Is there any interest? > > > > > > -- > > > Gokhan > > > > > > > > > -- > Gokhan >