Have you experimented with, for instance, row number as id, value as binary
serialized vector?




On Tue, May 7, 2013 at 2:16 PM, Gokhan Capan <gkhn...@gmail.com> wrote:

> 2 options:
>
> 1- row index as the row key, column index as column identifier, and value
> as value
> 2- row index and column index combined as the row key, and value in a
> column called "value"
>
> Row indices are kept in a member variable in memory, to make iteration
> fast.
>
>
>
> On Wed, May 8, 2013 at 12:11 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>
> > How did you store the matrix in HBase?
> >
> >
> > On Tue, May 7, 2013 at 1:08 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > For taking large matrices as input and persisting large models (like
> > factor
> > > models), I created an HBase-backed version of Mahout matrix.
> > >
> > > It allows random access to cells and rows as well as assignment, and
> > > iteration over rows. viewRow returns a view, and lazy loads actual data
> > if
> > > a get is actually invoked.
> > >
> > > I plan to add a VectorInputFormat on top of it, too.
> > >
> > > The code that we need to have for our algorithms is tested, but there
> are
> > > still parts of it that are not.
> > >
> > > I am going to speak about this at HBaseCon, and I wanted to let you
> know
> > > that it can be contributed after some refactoring.
> > >
> > > Is there any interest?
> > >
> > > --
> > > Gokhan
> > >
> >
>
>
>
> --
> Gokhan
>

Reply via email to