Re: HBase backed matrices

Gokhan Capan Tue, 07 May 2013 15:29:39 -0700

So if rows are small, blob is probably better; and if they get larger I can
make blocks of blobs. I will experiment this.



On Wed, May 8, 2013 at 1:06 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> It really depends on your access patterns.
>
> Blob storage of rows will be much faster for scans and will take much less
> space.
>
> Column storage of values may or may not make things faster, but it is
> conceptually nicer to not have to update so much.  In practice, I am not
> convinced that you will notice the difference except for really big rows.
>
> Remember that you don't have to commit to a single choice.  You could use a
> rolled up representation most of the time and then break the rollups in to
> regions as they get bigger.
>
>
> On Tue, May 7, 2013 at 2:32 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
>
> > Nope,
> >
> > I simply thought that would make accessing and setting individual cells
> > more difficult.
> >
> > Should I? Do you think it would perform better? And I would want to hear
> if
> > you have more design choices in your mind.
> >
> >
> > On Wed, May 8, 2013 at 12:22 AM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> >
> > > Have you experimented with, for instance, row number as id, value as
> > binary
> > > serialized vector?
> > >
> > >
> > >
> > >
> > > On Tue, May 7, 2013 at 2:16 PM, Gokhan Capan <gkhn...@gmail.com>
> wrote:
> > >
> > > > 2 options:
> > > >
> > > > 1- row index as the row key, column index as column identifier, and
> > value
> > > > as value
> > > > 2- row index and column index combined as the row key, and value in a
> > > > column called "value"
> > > >
> > > > Row indices are kept in a member variable in memory, to make
> iteration
> > > > fast.
> > > >
> > > >
> > > >
> > > > On Wed, May 8, 2013 at 12:11 AM, Ted Dunning <ted.dunn...@gmail.com>
> > > > wrote:
> > > >
> > > > > How did you store the matrix in HBase?
> > > > >
> > > > >
> > > > > On Tue, May 7, 2013 at 1:08 PM, Gokhan Capan <gkhn...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > For taking large matrices as input and persisting large models
> > (like
> > > > > factor
> > > > > > models), I created an HBase-backed version of Mahout matrix.
> > > > > >
> > > > > > It allows random access to cells and rows as well as assignment,
> > and
> > > > > > iteration over rows. viewRow returns a view, and lazy loads
> actual
> > > data
> > > > > if
> > > > > > a get is actually invoked.
> > > > > >
> > > > > > I plan to add a VectorInputFormat on top of it, too.
> > > > > >
> > > > > > The code that we need to have for our algorithms is tested, but
> > there
> > > > are
> > > > > > still parts of it that are not.
> > > > > >
> > > > > > I am going to speak about this at HBaseCon, and I wanted to let
> you
> > > > know
> > > > > > that it can be contributed after some refactoring.
> > > > > >
> > > > > > Is there any interest?
> > > > > >
> > > > > > --
> > > > > > Gokhan
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Gokhan
> > > >
> > >
> >
> >
> >
> > --
> > Gokhan
> >
>



-- 
Gokhan

Re: HBase backed matrices

Reply via email to