Nope, I simply thought that would make accessing and setting individual cells more difficult.
Should I? Do you think it would perform better? And I would want to hear if you have more design choices in your mind. On Wed, May 8, 2013 at 12:22 AM, Ted Dunning <[email protected]> wrote: > Have you experimented with, for instance, row number as id, value as binary > serialized vector? > > > > > On Tue, May 7, 2013 at 2:16 PM, Gokhan Capan <[email protected]> wrote: > > > 2 options: > > > > 1- row index as the row key, column index as column identifier, and value > > as value > > 2- row index and column index combined as the row key, and value in a > > column called "value" > > > > Row indices are kept in a member variable in memory, to make iteration > > fast. > > > > > > > > On Wed, May 8, 2013 at 12:11 AM, Ted Dunning <[email protected]> > > wrote: > > > > > How did you store the matrix in HBase? > > > > > > > > > On Tue, May 7, 2013 at 1:08 PM, Gokhan Capan <[email protected]> > wrote: > > > > > > > Hi, > > > > > > > > For taking large matrices as input and persisting large models (like > > > factor > > > > models), I created an HBase-backed version of Mahout matrix. > > > > > > > > It allows random access to cells and rows as well as assignment, and > > > > iteration over rows. viewRow returns a view, and lazy loads actual > data > > > if > > > > a get is actually invoked. > > > > > > > > I plan to add a VectorInputFormat on top of it, too. > > > > > > > > The code that we need to have for our algorithms is tested, but there > > are > > > > still parts of it that are not. > > > > > > > > I am going to speak about this at HBaseCon, and I wanted to let you > > know > > > > that it can be contributed after some refactoring. > > > > > > > > Is there any interest? > > > > > > > > -- > > > > Gokhan > > > > > > > > > > > > > > > -- > > Gokhan > > > -- Gokhan
