So if rows are small, blob is probably better; and if they get larger I can make blocks of blobs. I will experiment this.
On Wed, May 8, 2013 at 1:06 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > It really depends on your access patterns. > > Blob storage of rows will be much faster for scans and will take much less > space. > > Column storage of values may or may not make things faster, but it is > conceptually nicer to not have to update so much. In practice, I am not > convinced that you will notice the difference except for really big rows. > > Remember that you don't have to commit to a single choice. You could use a > rolled up representation most of the time and then break the rollups in to > regions as they get bigger. > > > On Tue, May 7, 2013 at 2:32 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > > > Nope, > > > > I simply thought that would make accessing and setting individual cells > > more difficult. > > > > Should I? Do you think it would perform better? And I would want to hear > if > > you have more design choices in your mind. > > > > > > On Wed, May 8, 2013 at 12:22 AM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > > > > > Have you experimented with, for instance, row number as id, value as > > binary > > > serialized vector? > > > > > > > > > > > > > > > On Tue, May 7, 2013 at 2:16 PM, Gokhan Capan <gkhn...@gmail.com> > wrote: > > > > > > > 2 options: > > > > > > > > 1- row index as the row key, column index as column identifier, and > > value > > > > as value > > > > 2- row index and column index combined as the row key, and value in a > > > > column called "value" > > > > > > > > Row indices are kept in a member variable in memory, to make > iteration > > > > fast. > > > > > > > > > > > > > > > > On Wed, May 8, 2013 at 12:11 AM, Ted Dunning <ted.dunn...@gmail.com> > > > > wrote: > > > > > > > > > How did you store the matrix in HBase? > > > > > > > > > > > > > > > On Tue, May 7, 2013 at 1:08 PM, Gokhan Capan <gkhn...@gmail.com> > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > For taking large matrices as input and persisting large models > > (like > > > > > factor > > > > > > models), I created an HBase-backed version of Mahout matrix. > > > > > > > > > > > > It allows random access to cells and rows as well as assignment, > > and > > > > > > iteration over rows. viewRow returns a view, and lazy loads > actual > > > data > > > > > if > > > > > > a get is actually invoked. > > > > > > > > > > > > I plan to add a VectorInputFormat on top of it, too. > > > > > > > > > > > > The code that we need to have for our algorithms is tested, but > > there > > > > are > > > > > > still parts of it that are not. > > > > > > > > > > > > I am going to speak about this at HBaseCon, and I wanted to let > you > > > > know > > > > > > that it can be contributed after some refactoring. > > > > > > > > > > > > Is there any interest? > > > > > > > > > > > > -- > > > > > > Gokhan > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Gokhan > > > > > > > > > > > > > > > -- > > Gokhan > > > -- Gokhan