Jeff, Glad to hear you are looking at Mahout.
Practically speaking, it probalby isn't feasible to have an hbase column per matrix column. That makes storage of matrix data in hbase somewhat less compelling, although clearly still very useful for some applications. As Grant pointed out, Mahout is trying to stay pretty agnostic relative to data storage methods. Some people need to read matrices from Lucene indexes, others from files, still others from hbase. We need to support all of those options. Your suggestion about making sure that Taste supports hbase is a good one. On Mon, Nov 16, 2009 at 12:54 AM, Jeff Zhang <[email protected]> wrote: > Then we can store them as one hbase row: > A: {tilte:love=>1, > content:I=>1,content:love=>1,content:this=>1,content:game=>1} > > > Using hbase, it will be very easy for us to compute the similarity between > documents. > And another advantage of hbase compared to raw text data is that it's > semi-structured. And I think it will be easy for programming if we use > hbase > rather than the raw data. > -- Ted Dunning, CTO DeepDyve
