I think at the current state of Mahout one can't make a sweeping statement as to any single algorithm supports a universal Mahout format.
Many algorithms work with Distributed row matrix format, which is a sequence file of <Writable, VectorWritable> pairs. This is probably the most widely supported in Mahout for batch-style training and transformations. Many algorithms also expose java level api. SGD-based regressions also support web services for online style learning. I am not sure what the status of integration with hbase is, but i think any support for this is currently quite scarce. I don't think there was any effort to integrate with Cassandra. Since most algorithms are working either with matrices or streams of feature vectors, any effort to integrate with either hbase or Cassandra would probably require some standardization of interpretation of content similar to how it was done about Distributed Row matrix sequence files. AFAIK there's none such effort. *I think* the thinking is that one can integrate with any kind of outside sample media but the effort to vectorize that is outside of Mahout's scope (perhaps it can be more like 'contributed' scope). Usually it is very easy to vectorize data and there are helpers available to do that, which are described in much detail in the book "Mahout in Action". On Wed, May 4, 2011 at 7:14 PM, hustnn <[email protected]> wrote: > Is there some examples shows how mahout integrate with mysql,hbase , > cassandra and hadoop, it means how to gain input and output data. > > Do I need to implement some inputformat and outputformat for the specific > db? > > Thanks. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/integrate-with-mysql-hadoop-hbase-and-cassandra-tp2901764p2901764.html > Sent from the Mahout Developer List mailing list archive at Nabble.com. >
