Re: integrate with mysql,hadoop,hbase and cassandra

Dmitriy Lyubimov Wed, 04 May 2011 23:03:44 -0700

I think at the current state of Mahout one can't make a sweeping
statement as to any single algorithm supports a universal Mahout
format.

Many algorithms work with Distributed row matrix format, which is a
sequence file of <Writable, VectorWritable> pairs. This is probably
the most widely supported in Mahout for batch-style training and
transformations.

Many algorithms also expose  java level api.

SGD-based regressions also support web services for online style learning.

I am not sure what the status of integration with hbase is, but i
think any support for this is currently quite scarce.

I don't think there was any effort to integrate with Cassandra.

Since most algorithms are working either with matrices or streams of
feature vectors, any effort to integrate with either hbase or
Cassandra would probably require some standardization of
interpretation of content similar to how it was done about Distributed
Row matrix sequence files. AFAIK there's none such effort.

*I think* the thinking is that one can integrate with any kind of
outside sample media but the effort to vectorize that is outside of
Mahout's scope (perhaps it can be more like 'contributed' scope).
Usually it is very easy to vectorize data and there are helpers
available to do that, which are described in much detail in the book
"Mahout in Action".

On Wed, May 4, 2011 at 7:14 PM, hustnn <[email protected]> wrote:
> Is there some examples shows how mahout integrate with mysql,hbase ,
> cassandra and hadoop, it means how to gain input and output data.
>
> Do I need to implement some inputformat and outputformat for the specific
> db?
>
> Thanks.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/integrate-with-mysql-hadoop-hbase-and-cassandra-tp2901764p2901764.html
> Sent from the Mahout Developer List mailing list archive at Nabble.com.
>

Re: integrate with mysql,hadoop,hbase and cassandra

Reply via email to