On 9/8/2010 6:58 AM, Kevin Smith wrote:
Ken -

Riak contains its own implementation of a MapReduce API. This API is written in Erlang 
and C and shares no code with Hadoop. We have, on occasion, described our API as 
"Hadoop like" to help people understand the differences between our 
implementation and others. For example:

* CouchDB's MapReduce incrementally builds a BTree thru the execution of views. Riak 
doesn't cache or store MapReduce results. Each job executes "from scratch" and 
returns results to caller. It is up to the caller to cache or store these results if 
needed.

* Hadoop MapReduce jobs use HDFS or HDFS-adapted resources. Riak MapReduce jobs 
use data stored in a Riak cluster.

* Riak MapReduce jobs take advantage of their execution environment. Map 
functions exploit data locality. This means map functions are sent to the node 
hosting the required data rather than streaming the data to a central 
coordinating node. The most recent version of Riak also includes improvements 
which has significantly boosted the efficiency of mapping over entire buckets 
of data.

* Riak supports writing MapReduce functions in two languages: Erlang and 
Javascript. Erlang is the absolute fastest, in terms of raw speed, but 
Javascript runs a close second and is easier for most people to use.

Depending on what your jobs are doing and which language, Erlang or Javascript, 
there are a few tunable parameters we can tweak to improve performance. More 
information about what your job is doing, how your cluster is set up, and the 
kinds of performance you're seeing would help us debug the situation.

Are there any 'high level' tools to integrate riak data into other systems like the hive jdbc driver for hadoop and pentaho's project to do reporting on top of that?

--
  Les Mikesell
    [email protected]



_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to