dynamically loading C++ mapper/reducer classes in map/reduce jobs
------------------------------------------------------------------
Key: HADOOP-913
URL: https://issues.apache.org/jira/browse/HADOOP-913
Project: Hadoop
Issue Type: New Feature
Reporter: Runping Qi
It is highly desirable for the current map/reduce framework to be able to call
functions in c++ (or other languages).
I am proposing a generic entension to the current framework to achieve the
above goal.
The extension is an application level solution, similar to
HadoopStreaming in spirit, thus does not have impact on Hadoop core.
I will maintain the native map/reduce execution model.
The basic idea is to use socket/rpc to go through the language barrier.
In particular, we can implement a generic mapper/reducer class in Java as a
proxy for calling functions in other language.
The configure function of the class will create a process that will open a user
specified shared lirary act as an RPC server.
The map function of the class will just invoke an RPC call the key/value pair.
Such an RPC call is expected to return a list of key/value pairs. The map
function then can emit the outputs.
The below is a sketch for the generic class:
public class MapRedCPPAdapter implements Mapper, Reducer {
String sharedLibraryName;
RPCProxy theServer;
...
public void configure(JobConf job) {
sharedLibraryName = job.get("shared.lib.name");
theServer = createServer(sharedLibraryName );
}
public void close() {
theServer.stop();
}
public void map(key, value, output, repoter) {
ArrayList pairs = invokeRemoteMap(theServer, key,
value);
emit(pairs)
}
public void reduce (key, values, output, reporter) {
ArrayList pairs = invokeRemoteReduce(theServer, key,
value);
emit(pairs)
}
}
The cons of this approach include are the overhead associated with
RPC calls and creating an additional process per mapper/reducer task.
The pros are thhat the extension is clean, generic, simple. It is applicable to
other foreign languages too.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira