Yes, you can write some glue in Spark to call these. Some functions to look at:
- SparkContext.hadoopRDD lets you create an input RDD from an existing JobConf configured by Hadoop (including InputFormat, paths, etc) - RDD.mapPartitions lets you operate in all the values on one partition (block) at a time, similar to how Mappers in MapReduce work - PairRDDFunctions.reduceByKey and groupByKey can be used for aggregation. - RDD.pipe() can be used to call out to a script or binary, like Hadoop Streaming. A fair number of people have been running both Java and Hadoop Streaming apps like this. Matei On Jun 4, 2014, at 1:08 PM, Wei Tan <w...@us.ibm.com> wrote: > Hello, > > I am trying to use spark in such a scenario: > > I have code written in Hadoop and now I try to migrate to Spark. The > mappers and reducers are fairly complex. So I wonder if I can reuse the map() > functions I already wrote in Hadoop (Java), and use Spark to chain them, > mixing the Java map() functions with Spark operators? > > Another related question, can I use binary as operators, like Hadoop > streaming? > > Thanks! > Wei > >