Thanks to the both of you for your inputs. Looks like I'll play with the mapPartitions function to start porting MapReduce algorithms to Spark.
On Wed, Jul 30, 2014 at 1:23 PM, Sean Owen <so...@cloudera.com> wrote: > Really, the analog of a Mapper is not map(), but mapPartitions(). Instead > of: > > rdd.map(yourFunction) > > ... you can run setup code before mapping a bunch of records, and > after, like so: > > rdd.mapPartitions { partition => > // Some setup code here > partition.map(yourfunction) > // Some cleanup code here > } > > You couldn't share state across Mappers, or Mappers and Reducers in > Hadoop. (At least there was no direct way.) Same here. But you can > maintain state across many map calls. > > On Wed, Jul 30, 2014 at 6:07 PM, Kevin <kevin.macksa...@gmail.com> wrote: > > Hi, > > > > Is it possible to maintain state inside a Spark map function? With Hadoop > > MapReduce, Mappers and Reducers are classes that can have their own state > > using instance variables. Can this be done with Spark? Are there any > > examples? > > > > Most examples I have seen do a simple operating on the value passed into > the > > map function and then pass it along to the reduce function. > > > > Thanks in advance. > > > > -Kevin >