Thanks to the both of you for your inputs. Looks like I'll play with the
mapPartitions function to start porting MapReduce algorithms to Spark.


On Wed, Jul 30, 2014 at 1:23 PM, Sean Owen <so...@cloudera.com> wrote:

> Really, the analog of a Mapper is not map(), but mapPartitions(). Instead
> of:
>
> rdd.map(yourFunction)
>
> ... you can run setup code before mapping a bunch of records, and
> after, like so:
>
> rdd.mapPartitions { partition =>
>    // Some setup code here
>    partition.map(yourfunction)
>    // Some cleanup code here
> }
>
> You couldn't share state across Mappers, or Mappers and Reducers in
> Hadoop. (At least there was no direct way.) Same here. But you can
> maintain state across many map calls.
>
> On Wed, Jul 30, 2014 at 6:07 PM, Kevin <kevin.macksa...@gmail.com> wrote:
> > Hi,
> >
> > Is it possible to maintain state inside a Spark map function? With Hadoop
> > MapReduce, Mappers and Reducers are classes that can have their own state
> > using instance variables. Can this be done with Spark? Are there any
> > examples?
> >
> > Most examples I have seen do a simple operating on the value passed into
> the
> > map function and then pass it along to the reduce function.
> >
> > Thanks in advance.
> >
> > -Kevin
>

Reply via email to