I've seen two cases most commonly: The first is when I need to create some processing object to process each record. If that object creation is expensive, creating one per record becomes prohibitive. So instead, we use mapPartition, and create one per partition, and use it on each record in the partition.
The other is I've often found it much more efficient, when summarizing data, to use a mutable form of the summary object, running over each record in a partition, then reduce those per-partition results, than to create a summary object per record and reduce that much larger set pf summary objects. Again, it saves a lot of object creation. On Mon, Mar 24, 2014 at 8:57 AM, Jaonary Rabarisoa <jaon...@gmail.com>wrote: > Dear all, > > Sorry for asking such a basic question, but someone can explain when one > should use mapPartiontions instead of map. > > Thanks > > Jaonary > -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com