Can two spark applications share rdd?

2014-03-15 Thread 林武康
hi, I am a newbie of spark, the question below may seems fool, but I really want some advices: As load data from disk to generate an rdd is very cost in my applications, I hope I can generate it once and cache it in memory, then any other spark applications can refer to this rdd. Can this

Re: [re-cont] map and flatMap

2014-03-15 Thread andy petrella
Yep, Regarding flatMap and an implicit parameter might work like in scala's future for instance: https://github.com/scala/scala/blob/master/src/library/scala/concurrent/Future.scala#L246 Dunno, still waiting for some insights from the team ^^ andy On Wed, Mar 12, 2014 at 3:23 PM, Pascal Voitot

Spark join for skewed dataset

2014-03-15 Thread Debasish Das
Hi, If the join keys are skewed is there are specific optimized join available in Spark for such usecases ? I saw in both scalding and Hive similar feature is supported and I am testing skewjoinWithSmaller on one of the skewed dataset...