Re: split a RDD by pencetage

2014-09-12 Thread pankaj.arora
You can use MapPartitions to achieve this. /split each partition into 10 equal parts with each part having number as its id val splittedRDD = self.mapPartitions((itr)= { Iterate over this iterator and breaks this iterator into 10 parts. val iterators = Array[ArrayBuffer[T]](10) var i =0 for(tuple

Re: Use Case of mutable RDD - any ideas around will help.

2014-09-12 Thread pankaj.arora
Hi Patrick, What if all the data has to be keep in cache all time. If applying union result in new RDD then caching this would result into keeping older as well as this into memory hence duplicating data. Below is what i understood from your comment. sqlContext.cacheTable(existingRDD)// caches