You can use MapPartitions to achieve this. /split each partition into 10 equal parts with each part having number as its id val splittedRDD = self.mapPartitions((itr)=> { Iterate over this iterator and breaks this iterator into 10 parts. val iterators = Array[ArrayBuffer[T]](10) var i =0 for(tuple <- itr) { iterators(i%10) = tuple i+=1 } i = 0 iterators.map((i,_)) })
//filter rdd for each part broken above and flat map to get array of RDDs var rddArray = (0 to 10).toArray.map(i => splittedRDD.filter(_._1 == i).flatMap(x=>x) The code is not written in IDE it will work with little modifications -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/split-a-RDD-by-pencetage-tp333p14106.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org