You can't create a new RDD by selecting few elements. A rdd.take(n), takeSample etc are actions and it will trigger your entire pipeline to be executed. You can although do something like this i guess:
val sample_data = rdd.take(10) val sample_rdd = sc.parallelize(sample_data) Thanks Best Regards On Thu, Oct 29, 2015 at 10:45 AM, 张志强(旺轩) <zzq98...@alibaba-inc.com> wrote: > How do I to get a NEW RDD that has a number of elements that I specified? > Sample()? It has no the number parameter, takeSample() it returns as a list? > > > > Help, please. >