Apparently, there is another way to do it. You can try creating a PartitionPruningRDD and pass a partition filter function to it. This RDD will do the same thing that I suggested in my mail and you will not have to create a new RDD.
Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io On Wed, Apr 6, 2016 at 5:35 PM, Sun, Rui <rui....@intel.com> wrote: > Maybe you can try SparkContext.submitJob: > > *def **submitJob**[T, U, R](rdd: RDD > <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html>[T], > processPartition: > (Iterator[T]) **⇒** U, partitions: Seq[Int], resultHandler: (Int, U) **⇒** > Unit, resultFunc: > **⇒** R): SimpleFutureAction > <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SimpleFutureAction.html>[R]* > > > > > > *From:* Hemant Bhanawat [mailto:hemant9...@gmail.com] > *Sent:* Wednesday, April 6, 2016 7:16 PM > *To:* Andrei <faithlessfri...@gmail.com> > *Cc:* user <user@spark.apache.org> > *Subject:* Re: How to process one partition at a time? > > > > Instead of doing it in compute, you could rather override getPartitions > method of your RDD and return only the target partitions. This way tasks > for only target partitions will be created. Currently in your case, tasks > for all the partitions are getting created. > > I hope it helps. I would like to hear if you take some other approach. > > > Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> > > www.snappydata.io > > > > On Wed, Apr 6, 2016 at 3:49 PM, Andrei <faithlessfri...@gmail.com> wrote: > > I'm writing a kind of sampler which in most cases will require only 1 > partition, sometimes 2 and very rarely more. So it doesn't make sense to > process all partitions in parallel. What is the easiest way to limit > computations to one partition only? > > > > So far the best idea I came to is to create a custom partition whose > `compute` method looks something like: > > > > def compute(split: Partition, context: TaskContext) = { > > if (split.index == targetPartition) { > > // do computation > > } else { > > // return empty iterator > > } > > } > > > > > > But it's quite ugly and I'm unlikely to be the first person with such a > need. Is there easier way to do it? > > > > > > >