Thanks Sean.

But if the partitions of RDD is determined before hand, it would not be
flexible to run the same program on the different dataset. Although for the
first stage the partitions can be determined by the input data set, for the
intermediate stage it is not possible. Users have to create policy to
repartition or coalesce based on the data set size.


On Tue, Mar 3, 2015 at 6:29 PM, Sean Owen <so...@cloudera.com> wrote:

> An RDD has a certain fixed number of partitions, yes. You can't change
> an RDD. You can repartition() or coalese() and RDD to make a new one
> with a different number of RDDs, possibly requiring a shuffle.
>
> On Tue, Mar 3, 2015 at 10:21 AM, Jeff Zhang <zjf...@gmail.com> wrote:
> > I mean is it possible to change the partition number at runtime. Thanks
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Reply via email to