It's not in general true that 100 different partitions keys go to 100
partitions -- it depends on the partitioner, but wouldn't be true in the
case of a default HashPartitioner. But, yeah you'd expect a reasonably even
distribution.

What happens in all cases depends on the partitioner. I haven't tested it
(you should just try it) but I would assume that switching the columns
could result in a different assignment to partitions. You would not in
general want to be sensitive to the exact partitioning unless you were
using your own custom partitioning.

On Thu, Nov 17, 2016 at 7:41 PM Cesar <ces...@gmail.com> wrote:

>
> I am using the next line to re-partition a data frame by multiple columns:
>
> val partitionColumns = Seq("date", "company_id").map(x => new Column(x))
> val numPartitions = 100
>
> val dfRepartitioined = df.repartition(numPartitions, partitionColumns)
>
> I understand that if the number of combinations of date and company_id is
> at most 100, each combination of will go to a different partition.
>
> My question is, what happens when the number of combinations larger than
> 100 ? Does re-partition changes in behavior if I switch the column order in
> the definition of partitionColumns variable?
>
> Thanks
> --
> Cesar Flores
>

Reply via email to