I'm confused about this. The comment  on the function seems to indicate
that there is absolutely no shuffle or network IO but it also states that
it assigns an even number of parent partitions to each final partition
group. I'm having trouble seeing how this can be guaranteed without some
data passing around nodes.

For instance, lets saying I have 5 machines and 10 partitions but the way
the partitions are layed out is machines 1, 2, and 3 each have 3 partitions
while machine 4 only has 1 partition and machine 5 has none. Am I to assume
that coalesce(4, false) will the 3 partitions on nodes 1, 2, and 3 each to
1 partition while node 4 will just remain 1 partition?

Thanks.

Reply via email to