RDD level partitioning information is not used to decide when to shuffle
for queries planned using Catalyst (since we have better information about
distribution from the query plan itself).  Instead you should be looking at
the logic in EnsureRequirements
<https://github.com/apache/spark/blob/06f0df6df204c4722ff8a6bf909abaa32a715c41/sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala#L272>
.

We don't yet reason about equivalence classes for attributes when deciding
if a given partitioning is valid, but #10844
<https://github.com/apache/spark/pull/10844> is a start at building that
infrastructure.

Reply via email to