[ https://issues.apache.org/jira/browse/SPARK-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680550#comment-14680550 ]
Apache Spark commented on SPARK-9785: ------------------------------------- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/8074 > HashPartitioning compatibility should consider expression ordering > ------------------------------------------------------------------ > > Key: SPARK-9785 > URL: https://issues.apache.org/jira/browse/SPARK-9785 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Josh Rosen > Assignee: Josh Rosen > Priority: Blocker > > HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but > in other contexts the ordering of those expressions matters. This is > illustrated by the following regression test: > {code} > test("HashPartitioning compatibility") { > val expressions = Seq(Literal(2), Literal(3)) > // Consider two HashPartitionings that have the same _set_ of hash > expressions but which are > // created with different orderings of those expressions: > val partitioningA = HashPartitioning(expressions, 100) > val partitioningB = HashPartitioning(expressions.reverse, 100) > // These partitionings are not considered equal: > assert(partitioningA != partitioningB) > // However, they both satisfy the same clustered distribution: > val distribution = ClusteredDistribution(expressions) > assert(partitioningA.satisfies(distribution)) > assert(partitioningB.satisfies(distribution)) > // Both partitionings are compatible with and guarantee each other: > assert(partitioningA.compatibleWith(partitioningB)) > assert(partitioningB.compatibleWith(partitioningA)) > assert(partitioningA.guarantees(partitioningB)) > assert(partitioningB.guarantees(partitioningA)) > // Given all of this, we would expect these partitionings to compute the > same hashcode for > // any given row: > def computeHashCode(partitioning: HashPartitioning): Int = { > val hashExprProj = new > InterpretedMutableProjection(partitioning.expressions, Seq.empty) > hashExprProj.apply(InternalRow.empty).hashCode() > } > assert(computeHashCode(partitioningA) === computeHashCode(partitioningB)) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org