Josh Rosen created SPARK-9785: --------------------------------- Summary: HashPartitioning guarantees / compatibleWith violate those methods' contracts Key: SPARK-9785 URL: https://issues.apache.org/jira/browse/SPARK-9785 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Blocker
HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but in other contexts the ordering of those expressions matters. This is illustrated by the following regression test: {code} test("HashPartitioning compatibility") { val expressions = Seq(Literal(2), Literal(3)) // Consider two HashPartitionings that have the same _set_ of hash expressions but which are // created with different orderings of those expressions: val partitioningA = HashPartitioning(expressions, 100) val partitioningB = HashPartitioning(expressions.reverse, 100) // These partitionings are not considered equal: assert(partitioningA != partitioningB) // However, they both satisfy the same clustered distribution: val distribution = ClusteredDistribution(expressions) assert(partitioningA.satisfies(distribution)) assert(partitioningB.satisfies(distribution)) // Both partitionings are compatible with and guarantee each other: assert(partitioningA.compatibleWith(partitioningB)) assert(partitioningB.compatibleWith(partitioningA)) assert(partitioningA.guarantees(partitioningB)) assert(partitioningB.guarantees(partitioningA)) // Given all of this, we would expect these partitionings to compute the same hashcode for // any given row: def computeHashCode(partitioning: HashPartitioning): Int = { val hashExprProj = new InterpretedMutableProjection(partitioning.expressions, Seq.empty) hashExprProj.apply(InternalRow.empty).hashCode() } assert(computeHashCode(partitioningA) === computeHashCode(partitioningB)) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org