[jira] [Assigned] (SPARK-9785) HashPartitioning compatibility should consider expression ordering
[ https://issues.apache.org/jira/browse/SPARK-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9785: --- Assignee: Apache Spark (was: Josh Rosen) HashPartitioning compatibility should consider expression ordering -- Key: SPARK-9785 URL: https://issues.apache.org/jira/browse/SPARK-9785 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Josh Rosen Assignee: Apache Spark Priority: Blocker HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but in other contexts the ordering of those expressions matters. This is illustrated by the following regression test: {code} test(HashPartitioning compatibility) { val expressions = Seq(Literal(2), Literal(3)) // Consider two HashPartitionings that have the same _set_ of hash expressions but which are // created with different orderings of those expressions: val partitioningA = HashPartitioning(expressions, 100) val partitioningB = HashPartitioning(expressions.reverse, 100) // These partitionings are not considered equal: assert(partitioningA != partitioningB) // However, they both satisfy the same clustered distribution: val distribution = ClusteredDistribution(expressions) assert(partitioningA.satisfies(distribution)) assert(partitioningB.satisfies(distribution)) // Both partitionings are compatible with and guarantee each other: assert(partitioningA.compatibleWith(partitioningB)) assert(partitioningB.compatibleWith(partitioningA)) assert(partitioningA.guarantees(partitioningB)) assert(partitioningB.guarantees(partitioningA)) // Given all of this, we would expect these partitionings to compute the same hashcode for // any given row: def computeHashCode(partitioning: HashPartitioning): Int = { val hashExprProj = new InterpretedMutableProjection(partitioning.expressions, Seq.empty) hashExprProj.apply(InternalRow.empty).hashCode() } assert(computeHashCode(partitioningA) === computeHashCode(partitioningB)) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9785) HashPartitioning compatibility should consider expression ordering
[ https://issues.apache.org/jira/browse/SPARK-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9785: --- Assignee: Josh Rosen (was: Apache Spark) HashPartitioning compatibility should consider expression ordering -- Key: SPARK-9785 URL: https://issues.apache.org/jira/browse/SPARK-9785 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Blocker HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but in other contexts the ordering of those expressions matters. This is illustrated by the following regression test: {code} test(HashPartitioning compatibility) { val expressions = Seq(Literal(2), Literal(3)) // Consider two HashPartitionings that have the same _set_ of hash expressions but which are // created with different orderings of those expressions: val partitioningA = HashPartitioning(expressions, 100) val partitioningB = HashPartitioning(expressions.reverse, 100) // These partitionings are not considered equal: assert(partitioningA != partitioningB) // However, they both satisfy the same clustered distribution: val distribution = ClusteredDistribution(expressions) assert(partitioningA.satisfies(distribution)) assert(partitioningB.satisfies(distribution)) // Both partitionings are compatible with and guarantee each other: assert(partitioningA.compatibleWith(partitioningB)) assert(partitioningB.compatibleWith(partitioningA)) assert(partitioningA.guarantees(partitioningB)) assert(partitioningB.guarantees(partitioningA)) // Given all of this, we would expect these partitionings to compute the same hashcode for // any given row: def computeHashCode(partitioning: HashPartitioning): Int = { val hashExprProj = new InterpretedMutableProjection(partitioning.expressions, Seq.empty) hashExprProj.apply(InternalRow.empty).hashCode() } assert(computeHashCode(partitioningA) === computeHashCode(partitioningB)) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org