[GitHub] spark pull request #21926: [SPARK-24972][SQL] PivotFirst could not handle pi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21926 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21926: [SPARK-24972][SQL] PivotFirst could not handle pi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21926#discussion_r206406546 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -529,6 +529,10 @@ class Analyzer( || (p.groupByExprsOpt.isDefined && !p.groupByExprsOpt.get.forall(_.resolved)) || !p.pivotColumn.resolved || !p.pivotValues.forall(_.resolved) => p case Pivot(groupByExprsOpt, pivotColumn, pivotValues, aggregates, child) => +if (!RowOrdering.isOrderable(pivotColumn.dataType)) { + throw new AnalysisException( +s"Invalid pivot column '${pivotColumn}'. Pivot columns must be comparable.") --- End diff -- To the other reviewers, this is consistent with the requirements of group-by columns. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21926: [SPARK-24972][SQL] PivotFirst could not handle pi...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/21926#discussion_r206354004 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -574,10 +578,14 @@ class Analyzer( // Since evaluating |pivotValues| if statements for each input row can get slow this is an // alternate plan that instead uses two steps of aggregation. val namedAggExps: Seq[NamedExpression] = aggregates.map(a => Alias(a, a.sql)()) - val bigGroup = groupByExprs ++ pivotColumn.references + val namedPivotCol = pivotColumn match { --- End diff -- This is to revert the original walk-around aimed to avoid the PivotFirst issue. Now that we have PivotFirst working alright for complex types, we can revert it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21926: [SPARK-24972][SQL] PivotFirst could not handle pi...
GitHub user maryannxue opened a pull request: https://github.com/apache/spark/pull/21926 [SPARK-24972][SQL] PivotFirst could not handle pivot columns of complex types ## What changes were proposed in this pull request? When the pivot column is of a complex type, the eval() result will be an UnsafeRow, while the keys of the HashMap for column value matching is a GenericInternalRow. As a result, there will be no match and the result will always be empty. So for a pivot column of complex-types, we should: 1) If the complex-type is not comparable (orderable), throw an Exception. It cannot be a pivot column. 2) Otherwise, if it goes through the `PivotFirst` code path, `PivotFirst` should use a TreeMap instead of HashMap for such columns. ## How was this patch tested? Added UT. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maryannxue/spark pivot_followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21926.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21926 commit b41a45cb22bd3d49e75711950bcbc3d409bc544a Author: maryannxue Date: 2018-07-30T23:15:40Z [SPARK-24972][SQL] PivotFirst could not handle pivot columns of complex types --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org