GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/14648

    [SPARK-16995][SQL] TreeNodeException when flat mapping 
RelationalGroupedDataset created from DataFrame containing a column created 
with lit/expr

    ## What changes were proposed in this pull request?
    
    A TreeNodeException is thrown when executing the following minimal example 
in Spark 2.0. 
    
        import spark.implicits._
        case class test (x: Int, q: Int)
    
        val d = Seq(1).toDF("x")
        d.withColumn("q", lit(0)).as[test].groupByKey(_.x).flatMapGroups{case 
(x, iter) => List[Int]()}.show
        d.withColumn("q", 
expr("0")).as[test].groupByKey(_.x).flatMapGroups{case (x, iter) => 
List[Int]()}.show
    
    The problem is at `FoldablePropagation`. The rule will do 
`transformExpressions` on `LogicalPlan`. The query above contains a `MapGroups` 
which has a parameter `dataAttributes:Seq[Attribute]`. One attributes in 
`dataAttributes` will be transformed to an `Alias(literal(0), _)` in 
`FoldablePropagation`. `Alias` is not an `Attribute` and causes the error.
    
    We can't easily detect such type inconsistency during transforming 
expressions. A direct approach to this problem is to skip doing 
`FoldablePropagation` on object operators as they should not contain such 
expressions.
    
    ## How was this patch tested?
    
    Jenkins tests.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 flat-mapping

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14648
    
----
commit 2e0d7d6d6ccf896e363db68b76030adf9ea9e691
Author: Liang-Chi Hsieh <sim...@tw.ibm.com>
Date:   2016-08-15T15:26:29Z

    Don't do foldablePropagate on object operators.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to