[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...

gatorsmile Thu, 19 Jul 2018 16:50:17 -0700

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/21821


    [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWriter

    ## What changes were proposed in this pull request?
    ```Scala
          val udf1 = udf({(x: Int, y: Int) => x + y})
          val df = spark.range(0, 3).toDF("a")
            .withColumn("b", udf1($"a", udf1($"a", lit(10))))
          df.cache()
          df.write.saveAsTable("t")
    ```
    Cache is not being used because the plans do not match with the cached 
plan. This is a regression caused by the changes we made in AnalysisBarrier, 
since not all the Analyzer rules are idempotent. 
    
    ## How was this patch tested?
    Added a test. 
    
    Also found a bug in the DSV1 write path. This is not a regression. Thus, 
opened a separate JIRA https://issues.apache.org/jira/browse/SPARK-24869

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark testMaster22

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21821.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21821
    
----
commit 23ec09fc3bbedd2f34c594daf461cebd9c0295a6
Author: Xiao Li <gatorsmile@...>
Date:   2018-07-19T23:38:44Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...

Reply via email to