[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...

2018-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21821


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...

2018-07-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21821#discussion_r204248020
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -891,8 +891,9 @@ object DDLUtils {
* Throws exception if outputPath tries to overwrite inputpath.
*/
   def verifyNotReadPath(query: LogicalPlan, outputPath: Path) : Unit = {
-val inputPaths = query.collect {
-  case LogicalRelation(r: HadoopFsRelation, _, _, _) => 
r.location.rootPaths
+val inputPaths = EliminateBarriers(query).collect {
--- End diff --

AnalysisBarrier is a leaf node. That is one of the reasons why it could 
easily break the other code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...

2018-07-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21821#discussion_r203905148
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -254,7 +254,7 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   val writer = ws.createWriter(jobId, df.logicalPlan.schema, mode, 
options)
   if (writer.isPresent) {
 runCommand(df.sparkSession, "save") {
-  WriteToDataSourceV2(writer.get(), df.logicalPlan)
+  WriteToDataSourceV2(writer.get(), df.planWithBarrier)
--- End diff --

This change is not needed but it is safe to have. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...

2018-07-19 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/21821

[SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWriter

## What changes were proposed in this pull request?
```Scala
  val udf1 = udf({(x: Int, y: Int) => x + y})
  val df = spark.range(0, 3).toDF("a")
.withColumn("b", udf1($"a", udf1($"a", lit(10
  df.cache()
  df.write.saveAsTable("t")
```
Cache is not being used because the plans do not match with the cached 
plan. This is a regression caused by the changes we made in AnalysisBarrier, 
since not all the Analyzer rules are idempotent. 

## How was this patch tested?
Added a test. 

Also found a bug in the DSV1 write path. This is not a regression. Thus, 
opened a separate JIRA https://issues.apache.org/jira/browse/SPARK-24869

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark testMaster22

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21821.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21821


commit 23ec09fc3bbedd2f34c594daf461cebd9c0295a6
Author: Xiao Li 
Date:   2018-07-19T23:38:44Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org