[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

2018-11-12 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/22941


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

2018-11-04 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22941#discussion_r230622708
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with 
SharedSQLContext {
   sql("INSERT INTO TABLE test_table SELECT 2, null")
 }
   }
+
+  test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") 
{
--- End diff --

It works. Do we need to fix this plan issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22941#discussion_r230609046
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with 
SharedSQLContext {
   sql("INSERT INTO TABLE test_table SELECT 2, null")
 }
   }
+
+  test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") 
{
--- End diff --

You can move this test suite to CachedTableSuite.scala and use the helper 
functions to verify whether the cache is used. 

See the example. 
```
spark.range(2).createTempView("test_view")
spark.catalog.cacheTable("test_view")
val rddId = rddIdOf("test_view")
assert(!isMaterialized(rddId))
sql("INSERT INTO TABLE test_table SELECT * FROM test_view")
assert(isMaterialized(rddId))
```




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22941#discussion_r230608937
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoDataSourceCommand.scala
 ---
@@ -30,14 +30,13 @@ import org.apache.spark.sql.sources.InsertableRelation
 case class InsertIntoDataSourceCommand(
 logicalRelation: LogicalRelation,
 query: LogicalPlan,
-overwrite: Boolean)
-  extends RunnableCommand {
+overwrite: Boolean,
+outputColumnNames: Seq[String])
+  extends DataWritingCommand {
 
-  override protected def innerChildren: Seq[QueryPlan[_]] = Seq(query)
-
-  override def run(sparkSession: SparkSession): Seq[Row] = {
+  override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] 
= {
 val relation = 
logicalRelation.relation.asInstanceOf[InsertableRelation]
-val data = Dataset.ofRows(sparkSession, query)
--- End diff --

This will use the cached data, although the plan does not show the cached 
data is used. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

2018-11-04 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/22941

[SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does not use Cached Data

## What changes were proposed in this pull request?

```java
spark.sql("""
  CREATE TABLE jdbcTable
  USING org.apache.spark.sql.jdbc
  OPTIONS (
url "jdbc:mysql://localhost:3306/test",
dbtable "test.InsertIntoDataSourceCommand",
user "hive",
password "hive"
  )""")

spark.range(2).createTempView("test_view")
spark.catalog.cacheTable("test_view")
spark.sql("INSERT INTO TABLE jdbcTable SELECT * FROM test_view").explain
```

Before this PR:
```
== Physical Plan == 

Execute InsertIntoDataSourceCommand
   +- InsertIntoDataSourceCommand
 +- Project
+- SubqueryAlias
   +- Range (0, 2, step=1, splits=Some(8))
```

After this PR:
```
== Physical Plan == 

Execute InsertIntoDataSourceCommand InsertIntoDataSourceCommand 
Relation[id#8L] JDBCRelation(test.InsertIntoDataSourceCommand) 
[numPartitions=1], false, [id]
+- *(1) InMemoryTableScan [id#0L]
  +- InMemoryRelation [id#0L], StorageLevel(disk, memory, deserialized, 
1 replicas)
+- *(1) Range (0, 2, step=1, splits=8)
```

## How was this patch tested?

unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-25936

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22941


commit 2968b2c34f42f6b0bcb5e373a400377abfd09e86
Author: Yuming Wang 
Date:   2018-11-04T10:36:20Z

Fix InsertIntoDataSourceCommand does not use Cached Data




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org