[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22688


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224341077
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala
 ---
@@ -116,7 +116,6 @@ class SimpleWritableDataSource extends DataSourceV2
   schema: StructType,
   mode: SaveMode,
   options: DataSourceOptions): Optional[BatchWriteSupport] = {
-assert(DataType.equalsStructurally(schema.asNullable, 
this.schema.asNullable))
--- End diff --

Yea .. but it's in test code and just sanity check.. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224337926
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala
 ---
@@ -116,7 +116,6 @@ class SimpleWritableDataSource extends DataSourceV2
   schema: StructType,
   mode: SaveMode,
   options: DataSourceOptions): Optional[BatchWriteSupport] = {
-assert(DataType.equalsStructurally(schema.asNullable, 
this.schema.asNullable))
--- End diff --

For modes other than Append, I think we still need this assert, don't we?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224326923
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -351,6 +351,21 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-25700: do not read schema when writing in other modes except 
append mode") {
+withTempPath { file =>
+  val cls = classOf[SimpleWriteOnlyDataSource]
+  val path = file.getCanonicalPath
+  val df = spark.range(5).select('id as 'i, -'id as 'j)
+  try {
+df.write.format(cls.getName).option("path", 
path).mode("error").save()
+df.write.format(cls.getName).option("path", 
path).mode("overwrite").save()
+df.write.format(cls.getName).option("path", 
path).mode("ignore").save()
+  } catch {
+case e: SchemaReadAttemptException => fail("Schema read was 
attempted.", e)
+  }
--- End diff --

Yup


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224326576
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -351,6 +351,21 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-25700: do not read schema when writing in other modes except 
append mode") {
+withTempPath { file =>
+  val cls = classOf[SimpleWriteOnlyDataSource]
+  val path = file.getCanonicalPath
+  val df = spark.range(5).select('id as 'i, -'id as 'j)
+  try {
+df.write.format(cls.getName).option("path", 
path).mode("error").save()
+df.write.format(cls.getName).option("path", 
path).mode("overwrite").save()
+df.write.format(cls.getName).option("path", 
path).mode("ignore").save()
+  } catch {
+case e: SchemaReadAttemptException => fail("Schema read was 
attempted.", e)
+  }
--- End diff --

To validate new code path [line 
250](https://github.com/apache/spark/pull/22688/files#diff-94fbd986b04087223f53697d4b6cab24R250),
 could you add `intercept[SchemaReadAttemptException]` and do `append`, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224316297
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -351,6 +351,21 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-25700: do not read schema when writing in other modes except 
append mode") {
+withTempPath { file =>
+  val cls = classOf[SimpleWriteOnlyDataSource]
+  val path = file.getCanonicalPath
+  val df = spark.range(5).select('id as 'i, -'id as 'j)
--- End diff --

The write path looks requiring two columns:


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala#L214



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224316130
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -351,6 +351,21 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-25700: do not read schema when writing in other modes except 
append mode") {
+withTempPath { file =>
+  val cls = classOf[SimpleWriteOnlyDataSource]
+  val path = file.getCanonicalPath
+  val df = spark.range(5).select($"id", $"id")
--- End diff --

The write path looks requiring two columns:


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala#L214



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224153758
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -190,12 +190,13 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
 
   test("simple writable data source") {
 // TODO: java implementation.
+val writeOnlySource = classOf[SimpleWriteOnlyDataSource]
--- End diff --

 can we create a new test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-10 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22688

[SPARK-25700][SQL] Creates ReadSupport in only Append Mode in Data Source 
V2 write path

## What changes were proposed in this pull request?

Alternative for https://github.com/apache/spark/pull/22686: In other save 
modes, we don't need to make a readsupport and read schema.

## How was this patch tested?

Unit test and manual tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark append-revert-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22688.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22688


commit a445e163675eb330eade07f1cbdd88b99caab117
Author: hyukjinkwon 
Date:   2018-10-10T10:38:55Z

Creates ReadSupport in only Append Mode in Data Source V2 write path




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org