[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20705


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r174637563
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -526,7 +526,7 @@ object SQLConf {
   val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default")
 .doc("The default data source to use in input/output.")
 .stringConf
-.createWithDefault("parquet")
+.createWithDefault("orc")
--- End diff --

Yep. It's back now, @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r174019305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -526,7 +526,7 @@ object SQLConf {
   val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default")
 .doc("The default data source to use in input/output.")
 .stringConf
-.createWithDefault("parquet")
+.createWithDefault("orc")
--- End diff --

Can you change it back?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r173358166
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
 ---
@@ -852,52 +846,52 @@ class MetastoreDataSourcesSuite extends QueryTest 
with SQLTestUtils with TestHiv
   (from to to).map(i => i -> s"str$i").toDF("c1", "c2")
 }
 
-withTable("insertParquet") {
-  createDF(0, 9).write.format("parquet").saveAsTable("insertParquet")
+withTable("t") {
+  createDF(0, 9).write.saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, p.c2 FROM insertParquet p WHERE p.c1 > 5"),
+sql("SELECT p.c1, p.c2 FROM t p WHERE p.c1 > 5"),
 (6 to 9).map(i => Row(i, s"str$i")))
 
   intercept[AnalysisException] {
-createDF(10, 
19).write.format("parquet").saveAsTable("insertParquet")
+createDF(10, 19).write.saveAsTable("t")
   }
 
-  createDF(10, 
19).write.mode(SaveMode.Append).format("parquet").saveAsTable("insertParquet")
+  createDF(10, 19).write.mode(SaveMode.Append).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, p.c2 FROM insertParquet p WHERE p.c1 > 5"),
+sql("SELECT p.c1, p.c2 FROM t p WHERE p.c1 > 5"),
 (6 to 19).map(i => Row(i, s"str$i")))
 
-  createDF(20, 
29).write.mode(SaveMode.Append).format("parquet").saveAsTable("insertParquet")
+  createDF(20, 29).write.mode(SaveMode.Append).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 
< 25"),
+sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 25"),
 (6 to 24).map(i => Row(i, s"str$i")))
 
   intercept[AnalysisException] {
-createDF(30, 39).write.saveAsTable("insertParquet")
+createDF(30, 39).write.saveAsTable("t")
   }
 
-  createDF(30, 
39).write.mode(SaveMode.Append).saveAsTable("insertParquet")
+  createDF(30, 39).write.mode(SaveMode.Append).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 
< 35"),
+sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 35"),
 (6 to 34).map(i => Row(i, s"str$i")))
 
-  createDF(40, 
49).write.mode(SaveMode.Append).insertInto("insertParquet")
+  createDF(40, 49).write.mode(SaveMode.Append).insertInto("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 
< 45"),
+sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 45"),
 (6 to 44).map(i => Row(i, s"str$i")))
 
-  createDF(50, 
59).write.mode(SaveMode.Overwrite).saveAsTable("insertParquet")
+  createDF(50, 59).write.mode(SaveMode.Overwrite).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 51 AND p.c1 
< 55"),
+sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 51 AND p.c1 < 55"),
 (52 to 54).map(i => Row(i, s"str$i")))
-  createDF(60, 
69).write.mode(SaveMode.Ignore).saveAsTable("insertParquet")
+  createDF(60, 69).write.mode(SaveMode.Ignore).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p"),
+sql("SELECT p.c1, c2 FROM t p"),
 (50 to 59).map(i => Row(i, s"str$i")))
 
-  createDF(70, 
79).write.mode(SaveMode.Overwrite).insertInto("insertParquet")
+  createDF(70, 79).write.mode(SaveMode.Overwrite).insertInto("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p"),
+sql("SELECT p.c1, c2 FROM t p"),
 (70 to 79).map(i => Row(i, s"str$i")))
--- End diff --

That is because this PR minimally changed only the test case causing 
failures. We cannot generalize all test cases at an one-shot huge PR for all 
modules. That will make it difficult to backport the other commits. The main 
goal of this PR is improving test- ability for new data sources. 

For example, `SPARK-8156:create table to specific database by 'use dbname'` 
writes to parquet, but reads with SQL, not by `read.parquet`. So, it doesn't 
fail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-08 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r173331220
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
 ---
@@ -591,7 +591,7 @@ class MetastoreDataSourcesSuite extends QueryTest with 
SQLTestUtils with TestHiv
   }
 
   test("Pre insert nullability check (ArrayType)") {
-withTable("arrayInParquet") {
+withTable("array") {
--- End diff --

It would be good, maybe in a future cleanup, to replace all these repeating 
string literals (e.g, "array" 7 times, "map" 7 times) with a variable name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-08 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r173332327
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
 ---
@@ -852,52 +846,52 @@ class MetastoreDataSourcesSuite extends QueryTest 
with SQLTestUtils with TestHiv
   (from to to).map(i => i -> s"str$i").toDF("c1", "c2")
 }
 
-withTable("insertParquet") {
-  createDF(0, 9).write.format("parquet").saveAsTable("insertParquet")
+withTable("t") {
+  createDF(0, 9).write.saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, p.c2 FROM insertParquet p WHERE p.c1 > 5"),
+sql("SELECT p.c1, p.c2 FROM t p WHERE p.c1 > 5"),
 (6 to 9).map(i => Row(i, s"str$i")))
 
   intercept[AnalysisException] {
-createDF(10, 
19).write.format("parquet").saveAsTable("insertParquet")
+createDF(10, 19).write.saveAsTable("t")
   }
 
-  createDF(10, 
19).write.mode(SaveMode.Append).format("parquet").saveAsTable("insertParquet")
+  createDF(10, 19).write.mode(SaveMode.Append).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, p.c2 FROM insertParquet p WHERE p.c1 > 5"),
+sql("SELECT p.c1, p.c2 FROM t p WHERE p.c1 > 5"),
 (6 to 19).map(i => Row(i, s"str$i")))
 
-  createDF(20, 
29).write.mode(SaveMode.Append).format("parquet").saveAsTable("insertParquet")
+  createDF(20, 29).write.mode(SaveMode.Append).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 
< 25"),
+sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 25"),
 (6 to 24).map(i => Row(i, s"str$i")))
 
   intercept[AnalysisException] {
-createDF(30, 39).write.saveAsTable("insertParquet")
+createDF(30, 39).write.saveAsTable("t")
   }
 
-  createDF(30, 
39).write.mode(SaveMode.Append).saveAsTable("insertParquet")
+  createDF(30, 39).write.mode(SaveMode.Append).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 
< 35"),
+sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 35"),
 (6 to 34).map(i => Row(i, s"str$i")))
 
-  createDF(40, 
49).write.mode(SaveMode.Append).insertInto("insertParquet")
+  createDF(40, 49).write.mode(SaveMode.Append).insertInto("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 
< 45"),
+sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 45"),
 (6 to 44).map(i => Row(i, s"str$i")))
 
-  createDF(50, 
59).write.mode(SaveMode.Overwrite).saveAsTable("insertParquet")
+  createDF(50, 59).write.mode(SaveMode.Overwrite).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 51 AND p.c1 
< 55"),
+sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 51 AND p.c1 < 55"),
 (52 to 54).map(i => Row(i, s"str$i")))
-  createDF(60, 
69).write.mode(SaveMode.Ignore).saveAsTable("insertParquet")
+  createDF(60, 69).write.mode(SaveMode.Ignore).saveAsTable("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p"),
+sql("SELECT p.c1, c2 FROM t p"),
 (50 to 59).map(i => Row(i, s"str$i")))
 
-  createDF(70, 
79).write.mode(SaveMode.Overwrite).insertInto("insertParquet")
+  createDF(70, 79).write.mode(SaveMode.Overwrite).insertInto("t")
   checkAnswer(
-sql("SELECT p.c1, c2 FROM insertParquet p"),
+sql("SELECT p.c1, c2 FROM t p"),
 (70 to 79).map(i => Row(i, s"str$i")))
--- End diff --

Curious about why the test named "SPARK-8156:create table to specific 
database by 'use dbname'" still has a hard-coded format of parquet. Is it 
testing functionality that is orthogonal to the format maybe?

I changed the hard-coded format to json, orc, and csv, and each time that 
test passed.

Similarly with 
  Suite: org.apache.spark.sql.sources.SaveLoadSuite
  Test: SPARK-23459: Improve error message when specified unknown column in 
partition columns


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r173281238
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2476,7 +2477,7 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 withTempDir { dir =>
   val parquetDir = new File(dir, "parquet").getCanonicalPath
   spark.range(10).withColumn("_col", 
$"id").write.partitionBy("_col").save(parquetDir)
--- End diff --

Thank you for review, @bersprockets .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-08 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r173275865
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2476,7 +2477,7 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 withTempDir { dir =>
   val parquetDir = new File(dir, "parquet").getCanonicalPath
   spark.range(10).withColumn("_col", 
$"id").write.partitionBy("_col").save(parquetDir)
--- End diff --

Since the data format may not be parquet, maybe the directory name should 
be more generic, like dataDir.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172024213
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("data source table created in InMemoryCatalog should be able to 
read/write") {
 withTable("tbl") {
-  sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+  val provider = spark.sessionState.conf.defaultDataSourceName
--- End diff --

So far, the purpose of this PR is **setting once in `SQLConf.scala`** to 
order to test a new data source to find out the limitation instead of 
**touching every data suite**.

BTW, `spark.sql.sources.default=parquet` doesn't help this existing code 
because the SQL has a fixed string `USING parquet`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172024043
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("data source table created in InMemoryCatalog should be able to 
read/write") {
 withTable("tbl") {
-  sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+  val provider = spark.sessionState.conf.defaultDataSourceName
--- End diff --

This is `SQLQuerySuite`. The test case is correctly testing its 
**purpose**. Every data source have its own capability and limitation. Your 
example is only `text` data source's limitation supporting `a single column 
schema`, isn't it? For the other `csv/json/orc/parquet` will pass this specific 
test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172023909
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, 
**options):
or a DDL-formatted string (For example ``col0 INT, 
col1 DOUBLE``).
 :param options: all other string options
 
+>>> spark.conf.set("spark.sql.sources.default", "parquet")
--- End diff --

Yep. That was my first commit 
[here](https://github.com/apache/spark/pull/20705#discussion-diff-171729865R150).
 I'll rollback this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172022980
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("data source table created in InMemoryCatalog should be able to 
read/write") {
 withTable("tbl") {
-  sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+  val provider = spark.sessionState.conf.defaultDataSourceName
--- End diff --

Hm .. how about just explicitly setting `spark.sql.sources.default` to 
`parquet` in all places rather than using the default? If it's set to, for 
example, `text`, this test becomes failed. I thought it's a bit odd a test it 
is dependent on a default value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172022384
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, 
**options):
or a DDL-formatted string (For example ``col0 INT, 
col1 DOUBLE``).
 :param options: all other string options
 
+>>> spark.conf.set("spark.sql.sources.default", "parquet")
--- End diff --

Can we just call `format('parquet')` like the doctest for JSON below?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172007674
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 ---
@@ -57,6 +57,16 @@ class ParquetPartitionDiscoverySuite extends QueryTest 
with ParquetTest with Sha
   val timeZone = TimeZone.getDefault()
   val timeZoneId = timeZone.getID
 
+  protected override def beforeAll(): Unit = {
+super.beforeAll()
+spark.conf.set(SQLConf.DEFAULT_DATA_SOURCE_NAME.key, "parquet")
--- End diff --

Since this is `ParquetPartitionDiscoverySuite`, the test cases' assumption 
is legitimate.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172007654
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, 
**options):
or a DDL-formatted string (For example ``col0 INT, 
col1 DOUBLE``).
 :param options: all other string options
 
+>>> spark.conf.set("spark.sql.sources.default", "parquet")
 >>> df = 
spark.read.load('python/test_support/sql/parquet_partitioned', opt1=True,
--- End diff --

The built-in test data is `parquet`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r171729865
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -147,8 +147,8 @@ def load(self, path=None, format=None, schema=None, 
**options):
or a DDL-formatted string (For example ``col0 INT, 
col1 DOUBLE``).
 :param options: all other string options
 
->>> df = 
spark.read.load('python/test_support/sql/parquet_partitioned', opt1=True,
-... opt2=1, opt3='str')
+>>> df = 
spark.read.format("parquet").load('python/test_support/sql/parquet_partitioned',
+... opt1=True, opt2=1, opt3='str')
--- End diff --

Unlike the other things, there is some difference from the original 
semantics.
As another approach, we can add the following with keeping the original 
`spark.read.load`.
```python
spark.conf.set("spark.sql.sources.default", "parquet")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r171668693
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -516,24 +516,19 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   test("CTAS with default fileformat") {
 val table = "ctas1"
 val ctas = s"CREATE TABLE IF NOT EXISTS $table SELECT key k, value 
FROM src"
-withSQLConf(SQLConf.CONVERT_CTAS.key -> "true") {
-  withSQLConf("hive.default.fileformat" -> "textfile") {
+Seq("orc", "parquet").foreach { dataSourceFormat =>
+  withSQLConf(
+SQLConf.CONVERT_CTAS.key -> "true",
+SQLConf.DEFAULT_DATA_SOURCE_NAME.key -> dataSourceFormat,
+"hive.default.fileformat" -> "textfile") {
 withTable(table) {
   sql(ctas)
-  // We should use parquet here as that is the default datasource 
fileformat. The default
-  // datasource file format is controlled by 
`spark.sql.sources.default` configuration.
+  // The default datasource file format is controlled by 
`spark.sql.sources.default`.
   // This testcase verifies that setting `hive.default.fileformat` 
has no impact on
   // the target table's fileformat in case of CTAS.
-  assert(sessionState.conf.defaultDataSourceName === "parquet")
-  checkRelation(tableName = table, isDataSourceTable = true, 
format = "parquet")
+  checkRelation(tableName = table, isDataSourceTable = true, 
format = dataSourceFormat)
 }
   }
--- End diff --

Previously, `spark.sql.source.default=orc` with 
`hive.default.fileformat=textfile` is not tested properly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r171668051
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 ---
@@ -739,15 +739,15 @@ class ParquetPartitionDiscoverySuite extends 
QueryTest with ParquetTest with Sha
 withTempPath { dir =>
   df.write.format("parquet").partitionBy(partitionColumns.map(_.name): 
_*).save(dir.toString)
   val fields = schema.map(f => Column(f.name).cast(f.dataType))
-  checkAnswer(spark.read.load(dir.toString).select(fields: _*), row)
+  checkAnswer(spark.read.parquet(dir.toString).select(fields: _*), row)
--- End diff --

Since this is `ParquetPartitionDiscoverySuite`, `parquet` is more proper 
than `load`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r171657406
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -526,7 +526,7 @@ object SQLConf {
   val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default")
 .doc("The default data source to use in input/output.")
 .stringConf
-.createWithDefault("parquet")
+.createWithDefault("orc")
--- End diff --

This is a testing purpose during review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-01 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/20705

[SPARK-23553][TESTS] Tests should not assume the default value of 
`spark.sql.sources.default`

## What changes were proposed in this pull request?

Currently, some tests have an assumption that 
`spark.sql.sources.default=parquet`. In fact, that is a correct assumption, but 
that assumption makes it difficult to test new data source format. This PR 
improves test suites more robust and makes it easy to test new data sources. As 
an example, the PR uses `spark.sql.sources.default=orc` during reviews. This PR 
also aims to test new native ORC data source with the full existing Apache 
Spark test coverage.

## How was this patch tested?

Pass the Jenkins with updated tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23553

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20705


commit 427b6f01a34938487f162ddf0a38d9217bfb4ece
Author: Dongjoon Hyun 
Date:   2018-01-11T05:01:21Z

[SPARK-23553][TESTS] Tests should not assume the default value of 
`spark.sql.sources.default`




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org