subject:"\[spark\] branch branch\-3.0 updated\: \[SPARK\-32810\]\[SQL\]\[TESTS\]\[FOLLOWUP\]\[3.0\] Check path globbing in JSON\/CSV datasources v1 and v2"

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

2020-09-09 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 837843b  [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing 
in JSON/CSV datasources v1 and v2
837843b is described below

commit 837843bea40eec842c782e3c719f8d81024d8a06
Author: Max Gekk 
AuthorDate: Wed Sep 9 21:16:16 2020 +0900

[SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV 
datasources v1 and v2

### What changes were proposed in this pull request?
In the PR, I propose to move the test `SPARK-32810: CSV and JSON data 
sources should be able to read files with escaped glob metacharacter in the 
paths` from `DataFrameReaderWriterSuite` to `CSVSuite` and to `JsonSuite`. This 
will allow to run the same test in `CSVv1Suite`/`CSVv2Suite` and in 
`JsonV1Suite`/`JsonV2Suite`.

### Why are the changes needed?
To improve test coverage by checking JSON/CSV datasources v1 and v2.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running affected test suites:
```
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.csv.*"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.json.*"
```

Closes #29690 from MaxGekk/globbing-paths-when-inferring-schema-dsv2-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 9ba2cab..4e93ea3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -2365,6 +2365,19 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.count() == 3)
 }
   }
+
+  test("SPARK-32810: CSV data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test CSV writer / reader without specifying schema
+  val csvTableName = "[abc]"
+  spark.range(3).coalesce(1).write.csv(s"$basePath/$csvTableName")
+  val readback = spark.read
+.csv(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(csvTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row("0"), Row("1"), Row("2")))
+}
+  }
 }
 
 class CSVv1Suite extends CSVSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
index 38b5e77..8eb5432 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
@@ -2706,6 +2706,19 @@ abstract class JsonSuite extends QueryTest with 
SharedSparkSession with TestJson
   checkAnswer(json, Row(null))
 }
   }
+
+  test("SPARK-32810: JSON data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test JSON writer / reader without specifying schema
+  val jsonTableName = "{def}"
+  spark.range(3).coalesce(1).write.json(s"$basePath/$jsonTableName")
+  val readback = spark.read
+.json(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(jsonTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row(0), Row(1), Row(2)))
+}
+  }
 }
 
 class JsonV1Suite extends JsonSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index f48f445..c7ca012 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -1105,27 +1105,4 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
   }
 }
   }
-
-  test("SPARK-32810: CSV and JSON data sources should be able to read files 
with " +
-"escaped glob metacharacter in the paths") {
-def escape(str: String): String = {
-  """(\[|\]|\{|\})""".r.replaceAllIn(str,

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

2020-09-09 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 837843b  [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing 
in JSON/CSV datasources v1 and v2
837843b is described below

commit 837843bea40eec842c782e3c719f8d81024d8a06
Author: Max Gekk 
AuthorDate: Wed Sep 9 21:16:16 2020 +0900

[SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV 
datasources v1 and v2

### What changes were proposed in this pull request?
In the PR, I propose to move the test `SPARK-32810: CSV and JSON data 
sources should be able to read files with escaped glob metacharacter in the 
paths` from `DataFrameReaderWriterSuite` to `CSVSuite` and to `JsonSuite`. This 
will allow to run the same test in `CSVv1Suite`/`CSVv2Suite` and in 
`JsonV1Suite`/`JsonV2Suite`.

### Why are the changes needed?
To improve test coverage by checking JSON/CSV datasources v1 and v2.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running affected test suites:
```
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.csv.*"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.json.*"
```

Closes #29690 from MaxGekk/globbing-paths-when-inferring-schema-dsv2-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 9ba2cab..4e93ea3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -2365,6 +2365,19 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.count() == 3)
 }
   }
+
+  test("SPARK-32810: CSV data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test CSV writer / reader without specifying schema
+  val csvTableName = "[abc]"
+  spark.range(3).coalesce(1).write.csv(s"$basePath/$csvTableName")
+  val readback = spark.read
+.csv(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(csvTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row("0"), Row("1"), Row("2")))
+}
+  }
 }
 
 class CSVv1Suite extends CSVSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
index 38b5e77..8eb5432 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
@@ -2706,6 +2706,19 @@ abstract class JsonSuite extends QueryTest with 
SharedSparkSession with TestJson
   checkAnswer(json, Row(null))
 }
   }
+
+  test("SPARK-32810: JSON data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test JSON writer / reader without specifying schema
+  val jsonTableName = "{def}"
+  spark.range(3).coalesce(1).write.json(s"$basePath/$jsonTableName")
+  val readback = spark.read
+.json(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(jsonTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row(0), Row(1), Row(2)))
+}
+  }
 }
 
 class JsonV1Suite extends JsonSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index f48f445..c7ca012 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -1105,27 +1105,4 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
   }
 }
   }
-
-  test("SPARK-32810: CSV and JSON data sources should be able to read files 
with " +
-"escaped glob metacharacter in the paths") {
-def escape(str: String): String = {
-  """(\[|\]|\{|\})""".r.replaceAllIn(str,

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

2020-09-09 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 837843b  [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing 
in JSON/CSV datasources v1 and v2
837843b is described below

commit 837843bea40eec842c782e3c719f8d81024d8a06
Author: Max Gekk 
AuthorDate: Wed Sep 9 21:16:16 2020 +0900

[SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV 
datasources v1 and v2

### What changes were proposed in this pull request?
In the PR, I propose to move the test `SPARK-32810: CSV and JSON data 
sources should be able to read files with escaped glob metacharacter in the 
paths` from `DataFrameReaderWriterSuite` to `CSVSuite` and to `JsonSuite`. This 
will allow to run the same test in `CSVv1Suite`/`CSVv2Suite` and in 
`JsonV1Suite`/`JsonV2Suite`.

### Why are the changes needed?
To improve test coverage by checking JSON/CSV datasources v1 and v2.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running affected test suites:
```
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.csv.*"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.json.*"
```

Closes #29690 from MaxGekk/globbing-paths-when-inferring-schema-dsv2-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 9ba2cab..4e93ea3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -2365,6 +2365,19 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.count() == 3)
 }
   }
+
+  test("SPARK-32810: CSV data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test CSV writer / reader without specifying schema
+  val csvTableName = "[abc]"
+  spark.range(3).coalesce(1).write.csv(s"$basePath/$csvTableName")
+  val readback = spark.read
+.csv(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(csvTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row("0"), Row("1"), Row("2")))
+}
+  }
 }
 
 class CSVv1Suite extends CSVSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
index 38b5e77..8eb5432 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
@@ -2706,6 +2706,19 @@ abstract class JsonSuite extends QueryTest with 
SharedSparkSession with TestJson
   checkAnswer(json, Row(null))
 }
   }
+
+  test("SPARK-32810: JSON data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test JSON writer / reader without specifying schema
+  val jsonTableName = "{def}"
+  spark.range(3).coalesce(1).write.json(s"$basePath/$jsonTableName")
+  val readback = spark.read
+.json(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(jsonTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row(0), Row(1), Row(2)))
+}
+  }
 }
 
 class JsonV1Suite extends JsonSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index f48f445..c7ca012 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -1105,27 +1105,4 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
   }
 }
   }
-
-  test("SPARK-32810: CSV and JSON data sources should be able to read files 
with " +
-"escaped glob metacharacter in the paths") {
-def escape(str: String): String = {
-  """(\[|\]|\{|\})""".r.replaceAllIn(str,

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

2020-09-09 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 837843b  [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing 
in JSON/CSV datasources v1 and v2
837843b is described below

commit 837843bea40eec842c782e3c719f8d81024d8a06
Author: Max Gekk 
AuthorDate: Wed Sep 9 21:16:16 2020 +0900

[SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV 
datasources v1 and v2

### What changes were proposed in this pull request?
In the PR, I propose to move the test `SPARK-32810: CSV and JSON data 
sources should be able to read files with escaped glob metacharacter in the 
paths` from `DataFrameReaderWriterSuite` to `CSVSuite` and to `JsonSuite`. This 
will allow to run the same test in `CSVv1Suite`/`CSVv2Suite` and in 
`JsonV1Suite`/`JsonV2Suite`.

### Why are the changes needed?
To improve test coverage by checking JSON/CSV datasources v1 and v2.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running affected test suites:
```
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.csv.*"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.json.*"
```

Closes #29690 from MaxGekk/globbing-paths-when-inferring-schema-dsv2-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 9ba2cab..4e93ea3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -2365,6 +2365,19 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.count() == 3)
 }
   }
+
+  test("SPARK-32810: CSV data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test CSV writer / reader without specifying schema
+  val csvTableName = "[abc]"
+  spark.range(3).coalesce(1).write.csv(s"$basePath/$csvTableName")
+  val readback = spark.read
+.csv(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(csvTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row("0"), Row("1"), Row("2")))
+}
+  }
 }
 
 class CSVv1Suite extends CSVSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
index 38b5e77..8eb5432 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
@@ -2706,6 +2706,19 @@ abstract class JsonSuite extends QueryTest with 
SharedSparkSession with TestJson
   checkAnswer(json, Row(null))
 }
   }
+
+  test("SPARK-32810: JSON data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test JSON writer / reader without specifying schema
+  val jsonTableName = "{def}"
+  spark.range(3).coalesce(1).write.json(s"$basePath/$jsonTableName")
+  val readback = spark.read
+.json(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(jsonTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row(0), Row(1), Row(2)))
+}
+  }
 }
 
 class JsonV1Suite extends JsonSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index f48f445..c7ca012 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -1105,27 +1105,4 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
   }
 }
   }
-
-  test("SPARK-32810: CSV and JSON data sources should be able to read files 
with " +
-"escaped glob metacharacter in the paths") {
-def escape(str: String): String = {
-  """(\[|\]|\{|\})""".r.replaceAllIn(str,

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

2020-09-09 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 837843b  [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing 
in JSON/CSV datasources v1 and v2
837843b is described below

commit 837843bea40eec842c782e3c719f8d81024d8a06
Author: Max Gekk 
AuthorDate: Wed Sep 9 21:16:16 2020 +0900

[SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV 
datasources v1 and v2

### What changes were proposed in this pull request?
In the PR, I propose to move the test `SPARK-32810: CSV and JSON data 
sources should be able to read files with escaped glob metacharacter in the 
paths` from `DataFrameReaderWriterSuite` to `CSVSuite` and to `JsonSuite`. This 
will allow to run the same test in `CSVv1Suite`/`CSVv2Suite` and in 
`JsonV1Suite`/`JsonV2Suite`.

### Why are the changes needed?
To improve test coverage by checking JSON/CSV datasources v1 and v2.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running affected test suites:
```
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.csv.*"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.execution.datasources.json.*"
```

Closes #29690 from MaxGekk/globbing-paths-when-inferring-schema-dsv2-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 9ba2cab..4e93ea3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -2365,6 +2365,19 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.count() == 3)
 }
   }
+
+  test("SPARK-32810: CSV data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test CSV writer / reader without specifying schema
+  val csvTableName = "[abc]"
+  spark.range(3).coalesce(1).write.csv(s"$basePath/$csvTableName")
+  val readback = spark.read
+.csv(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(csvTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row("0"), Row("1"), Row("2")))
+}
+  }
 }
 
 class CSVv1Suite extends CSVSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
index 38b5e77..8eb5432 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
@@ -2706,6 +2706,19 @@ abstract class JsonSuite extends QueryTest with 
SharedSparkSession with TestJson
   checkAnswer(json, Row(null))
 }
   }
+
+  test("SPARK-32810: JSON data source should be able to read files with " +
+"escaped glob metacharacter in the paths") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test JSON writer / reader without specifying schema
+  val jsonTableName = "{def}"
+  spark.range(3).coalesce(1).write.json(s"$basePath/$jsonTableName")
+  val readback = spark.read
+.json(s"$basePath/${"""(\[|\]|\{|\})""".r.replaceAllIn(jsonTableName, 
"""\\$1""")}")
+  assert(readback.collect sameElements Array(Row(0), Row(1), Row(2)))
+}
+  }
 }
 
 class JsonV1Suite extends JsonSuite {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index f48f445..c7ca012 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -1105,27 +1105,4 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
   }
 }
   }
-
-  test("SPARK-32810: CSV and JSON data sources should be able to read files 
with " +
-"escaped glob metacharacter in the paths") {
-def escape(str: String): String = {
-  """(\[|\]|\{|\})""".r.replaceAllIn(str,

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

[spark] branch branch-3.0 updated: [SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/CSV datasources v1 and v2

5 matches

Site Navigation

Mail list logo

Footer information