[GitHub] [spark] cloud-fan commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
cloud-fan commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466165358 ## File path: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ## @@ -826,27 +826,6 @@ class FileBasedDataSourceSuite extends QueryTest } } - test("File table location should include both values of option `path` and `paths`") { Review comment: It's for file source v2 only, I think it's OK to drop it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
cloud-fan commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466165046 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -229,7 +229,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { */ def load(path: String): DataFrame = { // force invocation of `load(...varargs...)` -option("path", path).load(Seq.empty: _*) Review comment: To keep the behavior, I think we should still generate the `path` option when only one path is given in the parameters. We need to update L257. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
cloud-fan commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466164524 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ## @@ -224,17 +224,6 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSparkSession with assert(LastOptions.parameters("opt3") == "3") } - test("SPARK-32364: path argument of load function should override all existing options") { Review comment: We can just call `load` without parameters, this test is to verify that the later option overwrites the earlier ones. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
cloud-fan commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r464203078 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -245,15 +245,22 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { "read files of Hive data source directly.") } +val updatedPaths = if (paths.length == 1) { Review comment: If we are worried about silent result changing, we can fail if there are `path` option and `load` is called with path parameters. The error message should ask users to either remove the `path` options, we put it into the `load` parameters. ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -245,15 +245,22 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { "read files of Hive data source directly.") } +val updatedPaths = if (paths.length == 1) { Review comment: If we are worried about silent result changing, we can fail if there are `path` option and `load` is called with path parameters. The error message should ask users to either remove the `path` options, or put it into the `load` parameters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
cloud-fan commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r464202756 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -245,15 +245,22 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { "read files of Hive data source directly.") } +val updatedPaths = if (paths.length == 1) { Review comment: I think the most intuitive behavior is to drop the `path` option if `load` is called with path parameters, no matter it's one or more paths. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org