[GitHub] [spark] imback82 commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
imback82 commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466634683 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -229,7 +229,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { */ def load(path: String): DataFrame = { // force invocation of `load(...varargs...)` -option("path", path).load(Seq.empty: _*) Review comment: I pushed a new commit that doesn't change the existing code except for the check. Please let me know what you think. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
imback82 commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466634683 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -229,7 +229,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { */ def load(path: String): DataFrame = { // force invocation of `load(...varargs...)` -option("path", path).load(Seq.empty: _*) Review comment: I pushed a new commit that doesn't change existing code except for the check. Please let me know what you think. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
imback82 commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466623120 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ## @@ -224,17 +224,6 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSparkSession with assert(LastOptions.parameters("opt3") == "3") } - test("SPARK-32364: path argument of load function should override all existing options") { Review comment: OK. I put this back and renamed the test. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
imback82 commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466622862 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -229,7 +229,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { */ def load(path: String): DataFrame = { // force invocation of `load(...varargs...)` -option("path", path).load(Seq.empty: _*) Review comment: If we update L257, it will only handle the case when `DataSource.lookupDataSourceV2` returns `Some`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
imback82 commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466125984 ## File path: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ## @@ -826,27 +826,6 @@ class FileBasedDataSourceSuite extends QueryTest } } - test("File table location should include both values of option `path` and `paths`") { Review comment: @cloud-fan This is a test that's failing. Looks like this was a supported scenario and is it OK to deprecate this behavior? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
imback82 commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r466122116 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ## @@ -224,17 +224,6 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSparkSession with assert(LastOptions.parameters("opt3") == "3") } - test("SPARK-32364: path argument of load function should override all existing options") { Review comment: This is no longer relevant, so I am removing it. But I am keeping the case insensitivity check in the new test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs
imback82 commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r464203736 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -245,15 +245,22 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { "read files of Hive data source directly.") } +val updatedPaths = if (paths.length == 1) { Review comment: +1 for your suggestion This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org