Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r178317653 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -187,6 +189,14 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { "read files of Hive data source directly.") } + // SPARK-23817 Since datasource V2 didn't support reading multiple files yet, + // ORC V2 is only used when loading single file path. + val allPaths = CaseInsensitiveMap(extraOptions.toMap).get("path") ++ paths + val orcV2 = OrcDataSourceV2.satisfy(sparkSession, source, allPaths.toSeq) + if (orcV2.isDefined) { + option("path", allPaths.head) + source = orcV2.get + } --- End diff -- What about bucketed reads? WIll they need a similar change here, or is that lack of support handled elsewhere? (Or am I misunderstanding something about that part of the description - I'm not super familiar with the ORC source)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org