Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11270#discussion_r53545637 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala --- @@ -130,7 +141,49 @@ object ResolvedDataSource extends Logging { bucketSpec: Option[BucketSpec], provider: String, options: Map[String, String]): ResolvedDataSource = { - val clazz: Class[_] = lookupDataSource(provider) + // Here, it tries to find out data source by file extensions if the `format()` is not called. + // The auto-detection is based on given paths and it recognizes glob pattern as well but + // it does not recursively check the sub-paths even if the given paths are directories. + // This source detection goes the following steps + // + // 1. Check `provider` and use this if this is not `null`. + // 2. If `provider` is not given, then it tries to detect the source types by extension. + // at this point, if detects only if all the given paths have the same extension. + // 3. if it fails to detect, use the datasource given to `spark.sql.sources.default`. + // + val paths = { --- End diff -- note that i'd move this detection code into a separate class, so we can unit test it.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org