Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53545637
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala
 ---
    @@ -130,7 +141,49 @@ object ResolvedDataSource extends Logging {
           bucketSpec: Option[BucketSpec],
           provider: String,
           options: Map[String, String]): ResolvedDataSource = {
    -    val clazz: Class[_] = lookupDataSource(provider)
    +    // Here, it tries to find out data source by file extensions if the 
`format()` is not called.
    +    // The auto-detection is based on given paths and it recognizes glob 
pattern as well but
    +    // it does not recursively check the sub-paths even if the given paths 
are directories.
    +    // This source detection goes the following steps
    +    //
    +    //   1. Check `provider` and use this if this is not `null`.
    +    //   2. If `provider` is not given, then it tries to detect the source 
types by extension.
    +    //      at this point, if detects only if all the given paths have the 
same extension.
    +    //   3. if it fails to detect, use the datasource given to 
`spark.sql.sources.default`.
    +    //
    +    val paths = {
    --- End diff --
    
    note that i'd move this detection code into a separate class, so we can 
unit test it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to