[jira] [Resolved] (SPARK-10185) Spark SQL does not handle comma separates paths on Hadoop FileSystem

Davies Liu (JIRA) Sat, 17 Oct 2015 14:57:43 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Davies Liu resolved SPARK-10185.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0

Issue resolved by pull request 8416
[https://github.com/apache/spark/pull/8416]

> Spark SQL does not handle comma separates paths on Hadoop FileSystem
> --------------------------------------------------------------------
>
>                 Key: SPARK-10185
>                 URL: https://issues.apache.org/jira/browse/SPARK-10185
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>            Reporter: koert kuipers
>             Fix For: 1.6.0
>
>
> Spark SQL uses a Map[String, String] for data source settings. As a 
> consequence the only way to pass in multiple paths (something that hadoop 
> file input format supports) is to do pass in a comma separated list. For 
> example:
> sqlContext.format("json").load("dir1,dir22")
> or
> sqlContext.format("json").option("path", "dir1,dir2").load
> However in this case ResolvedDataSource does not handle the comma delimited 
> paths correctly for a HadoopFsRelationProvider. It treats the multiple comma 
> delimited paths as single path.
> For example if i pass in for path "dir1,dir2" it will make dir1 qualified but 
> ignore dir2 (presumably because it simply treats it as part of dir1). If 
> globs are involved then it simply always returns an empty array of paths 
> (because the glob with comma in it doesn’t match anything).
> I think its important to handle commas to pass in multiple paths, since the 
> framework does not provide an alternative. In some cases like parquet the 
> code simply bypasses ResolvedDataSource to support multiple paths but to me 
> this is a workaround that should be discouraged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10185) Spark SQL does not handle comma separates paths on Hadoop FileSystem

Reply via email to