[ https://issues.apache.org/jira/browse/SPARK-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davies Liu resolved SPARK-10185. -------------------------------- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8416 [https://github.com/apache/spark/pull/8416] > Spark SQL does not handle comma separates paths on Hadoop FileSystem > -------------------------------------------------------------------- > > Key: SPARK-10185 > URL: https://issues.apache.org/jira/browse/SPARK-10185 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.1 > Reporter: koert kuipers > Fix For: 1.6.0 > > > Spark SQL uses a Map[String, String] for data source settings. As a > consequence the only way to pass in multiple paths (something that hadoop > file input format supports) is to do pass in a comma separated list. For > example: > sqlContext.format("json").load("dir1,dir22") > or > sqlContext.format("json").option("path", "dir1,dir2").load > However in this case ResolvedDataSource does not handle the comma delimited > paths correctly for a HadoopFsRelationProvider. It treats the multiple comma > delimited paths as single path. > For example if i pass in for path "dir1,dir2" it will make dir1 qualified but > ignore dir2 (presumably because it simply treats it as part of dir1). If > globs are involved then it simply always returns an empty array of paths > (because the glob with comma in it doesn’t match anything). > I think its important to handle commas to pass in multiple paths, since the > framework does not provide an alternative. In some cases like parquet the > code simply bypasses ResolvedDataSource to support multiple paths but to me > this is a workaround that should be discouraged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org