[ https://issues.apache.org/jira/browse/SPARK-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jianshi Huang resolved SPARK-6533. ---------------------------------- Resolution: Won't Fix Don't use sqlc.parquetFile(...), use sqlc.load(..., "parquet") instead, or the latest Reader/Writer API. Jianshi > Allow using wildcard and other file pattern in Parquet DataSource > ----------------------------------------------------------------- > > Key: SPARK-6533 > URL: https://issues.apache.org/jira/browse/SPARK-6533 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.3.0, 1.3.1 > Reporter: Jianshi Huang > Priority: Critical > > By default, spark.sql.parquet.useDataSourceApi is set to true. And loading > parquet files using file pattern will throw errors. > *\*Wildcard* > {noformat} > scala> val qp = > sqlContext.parquetFile("hdfs://.../source=live/date=2014-06-0*") > 15/03/25 08:43:59 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 15/03/25 08:43:59 WARN hdfs.BlockReaderLocal: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > java.io.FileNotFoundException: File does not exist: > hdfs://.../source=live/date=2014-06-0* > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:276) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:267) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:267) > at > org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:388) > at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522) > {noformat} > And > *\[abc\]* > {noformat} > val qp = sqlContext.parquetFile("hdfs://.../source=live/date=2014-06-0[12]") > java.lang.IllegalArgumentException: Illegal character in path at index 74: > hdfs://.../source=live/date=2014-06-0[12] > at java.net.URI.create(URI.java:859) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:268) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:267) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:267) > at > org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:388) > at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522) > ... 49 elided > Caused by: java.net.URISyntaxException: Illegal character in path at index > 74: hdfs://.../source=live/date=2014-06-0[12] > at java.net.URI$Parser.fail(URI.java:2829) > at java.net.URI$Parser.checkChars(URI.java:3002) > at java.net.URI$Parser.parseHierarchical(URI.java:3086) > at java.net.URI$Parser.parse(URI.java:3034) > at java.net.URI.<init>(URI.java:595) > at java.net.URI.create(URI.java:857) > {noformat} > If spark.sql.parquet.useDataSourceApi is not enabled we cannot have partition > discovery, schema evolution etc, but being able to specify file pattern is > also very important to applications. > Please add this important feature. > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org