[SQL] Wildcards in SQLContext.parquetFile?

Yana Kadiyska Wed, 03 Dec 2014 09:27:08 -0800

Hi folks,

I'm wondering if someone has successfully used wildcards with a parquetFile
call?


I saw this thread and it makes me think no?
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCACA1tWLjcF-NtXj=pqpqm3xk4aj0jitxjhmdqbojj_ojybo...@mail.gmail.com%3E

I have a set of parquet files that are partitioned by key. I'd like to
issue a query to read in a subset of the files, based on a directory
wildcard (the wildcard will be a little more specific than * but this is to
show the issue):

This call works fine:

sc.textFile("hdfs:///warehouse/hive/*/*/*.parquet").first
res4: String = PAR1????? L??????? ?\??????? ,????????????
,????????????????a??aL????????0?x????????U???e??



but this doesn't

scala> val parquetFile =
sqlContext.parquetFile(“hdfs:///warehouse/hive/*/*/*.parquet”).first
java.io.FileNotFoundException: File
hdfs://cdh4-14822-nn/warehouse/hive/*/*/*.parquet does not exist

[SQL] Wildcards in SQLContext.parquetFile?

Reply via email to