Hi folks, I'm wondering if someone has successfully used wildcards with a parquetFile call?
I saw this thread and it makes me think no? http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCACA1tWLjcF-NtXj=pqpqm3xk4aj0jitxjhmdqbojj_ojybo...@mail.gmail.com%3E I have a set of parquet files that are partitioned by key. I'd like to issue a query to read in a subset of the files, based on a directory wildcard (the wildcard will be a little more specific than * but this is to show the issue): This call works fine: sc.textFile("hdfs:///warehouse/hive/*/*/*.parquet").first res4: String = PAR1????? L??????? ?\??????? ,???????????? ,????????????????a??aL????????0?x????????U???e?? but this doesn't scala> val parquetFile = sqlContext.parquetFile(“hdfs:///warehouse/hive/*/*/*.parquet”).first java.io.FileNotFoundException: File hdfs://cdh4-14822-nn/warehouse/hive/*/*/*.parquet does not exist