Hi, I have a program that loads a single avro file using spark SQL, queries it, transforms it and then outputs the data. The file is loaded with:
val records = sqlContext.avroFile(filePath) val data = records.registerTempTable("data") ... Now I want to run it over tens of thousands of Avro files (all with schemas that contain the fields I'm interested in). Is it possible to load multiple avro files recursively from a top-level directory using wildcards? All my avro files are stored under s3://my-bucket/avros/*/DATE/*.avro, and I want to run my task across all of these. If that's not possible, is there some way to load multiple avro files into the same table/RDD so the whole dataset can be processed (and in that case I'd supply paths to each file concretely, but I *really* don't want to have to do that). Thanks