Hey folks, I am trying to load a directory of avro files like this in spark-shell:
val data = sqlContext.avroFile("hdfs://path/to/dir/*").cache data.count This works fine, but when more files are uploaded to that directory running these two lines again yields the same result. I suspect there is some metadata caching in HadoopRDD, thus new files are ignored. Does anyone know why this is happening? Is there a way to force reload the whole directory without restarting spark-shell? Thanks. W --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org