invalidate caching for hadoopFile input?

Wei Wei Mon, 20 Apr 2015 20:17:06 -0700

Hey folks,

I am trying to load a directory of avro files like this in spark-shell:


val data = sqlContext.avroFile("hdfs://path/to/dir/*").cache
data.count

This works fine, but when more files are uploaded to that directory
running these two lines again yields the same result. I suspect there
is some metadata caching in HadoopRDD, thus new files are ignored.

Does anyone know why this is happening? Is there a way to force reload
the whole directory without restarting spark-shell?

Thanks.
W

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

invalidate caching for hadoopFile input?

Reply via email to