You can use rdd.unpersist. its documented in spark programming guide page under Removing Data section.
Ayan On 21 Apr 2015 13:16, "Wei Wei" <vivie...@gmail.com> wrote: > Hey folks, > > I am trying to load a directory of avro files like this in spark-shell: > > val data = sqlContext.avroFile("hdfs://path/to/dir/*").cache > data.count > > This works fine, but when more files are uploaded to that directory > running these two lines again yields the same result. I suspect there > is some metadata caching in HadoopRDD, thus new files are ignored. > > Does anyone know why this is happening? Is there a way to force reload > the whole directory without restarting spark-shell? > > Thanks. > W > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >