Re: invalidate caching for hadoopFile input?

ayan guha Mon, 20 Apr 2015 20:38:22 -0700

You can use rdd.unpersist. its documented in spark programming guide page
under Removing Data section.


Ayan
On 21 Apr 2015 13:16, "Wei Wei" <vivie...@gmail.com> wrote:

> Hey folks,
>
> I am trying to load a directory of avro files like this in spark-shell:
>
> val data = sqlContext.avroFile("hdfs://path/to/dir/*").cache
> data.count
>
> This works fine, but when more files are uploaded to that directory
> running these two lines again yields the same result. I suspect there
> is some metadata caching in HadoopRDD, thus new files are ignored.
>
> Does anyone know why this is happening? Is there a way to force reload
> the whole directory without restarting spark-shell?
>
> Thanks.
> W
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: invalidate caching for hadoopFile input?

Reply via email to