On Mon, May 12, 2014 at 12:14 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
That API is something the HDFS administrator uses outside of any application
to tell HDFS to cache certain files or directories. But once you’ve done
that, any existing HDFS client accesses them directly from the
Great to know that! Thank you, Matei.
Best regards,
-chanwit
--
Chanwit Kaewkasi
linkedin.com/in/chanwit
On Tue, May 13, 2014 at 2:14 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
That API is something the HDFS administrator uses outside of any application
to tell HDFS to cache certain
Yes, Spark goes through the standard HDFS client and will automatically benefit
from this.
Matei
On May 8, 2014, at 4:43 AM, Chanwit Kaewkasi chan...@gmail.com wrote:
Hi all,
Can Spark (0.9.x) utilize the caching feature in HDFS 2.3 via
sc.textFile() and other HDFS-related APIs?
Is that true? I believe that API Chanwit is talking about requires
explicitly asking for files to be cached in HDFS.
Spark automatically benefits from the kernel's page cache (i.e. if
some block is in the kernel's page cache, it will be read more
quickly). But the explicit HDFS cache is a