Spark to utilize HDFS's mmap caching

2014-05-15 Thread Chanwit Kaewkasi
Hi all, Can Spark (0.9.x) utilize the caching feature in HDFS 2.3 via sc.textFile() and other HDFS-related APIs? http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html Best regards, -chanwit -- Chanwit Kaewkasi linkedin.com/in/chanwit

Re: Spark to utilize HDFS's mmap caching

2014-05-13 Thread Marcelo Vanzin
On Mon, May 12, 2014 at 12:14 PM, Matei Zaharia matei.zaha...@gmail.com wrote: That API is something the HDFS administrator uses outside of any application to tell HDFS to cache certain files or directories. But once you’ve done that, any existing HDFS client accesses them directly from the

Re: Spark to utilize HDFS's mmap caching

2014-05-13 Thread Chanwit Kaewkasi
Great to know that! Thank you, Matei. Best regards, -chanwit -- Chanwit Kaewkasi linkedin.com/in/chanwit On Tue, May 13, 2014 at 2:14 AM, Matei Zaharia matei.zaha...@gmail.com wrote: That API is something the HDFS administrator uses outside of any application to tell HDFS to cache certain

Re: Spark to utilize HDFS's mmap caching

2014-05-12 Thread Matei Zaharia
Yes, Spark goes through the standard HDFS client and will automatically benefit from this. Matei On May 8, 2014, at 4:43 AM, Chanwit Kaewkasi chan...@gmail.com wrote: Hi all, Can Spark (0.9.x) utilize the caching feature in HDFS 2.3 via sc.textFile() and other HDFS-related APIs?

Re: Spark to utilize HDFS's mmap caching

2014-05-12 Thread Marcelo Vanzin
Is that true? I believe that API Chanwit is talking about requires explicitly asking for files to be cached in HDFS. Spark automatically benefits from the kernel's page cache (i.e. if some block is in the kernel's page cache, it will be read more quickly). But the explicit HDFS cache is a