Re: Spark to utilize HDFS's mmap caching

Marcelo Vanzin Tue, 13 May 2014 17:22:16 -0700

On Mon, May 12, 2014 at 12:14 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> That API is something the HDFS administrator uses outside of any application 
> to tell HDFS to cache certain files or directories. But once you’ve done 
> that, any existing HDFS client accesses them directly from the cache.


Ah, yeah, sure. What I meant is that Spark itself will not, AFAIK, use
that facility for adding files to the cache or anything like that. But
yes, it does benefit from things already cached.


> On May 12, 2014, at 11:10 AM, Marcelo Vanzin <van...@cloudera.com> wrote:
>
>> Is that true? I believe that API Chanwit is talking about requires
>> explicitly asking for files to be cached in HDFS.
>>
>> Spark automatically benefits from the kernel's page cache (i.e. if
>> some block is in the kernel's page cache, it will be read more
>> quickly). But the explicit HDFS cache is a different thing; Spark
>> applications that want to use it would have to explicitly call the
>> respective HDFS APIs.
>>
>> On Sun, May 11, 2014 at 11:04 PM, Matei Zaharia <matei.zaha...@gmail.com> 
>> wrote:
>>> Yes, Spark goes through the standard HDFS client and will automatically 
>>> benefit from this.
>>>
>>> Matei
>>>
>>> On May 8, 2014, at 4:43 AM, Chanwit Kaewkasi <chan...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Can Spark (0.9.x) utilize the caching feature in HDFS 2.3 via
>>>> sc.textFile() and other HDFS-related APIs?
>>>>
>>>> http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html
>>>>
>>>> Best regards,
>>>>
>>>> -chanwit
>>>>
>>>> --
>>>> Chanwit Kaewkasi
>>>> linkedin.com/in/chanwit
>>>
>>
>>
>>
>> --
>> Marcelo
>



-- 
Marcelo

Re: Spark to utilize HDFS's mmap caching

Reply via email to