Re: rdd cache name

charles li Wed, 02 Mar 2016 18:55:48 -0800

thanks a lot, Xinh, that's very helpful for me.

On Thu, Mar 3, 2016 at 12:54 AM, Xinh Huynh <xinh.hu...@gmail.com> wrote:


> Hi Charles,
>
> You can set the RDD name before using it. Just do before caching:
> (Scala) myRdd.setName("Charles RDD")
> (Python) myRdd.setName('Charles RDD')
> Reference: PySpark doc:
> http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD
>
> Fraction cached is the percentage of partitions of an RDD that are cached.
> From the code:
> (rdd.numCachedPartitions * 100.0 / rdd.numPartitions)
> Code is here:
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/storage/StoragePage.scala
> Fraction cached will be less than 100% if there isn't enough room for all
> cached RDDs to fit in the cache. If it's a problem, you may want to
> increase your in-memory cache size or cache off-heap or to disk.
>
> Xinh
>
> On Wed, Mar 2, 2016 at 1:48 AM, charles li <charles.up...@gmail.com>
> wrote:
>
>> hi, there, I feel a little confused about the *cache* in spark.
>>
>> first, is there any way to *customize the cached RDD name*, it's not
>> convenient for me when looking at the storage page, there are the kind of
>> RDD in the RDD Name column, I hope to make it as my customized name, kinds
>> of 'rdd 1', 'rrd of map', 'rdd of groupby' and so on.
>>
>> second, can some one tell me what exactly the '*Fraction Cached*' mean
>> under the hood?
>>
>> great thanks
>>
>>
>>
>> 
>>
>> --
>> *--------------------------------------*
>> a spark lover, a quant, a developer and a good man.
>>
>> http://github.com/litaotao
>>
>
>


-- 
*--------------------------------------*
a spark lover, a quant, a developer and a good man.

http://github.com/litaotao

Re: rdd cache name

Reply via email to