thanks a lot, Xinh, that's very helpful for me. On Thu, Mar 3, 2016 at 12:54 AM, Xinh Huynh <xinh.hu...@gmail.com> wrote:
> Hi Charles, > > You can set the RDD name before using it. Just do before caching: > (Scala) myRdd.setName("Charles RDD") > (Python) myRdd.setName('Charles RDD') > Reference: PySpark doc: > http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD > > Fraction cached is the percentage of partitions of an RDD that are cached. > From the code: > (rdd.numCachedPartitions * 100.0 / rdd.numPartitions) > Code is here: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/storage/StoragePage.scala > Fraction cached will be less than 100% if there isn't enough room for all > cached RDDs to fit in the cache. If it's a problem, you may want to > increase your in-memory cache size or cache off-heap or to disk. > > Xinh > > On Wed, Mar 2, 2016 at 1:48 AM, charles li <charles.up...@gmail.com> > wrote: > >> hi, there, I feel a little confused about the *cache* in spark. >> >> first, is there any way to *customize the cached RDD name*, it's not >> convenient for me when looking at the storage page, there are the kind of >> RDD in the RDD Name column, I hope to make it as my customized name, kinds >> of 'rdd 1', 'rrd of map', 'rdd of groupby' and so on. >> >> second, can some one tell me what exactly the '*Fraction Cached*' mean >> under the hood? >> >> great thanks >> >> >> >> >> >> -- >> *--------------------------------------* >> a spark lover, a quant, a developer and a good man. >> >> http://github.com/litaotao >> > > -- *--------------------------------------* a spark lover, a quant, a developer and a good man. http://github.com/litaotao