[ 
https://issues.apache.org/jira/browse/SPARK-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606157#comment-14606157
 ] 

Sean Owen commented on SPARK-6830:
----------------------------------

Is this valid? For example, consider an RDD from a file that's being written 
to. count() would return larger values each time it is called. Caching it would 
change this behavior. Of course, caching the RDD would also mean the count was 
then fixed, but these are semantically different.

> Memoize frequently queried vals in RDD, such as numPartitions, count etc.
> -------------------------------------------------------------------------
>
>                 Key: SPARK-6830
>                 URL: https://issues.apache.org/jira/browse/SPARK-6830
>             Project: Spark
>          Issue Type: Improvement
>          Components: SparkR
>            Reporter: Shivaram Venkataraman
>            Priority: Minor
>              Labels: Starter
>
> We should memoize frequently queried vals in RDD, such as numPartitions, 
> count etc.
> While using SparkR in RStudio, the `count` function seems to be called 
> frequently by the IDE – I think this is to show some stats about variables in 
> the workspace etc. but this is not great in SparkR as we trigger a job every 
> time count is called.
> Memoization would help in this case, but we should also see if there is some 
> better way to interact with RStudio.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to