[ https://issues.apache.org/jira/browse/SPARK-43408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Faiz Halde updated SPARK-43408: ------------------------------- Description: Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself FWIW, I am talking specifically in the context of the Dataframe API. The StorageLevel allowed in my case would only be DISK_ONLY i.e. I am not looking to speed up by caching data in memory was: Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself FWIW, I am talking specifically in the context of the Dataframe API > Spark caching in the context of a single job > -------------------------------------------- > > Key: SPARK-43408 > URL: https://issues.apache.org/jira/browse/SPARK-43408 > Project: Spark > Issue Type: Question > Components: Shuffle > Affects Versions: 3.3.1 > Reporter: Faiz Halde > Priority: Trivial > > Does caching benefit a spark job with only a single action in it? Spark IIRC > already optimizes shuffles by persisting them onto the disk > I am unable to find a counter-example where caching would benefit a job with > a single action. In every case I can think of, the shuffle checkpoint acts as > a good enough caching mechanism in itself > FWIW, I am talking specifically in the context of the Dataframe API. The > StorageLevel allowed in my case would only be DISK_ONLY i.e. I am not looking > to speed up by caching data in memory -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org