Hello,
It is my understanding that shuffle are written on disk and that they act
as checkpoints.
I wonder if this is true only within a job, or across jobs. Please note
that I use the words job and stage carefully here.
1. can a shuffle created during JobN be used to skip many stages from
Ah, for #3, maybe this is what *rdd.checkpoint *does!
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
Thomas
On Mon, Jun 29, 2015 at 7:12 PM, Thomas Gerber thomas.ger...@radius.com
wrote:
Hello,
It is my understanding that shuffle are written on disk and
stages in the job UI. They are
periodically cleaned up based on available space of the configured
spark.local.dirs paths.
From: Thomas Gerber
Date: Monday, June 29, 2015 at 10:12 PM
To: user
Subject: Shuffle files lifecycle
Hello,
It is my understanding that shuffle are written
spark.local.dirs paths.
From: Thomas Gerber
Date: Monday, June 29, 2015 at 10:12 PM
To: user
Subject: Shuffle files lifecycle
Hello,
It is my understanding that shuffle are written on disk and that they act as
checkpoints.
I wonder if this is true only within a job, or across jobs. Please note that I