You should actually be able to get to the underlying filesystem from your
SparkContext:
String originalFs = sparkContext.hadoopConfiguration().get("fs.defaultFS");
and then you could just use that:
String checkpointPath = String.format("%s/%s/", originalFs,
checkpointDirectory);
Still haven't found a --conf option.
Regarding a temporary HDFS checkpoint directory, it looks like when using
--master yarn, spark-submit supplies a SPARK_YARN_STAGING_DIR environment
variable. Thus, one could do the following when creating a SparkSession:
val checkpointPath = new
Hi,
I need to set a checkpoint directory as I'm starting to use GraphFrames.
(Also, occasionally my regular DataFrame lineages get too long so it'd be
nice to use checkpointing to squash the lineage.)
I don't actually need this checkpointed data to live beyond the life of the
job, however. I'm