Re: Temp checkpoint directory for EMR (S3 or HDFS)

2017-05-30 Thread Asher Krim
You should actually be able to get to the underlying filesystem from your SparkContext: String originalFs = sparkContext.hadoopConfiguration().get("fs.defaultFS"); and then you could just use that: String checkpointPath = String.format("%s/%s/", originalFs, checkpointDirectory);

Re: Temp checkpoint directory for EMR (S3 or HDFS)

2017-05-30 Thread Everett Anderson
Still haven't found a --conf option. Regarding a temporary HDFS checkpoint directory, it looks like when using --master yarn, spark-submit supplies a SPARK_YARN_STAGING_DIR environment variable. Thus, one could do the following when creating a SparkSession: val checkpointPath = new

Temp checkpoint directory for EMR (S3 or HDFS)

2017-05-26 Thread Everett Anderson
Hi, I need to set a checkpoint directory as I'm starting to use GraphFrames. (Also, occasionally my regular DataFrame lineages get too long so it'd be nice to use checkpointing to squash the lineage.) I don't actually need this checkpointed data to live beyond the life of the job, however. I'm