Re: Wrong temp directory when compressing before sending text file to S3

Josh Rosen Wed, 12 Nov 2014 13:16:19 -0800

Hi Gary,

Could you create a Spark JIRA ticket for this so that it doesn't fall
through the cracks?  Thanks!


On Thu, Nov 6, 2014 at 2:10 PM, Gary Malouf <malouf.g...@gmail.com> wrote:

> We have some data that we are exporting from our HDFS cluster to S3 with
> some help from Spark.  The final RDD command we run is:
>
> csvData.saveAsTextFile("s3n://data/mess/2014/11/dump-oct-30-to-nov-5-gzip",
> classOf[GzipCodec])
>
>
> We have our 'spark.local.dir' set to our large ephemeral partition on
> each slave (on EC2), but with compression on an intermediate format
> seems to be written to /tmp/hadoop-root/s3.  Is this a bug in Spark or
> are we missing a configuration property?
>
>
> It's a problem for us because the root disks on EC2 xls are small (~ 5GB).
>

Re: Wrong temp directory when compressing before sending text file to S3

Reply via email to