Hi Gary, Could you create a Spark JIRA ticket for this so that it doesn't fall through the cracks? Thanks!
On Thu, Nov 6, 2014 at 2:10 PM, Gary Malouf <malouf.g...@gmail.com> wrote: > We have some data that we are exporting from our HDFS cluster to S3 with > some help from Spark. The final RDD command we run is: > > csvData.saveAsTextFile("s3n://data/mess/2014/11/dump-oct-30-to-nov-5-gzip", > classOf[GzipCodec]) > > > We have our 'spark.local.dir' set to our large ephemeral partition on > each slave (on EC2), but with compression on an intermediate format > seems to be written to /tmp/hadoop-root/s3. Is this a bug in Spark or > are we missing a configuration property? > > > It's a problem for us because the root disks on EC2 xls are small (~ 5GB). >