SaveAsTextFile to S3 bucket

2015-01-26 Thread Chen, Kevin
Does anyone know if I can save a RDD as a text file to a pre-created directory 
in S3 bucket?

I have a directory created in S3 bucket: //nexgen-software/dev

When I tried to save a RDD as text file in this directory:
rdd.saveAsTextFile(s3n://nexgen-software/dev/output);


I got following exception at runtime:

Exception in thread main org.apache.hadoop.fs.s3.S3Exception: 
org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' - 
ResponseCode=403, ResponseMessage=Forbidden


I have verified /dev has write permission. However, if I grant the bucket 
//nexgen-software write permission, I don't get exception. But the output is 
not created under dev. Rather, a different /dev/output directory is created 
directory in the bucket (//nexgen-software). Is this how saveAsTextFile 
behalves in S3? Is there anyway I can have output created under a pre-defied 
directory.


Thanks in advance.





Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Nick Pentreath
Your output folder specifies

rdd.saveAsTextFile(s3n://nexgen-software/dev/output);

So it will try to write to /dev/output which is as expected. If you create
the directory /dev/output upfront in your bucket, and try to save it to
that (empty) directory, what is the behaviour?

On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin kevin.c...@neustar.biz wrote:

  Does anyone know if I can save a RDD as a text file to a pre-created
 directory in S3 bucket?

  I have a directory created in S3 bucket: //nexgen-software/dev

  When I tried to save a RDD as text file in this directory:
 rdd.saveAsTextFile(s3n://nexgen-software/dev/output);


  I got following exception at runtime:

 Exception in thread main org.apache.hadoop.fs.s3.S3Exception:
 org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' -
 ResponseCode=403, ResponseMessage=Forbidden


  I have verified /dev has write permission. However, if I grant the
 bucket //nexgen-software write permission, I don't get exception. But the
 output is not created under dev. Rather, a different /dev/output directory
 is created directory in the bucket (//nexgen-software). Is this how
 saveAsTextFile behalves in S3? Is there anyway I can have output created
 under a pre-defied directory.


  Thanks in advance.







Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Chen, Kevin
When spark saves rdd to a text file, the directory must not exist upfront. It 
will create a directory and write the data to part- under that directory. 
In my use case, I create a directory dev in the bucket ://nexgen-software/dev . 
I expect it creates output direct under dev and a part- under output. But 
it gave me exception as I only give write permission to dev not the bucket. If 
I open up write permission to bucket, it worked. But it did not create output 
directory under dev, it rather creates another dev/output directory under 
bucket. I just want to know if it is possible to have output directory created 
under dev directory I created upfront.

From: Nick Pentreath nick.pentre...@gmail.commailto:nick.pentre...@gmail.com
Date: Monday, January 26, 2015 9:15 PM
To: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SaveAsTextFile to S3 bucket

Your output folder specifies

rdd.saveAsTextFile(s3n://nexgen-software/dev/output);

So it will try to write to /dev/output which is as expected. If you create the 
directory /dev/output upfront in your bucket, and try to save it to that 
(empty) directory, what is the behaviour?

On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin 
kevin.c...@neustar.bizmailto:kevin.c...@neustar.biz wrote:
Does anyone know if I can save a RDD as a text file to a pre-created directory 
in S3 bucket?

I have a directory created in S3 bucket: //nexgen-software/dev

When I tried to save a RDD as text file in this directory:
rdd.saveAsTextFile(s3n://nexgen-software/dev/output);


I got following exception at runtime:

Exception in thread main org.apache.hadoop.fs.s3.S3Exception: 
org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' - 
ResponseCode=403, ResponseMessage=Forbidden


I have verified /dev has write permission. However, if I grant the bucket 
//nexgen-software write permission, I don't get exception. But the output is 
not created under dev. Rather, a different /dev/output directory is created 
directory in the bucket (//nexgen-software). Is this how saveAsTextFile 
behalves in S3? Is there anyway I can have output created under a pre-defied 
directory.


Thanks in advance.






Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Ashish Rangole
By default, the files will be created under the path provided as the
argument for saveAsTextFile. This argument is considered as a folder in the
bucket and actual files are created in it with the naming convention
part-n, where n is the number of output partition.

On Mon, Jan 26, 2015 at 9:15 PM, Nick Pentreath nick.pentre...@gmail.com
wrote:

 Your output folder specifies

 rdd.saveAsTextFile(s3n://nexgen-software/dev/output);

 So it will try to write to /dev/output which is as expected. If you create
 the directory /dev/output upfront in your bucket, and try to save it to
 that (empty) directory, what is the behaviour?

 On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin kevin.c...@neustar.biz
 wrote:

  Does anyone know if I can save a RDD as a text file to a pre-created
 directory in S3 bucket?

  I have a directory created in S3 bucket: //nexgen-software/dev

  When I tried to save a RDD as text file in this directory:
 rdd.saveAsTextFile(s3n://nexgen-software/dev/output);


  I got following exception at runtime:

 Exception in thread main org.apache.hadoop.fs.s3.S3Exception:
 org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' -
 ResponseCode=403, ResponseMessage=Forbidden


  I have verified /dev has write permission. However, if I grant the
 bucket //nexgen-software write permission, I don't get exception. But the
 output is not created under dev. Rather, a different /dev/output directory
 is created directory in the bucket (//nexgen-software). Is this how
 saveAsTextFile behalves in S3? Is there anyway I can have output created
 under a pre-defied directory.


  Thanks in advance.