SaveAsTextFile to S3 bucket
Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket? I have a directory created in S3 bucket: //nexgen-software/dev When I tried to save a RDD as text file in this directory: rdd.saveAsTextFile(s3n://nexgen-software/dev/output); I got following exception at runtime: Exception in thread main org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' - ResponseCode=403, ResponseMessage=Forbidden I have verified /dev has write permission. However, if I grant the bucket //nexgen-software write permission, I don't get exception. But the output is not created under dev. Rather, a different /dev/output directory is created directory in the bucket (//nexgen-software). Is this how saveAsTextFile behalves in S3? Is there anyway I can have output created under a pre-defied directory. Thanks in advance.
Re: SaveAsTextFile to S3 bucket
Your output folder specifies rdd.saveAsTextFile(s3n://nexgen-software/dev/output); So it will try to write to /dev/output which is as expected. If you create the directory /dev/output upfront in your bucket, and try to save it to that (empty) directory, what is the behaviour? On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin kevin.c...@neustar.biz wrote: Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket? I have a directory created in S3 bucket: //nexgen-software/dev When I tried to save a RDD as text file in this directory: rdd.saveAsTextFile(s3n://nexgen-software/dev/output); I got following exception at runtime: Exception in thread main org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' - ResponseCode=403, ResponseMessage=Forbidden I have verified /dev has write permission. However, if I grant the bucket //nexgen-software write permission, I don't get exception. But the output is not created under dev. Rather, a different /dev/output directory is created directory in the bucket (//nexgen-software). Is this how saveAsTextFile behalves in S3? Is there anyway I can have output created under a pre-defied directory. Thanks in advance.
Re: SaveAsTextFile to S3 bucket
When spark saves rdd to a text file, the directory must not exist upfront. It will create a directory and write the data to part- under that directory. In my use case, I create a directory dev in the bucket ://nexgen-software/dev . I expect it creates output direct under dev and a part- under output. But it gave me exception as I only give write permission to dev not the bucket. If I open up write permission to bucket, it worked. But it did not create output directory under dev, it rather creates another dev/output directory under bucket. I just want to know if it is possible to have output directory created under dev directory I created upfront. From: Nick Pentreath nick.pentre...@gmail.commailto:nick.pentre...@gmail.com Date: Monday, January 26, 2015 9:15 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: SaveAsTextFile to S3 bucket Your output folder specifies rdd.saveAsTextFile(s3n://nexgen-software/dev/output); So it will try to write to /dev/output which is as expected. If you create the directory /dev/output upfront in your bucket, and try to save it to that (empty) directory, what is the behaviour? On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin kevin.c...@neustar.bizmailto:kevin.c...@neustar.biz wrote: Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket? I have a directory created in S3 bucket: //nexgen-software/dev When I tried to save a RDD as text file in this directory: rdd.saveAsTextFile(s3n://nexgen-software/dev/output); I got following exception at runtime: Exception in thread main org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' - ResponseCode=403, ResponseMessage=Forbidden I have verified /dev has write permission. However, if I grant the bucket //nexgen-software write permission, I don't get exception. But the output is not created under dev. Rather, a different /dev/output directory is created directory in the bucket (//nexgen-software). Is this how saveAsTextFile behalves in S3? Is there anyway I can have output created under a pre-defied directory. Thanks in advance.
Re: SaveAsTextFile to S3 bucket
By default, the files will be created under the path provided as the argument for saveAsTextFile. This argument is considered as a folder in the bucket and actual files are created in it with the naming convention part-n, where n is the number of output partition. On Mon, Jan 26, 2015 at 9:15 PM, Nick Pentreath nick.pentre...@gmail.com wrote: Your output folder specifies rdd.saveAsTextFile(s3n://nexgen-software/dev/output); So it will try to write to /dev/output which is as expected. If you create the directory /dev/output upfront in your bucket, and try to save it to that (empty) directory, what is the behaviour? On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin kevin.c...@neustar.biz wrote: Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket? I have a directory created in S3 bucket: //nexgen-software/dev When I tried to save a RDD as text file in this directory: rdd.saveAsTextFile(s3n://nexgen-software/dev/output); I got following exception at runtime: Exception in thread main org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' - ResponseCode=403, ResponseMessage=Forbidden I have verified /dev has write permission. However, if I grant the bucket //nexgen-software write permission, I don't get exception. But the output is not created under dev. Rather, a different /dev/output directory is created directory in the bucket (//nexgen-software). Is this how saveAsTextFile behalves in S3? Is there anyway I can have output created under a pre-defied directory. Thanks in advance.