Hi Jonhy, What is the master you are using with spark-submit?
I ve had this problem before because Spark (different from CLI and boto3) was running in Yarn distributed mode (--master yarn) So the keys were not copied to all the executors' nodes so I have had to submit my spark job as following: $ spark-submit --master yarn-client --conf "spark.executor.extraJavaOptions=-Daws.accessKeyId=ACCESSKEY -Daws.secretKey=SECRETKEY" .... I hope this will help Amjad On Tue, Mar 7, 2017 at 4:21 PM, Jonhy Stack <so.jo...@gmail.com> wrote: > In order to access my S3 bucket i have exported my creds > > export AWS_SECRET_ACCESS_KEY= > export AWS_ACCESSS_ACCESS_KEY= > > I can verify that everything works by doing > > aws s3 ls mybucket > > I can also verify with boto3 that it works in python > > resource = boto3.resource("s3", region_name="us-east-1") > resource.Object("mybucket", "text/text.py") \ > .put(Body=open("text.py", "rb"),ContentType="text/x-py") > > This works and I can see the file in the bucket. > > However when I do this with spark: > > spark_context = SparkContext() > sql_context = SQLContext(spark_context) > spark_context.textFile("s3://mybucket/my/path/*) > > I get a nice > > > Caused by: org.jets3t.service.S3ServiceException: Service Error > > Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error > > Message: <?xml version="1.0" > > encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The > > AWS Access Key Id you provided does not exist in our > > records.</Message><AWSAccessKeyId>[MY_ACCESS_KEY]</AWSAccess > KeyId><RequestId>XXXXX</RequestId><HostId>xxxxxxx</HostId></Error> > > this is how I submit the job locally > > spark-submit --packages com.amazonaws:aws-java-sdk-pom > :1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py > > Why does it works with command line + boto3 but spark is chocking ? >