In order to access my S3 bucket i have exported my creds

    export AWS_SECRET_ACCESS_KEY=
    export AWS_ACCESSS_ACCESS_KEY=

I can verify that everything works by doing

    aws s3 ls mybucket

I can also verify with boto3 that it works in python

    resource = boto3.resource("s3", region_name="us-east-1")
    resource.Object("mybucket", "text/text.py") \
                .put(Body=open("text.py", "rb"),ContentType="text/x-py")

This works and I can see the file in the bucket.

However when I do this with spark:

    spark_context = SparkContext()
    sql_context = SQLContext(spark_context)
    spark_context.textFile("s3://mybucket/my/path/*)

I get a nice

    > Caused by: org.jets3t.service.S3ServiceException: Service Error
    > Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error
    > Message: <?xml version="1.0"
    > encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The
    > AWS Access Key Id you provided does not exist in our
    > records.</Message><AWSAccessKeyId>[MY_ACCESS_KEY]</AWSAccess
KeyId><RequestId>XXXXX</RequestId><HostId>xxxxxxx</HostId></Error>

this is how I submit the job locally

spark-submit --packages com.amazonaws:aws-java-sdk-pom
:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py

Why does it works with command line + boto3 but spark is chocking ?

Reply via email to