Re: S3 Bucket Access

Gen Tue, 14 Oct 2014 10:33:08 -0700

Hi,

If I remember well, spark cannot use the IAMrole credentials to access to
s3. It use at first the id/key in the environment. If it is null in the
environment, it use the value in the file core-site.xml.  So, IAMrole is not
useful for spark. The same problem happens if you want to use distcp command
in hadoop.



Do you use curl http://169.254.169.254/latest/meta-data/iam/... to get the
"temporary" access. If yes, this code cannot use directly by spark, for more
information, you can take a look 
http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html
<http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html>  



sranga wrote
> Thanks for the pointers.
> I verified that the access key-id/secret used are valid. However, the
> secret may contain "/" at times. The issues I am facing are as follows:
> 
>    - The EC2 instances are setup with an IAMRole () and don't have a
> static
>    key-id/secret
>    - All of the EC2 instances have access to S3 based on this role (I used
>    s3ls and s3cp commands to verify this)
>    - I can get a "temporary" access key-id/secret based on the IAMRole but
>    they generally expire in an hour
>    - If Spark is not able to use the IAMRole credentials, I may have to
>    generate a static key-id/secret. This may or may not be possible in the
>    environment I am in (from a policy perspective)
> 
> 
> 
> - Ranga
> 
> On Tue, Oct 14, 2014 at 4:21 AM, Rafal Kwasny &lt;

> mag@

> &gt; wrote:
> 
>> Hi,
>> keep in mind that you're going to have a bad time if your secret key
>> contains a "/"
>> This is due to old and stupid hadoop bug:
>> https://issues.apache.org/jira/browse/HADOOP-3733
>>
>> Best way is to regenerate the key so it does not include a "/"
>>
>> /Raf
>>
>>
>> Akhil Das wrote:
>>
>> Try the following:
>>
>> 1. Set the access key and secret key in the sparkContext:
>>
>> sparkContext.set("
>>> 
>>> AWS_ACCESS_KEY_ID",yourAccessKey)
>>
>> sparkContext.set("
>>> 
>>> AWS_SECRET_ACCESS_KEY",yourSecretKey)
>>
>>
>> 2. Set the access key and secret key in the environment before starting
>> your application:
>>
>> 
>>>
>> export
>>> 
>>> AWS_ACCESS_KEY_ID=
> <your access>
>>
>> export
>>> 
>>> AWS_SECRET_ACCESS_KEY=
> <your secret>
> 
>>
>>
>> 3. Set the access key and secret key inside the hadoop configurations
>>
>> val hadoopConf=sparkContext.hadoopConfiguration;
>>>
>>> hadoopConf.set("fs.s3.impl",
>>>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>>>
>>> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey)
>>>
>>> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey)
>>>
>>>
>> 4. You can also try:
>>
>> val lines =
>>
>> s
>>> parkContext.textFile("s3n://yourAccessKey:yourSecretKey@
>>> 
> <yourBucket>
> /path/")
>>
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Oct 13, 2014 at 11:33 PM, Ranga &lt;

> sranga@

> &gt; wrote:
>>
>>> Hi
>>>
>>> I am trying to access files/buckets in S3 and encountering a permissions
>>> issue. The buckets are configured to authenticate using an IAMRole
>>> provider.
>>> I have set the KeyId and Secret using environment variables (
>>> AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID). However, I am still unable
>>> to access the S3 buckets.
>>>
>>> Before setting the access key and secret the error was:
>>> "java.lang.IllegalArgumentException:
>>> AWS Access Key ID and Secret Access Key must be specified as the
>>> username
>>> or password (respectively) of a s3n URL, or by setting the
>>> fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties
>>> (respectively)."
>>>
>>> After setting the access key and secret, the error is: "The AWS Access
>>> Key Id you provided does not exist in our records."
>>>
>>> The id/secret being set are the right values. This makes me believe that
>>> something else ("token", etc.) needs to be set as well.
>>> Any help is appreciated.
>>>
>>>
>>> - Ranga
>>>
>>
>>
>>





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/S3-Bucket-Access-tp16303p16397.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: S3 Bucket Access

Reply via email to