One related question. Could I specify the " com.amazonaws.services.s3.AmazonS3Client" implementation for the " fs.s3.impl" parameter? Let me try that and update this thread with my findings.
On Tue, Oct 14, 2014 at 10:48 AM, Ranga <sra...@gmail.com> wrote: > Thanks for the input. > Yes, I did use the "temporary" access credentials provided by the IAM role > (also detailed in the link you provided). The session token needs to be > specified and I was looking for a way to set that in the header (which > doesn't seem possible). > Looks like a static key/secret is the only option. > > On Tue, Oct 14, 2014 at 10:32 AM, Gen <gen.tan...@gmail.com> wrote: > >> Hi, >> >> If I remember well, spark cannot use the IAMrole credentials to access to >> s3. It use at first the id/key in the environment. If it is null in the >> environment, it use the value in the file core-site.xml. So, IAMrole is >> not >> useful for spark. The same problem happens if you want to use distcp >> command >> in hadoop. >> >> >> Do you use curl http://169.254.169.254/latest/meta-data/iam/... to get >> the >> "temporary" access. If yes, this code cannot use directly by spark, for >> more >> information, you can take a look >> http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html >> <http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html> >> >> >> >> sranga wrote >> > Thanks for the pointers. >> > I verified that the access key-id/secret used are valid. However, the >> > secret may contain "/" at times. The issues I am facing are as follows: >> > >> > - The EC2 instances are setup with an IAMRole () and don't have a >> > static >> > key-id/secret >> > - All of the EC2 instances have access to S3 based on this role (I >> used >> > s3ls and s3cp commands to verify this) >> > - I can get a "temporary" access key-id/secret based on the IAMRole >> but >> > they generally expire in an hour >> > - If Spark is not able to use the IAMRole credentials, I may have to >> > generate a static key-id/secret. This may or may not be possible in >> the >> > environment I am in (from a policy perspective) >> > >> > >> > >> > - Ranga >> > >> > On Tue, Oct 14, 2014 at 4:21 AM, Rafal Kwasny < >> >> > mag@ >> >> > > wrote: >> > >> >> Hi, >> >> keep in mind that you're going to have a bad time if your secret key >> >> contains a "/" >> >> This is due to old and stupid hadoop bug: >> >> https://issues.apache.org/jira/browse/HADOOP-3733 >> >> >> >> Best way is to regenerate the key so it does not include a "/" >> >> >> >> /Raf >> >> >> >> >> >> Akhil Das wrote: >> >> >> >> Try the following: >> >> >> >> 1. Set the access key and secret key in the sparkContext: >> >> >> >> sparkContext.set(" >> >>> >> >>> AWS_ACCESS_KEY_ID",yourAccessKey) >> >> >> >> sparkContext.set(" >> >>> >> >>> AWS_SECRET_ACCESS_KEY",yourSecretKey) >> >> >> >> >> >> 2. Set the access key and secret key in the environment before starting >> >> your application: >> >> >> >> >> >>> >> >> export >> >>> >> >>> AWS_ACCESS_KEY_ID= >> > <your access> >> >> >> >> export >> >>> >> >>> AWS_SECRET_ACCESS_KEY= >> > <your secret> >> > >> >> >> >> >> >> 3. Set the access key and secret key inside the hadoop configurations >> >> >> >> val hadoopConf=sparkContext.hadoopConfiguration; >> >>> >> >>> hadoopConf.set("fs.s3.impl", >> >>>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem") >> >>> >> >>> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey) >> >>> >> >>> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey) >> >>> >> >>> >> >> 4. You can also try: >> >> >> >> val lines = >> >> >> >> s >> >>> parkContext.textFile("s3n://yourAccessKey:yourSecretKey@ >> >>> >> > <yourBucket> >> > /path/") >> >> >> >> >> >> Thanks >> >> Best Regards >> >> >> >> On Mon, Oct 13, 2014 at 11:33 PM, Ranga < >> >> > sranga@ >> >> > > wrote: >> >> >> >>> Hi >> >>> >> >>> I am trying to access files/buckets in S3 and encountering a >> permissions >> >>> issue. The buckets are configured to authenticate using an IAMRole >> >>> provider. >> >>> I have set the KeyId and Secret using environment variables ( >> >>> AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID). However, I am still >> unable >> >>> to access the S3 buckets. >> >>> >> >>> Before setting the access key and secret the error was: >> >>> "java.lang.IllegalArgumentException: >> >>> AWS Access Key ID and Secret Access Key must be specified as the >> >>> username >> >>> or password (respectively) of a s3n URL, or by setting the >> >>> fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties >> >>> (respectively)." >> >>> >> >>> After setting the access key and secret, the error is: "The AWS Access >> >>> Key Id you provided does not exist in our records." >> >>> >> >>> The id/secret being set are the right values. This makes me believe >> that >> >>> something else ("token", etc.) needs to be set as well. >> >>> Any help is appreciated. >> >>> >> >>> >> >>> - Ranga >> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/S3-Bucket-Access-tp16303p16397.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >