I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of the new feature for instance-profile which greatly helps with this as well. Without the instance-profile, we got it working by copying a .aws/credentials file up to each node. We could easily automate that through the templates.
I don't need any additional libraries. We just need to change the core-site.xml -Christian On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <nicholas.cham...@gmail.com > wrote: > Thanks for sharing this, Christian. > > What build of Spark are you using? If I understand correctly, if you are > using Spark built against Hadoop 2.6+ then additional configs alone won't > help because additional libraries also need to be installed > <https://issues.apache.org/jira/browse/SPARK-7481>. > > Nick > > On Thu, Nov 5, 2015 at 11:25 AM Christian <engr...@gmail.com> wrote: > >> We ended up reading and writing to S3 a ton in our Spark jobs. >> For this to work, we ended up having to add s3a, and s3 key/secret pairs. >> We also had to add fs.hdfs.impl to get these things to work. >> >> I thought maybe I'd share what we did and it might be worth adding these >> to the spark conf for out of the box functionality with S3. >> >> We created: >> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml >> >> We changed the contents form the original, adding in the following: >> >> <property> >> <name>fs.file.impl</name> >> <value>org.apache.hadoop.fs.LocalFileSystem</value> >> </property> >> >> <property> >> <name>fs.hdfs.impl</name> >> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> >> </property> >> >> <property> >> <name>fs.s3.impl</name> >> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> >> </property> >> >> <property> >> <name>fs.s3.awsAccessKeyId</name> >> <value>{{aws_access_key_id}}</value> >> </property> >> >> <property> >> <name>fs.s3.awsSecretAccessKey</name> >> <value>{{aws_secret_access_key}}</value> >> </property> >> >> <property> >> <name>fs.s3n.awsAccessKeyId</name> >> <value>{{aws_access_key_id}}</value> >> </property> >> >> <property> >> <name>fs.s3n.awsSecretAccessKey</name> >> <value>{{aws_secret_access_key}}</value> >> </property> >> >> <property> >> <name>fs.s3a.awsAccessKeyId</name> >> <value>{{aws_access_key_id}}</value> >> </property> >> >> <property> >> <name>fs.s3a.awsSecretAccessKey</name> >> <value>{{aws_secret_access_key}}</value> >> </property> >> >> This change makes spark on ec2 work out of the box for us. It took us >> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop >> version 2. >> >> Best Regards, >> Christian >> >