I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of the
new feature for instance-profile which greatly helps with this as well.
Without the instance-profile, we got it working by copying a
.aws/credentials file up to each node. We could easily automate that
through the templates.

I don't need any additional libraries. We just need to change the
core-site.xml

-Christian

On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> Thanks for sharing this, Christian.
>
> What build of Spark are you using? If I understand correctly, if you are
> using Spark built against Hadoop 2.6+ then additional configs alone won't
> help because additional libraries also need to be installed
> <https://issues.apache.org/jira/browse/SPARK-7481>.
>
> Nick
>
> On Thu, Nov 5, 2015 at 11:25 AM Christian <engr...@gmail.com> wrote:
>
>> We ended up reading and writing to S3 a ton in our Spark jobs.
>> For this to work, we ended up having to add s3a, and s3 key/secret pairs.
>> We also had to add fs.hdfs.impl to get these things to work.
>>
>> I thought maybe I'd share what we did and it might be worth adding these
>> to the spark conf for out of the box functionality with S3.
>>
>> We created:
>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>
>> We changed the contents form the original, adding in the following:
>>
>>   <property>
>>     <name>fs.file.impl</name>
>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.hdfs.impl</name>
>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3.impl</name>
>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3.awsAccessKeyId</name>
>>     <value>{{aws_access_key_id}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3.awsSecretAccessKey</name>
>>     <value>{{aws_secret_access_key}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3n.awsAccessKeyId</name>
>>     <value>{{aws_access_key_id}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3n.awsSecretAccessKey</name>
>>     <value>{{aws_secret_access_key}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3a.awsAccessKeyId</name>
>>     <value>{{aws_access_key_id}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3a.awsSecretAccessKey</name>
>>     <value>{{aws_secret_access_key}}</value>
>>   </property>
>>
>> This change makes spark on ec2 work out of the box for us. It took us
>> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>> version 2.
>>
>> Best Regards,
>> Christian
>>
>

Reply via email to