Spark 1.5.1-hadoop1 On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote:
> > I am using both 1.4.1 and 1.5.1. > > That's the Spark version. I'm wondering what version of Hadoop your Spark > is built against. > > For example, when you download Spark > <http://spark.apache.org/downloads.html> you have to select from a number > of packages (under "Choose a package type"), and each is built against a > different version of Hadoop. When Spark is built against Hadoop 2.6+, from > my understanding, you need to install additional libraries > <https://issues.apache.org/jira/browse/SPARK-7481> to access S3. When > Spark is built against Hadoop 2.4 or earlier, you don't need to do this. > > I'm confirming that this is what is happening in your case. > > Nick > > On Thu, Nov 5, 2015 at 12:17 PM Christian <engr...@gmail.com> wrote: > >> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of the >> new feature for instance-profile which greatly helps with this as well. >> Without the instance-profile, we got it working by copying a >> .aws/credentials file up to each node. We could easily automate that >> through the templates. >> >> I don't need any additional libraries. We just need to change the >> core-site.xml >> >> -Christian >> >> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> Thanks for sharing this, Christian. >>> >>> What build of Spark are you using? If I understand correctly, if you are >>> using Spark built against Hadoop 2.6+ then additional configs alone won't >>> help because additional libraries also need to be installed >>> <https://issues.apache.org/jira/browse/SPARK-7481>. >>> >>> Nick >>> >>> On Thu, Nov 5, 2015 at 11:25 AM Christian <engr...@gmail.com> wrote: >>> >>>> We ended up reading and writing to S3 a ton in our Spark jobs. >>>> For this to work, we ended up having to add s3a, and s3 key/secret >>>> pairs. We also had to add fs.hdfs.impl to get these things to work. >>>> >>>> I thought maybe I'd share what we did and it might be worth adding >>>> these to the spark conf for out of the box functionality with S3. >>>> >>>> We created: >>>> >>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml >>>> >>>> We changed the contents form the original, adding in the following: >>>> >>>> <property> >>>> <name>fs.file.impl</name> >>>> <value>org.apache.hadoop.fs.LocalFileSystem</value> >>>> </property> >>>> >>>> <property> >>>> <name>fs.hdfs.impl</name> >>>> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> >>>> </property> >>>> >>>> <property> >>>> <name>fs.s3.impl</name> >>>> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> >>>> </property> >>>> >>>> <property> >>>> <name>fs.s3.awsAccessKeyId</name> >>>> <value>{{aws_access_key_id}}</value> >>>> </property> >>>> >>>> <property> >>>> <name>fs.s3.awsSecretAccessKey</name> >>>> <value>{{aws_secret_access_key}}</value> >>>> </property> >>>> >>>> <property> >>>> <name>fs.s3n.awsAccessKeyId</name> >>>> <value>{{aws_access_key_id}}</value> >>>> </property> >>>> >>>> <property> >>>> <name>fs.s3n.awsSecretAccessKey</name> >>>> <value>{{aws_secret_access_key}}</value> >>>> </property> >>>> >>>> <property> >>>> <name>fs.s3a.awsAccessKeyId</name> >>>> <value>{{aws_access_key_id}}</value> >>>> </property> >>>> >>>> <property> >>>> <name>fs.s3a.awsSecretAccessKey</name> >>>> <value>{{aws_secret_access_key}}</value> >>>> </property> >>>> >>>> This change makes spark on ec2 work out of the box for us. It took us >>>> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop >>>> version 2. >>>> >>>> Best Regards, >>>> Christian >>>> >>> >>