Recommended change to core-site.xml template

2015-11-05 Thread Christian
We ended up reading and writing to S3 a ton in our Spark jobs. For this to work, we ended up having to add s3a, and s3 key/secret pairs. We also had to add fs.hdfs.impl to get these things to work. I thought maybe I'd share what we did and it might be worth adding these to the spark conf for out

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
Thanks for sharing this, Christian. What build of Spark are you using? If I understand correctly, if you are using Spark built against Hadoop 2.6+ then additional configs alone won't help because additional libraries also need to be installed .

Re: Recommended change to core-site.xml template

2015-11-05 Thread Shivaram Venkataraman
Thanks for investigating this. The right place to add these is the core-site.xml template we have at https://github.com/amplab/spark-ec2/blob/branch-1.5/templates/root/spark/conf/core-site.xml and/or

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
> I am using both 1.4.1 and 1.5.1. That's the Spark version. I'm wondering what version of Hadoop your Spark is built against. For example, when you download Spark you have to select from a number of packages (under "Choose a package type"), and each is

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of the new feature for instance-profile which greatly helps with this as well. Without the instance-profile, we got it working by copying a .aws/credentials file up to each node. We could easily automate that through the templates.

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
I created the cluster with the following: --hadoop-major-version=2 --spark-version=1.4.1 from: spark-1.5.1-bin-hadoop1 Are you saying there might be different behavior if I download spark-1.5.1-hadoop-2.6 and create my cluster? On Thu, Nov 5, 2015 at 1:28 PM, Christian

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
Spark 1.5.1-hadoop1 On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > I am using both 1.4.1 and 1.5.1. > > That's the Spark version. I'm wondering what version of Hadoop your Spark > is built against. > > For example, when you download Spark >

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
Even with the changes I mentioned above? On Thu, Nov 5, 2015 at 8:10 PM Nicholas Chammas wrote: > Yep, I think if you try spark-1.5.1-hadoop-2.6 you will find that you > cannot access S3, unfortunately. > > On Thu, Nov 5, 2015 at 3:53 PM Christian

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
Oh right. I forgot about the libraries being removed. On Thu, Nov 5, 2015 at 10:35 PM Nicholas Chammas wrote: > I might be mistaken, but yes, even with the changes you mentioned you will > not be able to access S3 if Spark is built against Hadoop 2.6+ unless you >

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
I might be mistaken, but yes, even with the changes you mentioned you will not be able to access S3 if Spark is built against Hadoop 2.6+ unless you install additional libraries. The issue is explained in SPARK-7481 and SPARK-7442

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
Yep, I think if you try spark-1.5.1-hadoop-2.6 you will find that you cannot access S3, unfortunately. On Thu, Nov 5, 2015 at 3:53 PM Christian wrote: > I created the cluster with the following: > > --hadoop-major-version=2 > --spark-version=1.4.1 > > from: