Re: Recommended change to core-site.xml template

Christian Thu, 05 Nov 2015 12:54:06 -0800

I created the cluster with the following:

--hadoop-major-version=2
--spark-version=1.4.1


from: spark-1.5.1-bin-hadoop1

Are you saying there might be different behavior if I download
spark-1.5.1-hadoop-2.6 and create my cluster?

On Thu, Nov 5, 2015 at 1:28 PM, Christian <engr...@gmail.com> wrote:

> Spark 1.5.1-hadoop1
>
> On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> > I am using both 1.4.1 and 1.5.1.
>>
>> That's the Spark version. I'm wondering what version of Hadoop your Spark
>> is built against.
>>
>> For example, when you download Spark
>> <http://spark.apache.org/downloads.html> you have to select from a
>> number of packages (under "Choose a package type"), and each is built
>> against a different version of Hadoop. When Spark is built against Hadoop
>> 2.6+, from my understanding, you need to install additional libraries
>> <https://issues.apache.org/jira/browse/SPARK-7481> to access S3. When
>> Spark is built against Hadoop 2.4 or earlier, you don't need to do this.
>>
>> I'm confirming that this is what is happening in your case.
>>
>> Nick
>>
>> On Thu, Nov 5, 2015 at 12:17 PM Christian <engr...@gmail.com> wrote:
>>
>>> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of
>>> the new feature for instance-profile which greatly helps with this as well.
>>> Without the instance-profile, we got it working by copying a
>>> .aws/credentials file up to each node. We could easily automate that
>>> through the templates.
>>>
>>> I don't need any additional libraries. We just need to change the
>>> core-site.xml
>>>
>>> -Christian
>>>
>>> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
>>>> Thanks for sharing this, Christian.
>>>>
>>>> What build of Spark are you using? If I understand correctly, if you
>>>> are using Spark built against Hadoop 2.6+ then additional configs alone
>>>> won't help because additional libraries also need to be installed
>>>> <https://issues.apache.org/jira/browse/SPARK-7481>.
>>>>
>>>> Nick
>>>>
>>>> On Thu, Nov 5, 2015 at 11:25 AM Christian <engr...@gmail.com> wrote:
>>>>
>>>>> We ended up reading and writing to S3 a ton in our Spark jobs.
>>>>> For this to work, we ended up having to add s3a, and s3 key/secret
>>>>> pairs. We also had to add fs.hdfs.impl to get these things to work.
>>>>>
>>>>> I thought maybe I'd share what we did and it might be worth adding
>>>>> these to the spark conf for out of the box functionality with S3.
>>>>>
>>>>> We created:
>>>>>
>>>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>>>>
>>>>> We changed the contents form the original, adding in the following:
>>>>>
>>>>>   <property>
>>>>>     <name>fs.file.impl</name>
>>>>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.hdfs.impl</name>
>>>>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3.impl</name>
>>>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3.awsAccessKeyId</name>
>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3.awsSecretAccessKey</name>
>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3n.awsAccessKeyId</name>
>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3n.awsSecretAccessKey</name>
>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3a.awsAccessKeyId</name>
>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3a.awsSecretAccessKey</name>
>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>   </property>
>>>>>
>>>>> This change makes spark on ec2 work out of the box for us. It took us
>>>>> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>>>>> version 2.
>>>>>
>>>>> Best Regards,
>>>>> Christian
>>>>>
>>>>
>>>
>

Re: Recommended change to core-site.xml template

Reply via email to