Re: How best to install Spark?

Ken Williams Fri, 30 Jan 2015 02:52:26 -0800

Thanks, Kapil - this works :-)

I can now run the SparkPi example successfully.
root@ip-172-31-60-53:~# spark-submit --class
org.apache.spark.examples.SparkPi /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
15/01/30 10:29:33 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Pi is roughly 3.14318


root@ip-172-31-60-53:~#

I'm now trying to run the same example with the spark-submit '--master'
option set to either 'yarn-cluster' or 'yarn-client'
but I keep getting the same error :

root@ip-172-31-60-53:~# spark-submit --class
org.apache.spark.examples.SparkPi     --master yarn-client
--num-executors 3     --driver-memory 1g     --executor-memory 1g
--executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Exception in thread "main" java.lang.Exception: When running with master
'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
environment.

But on my spark-master/0 machine there is no /etc/hadoop/conf directory.
So what should the HADOOP_CONF_DIR or YARN_CONF_DIR value be ?
Do I need to add a juju relation between spark-master and ...
yarn-hdfs-master to make them aware of each other ?

Thanks for any help,

Ken





On 28 January 2015 at 19:32, Kapil Thangavelu <
kapil.thangav...@canonical.com> wrote:

>
>
> On Wed, Jan 28, 2015 at 1:54 PM, Ken Williams <ke...@theasi.co> wrote:
>
>>
>> Hi Sam/Amir,
>>
>>     I've been able to 'juju ssh spark-master/0' and I successfully ran
>> the two
>> simple examples for pyspark and spark-shell,
>>
>>     ./bin/pyspark
>>     >>> sc.parallelize(range(1000)).count()
>>     1000
>>
>>     ./bin/spark-shell
>>      scala> sc.parallelize(1 to 1000).count()
>>     1000
>>
>>
>> Now I want to run some of the spark examples in the spark-exampes*.jar
>> file, which I have on my local machine. How do I copy the jar file from
>> my local machine to the AWS machine ?
>>
>> I have tried 'scp' and 'juju scp' from the local command-line but both
>> fail (below),
>>
>> root@adminuser:~# scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>> ubuntu@ip-172-31-59:/tmp
>> ssh: Could not resolve hostname ip-172-31-59: Name or service not known
>> lost connection
>> root@adminuser:~# juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>> ubuntu@ip-172-31-59:/tmp
>> ERROR exit status 1 (nc: getaddrinfo: Name or service not known)
>>
>> Any ideas ?
>>
>
> juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar spark-master/0:/tmp
>
>>
>> Ken
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 28 January 2015 at 17:29, Samuel Cozannet <
>> samuel.cozan...@canonical.com> wrote:
>>
>>> Glad it worked!
>>>
>>> I'll make a merge request to the upstream so that it works natively from
>>> the store asap.
>>>
>>> Thanks for catching that!
>>> Samuel
>>>
>>> Best,
>>> Samuel
>>>
>>> --
>>> Samuel Cozannet
>>> Cloud, Big Data and IoT Strategy Team
>>> Business Development - Cloud and ISV Ecosystem
>>> Changing the Future of Cloud
>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>> Juju <https://jujucharms.com>
>>> samuel.cozan...@canonical.com
>>> mob: +33 616 702 389
>>> skype: samnco
>>> Twitter: @SaMnCo_23
>>>
>>> On Wed, Jan 28, 2015 at 6:15 PM, Ken Williams <ke...@theasi.co> wrote:
>>>
>>>>
>>>> Hi Sam (and Maarten),
>>>>
>>>>     Cloning Spark 1.2.0 from github seems to have worked!
>>>>     I can install the Spark examples afterwards.
>>>>
>>>>     Thanks for all your help!
>>>>
>>>>     Yes - Andrew and Angie both say 'hi'  :-)
>>>>
>>>>     Best Regards,
>>>>
>>>> Ken
>>>>
>>>>
>>>> On 28 January 2015 at 16:43, Samuel Cozannet <
>>>> samuel.cozan...@canonical.com> wrote:
>>>>
>>>>> Hey Ken,
>>>>>
>>>>> So I had a closer look to your Spark problem and found out what went
>>>>> wrong.
>>>>>
>>>>> The charm available on the charmstore is trying to download Spark
>>>>> 1.0.2, and the versions available on the Apache website are 1.1.0, 1.1.1
>>>>> and 1.2.0.
>>>>>
>>>>> There is another version of the charm available on GitHub that
>>>>> actually will deploy 1.2.0
>>>>>
>>>>> 1. On your computer, the below folders & get there:
>>>>>
>>>>> cd ~
>>>>> mkdir charms
>>>>> mkdir charms/trusty
>>>>> cd charms/trusty
>>>>>
>>>>> 2. Branch the Spark charm.
>>>>>
>>>>> git clone https://github.com/Archethought/spark-charm spark
>>>>>
>>>>> 3. Deploy Spark from local repository
>>>>>
>>>>> juju deploy --repository=~/charms local:trusty/spark spark-master
>>>>> juju deploy --repository=~/charms local:trusty/spark spark-slave
>>>>> juju add-relation spark-master:master spark-slave:slave
>>>>>
>>>>> Worked on AWS for me just minutes ago. Let me know how it goes for
>>>>> you. Note that this version of the charm does NOT install the Spark
>>>>> examples. The files are present though, so you'll find them in
>>>>> /var/lib/juju/agents/unit-spark-master-0/charm/files/archive
>>>>>
>>>>> Hope that helps...
>>>>> Let me know if it works for you!
>>>>>
>>>>> Best,
>>>>> Sam
>>>>>
>>>>>
>>>>> Best,
>>>>> Samuel
>>>>>
>>>>> --
>>>>> Samuel Cozannet
>>>>> Cloud, Big Data and IoT Strategy Team
>>>>> Business Development - Cloud and ISV Ecosystem
>>>>> Changing the Future of Cloud
>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>>> Juju <https://jujucharms.com>
>>>>> samuel.cozan...@canonical.com
>>>>> mob: +33 616 702 389
>>>>> skype: samnco
>>>>> Twitter: @SaMnCo_23
>>>>>
>>>>> On Wed, Jan 28, 2015 at 4:44 PM, Ken Williams <ke...@theasi.co> wrote:
>>>>>
>>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> I'm completely new to juju so any help is appreciated.
>>>>>>
>>>>>> I'm trying to create a hadoop/analytics-type platform.
>>>>>>
>>>>>> I've managed to install the 'data-analytics-with-sql-like' bundle
>>>>>> (using this command)
>>>>>>
>>>>>>     juju quickstart
>>>>>> bundle:data-analytics-with-sql-like/data-analytics-with-sql-like
>>>>>>
>>>>>> This is very impressive, and gives me virtually everything that I
>>>>>> want
>>>>>> (hadoop, hive, etc) - but I also need Spark.
>>>>>>
>>>>>> The Spark charm (http://manage.jujucharms.com/~asanjar/trusty/spark)
>>>>>> and bundle (
>>>>>> http://manage.jujucharms.com/bundle/~asanjar/spark/spark-cluster)
>>>>>> however do not seem stable or available and I can't figure out how to
>>>>>> install them.
>>>>>>
>>>>>> Should I just download and install the Spark tar-ball on the nodes
>>>>>> in my AWS cluster, or is there a better way to do this ?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Ken
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Juju mailing list
>>>>>> Juju@lists.ubuntu.com
>>>>>> Modify settings or unsubscribe at:
>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> --
>> Juju mailing list
>> Juju@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju
>>
>>
>

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Re: How best to install Spark?

Reply via email to