Re: How best to install Spark?

Samuel Cozannet Fri, 30 Jan 2015 03:17:03 -0800

Hey Ken,

Yes, you need to create the relationship between the 2 entities to they
know about each other.


Looking at the list of hooks for the charm
<https://github.com/Archethought/spark-charm/tree/master/hooks> you can see
there are 2 hooks named namenode-relation-changed
<https://github.com/Archethought/spark-charm/blob/master/hooks/namenode-relation-changed>
 and resourcemanager-relation-changed
<https://github.com/Archethought/spark-charm/blob/master/hooks/resourcemanager-relation-changed>
which
are related to YARN/Hadoop.
Looking deeper in the code, you'll notice they reference a function found
in bdutils.py called "setHadoopEnvVar()", which based on its name should
set the HADOOP_CONF_DIR.

There are 2 relations, so add both of them.

Note that I didn't test this myself, but I expect this should fix the
problem. If it doesn't please come back to us...

Thanks!
Sam


Best,
Samuel

--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> / Juju
<https://jujucharms.com>
samuel.cozan...@canonical.com
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23

On Fri, Jan 30, 2015 at 11:51 AM, Ken Williams <ke...@theasi.co> wrote:

>
> Thanks, Kapil - this works :-)
>
> I can now run the SparkPi example successfully.
> root@ip-172-31-60-53:~# spark-submit --class
> org.apache.spark.examples.SparkPi /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
> Spark assembly has been built with Hive, including Datanucleus jars on
> classpath
> 15/01/30 10:29:33 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Pi is roughly 3.14318
>
> root@ip-172-31-60-53:~#
>
> I'm now trying to run the same example with the spark-submit '--master'
> option set to either 'yarn-cluster' or 'yarn-client'
> but I keep getting the same error :
>
> root@ip-172-31-60-53:~# spark-submit --class
> org.apache.spark.examples.SparkPi     --master yarn-client
> --num-executors 3     --driver-memory 1g     --executor-memory 1g
> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
> Spark assembly has been built with Hive, including Datanucleus jars on
> classpath
> Exception in thread "main" java.lang.Exception: When running with master
> 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
> environment.
>
> But on my spark-master/0 machine there is no /etc/hadoop/conf directory.
> So what should the HADOOP_CONF_DIR or YARN_CONF_DIR value be ?
> Do I need to add a juju relation between spark-master and ...
> yarn-hdfs-master to make them aware of each other ?
>
> Thanks for any help,
>
> Ken
>
>
>
>
>
> On 28 January 2015 at 19:32, Kapil Thangavelu <
> kapil.thangav...@canonical.com> wrote:
>
>>
>>
>> On Wed, Jan 28, 2015 at 1:54 PM, Ken Williams <ke...@theasi.co> wrote:
>>
>>>
>>> Hi Sam/Amir,
>>>
>>>     I've been able to 'juju ssh spark-master/0' and I successfully ran
>>> the two
>>> simple examples for pyspark and spark-shell,
>>>
>>>     ./bin/pyspark
>>>     >>> sc.parallelize(range(1000)).count()
>>>     1000
>>>
>>>     ./bin/spark-shell
>>>      scala> sc.parallelize(1 to 1000).count()
>>>     1000
>>>
>>>
>>> Now I want to run some of the spark examples in the spark-exampes*.jar
>>> file, which I have on my local machine. How do I copy the jar file from
>>> my local machine to the AWS machine ?
>>>
>>> I have tried 'scp' and 'juju scp' from the local command-line but both
>>> fail (below),
>>>
>>> root@adminuser:~# scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>> ubuntu@ip-172-31-59:/tmp
>>> ssh: Could not resolve hostname ip-172-31-59: Name or service not known
>>> lost connection
>>> root@adminuser:~# juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>> ubuntu@ip-172-31-59:/tmp
>>> ERROR exit status 1 (nc: getaddrinfo: Name or service not known)
>>>
>>> Any ideas ?
>>>
>>
>> juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar spark-master/0:/tmp
>>
>>>
>>> Ken
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 28 January 2015 at 17:29, Samuel Cozannet <
>>> samuel.cozan...@canonical.com> wrote:
>>>
>>>> Glad it worked!
>>>>
>>>> I'll make a merge request to the upstream so that it works natively
>>>> from the store asap.
>>>>
>>>> Thanks for catching that!
>>>> Samuel
>>>>
>>>> Best,
>>>> Samuel
>>>>
>>>> --
>>>> Samuel Cozannet
>>>> Cloud, Big Data and IoT Strategy Team
>>>> Business Development - Cloud and ISV Ecosystem
>>>> Changing the Future of Cloud
>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>> Juju <https://jujucharms.com>
>>>> samuel.cozan...@canonical.com
>>>> mob: +33 616 702 389
>>>> skype: samnco
>>>> Twitter: @SaMnCo_23
>>>>
>>>> On Wed, Jan 28, 2015 at 6:15 PM, Ken Williams <ke...@theasi.co> wrote:
>>>>
>>>>>
>>>>> Hi Sam (and Maarten),
>>>>>
>>>>>     Cloning Spark 1.2.0 from github seems to have worked!
>>>>>     I can install the Spark examples afterwards.
>>>>>
>>>>>     Thanks for all your help!
>>>>>
>>>>>     Yes - Andrew and Angie both say 'hi'  :-)
>>>>>
>>>>>     Best Regards,
>>>>>
>>>>> Ken
>>>>>
>>>>>
>>>>> On 28 January 2015 at 16:43, Samuel Cozannet <
>>>>> samuel.cozan...@canonical.com> wrote:
>>>>>
>>>>>> Hey Ken,
>>>>>>
>>>>>> So I had a closer look to your Spark problem and found out what went
>>>>>> wrong.
>>>>>>
>>>>>> The charm available on the charmstore is trying to download Spark
>>>>>> 1.0.2, and the versions available on the Apache website are 1.1.0, 1.1.1
>>>>>> and 1.2.0.
>>>>>>
>>>>>> There is another version of the charm available on GitHub that
>>>>>> actually will deploy 1.2.0
>>>>>>
>>>>>> 1. On your computer, the below folders & get there:
>>>>>>
>>>>>> cd ~
>>>>>> mkdir charms
>>>>>> mkdir charms/trusty
>>>>>> cd charms/trusty
>>>>>>
>>>>>> 2. Branch the Spark charm.
>>>>>>
>>>>>> git clone https://github.com/Archethought/spark-charm spark
>>>>>>
>>>>>> 3. Deploy Spark from local repository
>>>>>>
>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-master
>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-slave
>>>>>> juju add-relation spark-master:master spark-slave:slave
>>>>>>
>>>>>> Worked on AWS for me just minutes ago. Let me know how it goes for
>>>>>> you. Note that this version of the charm does NOT install the Spark
>>>>>> examples. The files are present though, so you'll find them in
>>>>>> /var/lib/juju/agents/unit-spark-master-0/charm/files/archive
>>>>>>
>>>>>> Hope that helps...
>>>>>> Let me know if it works for you!
>>>>>>
>>>>>> Best,
>>>>>> Sam
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Samuel
>>>>>>
>>>>>> --
>>>>>> Samuel Cozannet
>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>> Changing the Future of Cloud
>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>>>> Juju <https://jujucharms.com>
>>>>>> samuel.cozan...@canonical.com
>>>>>> mob: +33 616 702 389
>>>>>> skype: samnco
>>>>>> Twitter: @SaMnCo_23
>>>>>>
>>>>>> On Wed, Jan 28, 2015 at 4:44 PM, Ken Williams <ke...@theasi.co>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> I'm completely new to juju so any help is appreciated.
>>>>>>>
>>>>>>> I'm trying to create a hadoop/analytics-type platform.
>>>>>>>
>>>>>>> I've managed to install the 'data-analytics-with-sql-like' bundle
>>>>>>> (using this command)
>>>>>>>
>>>>>>>     juju quickstart
>>>>>>> bundle:data-analytics-with-sql-like/data-analytics-with-sql-like
>>>>>>>
>>>>>>> This is very impressive, and gives me virtually everything that I
>>>>>>> want
>>>>>>> (hadoop, hive, etc) - but I also need Spark.
>>>>>>>
>>>>>>> The Spark charm (http://manage.jujucharms.com/~asanjar/trusty/spark
>>>>>>> )
>>>>>>> and bundle (
>>>>>>> http://manage.jujucharms.com/bundle/~asanjar/spark/spark-cluster)
>>>>>>> however do not seem stable or available and I can't figure out how
>>>>>>> to install them.
>>>>>>>
>>>>>>> Should I just download and install the Spark tar-ball on the nodes
>>>>>>> in my AWS cluster, or is there a better way to do this ?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> Ken
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Juju mailing list
>>>>>>> Juju@lists.ubuntu.com
>>>>>>> Modify settings or unsubscribe at:
>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Juju mailing list
>>> Juju@lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>
>>>
>>
>

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Re: How best to install Spark?

Reply via email to