Re: How best to install Spark?

Ken Williams Fri, 30 Jan 2015 10:11:47 -0800

Ok - Sam, I'll try this and let you know.

Thanks again for all your help,


Best Regards,

Ken



On 30 January 2015 at 18:09, Samuel Cozannet <samuel.cozan...@canonical.com>
wrote:

> I'll have a look asap, but probably not before Tuesday.
>
> This may be "my guts tell me that" but, if you have the time, try to
> collocate YARN and Spark, that will guarantee you have the YARN_CONF_DIR
> set. I am 90% sure it will fix your problem.
>
> YARN itself will not eat much resources, you should be alright and it may
> allow you to move forward instead of being stuck.
>
> Best,
> Sam
>
>
>
>
>
>
> Best,
> Samuel
>
> --
> Samuel Cozannet
> Cloud, Big Data and IoT Strategy Team
> Business Development - Cloud and ISV Ecosystem
> Changing the Future of Cloud
> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
> Juju <https://jujucharms.com>
> samuel.cozan...@canonical.com
> mob: +33 616 702 389
> skype: samnco
> Twitter: @SaMnCo_23
>
> On Fri, Jan 30, 2015 at 7:01 PM, Ken Williams <ke...@theasi.co> wrote:
>
>> Hi Sam,
>>
>>     Attached is my bundles.yaml file.
>>
>>     Also, there is no file 'directories.sh' on my spark-master/0 machine
>> (see below),
>>
>> ubuntu@ip-172-31-54-245:~$ ls -l /etc/profile.d/
>> total 12
>> -rw-r--r-- 1 root root 1559 Jul 29  2014 Z97-byobu.sh
>> -rwxr-xr-x 1 root root 2691 Oct  6 13:19 Z99-cloud-locale-test.sh
>> -rw-r--r-- 1 root root  663 Apr  7  2014 bash_completion.sh
>> ubuntu@ip-172-31-54-245:~$
>>
>>
>>     Many thanks again your help,
>>
>> Ken
>>
>>
>> On 30 January 2015 at 15:45, Samuel Cozannet <
>> samuel.cozan...@canonical.com> wrote:
>>
>>> Hey,
>>>
>>> can you send the bundle you're using (in the GUI, bottom right, "export"
>>> button should give you a bundles.yaml file, please send that to me, so I
>>> can bootstrap the same environment as you are playing with.
>>>
>>> also
>>> * can you let me know if you have a file /etc/profile.d/directories.sh?
>>> * if yes, can you execute it from your command line, then do the spark
>>> command again, and let me know?
>>>
>>> Thx,
>>> Sam
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Best,
>>> Samuel
>>>
>>> --
>>> Samuel Cozannet
>>> Cloud, Big Data and IoT Strategy Team
>>> Business Development - Cloud and ISV Ecosystem
>>> Changing the Future of Cloud
>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>> Juju <https://jujucharms.com>
>>> samuel.cozan...@canonical.com
>>> mob: +33 616 702 389
>>> skype: samnco
>>> Twitter: @SaMnCo_23
>>>
>>> On Fri, Jan 30, 2015 at 3:46 PM, Ken Williams <ke...@theasi.co> wrote:
>>>
>>>> Ok - I have been able to add the relation using this,
>>>>
>>>>                 juju add-relation yarn-hdfs-master:resourcemanager
>>>> spark-master
>>>>
>>>> But I still cannot see a /etc/hadoop/conf directory on the spark-master
>>>> machine
>>>> so I still get the same error about HADOOP_CONF_DIR and YARN_CONF_DIR
>>>> (below),
>>>>
>>>>
>>>> root@ip-172-31-60-53:~# spark-submit --class
>>>> org.apache.spark.examples.SparkPi     --master yarn-client
>>>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>>>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
>>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>>> classpath
>>>> Exception in thread "main" java.lang.Exception: When running with
>>>> master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in
>>>> the environment.
>>>> at
>>>> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:177)
>>>> at
>>>> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:81)
>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:70)
>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>> root@ip-172-31-60-53:~#
>>>>
>>>> Should there be a /etc/hadoop/conf directory ?
>>>>
>>>> Thanks for any help,
>>>>
>>>> Ken
>>>>
>>>>
>>>> On 30 January 2015 at 12:59, Samuel Cozannet <
>>>> samuel.cozan...@canonical.com> wrote:
>>>>
>>>>> Have you tried without ':master":
>>>>>
>>>>> juju add-relation yarn-hdfs-master:resourcemanager spark-master
>>>>>
>>>>> I think Spark master consumes the relationship but doesn't have to
>>>>> expose its master relationship.
>>>>>
>>>>> Rule of thumb, when a relation is non ambiguous on one of its ends,
>>>>> there is no requirement to specify it when adding it.
>>>>>
>>>>> Another option if this doesn't work is to use the GUI to create the
>>>>> relation. It will give you a dropdown of available relationships between
>>>>> entities.
>>>>>
>>>>> Let me know how it goes,
>>>>> Thx,
>>>>> Sam
>>>>>
>>>>>
>>>>> Best,
>>>>> Samuel
>>>>>
>>>>> --
>>>>> Samuel Cozannet
>>>>> Cloud, Big Data and IoT Strategy Team
>>>>> Business Development - Cloud and ISV Ecosystem
>>>>> Changing the Future of Cloud
>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>>> Juju <https://jujucharms.com>
>>>>> samuel.cozan...@canonical.com
>>>>> mob: +33 616 702 389
>>>>> skype: samnco
>>>>> Twitter: @SaMnCo_23
>>>>>
>>>>> On Fri, Jan 30, 2015 at 1:09 PM, Ken Williams <ke...@theasi.co> wrote:
>>>>>
>>>>>> Hi Sam,
>>>>>>
>>>>>>     I understand what you are saying but when I try to add the 2
>>>>>> relations I get this error,
>>>>>>
>>>>>> root@adminuser-VirtualBox:~# juju add-relation
>>>>>> yarn-hdfs-master:resourcemanager spark-master:master
>>>>>> ERROR no relations found
>>>>>> root@adminuser-VirtualBox:~# juju add-relation
>>>>>> yarn-hdfs-master:namenode spark-master:master
>>>>>> ERROR no relations found
>>>>>>
>>>>>>   Am I adding the relations right ?
>>>>>>
>>>>>>   Attached is my 'juju status' file.
>>>>>>
>>>>>>   Thanks for all your help,
>>>>>>
>>>>>> Ken
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 30 January 2015 at 11:16, Samuel Cozannet <
>>>>>> samuel.cozan...@canonical.com> wrote:
>>>>>>
>>>>>>> Hey Ken,
>>>>>>>
>>>>>>> Yes, you need to create the relationship between the 2 entities to
>>>>>>> they know about each other.
>>>>>>>
>>>>>>> Looking at the list of hooks for the charm
>>>>>>> <https://github.com/Archethought/spark-charm/tree/master/hooks> you
>>>>>>> can see there are 2 hooks named namenode-relation-changed
>>>>>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/namenode-relation-changed>
>>>>>>>  and resourcemanager-relation-changed
>>>>>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/resourcemanager-relation-changed>
>>>>>>>  which
>>>>>>> are related to YARN/Hadoop.
>>>>>>> Looking deeper in the code, you'll notice they reference a function
>>>>>>> found in bdutils.py called "setHadoopEnvVar()", which based on its name
>>>>>>> should set the HADOOP_CONF_DIR.
>>>>>>>
>>>>>>> There are 2 relations, so add both of them.
>>>>>>>
>>>>>>> Note that I didn't test this myself, but I expect this should fix
>>>>>>> the problem. If it doesn't please come back to us...
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Sam
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Samuel
>>>>>>>
>>>>>>> --
>>>>>>> Samuel Cozannet
>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>> Changing the Future of Cloud
>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>> samuel.cozan...@canonical.com
>>>>>>> mob: +33 616 702 389
>>>>>>> skype: samnco
>>>>>>> Twitter: @SaMnCo_23
>>>>>>>
>>>>>>> On Fri, Jan 30, 2015 at 11:51 AM, Ken Williams <ke...@theasi.co>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, Kapil - this works :-)
>>>>>>>>
>>>>>>>> I can now run the SparkPi example successfully.
>>>>>>>> root@ip-172-31-60-53:~# spark-submit --class
>>>>>>>> org.apache.spark.examples.SparkPi 
>>>>>>>> /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>>> Spark assembly has been built with Hive, including Datanucleus jars
>>>>>>>> on classpath
>>>>>>>> 15/01/30 10:29:33 WARN NativeCodeLoader: Unable to load
>>>>>>>> native-hadoop library for your platform... using builtin-java classes 
>>>>>>>> where
>>>>>>>> applicable
>>>>>>>> Pi is roughly 3.14318
>>>>>>>>
>>>>>>>> root@ip-172-31-60-53:~#
>>>>>>>>
>>>>>>>> I'm now trying to run the same example with the spark-submit
>>>>>>>> '--master' option set to either 'yarn-cluster' or 'yarn-client'
>>>>>>>> but I keep getting the same error :
>>>>>>>>
>>>>>>>> root@ip-172-31-60-53:~# spark-submit --class
>>>>>>>> org.apache.spark.examples.SparkPi     --master yarn-client
>>>>>>>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>>>>>>>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar    
>>>>>>>>  10
>>>>>>>> Spark assembly has been built with Hive, including Datanucleus jars
>>>>>>>> on classpath
>>>>>>>> Exception in thread "main" java.lang.Exception: When running with
>>>>>>>> master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be 
>>>>>>>> set in
>>>>>>>> the environment.
>>>>>>>>
>>>>>>>> But on my spark-master/0 machine there is no /etc/hadoop/conf
>>>>>>>> directory.
>>>>>>>> So what should the HADOOP_CONF_DIR or YARN_CONF_DIR value be ?
>>>>>>>> Do I need to add a juju relation between spark-master and ...
>>>>>>>> yarn-hdfs-master to make them aware of each other ?
>>>>>>>>
>>>>>>>> Thanks for any help,
>>>>>>>>
>>>>>>>> Ken
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28 January 2015 at 19:32, Kapil Thangavelu <
>>>>>>>> kapil.thangav...@canonical.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 28, 2015 at 1:54 PM, Ken Williams <ke...@theasi.co>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Sam/Amir,
>>>>>>>>>>
>>>>>>>>>>     I've been able to 'juju ssh spark-master/0' and I
>>>>>>>>>> successfully ran the two
>>>>>>>>>> simple examples for pyspark and spark-shell,
>>>>>>>>>>
>>>>>>>>>>     ./bin/pyspark
>>>>>>>>>>     >>> sc.parallelize(range(1000)).count()
>>>>>>>>>>     1000
>>>>>>>>>>
>>>>>>>>>>     ./bin/spark-shell
>>>>>>>>>>      scala> sc.parallelize(1 to 1000).count()
>>>>>>>>>>     1000
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Now I want to run some of the spark examples in the
>>>>>>>>>> spark-exampes*.jar
>>>>>>>>>> file, which I have on my local machine. How do I copy the jar
>>>>>>>>>> file from
>>>>>>>>>> my local machine to the AWS machine ?
>>>>>>>>>>
>>>>>>>>>> I have tried 'scp' and 'juju scp' from the local command-line but
>>>>>>>>>> both fail (below),
>>>>>>>>>>
>>>>>>>>>> root@adminuser:~# scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>>>>> ubuntu@ip-172-31-59:/tmp
>>>>>>>>>> ssh: Could not resolve hostname ip-172-31-59: Name or service not
>>>>>>>>>> known
>>>>>>>>>> lost connection
>>>>>>>>>> root@adminuser:~# juju scp
>>>>>>>>>> /tmp/spark-examples-1.2.0-hadoop2.4.0.jar ubuntu@ip-172-31-59
>>>>>>>>>> :/tmp
>>>>>>>>>> ERROR exit status 1 (nc: getaddrinfo: Name or service not known)
>>>>>>>>>>
>>>>>>>>>> Any ideas ?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>>>> spark-master/0:/tmp
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ken
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 28 January 2015 at 17:29, Samuel Cozannet <
>>>>>>>>>> samuel.cozan...@canonical.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Glad it worked!
>>>>>>>>>>>
>>>>>>>>>>> I'll make a merge request to the upstream so that it works
>>>>>>>>>>> natively from the store asap.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for catching that!
>>>>>>>>>>> Samuel
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Samuel
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Samuel Cozannet
>>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>>> samuel.cozan...@canonical.com
>>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>>> skype: samnco
>>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 28, 2015 at 6:15 PM, Ken Williams <ke...@theasi.co>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Sam (and Maarten),
>>>>>>>>>>>>
>>>>>>>>>>>>     Cloning Spark 1.2.0 from github seems to have worked!
>>>>>>>>>>>>     I can install the Spark examples afterwards.
>>>>>>>>>>>>
>>>>>>>>>>>>     Thanks for all your help!
>>>>>>>>>>>>
>>>>>>>>>>>>     Yes - Andrew and Angie both say 'hi'  :-)
>>>>>>>>>>>>
>>>>>>>>>>>>     Best Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Ken
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 28 January 2015 at 16:43, Samuel Cozannet <
>>>>>>>>>>>> samuel.cozan...@canonical.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Ken,
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I had a closer look to your Spark problem and found out
>>>>>>>>>>>>> what went wrong.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The charm available on the charmstore is trying to download
>>>>>>>>>>>>> Spark 1.0.2, and the versions available on the Apache website are 
>>>>>>>>>>>>> 1.1.0,
>>>>>>>>>>>>> 1.1.1 and 1.2.0.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is another version of the charm available on GitHub that
>>>>>>>>>>>>> actually will deploy 1.2.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. On your computer, the below folders & get there:
>>>>>>>>>>>>>
>>>>>>>>>>>>> cd ~
>>>>>>>>>>>>> mkdir charms
>>>>>>>>>>>>> mkdir charms/trusty
>>>>>>>>>>>>> cd charms/trusty
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Branch the Spark charm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> git clone https://github.com/Archethought/spark-charm spark
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3. Deploy Spark from local repository
>>>>>>>>>>>>>
>>>>>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark
>>>>>>>>>>>>> spark-master
>>>>>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark
>>>>>>>>>>>>> spark-slave
>>>>>>>>>>>>> juju add-relation spark-master:master spark-slave:slave
>>>>>>>>>>>>>
>>>>>>>>>>>>> Worked on AWS for me just minutes ago. Let me know how it goes
>>>>>>>>>>>>> for you. Note that this version of the charm does NOT install the 
>>>>>>>>>>>>> Spark
>>>>>>>>>>>>> examples. The files are present though, so you'll find them in
>>>>>>>>>>>>> /var/lib/juju/agents/unit-spark-master-0/charm/files/archive
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hope that helps...
>>>>>>>>>>>>> Let me know if it works for you!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Sam
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Samuel
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Samuel Cozannet
>>>>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>>>>> samuel.cozan...@canonical.com
>>>>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>>>>> skype: samnco
>>>>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 28, 2015 at 4:44 PM, Ken Williams <ke...@theasi.co
>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm completely new to juju so any help is appreciated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm trying to create a hadoop/analytics-type platform.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've managed to install the 'data-analytics-with-sql-like'
>>>>>>>>>>>>>> bundle
>>>>>>>>>>>>>> (using this command)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     juju quickstart
>>>>>>>>>>>>>> bundle:data-analytics-with-sql-like/data-analytics-with-sql-like
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is very impressive, and gives me virtually everything
>>>>>>>>>>>>>> that I want
>>>>>>>>>>>>>> (hadoop, hive, etc) - but I also need Spark.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The Spark charm (
>>>>>>>>>>>>>> http://manage.jujucharms.com/~asanjar/trusty/spark)
>>>>>>>>>>>>>> and bundle (
>>>>>>>>>>>>>> http://manage.jujucharms.com/bundle/~asanjar/spark/spark-cluster
>>>>>>>>>>>>>> )
>>>>>>>>>>>>>> however do not seem stable or available and I can't figure
>>>>>>>>>>>>>> out how to install them.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should I just download and install the Spark tar-ball on the
>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>> in my AWS cluster, or is there a better way to do this ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ken
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Juju mailing list
>>>>>>>>>>>>>> Juju@lists.ubuntu.com
>>>>>>>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Juju mailing list
>>>>>>>>>> Juju@lists.ubuntu.com
>>>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Re: How best to install Spark?

Reply via email to