Re: How best to install Spark?

Ken Williams Fri, 30 Jan 2015 06:47:40 -0800

Ok - I have been able to add the relation using this,

                juju add-relation yarn-hdfs-master:resourcemanager
spark-master


But I still cannot see a /etc/hadoop/conf directory on the spark-master
machine
so I still get the same error about HADOOP_CONF_DIR and YARN_CONF_DIR
(below),


root@ip-172-31-60-53:~# spark-submit --class
org.apache.spark.examples.SparkPi     --master yarn-client
--num-executors 3     --driver-memory 1g     --executor-memory 1g
--executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Exception in thread "main" java.lang.Exception: When running with master
'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
environment.
at
org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:177)
at
org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:81)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:70)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
root@ip-172-31-60-53:~#

Should there be a /etc/hadoop/conf directory ?

Thanks for any help,

Ken


On 30 January 2015 at 12:59, Samuel Cozannet <samuel.cozan...@canonical.com>
wrote:

> Have you tried without ':master":
>
> juju add-relation yarn-hdfs-master:resourcemanager spark-master
>
> I think Spark master consumes the relationship but doesn't have to expose
> its master relationship.
>
> Rule of thumb, when a relation is non ambiguous on one of its ends, there
> is no requirement to specify it when adding it.
>
> Another option if this doesn't work is to use the GUI to create the
> relation. It will give you a dropdown of available relationships between
> entities.
>
> Let me know how it goes,
> Thx,
> Sam
>
>
> Best,
> Samuel
>
> --
> Samuel Cozannet
> Cloud, Big Data and IoT Strategy Team
> Business Development - Cloud and ISV Ecosystem
> Changing the Future of Cloud
> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
> Juju <https://jujucharms.com>
> samuel.cozan...@canonical.com
> mob: +33 616 702 389
> skype: samnco
> Twitter: @SaMnCo_23
>
> On Fri, Jan 30, 2015 at 1:09 PM, Ken Williams <ke...@theasi.co> wrote:
>
>> Hi Sam,
>>
>>     I understand what you are saying but when I try to add the 2
>> relations I get this error,
>>
>> root@adminuser-VirtualBox:~# juju add-relation
>> yarn-hdfs-master:resourcemanager spark-master:master
>> ERROR no relations found
>> root@adminuser-VirtualBox:~# juju add-relation yarn-hdfs-master:namenode
>> spark-master:master
>> ERROR no relations found
>>
>>   Am I adding the relations right ?
>>
>>   Attached is my 'juju status' file.
>>
>>   Thanks for all your help,
>>
>> Ken
>>
>>
>>
>>
>>
>> On 30 January 2015 at 11:16, Samuel Cozannet <
>> samuel.cozan...@canonical.com> wrote:
>>
>>> Hey Ken,
>>>
>>> Yes, you need to create the relationship between the 2 entities to they
>>> know about each other.
>>>
>>> Looking at the list of hooks for the charm
>>> <https://github.com/Archethought/spark-charm/tree/master/hooks> you can
>>> see there are 2 hooks named namenode-relation-changed
>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/namenode-relation-changed>
>>>  and resourcemanager-relation-changed
>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/resourcemanager-relation-changed>
>>>  which
>>> are related to YARN/Hadoop.
>>> Looking deeper in the code, you'll notice they reference a function
>>> found in bdutils.py called "setHadoopEnvVar()", which based on its name
>>> should set the HADOOP_CONF_DIR.
>>>
>>> There are 2 relations, so add both of them.
>>>
>>> Note that I didn't test this myself, but I expect this should fix the
>>> problem. If it doesn't please come back to us...
>>>
>>> Thanks!
>>> Sam
>>>
>>>
>>> Best,
>>> Samuel
>>>
>>> --
>>> Samuel Cozannet
>>> Cloud, Big Data and IoT Strategy Team
>>> Business Development - Cloud and ISV Ecosystem
>>> Changing the Future of Cloud
>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>> Juju <https://jujucharms.com>
>>> samuel.cozan...@canonical.com
>>> mob: +33 616 702 389
>>> skype: samnco
>>> Twitter: @SaMnCo_23
>>>
>>> On Fri, Jan 30, 2015 at 11:51 AM, Ken Williams <ke...@theasi.co> wrote:
>>>
>>>>
>>>> Thanks, Kapil - this works :-)
>>>>
>>>> I can now run the SparkPi example successfully.
>>>> root@ip-172-31-60-53:~# spark-submit --class
>>>> org.apache.spark.examples.SparkPi /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>>> classpath
>>>> 15/01/30 10:29:33 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> Pi is roughly 3.14318
>>>>
>>>> root@ip-172-31-60-53:~#
>>>>
>>>> I'm now trying to run the same example with the spark-submit '--master'
>>>> option set to either 'yarn-cluster' or 'yarn-client'
>>>> but I keep getting the same error :
>>>>
>>>> root@ip-172-31-60-53:~# spark-submit --class
>>>> org.apache.spark.examples.SparkPi     --master yarn-client
>>>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>>>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
>>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>>> classpath
>>>> Exception in thread "main" java.lang.Exception: When running with
>>>> master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in
>>>> the environment.
>>>>
>>>> But on my spark-master/0 machine there is no /etc/hadoop/conf directory.
>>>> So what should the HADOOP_CONF_DIR or YARN_CONF_DIR value be ?
>>>> Do I need to add a juju relation between spark-master and ...
>>>> yarn-hdfs-master to make them aware of each other ?
>>>>
>>>> Thanks for any help,
>>>>
>>>> Ken
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 28 January 2015 at 19:32, Kapil Thangavelu <
>>>> kapil.thangav...@canonical.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 28, 2015 at 1:54 PM, Ken Williams <ke...@theasi.co> wrote:
>>>>>
>>>>>>
>>>>>> Hi Sam/Amir,
>>>>>>
>>>>>>     I've been able to 'juju ssh spark-master/0' and I successfully
>>>>>> ran the two
>>>>>> simple examples for pyspark and spark-shell,
>>>>>>
>>>>>>     ./bin/pyspark
>>>>>>     >>> sc.parallelize(range(1000)).count()
>>>>>>     1000
>>>>>>
>>>>>>     ./bin/spark-shell
>>>>>>      scala> sc.parallelize(1 to 1000).count()
>>>>>>     1000
>>>>>>
>>>>>>
>>>>>> Now I want to run some of the spark examples in the
>>>>>> spark-exampes*.jar
>>>>>> file, which I have on my local machine. How do I copy the jar file
>>>>>> from
>>>>>> my local machine to the AWS machine ?
>>>>>>
>>>>>> I have tried 'scp' and 'juju scp' from the local command-line but
>>>>>> both fail (below),
>>>>>>
>>>>>> root@adminuser:~# scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>> ubuntu@ip-172-31-59:/tmp
>>>>>> ssh: Could not resolve hostname ip-172-31-59: Name or service not
>>>>>> known
>>>>>> lost connection
>>>>>> root@adminuser:~# juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>> ubuntu@ip-172-31-59:/tmp
>>>>>> ERROR exit status 1 (nc: getaddrinfo: Name or service not known)
>>>>>>
>>>>>> Any ideas ?
>>>>>>
>>>>>
>>>>> juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar spark-master/0:/tmp
>>>>>
>>>>>
>>>>>>
>>>>>> Ken
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 January 2015 at 17:29, Samuel Cozannet <
>>>>>> samuel.cozan...@canonical.com> wrote:
>>>>>>
>>>>>>> Glad it worked!
>>>>>>>
>>>>>>> I'll make a merge request to the upstream so that it works natively
>>>>>>> from the store asap.
>>>>>>>
>>>>>>> Thanks for catching that!
>>>>>>> Samuel
>>>>>>>
>>>>>>> Best,
>>>>>>> Samuel
>>>>>>>
>>>>>>> --
>>>>>>> Samuel Cozannet
>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>> Changing the Future of Cloud
>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>> samuel.cozan...@canonical.com
>>>>>>> mob: +33 616 702 389
>>>>>>> skype: samnco
>>>>>>> Twitter: @SaMnCo_23
>>>>>>>
>>>>>>> On Wed, Jan 28, 2015 at 6:15 PM, Ken Williams <ke...@theasi.co>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Sam (and Maarten),
>>>>>>>>
>>>>>>>>     Cloning Spark 1.2.0 from github seems to have worked!
>>>>>>>>     I can install the Spark examples afterwards.
>>>>>>>>
>>>>>>>>     Thanks for all your help!
>>>>>>>>
>>>>>>>>     Yes - Andrew and Angie both say 'hi'  :-)
>>>>>>>>
>>>>>>>>     Best Regards,
>>>>>>>>
>>>>>>>> Ken
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28 January 2015 at 16:43, Samuel Cozannet <
>>>>>>>> samuel.cozan...@canonical.com> wrote:
>>>>>>>>
>>>>>>>>> Hey Ken,
>>>>>>>>>
>>>>>>>>> So I had a closer look to your Spark problem and found out what
>>>>>>>>> went wrong.
>>>>>>>>>
>>>>>>>>> The charm available on the charmstore is trying to download Spark
>>>>>>>>> 1.0.2, and the versions available on the Apache website are 1.1.0, 
>>>>>>>>> 1.1.1
>>>>>>>>> and 1.2.0.
>>>>>>>>>
>>>>>>>>> There is another version of the charm available on GitHub that
>>>>>>>>> actually will deploy 1.2.0
>>>>>>>>>
>>>>>>>>> 1. On your computer, the below folders & get there:
>>>>>>>>>
>>>>>>>>> cd ~
>>>>>>>>> mkdir charms
>>>>>>>>> mkdir charms/trusty
>>>>>>>>> cd charms/trusty
>>>>>>>>>
>>>>>>>>> 2. Branch the Spark charm.
>>>>>>>>>
>>>>>>>>> git clone https://github.com/Archethought/spark-charm spark
>>>>>>>>>
>>>>>>>>> 3. Deploy Spark from local repository
>>>>>>>>>
>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-master
>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-slave
>>>>>>>>> juju add-relation spark-master:master spark-slave:slave
>>>>>>>>>
>>>>>>>>> Worked on AWS for me just minutes ago. Let me know how it goes for
>>>>>>>>> you. Note that this version of the charm does NOT install the Spark
>>>>>>>>> examples. The files are present though, so you'll find them in
>>>>>>>>> /var/lib/juju/agents/unit-spark-master-0/charm/files/archive
>>>>>>>>>
>>>>>>>>> Hope that helps...
>>>>>>>>> Let me know if it works for you!
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Sam
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Samuel
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Samuel Cozannet
>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>> Changing the Future of Cloud
>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>> samuel.cozan...@canonical.com
>>>>>>>>> mob: +33 616 702 389
>>>>>>>>> skype: samnco
>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>
>>>>>>>>> On Wed, Jan 28, 2015 at 4:44 PM, Ken Williams <ke...@theasi.co>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>> I'm completely new to juju so any help is appreciated.
>>>>>>>>>>
>>>>>>>>>> I'm trying to create a hadoop/analytics-type platform.
>>>>>>>>>>
>>>>>>>>>> I've managed to install the 'data-analytics-with-sql-like' bundle
>>>>>>>>>> (using this command)
>>>>>>>>>>
>>>>>>>>>>     juju quickstart
>>>>>>>>>> bundle:data-analytics-with-sql-like/data-analytics-with-sql-like
>>>>>>>>>>
>>>>>>>>>> This is very impressive, and gives me virtually everything that I
>>>>>>>>>> want
>>>>>>>>>> (hadoop, hive, etc) - but I also need Spark.
>>>>>>>>>>
>>>>>>>>>> The Spark charm (
>>>>>>>>>> http://manage.jujucharms.com/~asanjar/trusty/spark)
>>>>>>>>>> and bundle (
>>>>>>>>>> http://manage.jujucharms.com/bundle/~asanjar/spark/spark-cluster)
>>>>>>>>>> however do not seem stable or available and I can't figure out
>>>>>>>>>> how to install them.
>>>>>>>>>>
>>>>>>>>>> Should I just download and install the Spark tar-ball on the nodes
>>>>>>>>>> in my AWS cluster, or is there a better way to do this ?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> Ken
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Juju mailing list
>>>>>>>>>> Juju@lists.ubuntu.com
>>>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Juju mailing list
>>>>>> Juju@lists.ubuntu.com
>>>>>> Modify settings or unsubscribe at:
>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Re: How best to install Spark?

Reply via email to