Re: How best to install Spark?

Samuel Cozannet Fri, 30 Jan 2015 07:46:17 -0800

Hey,

can you send the bundle you're using (in the GUI, bottom right, "export"
button should give you a bundles.yaml file, please send that to me, so I
can bootstrap the same environment as you are playing with.


also
* can you let me know if you have a file /etc/profile.d/directories.sh?
* if yes, can you execute it from your command line, then do the spark
command again, and let me know?

Thx,
Sam









Best,
Samuel

--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> / Juju
<https://jujucharms.com>
samuel.cozan...@canonical.com
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23

On Fri, Jan 30, 2015 at 3:46 PM, Ken Williams <ke...@theasi.co> wrote:

> Ok - I have been able to add the relation using this,
>
>                 juju add-relation yarn-hdfs-master:resourcemanager
> spark-master
>
> But I still cannot see a /etc/hadoop/conf directory on the spark-master
> machine
> so I still get the same error about HADOOP_CONF_DIR and YARN_CONF_DIR
> (below),
>
>
> root@ip-172-31-60-53:~# spark-submit --class
> org.apache.spark.examples.SparkPi     --master yarn-client
> --num-executors 3     --driver-memory 1g     --executor-memory 1g
> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
> Spark assembly has been built with Hive, including Datanucleus jars on
> classpath
> Exception in thread "main" java.lang.Exception: When running with master
> 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
> environment.
> at
> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:177)
> at
> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:81)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:70)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> root@ip-172-31-60-53:~#
>
> Should there be a /etc/hadoop/conf directory ?
>
> Thanks for any help,
>
> Ken
>
>
> On 30 January 2015 at 12:59, Samuel Cozannet <
> samuel.cozan...@canonical.com> wrote:
>
>> Have you tried without ':master":
>>
>> juju add-relation yarn-hdfs-master:resourcemanager spark-master
>>
>> I think Spark master consumes the relationship but doesn't have to expose
>> its master relationship.
>>
>> Rule of thumb, when a relation is non ambiguous on one of its ends, there
>> is no requirement to specify it when adding it.
>>
>> Another option if this doesn't work is to use the GUI to create the
>> relation. It will give you a dropdown of available relationships between
>> entities.
>>
>> Let me know how it goes,
>> Thx,
>> Sam
>>
>>
>> Best,
>> Samuel
>>
>> --
>> Samuel Cozannet
>> Cloud, Big Data and IoT Strategy Team
>> Business Development - Cloud and ISV Ecosystem
>> Changing the Future of Cloud
>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>> Juju <https://jujucharms.com>
>> samuel.cozan...@canonical.com
>> mob: +33 616 702 389
>> skype: samnco
>> Twitter: @SaMnCo_23
>>
>> On Fri, Jan 30, 2015 at 1:09 PM, Ken Williams <ke...@theasi.co> wrote:
>>
>>> Hi Sam,
>>>
>>>     I understand what you are saying but when I try to add the 2
>>> relations I get this error,
>>>
>>> root@adminuser-VirtualBox:~# juju add-relation
>>> yarn-hdfs-master:resourcemanager spark-master:master
>>> ERROR no relations found
>>> root@adminuser-VirtualBox:~# juju add-relation
>>> yarn-hdfs-master:namenode spark-master:master
>>> ERROR no relations found
>>>
>>>   Am I adding the relations right ?
>>>
>>>   Attached is my 'juju status' file.
>>>
>>>   Thanks for all your help,
>>>
>>> Ken
>>>
>>>
>>>
>>>
>>>
>>> On 30 January 2015 at 11:16, Samuel Cozannet <
>>> samuel.cozan...@canonical.com> wrote:
>>>
>>>> Hey Ken,
>>>>
>>>> Yes, you need to create the relationship between the 2 entities to they
>>>> know about each other.
>>>>
>>>> Looking at the list of hooks for the charm
>>>> <https://github.com/Archethought/spark-charm/tree/master/hooks> you
>>>> can see there are 2 hooks named namenode-relation-changed
>>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/namenode-relation-changed>
>>>>  and resourcemanager-relation-changed
>>>> <https://github.com/Archethought/spark-charm/blob/master/hooks/resourcemanager-relation-changed>
>>>>  which
>>>> are related to YARN/Hadoop.
>>>> Looking deeper in the code, you'll notice they reference a function
>>>> found in bdutils.py called "setHadoopEnvVar()", which based on its name
>>>> should set the HADOOP_CONF_DIR.
>>>>
>>>> There are 2 relations, so add both of them.
>>>>
>>>> Note that I didn't test this myself, but I expect this should fix the
>>>> problem. If it doesn't please come back to us...
>>>>
>>>> Thanks!
>>>> Sam
>>>>
>>>>
>>>> Best,
>>>> Samuel
>>>>
>>>> --
>>>> Samuel Cozannet
>>>> Cloud, Big Data and IoT Strategy Team
>>>> Business Development - Cloud and ISV Ecosystem
>>>> Changing the Future of Cloud
>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>> Juju <https://jujucharms.com>
>>>> samuel.cozan...@canonical.com
>>>> mob: +33 616 702 389
>>>> skype: samnco
>>>> Twitter: @SaMnCo_23
>>>>
>>>> On Fri, Jan 30, 2015 at 11:51 AM, Ken Williams <ke...@theasi.co> wrote:
>>>>
>>>>>
>>>>> Thanks, Kapil - this works :-)
>>>>>
>>>>> I can now run the SparkPi example successfully.
>>>>> root@ip-172-31-60-53:~# spark-submit --class
>>>>> org.apache.spark.examples.SparkPi 
>>>>> /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>>>> classpath
>>>>> 15/01/30 10:29:33 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>> library for your platform... using builtin-java classes where applicable
>>>>> Pi is roughly 3.14318
>>>>>
>>>>> root@ip-172-31-60-53:~#
>>>>>
>>>>> I'm now trying to run the same example with the spark-submit
>>>>> '--master' option set to either 'yarn-cluster' or 'yarn-client'
>>>>> but I keep getting the same error :
>>>>>
>>>>> root@ip-172-31-60-53:~# spark-submit --class
>>>>> org.apache.spark.examples.SparkPi     --master yarn-client
>>>>> --num-executors 3     --driver-memory 1g     --executor-memory 1g
>>>>> --executor-cores 1     --queue thequeue     lib/spark-examples*.jar     10
>>>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>>>> classpath
>>>>> Exception in thread "main" java.lang.Exception: When running with
>>>>> master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set 
>>>>> in
>>>>> the environment.
>>>>>
>>>>> But on my spark-master/0 machine there is no /etc/hadoop/conf
>>>>> directory.
>>>>> So what should the HADOOP_CONF_DIR or YARN_CONF_DIR value be ?
>>>>> Do I need to add a juju relation between spark-master and ...
>>>>> yarn-hdfs-master to make them aware of each other ?
>>>>>
>>>>> Thanks for any help,
>>>>>
>>>>> Ken
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 28 January 2015 at 19:32, Kapil Thangavelu <
>>>>> kapil.thangav...@canonical.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 28, 2015 at 1:54 PM, Ken Williams <ke...@theasi.co>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Sam/Amir,
>>>>>>>
>>>>>>>     I've been able to 'juju ssh spark-master/0' and I successfully
>>>>>>> ran the two
>>>>>>> simple examples for pyspark and spark-shell,
>>>>>>>
>>>>>>>     ./bin/pyspark
>>>>>>>     >>> sc.parallelize(range(1000)).count()
>>>>>>>     1000
>>>>>>>
>>>>>>>     ./bin/spark-shell
>>>>>>>      scala> sc.parallelize(1 to 1000).count()
>>>>>>>     1000
>>>>>>>
>>>>>>>
>>>>>>> Now I want to run some of the spark examples in the
>>>>>>> spark-exampes*.jar
>>>>>>> file, which I have on my local machine. How do I copy the jar file
>>>>>>> from
>>>>>>> my local machine to the AWS machine ?
>>>>>>>
>>>>>>> I have tried 'scp' and 'juju scp' from the local command-line but
>>>>>>> both fail (below),
>>>>>>>
>>>>>>> root@adminuser:~# scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>>> ubuntu@ip-172-31-59:/tmp
>>>>>>> ssh: Could not resolve hostname ip-172-31-59: Name or service not
>>>>>>> known
>>>>>>> lost connection
>>>>>>> root@adminuser:~# juju scp
>>>>>>> /tmp/spark-examples-1.2.0-hadoop2.4.0.jar ubuntu@ip-172-31-59:/tmp
>>>>>>> ERROR exit status 1 (nc: getaddrinfo: Name or service not known)
>>>>>>>
>>>>>>> Any ideas ?
>>>>>>>
>>>>>>
>>>>>> juju scp /tmp/spark-examples-1.2.0-hadoop2.4.0.jar
>>>>>> spark-master/0:/tmp
>>>>>>
>>>>>>>
>>>>>>> Ken
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 28 January 2015 at 17:29, Samuel Cozannet <
>>>>>>> samuel.cozan...@canonical.com> wrote:
>>>>>>>
>>>>>>>> Glad it worked!
>>>>>>>>
>>>>>>>> I'll make a merge request to the upstream so that it works natively
>>>>>>>> from the store asap.
>>>>>>>>
>>>>>>>> Thanks for catching that!
>>>>>>>> Samuel
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Samuel
>>>>>>>>
>>>>>>>> --
>>>>>>>> Samuel Cozannet
>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>> Changing the Future of Cloud
>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>> samuel.cozan...@canonical.com
>>>>>>>> mob: +33 616 702 389
>>>>>>>> skype: samnco
>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>
>>>>>>>> On Wed, Jan 28, 2015 at 6:15 PM, Ken Williams <ke...@theasi.co>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Sam (and Maarten),
>>>>>>>>>
>>>>>>>>>     Cloning Spark 1.2.0 from github seems to have worked!
>>>>>>>>>     I can install the Spark examples afterwards.
>>>>>>>>>
>>>>>>>>>     Thanks for all your help!
>>>>>>>>>
>>>>>>>>>     Yes - Andrew and Angie both say 'hi'  :-)
>>>>>>>>>
>>>>>>>>>     Best Regards,
>>>>>>>>>
>>>>>>>>> Ken
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28 January 2015 at 16:43, Samuel Cozannet <
>>>>>>>>> samuel.cozan...@canonical.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Ken,
>>>>>>>>>>
>>>>>>>>>> So I had a closer look to your Spark problem and found out what
>>>>>>>>>> went wrong.
>>>>>>>>>>
>>>>>>>>>> The charm available on the charmstore is trying to download Spark
>>>>>>>>>> 1.0.2, and the versions available on the Apache website are 1.1.0, 
>>>>>>>>>> 1.1.1
>>>>>>>>>> and 1.2.0.
>>>>>>>>>>
>>>>>>>>>> There is another version of the charm available on GitHub that
>>>>>>>>>> actually will deploy 1.2.0
>>>>>>>>>>
>>>>>>>>>> 1. On your computer, the below folders & get there:
>>>>>>>>>>
>>>>>>>>>> cd ~
>>>>>>>>>> mkdir charms
>>>>>>>>>> mkdir charms/trusty
>>>>>>>>>> cd charms/trusty
>>>>>>>>>>
>>>>>>>>>> 2. Branch the Spark charm.
>>>>>>>>>>
>>>>>>>>>> git clone https://github.com/Archethought/spark-charm spark
>>>>>>>>>>
>>>>>>>>>> 3. Deploy Spark from local repository
>>>>>>>>>>
>>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-master
>>>>>>>>>> juju deploy --repository=~/charms local:trusty/spark spark-slave
>>>>>>>>>> juju add-relation spark-master:master spark-slave:slave
>>>>>>>>>>
>>>>>>>>>> Worked on AWS for me just minutes ago. Let me know how it goes
>>>>>>>>>> for you. Note that this version of the charm does NOT install the 
>>>>>>>>>> Spark
>>>>>>>>>> examples. The files are present though, so you'll find them in
>>>>>>>>>> /var/lib/juju/agents/unit-spark-master-0/charm/files/archive
>>>>>>>>>>
>>>>>>>>>> Hope that helps...
>>>>>>>>>> Let me know if it works for you!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Sam
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Samuel
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Samuel Cozannet
>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>> samuel.cozan...@canonical.com
>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>> skype: samnco
>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 28, 2015 at 4:44 PM, Ken Williams <ke...@theasi.co>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi folks,
>>>>>>>>>>>
>>>>>>>>>>> I'm completely new to juju so any help is appreciated.
>>>>>>>>>>>
>>>>>>>>>>> I'm trying to create a hadoop/analytics-type platform.
>>>>>>>>>>>
>>>>>>>>>>> I've managed to install the 'data-analytics-with-sql-like'
>>>>>>>>>>> bundle
>>>>>>>>>>> (using this command)
>>>>>>>>>>>
>>>>>>>>>>>     juju quickstart
>>>>>>>>>>> bundle:data-analytics-with-sql-like/data-analytics-with-sql-like
>>>>>>>>>>>
>>>>>>>>>>> This is very impressive, and gives me virtually everything that
>>>>>>>>>>> I want
>>>>>>>>>>> (hadoop, hive, etc) - but I also need Spark.
>>>>>>>>>>>
>>>>>>>>>>> The Spark charm (
>>>>>>>>>>> http://manage.jujucharms.com/~asanjar/trusty/spark)
>>>>>>>>>>> and bundle (
>>>>>>>>>>> http://manage.jujucharms.com/bundle/~asanjar/spark/spark-cluster
>>>>>>>>>>> )
>>>>>>>>>>> however do not seem stable or available and I can't figure out
>>>>>>>>>>> how to install them.
>>>>>>>>>>>
>>>>>>>>>>> Should I just download and install the Spark tar-ball on the
>>>>>>>>>>> nodes
>>>>>>>>>>> in my AWS cluster, or is there a better way to do this ?
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> Ken
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Juju mailing list
>>>>>>>>>>> Juju@lists.ubuntu.com
>>>>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Juju mailing list
>>>>>>> Juju@lists.ubuntu.com
>>>>>>> Modify settings or unsubscribe at:
>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Re: How best to install Spark?

Reply via email to