Re: local Vs Standalonecluster production deployment

Mich Talebzadeh Sat, 28 May 2016 10:20:43 -0700

OK that is good news. So briefly how do you kick off spark-submit for each
(or sparkConf). In terms of memory/resources allocations.


Now what is the output of

/usr/bin/free



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 28 May 2016 at 18:12, sujeet jog <sujeet....@gmail.com> wrote:

> Yes Mich,
> They are currently emitting the results parallely,
> http://localhost:4040 &  http://localhost:4041 , i also see the
> monitoring from these URL's,
>
>
> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> ok they are submitted but the latter one 14302 is it doing anything?
>>
>> can you check it with jmonitor or the logs created
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 28 May 2016 at 18:03, sujeet jog <sujeet....@gmail.com> wrote:
>>
>>> Thanks Ted,
>>>
>>> Thanks Mich,  yes i see that i can run two applications by submitting
>>> these,  probably Driver + Executor running in a single JVM .  In-Process
>>> Spark.
>>>
>>> wondering if this can be used in production systems,  the reason for me
>>> considering local instead of standalone cluster mode is purely because of
>>> CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>>> Driver & 1 Executor per application,    ( running in a embedded network
>>> switch  )
>>>
>>>
>>> jps output
>>> [root@fos-elastic02 ~]# jps
>>> 14258 SparkSubmit
>>> 14503 Jps
>>> 14302 SparkSubmit
>>> ,
>>>
>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Ok so you want to run all this in local mode. In other words something
>>>> like below
>>>>
>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>
>>>>                 --master local[2] \
>>>>
>>>>                 --driver-memory 2G \
>>>>
>>>>                 --num-executors=1 \
>>>>
>>>>                 --executor-memory=2G \
>>>>
>>>>                 --executor-cores=2 \
>>>>
>>>>
>>>> I am not sure it will work for multiple drivers (app/JVM).  The only
>>>> way you can find out is to do try it running two apps simultaneously. You
>>>> have a number of tools.
>>>>
>>>>
>>>>
>>>>    1. use jps to see the apps and PID
>>>>    2. use jmonitor to see memory/cpu/heap usage for each spark-submit
>>>>    job
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 28 May 2016 at 17:41, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> Sujeet:
>>>>>
>>>>> Please also see:
>>>>>
>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>
>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Hi Sujeet,
>>>>>>
>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>
>>>>>> In Standalone cluster mode Spark allocates resources based on cores.
>>>>>> By default, an application will grab all the cores in the cluster.
>>>>>>
>>>>>> You only have one worker that lives within the driver JVM process
>>>>>> that you start when you start the application with spark-shell or
>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>
>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>> tasks. The worker is tasked to create the executor (in this case there is
>>>>>> only one executor) for the Driver. The Executor runs tasks for the 
>>>>>> Driver.
>>>>>> Only one executor can be allocated on each worker per application. In 
>>>>>> your
>>>>>> case you only have
>>>>>>
>>>>>>
>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well
>>>>>> that is my experience. Yes you can submit more than one spark-submit (the
>>>>>> driver) but they may queue up behind the running one if there is not 
>>>>>> enough
>>>>>> resources.
>>>>>>
>>>>>>
>>>>>> You pointed out that you will be running few applications in parallel
>>>>>> on the same host. The likelihood is that you are using a VM machine for
>>>>>> this purpose and the best option is to try running the first one, Check 
>>>>>> Web
>>>>>> GUI on  4040 to see the progress of this Job. If you start the next JVM
>>>>>> then assuming it is working, it will be using port 4041 and so forth.
>>>>>>
>>>>>>
>>>>>> In actual fact try the command "free" to see how much free memory you
>>>>>> have.
>>>>>>
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet....@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>
>>>>>>> I have 3 applications which i would like to run independently on a
>>>>>>> single machine, i need to run the drivers in the same machine.
>>>>>>>
>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM , 3
>>>>>>> - 4 cores.
>>>>>>>
>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>
>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>
>>>>>>> The issue here is i will require 6 JVM running in parallel, for
>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>
>>>>>>>
>>>>>>> Hence i was looking more towards a local mode deployment mode, would
>>>>>>> like to know if anybody is using local mode where Driver + Executor run 
>>>>>>> in
>>>>>>> a single JVM in production mode.
>>>>>>>
>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>> production base systems.?..
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: local Vs Standalonecluster production deployment

Reply via email to