Re: local Vs Standalonecluster production deployment

Chris Fregly Sat, 28 May 2016 12:09:07 -0700

btw, here's a handy Spark Config Generator by Ewan Higgs in in Gent,
Belgium:


code:  https://github.com/ehiggs/spark-config-gen

demo:  http://ehiggs.github.io/spark-config-gen/

my recent tweet on this:
https://twitter.com/cfregly/status/736631633927753729

On Sat, May 28, 2016 at 10:50 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> hang on. Free is telling me you have 8GB of memory. I was under the
> impression that you had 4GB of RAM :)
>
> So with no app you have 3.99GB free ~ 4GB
>  1st app takes 428MB of memory and the second is 425MB so pretty lean apps
>
> The question is the apps that I run take 2-3GB each. But your mileage
> varies. If you end up with free memory running these minute apps and no
> sudden spike in memory/cpu usage then as long as they run and finish within
> SLA you should be OK whichever environment you run. May be you apps do not
> require that amount of memory.
>
> I don't think there is clear cut answer to NOT to use local mode in prod.
> Others may have different opinions on this.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 May 2016 at 18:37, sujeet jog <sujeet....@gmail.com> wrote:
>
>> ran these from muliple bash shell for now, probably a multi threaded
>> python script would do ,  memory and resource allocations are seen as
>> submitted parameters
>>
>>
>> *say before running any applications . *
>>
>> [root@fos-elastic02 ~]# /usr/bin/free
>>              total       used       free     shared    buffers     cached
>> Mem:       8058568    *4066296 *   3992272      10172     141368
>>  1549520
>> -/+ buffers/cache:    2375408    5683160
>> Swap:      8290300     108672    8181628
>>
>>
>> *only 1 App : *
>>
>> [root@fos-elastic02 ~]# /usr/bin/free
>>              total       used       free     shared    buffers     cached
>> Mem:       8058568    *4494488*    3564080      10172     141392
>>  1549948
>> -/+ buffers/cache:    2803148    5255420
>> Swap:      8290300     108672    8181628
>>
>>
>> ran the single APP twice in parallel ( memory used double as expected )
>>
>> [root@fos-elastic02 ~]# /usr/bin/free
>>              total       used       free     shared    buffers     cached
>> Mem:       8058568    *4919532 *   3139036      10172     141444
>>  1550376
>> -/+ buffers/cache:    3227712    4830856
>> Swap:      8290300     108672    8181628
>>
>>
>> Curious to know if local mode is used in real deployments where there is
>> a scarcity of resources.
>>
>>
>> Thanks,
>> Sujeet
>>
>> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> OK that is good news. So briefly how do you kick off spark-submit for
>>> each (or sparkConf). In terms of memory/resources allocations.
>>>
>>> Now what is the output of
>>>
>>> /usr/bin/free
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 May 2016 at 18:12, sujeet jog <sujeet....@gmail.com> wrote:
>>>
>>>> Yes Mich,
>>>> They are currently emitting the results parallely,
>>>> http://localhost:4040 &  http://localhost:4041 , i also see the
>>>> monitoring from these URL's,
>>>>
>>>>
>>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>>>
>>>>> can you check it with jmonitor or the logs created
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 28 May 2016 at 18:03, sujeet jog <sujeet....@gmail.com> wrote:
>>>>>
>>>>>> Thanks Ted,
>>>>>>
>>>>>> Thanks Mich,  yes i see that i can run two applications by submitting
>>>>>> these,  probably Driver + Executor running in a single JVM .  In-Process
>>>>>> Spark.
>>>>>>
>>>>>> wondering if this can be used in production systems,  the reason for
>>>>>> me considering local instead of standalone cluster mode is purely because
>>>>>> of CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>>>>>> Driver & 1 Executor per application,    ( running in a embedded network
>>>>>> switch  )
>>>>>>
>>>>>>
>>>>>> jps output
>>>>>> [root@fos-elastic02 ~]# jps
>>>>>> 14258 SparkSubmit
>>>>>> 14503 Jps
>>>>>> 14302 SparkSubmit
>>>>>> ,
>>>>>>
>>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Ok so you want to run all this in local mode. In other words
>>>>>>> something like below
>>>>>>>
>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>
>>>>>>>                 --master local[2] \
>>>>>>>
>>>>>>>                 --driver-memory 2G \
>>>>>>>
>>>>>>>                 --num-executors=1 \
>>>>>>>
>>>>>>>                 --executor-memory=2G \
>>>>>>>
>>>>>>>                 --executor-cores=2 \
>>>>>>>
>>>>>>>
>>>>>>> I am not sure it will work for multiple drivers (app/JVM).  The only
>>>>>>> way you can find out is to do try it running two apps simultaneously. 
>>>>>>> You
>>>>>>> have a number of tools.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    1. use jps to see the apps and PID
>>>>>>>    2. use jmonitor to see memory/cpu/heap usage for each
>>>>>>>    spark-submit job
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * 
>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 28 May 2016 at 17:41, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Sujeet:
>>>>>>>>
>>>>>>>> Please also see:
>>>>>>>>
>>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>>>
>>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Sujeet,
>>>>>>>>>
>>>>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>>>>
>>>>>>>>> In Standalone cluster mode Spark allocates resources based on
>>>>>>>>> cores. By default, an application will grab all the cores in the 
>>>>>>>>> cluster.
>>>>>>>>>
>>>>>>>>> You only have one worker that lives within the driver JVM process
>>>>>>>>> that you start when you start the application with spark-shell or
>>>>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>>>>
>>>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>> tasks. The worker is tasked to create the executor (in this case 
>>>>>>>>> there is
>>>>>>>>> only one executor) for the Driver. The Executor runs tasks for the 
>>>>>>>>> Driver.
>>>>>>>>> Only one executor can be allocated on each worker per application. In 
>>>>>>>>> your
>>>>>>>>> case you only have
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well
>>>>>>>>> that is my experience. Yes you can submit more than one spark-submit 
>>>>>>>>> (the
>>>>>>>>> driver) but they may queue up behind the running one if there is not 
>>>>>>>>> enough
>>>>>>>>> resources.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You pointed out that you will be running few applications in
>>>>>>>>> parallel on the same host. The likelihood is that you are using a VM
>>>>>>>>> machine for this purpose and the best option is to try running the 
>>>>>>>>> first
>>>>>>>>> one, Check Web GUI on  4040 to see the progress of this Job. If you 
>>>>>>>>> start
>>>>>>>>> the next JVM then assuming it is working, it will be using port 4041 
>>>>>>>>> and so
>>>>>>>>> forth.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In actual fact try the command "free" to see how much free memory
>>>>>>>>> you have.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * 
>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet....@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>>>>
>>>>>>>>>> I have 3 applications which i would like to run independently on
>>>>>>>>>> a single machine, i need to run the drivers in the same machine.
>>>>>>>>>>
>>>>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM ,
>>>>>>>>>> 3 - 4 cores.
>>>>>>>>>>
>>>>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>>>>
>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>
>>>>>>>>>> The issue here is i will require 6 JVM running in parallel, for
>>>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hence i was looking more towards a local mode deployment mode,
>>>>>>>>>> would like to know if anybody is using local mode where Driver + 
>>>>>>>>>> Executor
>>>>>>>>>> run in a single JVM in production mode.
>>>>>>>>>>
>>>>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>>>>> production base systems.?..
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
*Chris Fregly*
Research Scientist @ Flux Capacitor AI
"Bringing AI Back to the Future!"
San Francisco, CA
http://fluxcapacitor.ai

Re: local Vs Standalonecluster production deployment

Reply via email to