btw, here's a handy Spark Config Generator by Ewan Higgs in in Gent, Belgium:
code: https://github.com/ehiggs/spark-config-gen demo: http://ehiggs.github.io/spark-config-gen/ my recent tweet on this: https://twitter.com/cfregly/status/736631633927753729 On Sat, May 28, 2016 at 10:50 AM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > hang on. Free is telling me you have 8GB of memory. I was under the > impression that you had 4GB of RAM :) > > So with no app you have 3.99GB free ~ 4GB > 1st app takes 428MB of memory and the second is 425MB so pretty lean apps > > The question is the apps that I run take 2-3GB each. But your mileage > varies. If you end up with free memory running these minute apps and no > sudden spike in memory/cpu usage then as long as they run and finish within > SLA you should be OK whichever environment you run. May be you apps do not > require that amount of memory. > > I don't think there is clear cut answer to NOT to use local mode in prod. > Others may have different opinions on this. > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 28 May 2016 at 18:37, sujeet jog <sujeet....@gmail.com> wrote: > >> ran these from muliple bash shell for now, probably a multi threaded >> python script would do , memory and resource allocations are seen as >> submitted parameters >> >> >> *say before running any applications . * >> >> [root@fos-elastic02 ~]# /usr/bin/free >> total used free shared buffers cached >> Mem: 8058568 *4066296 * 3992272 10172 141368 >> 1549520 >> -/+ buffers/cache: 2375408 5683160 >> Swap: 8290300 108672 8181628 >> >> >> *only 1 App : * >> >> [root@fos-elastic02 ~]# /usr/bin/free >> total used free shared buffers cached >> Mem: 8058568 *4494488* 3564080 10172 141392 >> 1549948 >> -/+ buffers/cache: 2803148 5255420 >> Swap: 8290300 108672 8181628 >> >> >> ran the single APP twice in parallel ( memory used double as expected ) >> >> [root@fos-elastic02 ~]# /usr/bin/free >> total used free shared buffers cached >> Mem: 8058568 *4919532 * 3139036 10172 141444 >> 1550376 >> -/+ buffers/cache: 3227712 4830856 >> Swap: 8290300 108672 8181628 >> >> >> Curious to know if local mode is used in real deployments where there is >> a scarcity of resources. >> >> >> Thanks, >> Sujeet >> >> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> OK that is good news. So briefly how do you kick off spark-submit for >>> each (or sparkConf). In terms of memory/resources allocations. >>> >>> Now what is the output of >>> >>> /usr/bin/free >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 28 May 2016 at 18:12, sujeet jog <sujeet....@gmail.com> wrote: >>> >>>> Yes Mich, >>>> They are currently emitting the results parallely, >>>> http://localhost:4040 & http://localhost:4041 , i also see the >>>> monitoring from these URL's, >>>> >>>> >>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> ok they are submitted but the latter one 14302 is it doing anything? >>>>> >>>>> can you check it with jmonitor or the logs created >>>>> >>>>> HTH >>>>> >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 28 May 2016 at 18:03, sujeet jog <sujeet....@gmail.com> wrote: >>>>> >>>>>> Thanks Ted, >>>>>> >>>>>> Thanks Mich, yes i see that i can run two applications by submitting >>>>>> these, probably Driver + Executor running in a single JVM . In-Process >>>>>> Spark. >>>>>> >>>>>> wondering if this can be used in production systems, the reason for >>>>>> me considering local instead of standalone cluster mode is purely because >>>>>> of CPU/MEM resources, i.e, i currently do not have the liberty to use 1 >>>>>> Driver & 1 Executor per application, ( running in a embedded network >>>>>> switch ) >>>>>> >>>>>> >>>>>> jps output >>>>>> [root@fos-elastic02 ~]# jps >>>>>> 14258 SparkSubmit >>>>>> 14503 Jps >>>>>> 14302 SparkSubmit >>>>>> , >>>>>> >>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Ok so you want to run all this in local mode. In other words >>>>>>> something like below >>>>>>> >>>>>>> ${SPARK_HOME}/bin/spark-submit \ >>>>>>> >>>>>>> --master local[2] \ >>>>>>> >>>>>>> --driver-memory 2G \ >>>>>>> >>>>>>> --num-executors=1 \ >>>>>>> >>>>>>> --executor-memory=2G \ >>>>>>> >>>>>>> --executor-cores=2 \ >>>>>>> >>>>>>> >>>>>>> I am not sure it will work for multiple drivers (app/JVM). The only >>>>>>> way you can find out is to do try it running two apps simultaneously. >>>>>>> You >>>>>>> have a number of tools. >>>>>>> >>>>>>> >>>>>>> >>>>>>> 1. use jps to see the apps and PID >>>>>>> 2. use jmonitor to see memory/cpu/heap usage for each >>>>>>> spark-submit job >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 28 May 2016 at 17:41, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>> >>>>>>>> Sujeet: >>>>>>>> >>>>>>>> Please also see: >>>>>>>> >>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html >>>>>>>> >>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Sujeet, >>>>>>>>> >>>>>>>>> if you have a single machine then it is Spark standalone mode. >>>>>>>>> >>>>>>>>> In Standalone cluster mode Spark allocates resources based on >>>>>>>>> cores. By default, an application will grab all the cores in the >>>>>>>>> cluster. >>>>>>>>> >>>>>>>>> You only have one worker that lives within the driver JVM process >>>>>>>>> that you start when you start the application with spark-shell or >>>>>>>>> spark-submit in the host where the cluster manager is running. >>>>>>>>> >>>>>>>>> The Driver node runs on the same host that the cluster manager is >>>>>>>>> running. The Driver requests the Cluster Manager for resources to run >>>>>>>>> tasks. The worker is tasked to create the executor (in this case >>>>>>>>> there is >>>>>>>>> only one executor) for the Driver. The Executor runs tasks for the >>>>>>>>> Driver. >>>>>>>>> Only one executor can be allocated on each worker per application. In >>>>>>>>> your >>>>>>>>> case you only have >>>>>>>>> >>>>>>>>> >>>>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well >>>>>>>>> that is my experience. Yes you can submit more than one spark-submit >>>>>>>>> (the >>>>>>>>> driver) but they may queue up behind the running one if there is not >>>>>>>>> enough >>>>>>>>> resources. >>>>>>>>> >>>>>>>>> >>>>>>>>> You pointed out that you will be running few applications in >>>>>>>>> parallel on the same host. The likelihood is that you are using a VM >>>>>>>>> machine for this purpose and the best option is to try running the >>>>>>>>> first >>>>>>>>> one, Check Web GUI on 4040 to see the progress of this Job. If you >>>>>>>>> start >>>>>>>>> the next JVM then assuming it is working, it will be using port 4041 >>>>>>>>> and so >>>>>>>>> forth. >>>>>>>>> >>>>>>>>> >>>>>>>>> In actual fact try the command "free" to see how much free memory >>>>>>>>> you have. >>>>>>>>> >>>>>>>>> >>>>>>>>> HTH >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Dr Mich Talebzadeh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> LinkedIn * >>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet....@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have a question w.r.t production deployment mode of spark, >>>>>>>>>> >>>>>>>>>> I have 3 applications which i would like to run independently on >>>>>>>>>> a single machine, i need to run the drivers in the same machine. >>>>>>>>>> >>>>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM , >>>>>>>>>> 3 - 4 cores. >>>>>>>>>> >>>>>>>>>> For deployment in standalone mode : i believe i need >>>>>>>>>> >>>>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>>>> >>>>>>>>>> The issue here is i will require 6 JVM running in parallel, for >>>>>>>>>> which i do not have sufficient CPU/MEM resources, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hence i was looking more towards a local mode deployment mode, >>>>>>>>>> would like to know if anybody is using local mode where Driver + >>>>>>>>>> Executor >>>>>>>>>> run in a single JVM in production mode. >>>>>>>>>> >>>>>>>>>> Are there any inherent issues upfront using local mode for >>>>>>>>>> production base systems.?.. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- *Chris Fregly* Research Scientist @ Flux Capacitor AI "Bringing AI Back to the Future!" San Francisco, CA http://fluxcapacitor.ai