Great, Thanks. On Sun, May 29, 2016 at 12:38 AM, Chris Fregly <ch...@fregly.com> wrote:
> btw, here's a handy Spark Config Generator by Ewan Higgs in in Gent, > Belgium: > > code: https://github.com/ehiggs/spark-config-gen > > demo: http://ehiggs.github.io/spark-config-gen/ > > my recent tweet on this: > https://twitter.com/cfregly/status/736631633927753729 > > On Sat, May 28, 2016 at 10:50 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> hang on. Free is telling me you have 8GB of memory. I was under the >> impression that you had 4GB of RAM :) >> >> So with no app you have 3.99GB free ~ 4GB >> 1st app takes 428MB of memory and the second is 425MB so pretty lean apps >> >> The question is the apps that I run take 2-3GB each. But your mileage >> varies. If you end up with free memory running these minute apps and no >> sudden spike in memory/cpu usage then as long as they run and finish within >> SLA you should be OK whichever environment you run. May be you apps do not >> require that amount of memory. >> >> I don't think there is clear cut answer to NOT to use local mode in prod. >> Others may have different opinions on this. >> >> HTH >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 28 May 2016 at 18:37, sujeet jog <sujeet....@gmail.com> wrote: >> >>> ran these from muliple bash shell for now, probably a multi threaded >>> python script would do , memory and resource allocations are seen as >>> submitted parameters >>> >>> >>> *say before running any applications . * >>> >>> [root@fos-elastic02 ~]# /usr/bin/free >>> total used free shared buffers cached >>> Mem: 8058568 *4066296 * 3992272 10172 141368 >>> 1549520 >>> -/+ buffers/cache: 2375408 5683160 >>> Swap: 8290300 108672 8181628 >>> >>> >>> *only 1 App : * >>> >>> [root@fos-elastic02 ~]# /usr/bin/free >>> total used free shared buffers cached >>> Mem: 8058568 *4494488* 3564080 10172 141392 >>> 1549948 >>> -/+ buffers/cache: 2803148 5255420 >>> Swap: 8290300 108672 8181628 >>> >>> >>> ran the single APP twice in parallel ( memory used double as expected ) >>> >>> [root@fos-elastic02 ~]# /usr/bin/free >>> total used free shared buffers cached >>> Mem: 8058568 *4919532 * 3139036 10172 141444 >>> 1550376 >>> -/+ buffers/cache: 3227712 4830856 >>> Swap: 8290300 108672 8181628 >>> >>> >>> Curious to know if local mode is used in real deployments where there is >>> a scarcity of resources. >>> >>> >>> Thanks, >>> Sujeet >>> >>> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> OK that is good news. So briefly how do you kick off spark-submit for >>>> each (or sparkConf). In terms of memory/resources allocations. >>>> >>>> Now what is the output of >>>> >>>> /usr/bin/free >>>> >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 28 May 2016 at 18:12, sujeet jog <sujeet....@gmail.com> wrote: >>>> >>>>> Yes Mich, >>>>> They are currently emitting the results parallely, >>>>> http://localhost:4040 & http://localhost:4041 , i also see the >>>>> monitoring from these URL's, >>>>> >>>>> >>>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> ok they are submitted but the latter one 14302 is it doing anything? >>>>>> >>>>>> can you check it with jmonitor or the logs created >>>>>> >>>>>> HTH >>>>>> >>>>>> >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 28 May 2016 at 18:03, sujeet jog <sujeet....@gmail.com> wrote: >>>>>> >>>>>>> Thanks Ted, >>>>>>> >>>>>>> Thanks Mich, yes i see that i can run two applications by >>>>>>> submitting these, probably Driver + Executor running in a single JVM . >>>>>>> In-Process Spark. >>>>>>> >>>>>>> wondering if this can be used in production systems, the reason for >>>>>>> me considering local instead of standalone cluster mode is purely >>>>>>> because >>>>>>> of CPU/MEM resources, i.e, i currently do not have the liberty to use >>>>>>> 1 >>>>>>> Driver & 1 Executor per application, ( running in a embedded network >>>>>>> switch ) >>>>>>> >>>>>>> >>>>>>> jps output >>>>>>> [root@fos-elastic02 ~]# jps >>>>>>> 14258 SparkSubmit >>>>>>> 14503 Jps >>>>>>> 14302 SparkSubmit >>>>>>> , >>>>>>> >>>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> Ok so you want to run all this in local mode. In other words >>>>>>>> something like below >>>>>>>> >>>>>>>> ${SPARK_HOME}/bin/spark-submit \ >>>>>>>> >>>>>>>> --master local[2] \ >>>>>>>> >>>>>>>> --driver-memory 2G \ >>>>>>>> >>>>>>>> --num-executors=1 \ >>>>>>>> >>>>>>>> --executor-memory=2G \ >>>>>>>> >>>>>>>> --executor-cores=2 \ >>>>>>>> >>>>>>>> >>>>>>>> I am not sure it will work for multiple drivers (app/JVM). The >>>>>>>> only way you can find out is to do try it running two apps >>>>>>>> simultaneously. >>>>>>>> You have a number of tools. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 1. use jps to see the apps and PID >>>>>>>> 2. use jmonitor to see memory/cpu/heap usage for each >>>>>>>> spark-submit job >>>>>>>> >>>>>>>> HTH >>>>>>>> >>>>>>>> Dr Mich Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> LinkedIn * >>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 28 May 2016 at 17:41, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Sujeet: >>>>>>>>> >>>>>>>>> Please also see: >>>>>>>>> >>>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html >>>>>>>>> >>>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh < >>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Sujeet, >>>>>>>>>> >>>>>>>>>> if you have a single machine then it is Spark standalone mode. >>>>>>>>>> >>>>>>>>>> In Standalone cluster mode Spark allocates resources based on >>>>>>>>>> cores. By default, an application will grab all the cores in the >>>>>>>>>> cluster. >>>>>>>>>> >>>>>>>>>> You only have one worker that lives within the driver JVM process >>>>>>>>>> that you start when you start the application with spark-shell or >>>>>>>>>> spark-submit in the host where the cluster manager is running. >>>>>>>>>> >>>>>>>>>> The Driver node runs on the same host that the cluster manager is >>>>>>>>>> running. The Driver requests the Cluster Manager for resources to run >>>>>>>>>> tasks. The worker is tasked to create the executor (in this case >>>>>>>>>> there is >>>>>>>>>> only one executor) for the Driver. The Executor runs tasks for the >>>>>>>>>> Driver. >>>>>>>>>> Only one executor can be allocated on each worker per application. >>>>>>>>>> In your >>>>>>>>>> case you only have >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well >>>>>>>>>> that is my experience. Yes you can submit more than one spark-submit >>>>>>>>>> (the >>>>>>>>>> driver) but they may queue up behind the running one if there is not >>>>>>>>>> enough >>>>>>>>>> resources. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You pointed out that you will be running few applications in >>>>>>>>>> parallel on the same host. The likelihood is that you are using a VM >>>>>>>>>> machine for this purpose and the best option is to try running the >>>>>>>>>> first >>>>>>>>>> one, Check Web GUI on 4040 to see the progress of this Job. If you >>>>>>>>>> start >>>>>>>>>> the next JVM then assuming it is working, it will be using port 4041 >>>>>>>>>> and so >>>>>>>>>> forth. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In actual fact try the command "free" to see how much free memory >>>>>>>>>> you have. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> HTH >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LinkedIn * >>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet....@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have a question w.r.t production deployment mode of spark, >>>>>>>>>>> >>>>>>>>>>> I have 3 applications which i would like to run independently on >>>>>>>>>>> a single machine, i need to run the drivers in the same machine. >>>>>>>>>>> >>>>>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM >>>>>>>>>>> , 3 - 4 cores. >>>>>>>>>>> >>>>>>>>>>> For deployment in standalone mode : i believe i need >>>>>>>>>>> >>>>>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>>>>> >>>>>>>>>>> The issue here is i will require 6 JVM running in parallel, for >>>>>>>>>>> which i do not have sufficient CPU/MEM resources, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hence i was looking more towards a local mode deployment mode, >>>>>>>>>>> would like to know if anybody is using local mode where Driver + >>>>>>>>>>> Executor >>>>>>>>>>> run in a single JVM in production mode. >>>>>>>>>>> >>>>>>>>>>> Are there any inherent issues upfront using local mode for >>>>>>>>>>> production base systems.?.. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > > -- > *Chris Fregly* > Research Scientist @ Flux Capacitor AI > "Bringing AI Back to the Future!" > San Francisco, CA > http://fluxcapacitor.ai >