ran these from muliple bash shell for now, probably a multi threaded python script would do , memory and resource allocations are seen as submitted parameters
*say before running any applications . * [root@fos-elastic02 ~]# /usr/bin/free total used free shared buffers cached Mem: 8058568 *4066296 * 3992272 10172 141368 1549520 -/+ buffers/cache: 2375408 5683160 Swap: 8290300 108672 8181628 *only 1 App : * [root@fos-elastic02 ~]# /usr/bin/free total used free shared buffers cached Mem: 8058568 *4494488* 3564080 10172 141392 1549948 -/+ buffers/cache: 2803148 5255420 Swap: 8290300 108672 8181628 ran the single APP twice in parallel ( memory used double as expected ) [root@fos-elastic02 ~]# /usr/bin/free total used free shared buffers cached Mem: 8058568 *4919532 * 3139036 10172 141444 1550376 -/+ buffers/cache: 3227712 4830856 Swap: 8290300 108672 8181628 Curious to know if local mode is used in real deployments where there is a scarcity of resources. Thanks, Sujeet On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > OK that is good news. So briefly how do you kick off spark-submit for each > (or sparkConf). In terms of memory/resources allocations. > > Now what is the output of > > /usr/bin/free > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 28 May 2016 at 18:12, sujeet jog <sujeet....@gmail.com> wrote: > >> Yes Mich, >> They are currently emitting the results parallely, >> http://localhost:4040 & http://localhost:4041 , i also see the >> monitoring from these URL's, >> >> >> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> ok they are submitted but the latter one 14302 is it doing anything? >>> >>> can you check it with jmonitor or the logs created >>> >>> HTH >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 28 May 2016 at 18:03, sujeet jog <sujeet....@gmail.com> wrote: >>> >>>> Thanks Ted, >>>> >>>> Thanks Mich, yes i see that i can run two applications by submitting >>>> these, probably Driver + Executor running in a single JVM . In-Process >>>> Spark. >>>> >>>> wondering if this can be used in production systems, the reason for me >>>> considering local instead of standalone cluster mode is purely because of >>>> CPU/MEM resources, i.e, i currently do not have the liberty to use 1 >>>> Driver & 1 Executor per application, ( running in a embedded network >>>> switch ) >>>> >>>> >>>> jps output >>>> [root@fos-elastic02 ~]# jps >>>> 14258 SparkSubmit >>>> 14503 Jps >>>> 14302 SparkSubmit >>>> , >>>> >>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Ok so you want to run all this in local mode. In other words something >>>>> like below >>>>> >>>>> ${SPARK_HOME}/bin/spark-submit \ >>>>> >>>>> --master local[2] \ >>>>> >>>>> --driver-memory 2G \ >>>>> >>>>> --num-executors=1 \ >>>>> >>>>> --executor-memory=2G \ >>>>> >>>>> --executor-cores=2 \ >>>>> >>>>> >>>>> I am not sure it will work for multiple drivers (app/JVM). The only >>>>> way you can find out is to do try it running two apps simultaneously. You >>>>> have a number of tools. >>>>> >>>>> >>>>> >>>>> 1. use jps to see the apps and PID >>>>> 2. use jmonitor to see memory/cpu/heap usage for each spark-submit >>>>> job >>>>> >>>>> HTH >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 28 May 2016 at 17:41, Ted Yu <yuzhih...@gmail.com> wrote: >>>>> >>>>>> Sujeet: >>>>>> >>>>>> Please also see: >>>>>> >>>>>> https://spark.apache.org/docs/latest/spark-standalone.html >>>>>> >>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Hi Sujeet, >>>>>>> >>>>>>> if you have a single machine then it is Spark standalone mode. >>>>>>> >>>>>>> In Standalone cluster mode Spark allocates resources based on >>>>>>> cores. By default, an application will grab all the cores in the >>>>>>> cluster. >>>>>>> >>>>>>> You only have one worker that lives within the driver JVM process >>>>>>> that you start when you start the application with spark-shell or >>>>>>> spark-submit in the host where the cluster manager is running. >>>>>>> >>>>>>> The Driver node runs on the same host that the cluster manager is >>>>>>> running. The Driver requests the Cluster Manager for resources to run >>>>>>> tasks. The worker is tasked to create the executor (in this case there >>>>>>> is >>>>>>> only one executor) for the Driver. The Executor runs tasks for the >>>>>>> Driver. >>>>>>> Only one executor can be allocated on each worker per application. In >>>>>>> your >>>>>>> case you only have >>>>>>> >>>>>>> >>>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well >>>>>>> that is my experience. Yes you can submit more than one spark-submit >>>>>>> (the >>>>>>> driver) but they may queue up behind the running one if there is not >>>>>>> enough >>>>>>> resources. >>>>>>> >>>>>>> >>>>>>> You pointed out that you will be running few applications in >>>>>>> parallel on the same host. The likelihood is that you are using a VM >>>>>>> machine for this purpose and the best option is to try running the first >>>>>>> one, Check Web GUI on 4040 to see the progress of this Job. If you >>>>>>> start >>>>>>> the next JVM then assuming it is working, it will be using port 4041 >>>>>>> and so >>>>>>> forth. >>>>>>> >>>>>>> >>>>>>> In actual fact try the command "free" to see how much free memory >>>>>>> you have. >>>>>>> >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet....@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have a question w.r.t production deployment mode of spark, >>>>>>>> >>>>>>>> I have 3 applications which i would like to run independently on a >>>>>>>> single machine, i need to run the drivers in the same machine. >>>>>>>> >>>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM , 3 >>>>>>>> - 4 cores. >>>>>>>> >>>>>>>> For deployment in standalone mode : i believe i need >>>>>>>> >>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>> 1 Driver JVM, 1 worker node ( 1 executor ) >>>>>>>> >>>>>>>> The issue here is i will require 6 JVM running in parallel, for >>>>>>>> which i do not have sufficient CPU/MEM resources, >>>>>>>> >>>>>>>> >>>>>>>> Hence i was looking more towards a local mode deployment mode, >>>>>>>> would like to know if anybody is using local mode where Driver + >>>>>>>> Executor >>>>>>>> run in a single JVM in production mode. >>>>>>>> >>>>>>>> Are there any inherent issues upfront using local mode for >>>>>>>> production base systems.?.. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >