Re: Launching multiple spark jobs within a main spark job.
Thanks Liang, Vadim and everyone for your inputs!! With this clarity, I've tried client modes for both main and sub-spark jobs. Every main spark job and its corresponding threaded spark jobs are coming up on the YARN applications list and the jobs are getting executed properly. I need to now test with cluster modes at both levels, and need to setup spark-submit and few configurations properly on all data nodes in the cluster. I will share the updates as and when I execute and analyze further. Concern now which I am thinking is: how to throttle multiple jobs launching based on the YARN cluster's availability. This exercise will be similar to performing cluster's break-point analysis. But problem here is that we will not know the file sizes until we read and get in memory and since Spark's memory mechanics are more subtle and fragile, need to be 100% sure and avoid OOM (out-of-memory) issues. Not sure if there is any process available which can poll resource manager's information and tell if any further jobs can be submitted to YARN. On Thu, Dec 22, 2016 at 7:26 AM, Liang-Chi Hsieh wrote: > > If you run the main driver and other Spark jobs in client mode, you can > make > sure they (I meant all the drivers) are running at the same node. Of course > all drivers now consume the resources at the same node. > > If you run the main driver in client mode, but run other Spark jobs in > cluster mode, the drivers of those Spark jobs will be launched at other > nodes in the cluster. It should work too. It is as same as you run a Spark > app in client mode and more others in cluster mode. > > If you run your main driver in cluster mode, and run other Spark jobs in > cluster mode too, you may need Spark properly installed in all nodes in > the > cluster, because those Spark jobs will be launched at the node which the > main driver is running on. > > > > > > - > Liang-Chi Hsieh | @viirya > Spark Technology Center > http://www.spark.tc/ > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Launching-multiple- > spark-jobs-within-a-main-spark-job-tp20311p20327.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: Launching multiple spark jobs within a main spark job.
If you run the main driver and other Spark jobs in client mode, you can make sure they (I meant all the drivers) are running at the same node. Of course all drivers now consume the resources at the same node. If you run the main driver in client mode, but run other Spark jobs in cluster mode, the drivers of those Spark jobs will be launched at other nodes in the cluster. It should work too. It is as same as you run a Spark app in client mode and more others in cluster mode. If you run your main driver in cluster mode, and run other Spark jobs in cluster mode too, you may need Spark properly installed in all nodes in the cluster, because those Spark jobs will be launched at the node which the main driver is running on. - Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Launching-multiple-spark-jobs-within-a-main-spark-job-tp20311p20327.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
RE: Launching multiple spark jobs within a main spark job.
I am not familiar of any problem with that. Anyway, If you run spark applicaction you would have multiple jobs, which makes sense that it is not a problem. Thanks David. From: Naveen [mailto:hadoopst...@gmail.com] Sent: Wednesday, December 21, 2016 9:18 AM To: dev@spark.apache.org; u...@spark.apache.org Subject: Launching multiple spark jobs within a main spark job. Hi Team, Is it ok to spawn multiple spark jobs within a main spark job, my main spark job's driver which was launched on yarn cluster, will do some preprocessing and based on it, it needs to launch multilple spark jobs on yarn cluster. Not sure if this right pattern. Please share your thoughts. Sample code i ve is as below for better understanding.. - Object Mainsparkjob { main(...){ val sc=new SparkContext(..) Fetch from hive..using hivecontext Fetch from hbase //spawning multiple Futures.. Val future1=Future{ Val sparkjob= SparkLauncher(...).launch; spark.waitFor } Similarly, future2 to futureN. future1.onComplete{...} } }// end of mainsparkjob -- Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
Re: Launching multiple spark jobs within a main spark job.
Is there any reason you need a context on the application launching the jobs? You can use SparkLauncher in a normal app and just listen for state transitions On Wed, 21 Dec 2016, 11:44 Naveen, wrote: > Hi Team, > > Thanks for your responses. > Let me give more details in a picture of how I am trying to launch jobs. > > Main spark job will launch other spark-job similar to calling multiple > spark-submit within a Spark driver program. > These spawned threads for new jobs will be totally different components, > so these cannot be implemented using spark actions. > > sample code: > > - > > Object Mainsparkjob { > > main(...){ > > val sc=new SparkContext(..) > > Fetch from hive..using hivecontext > Fetch from hbase > > //spawning multiple Futures.. > Val future1=Future{ > Val sparkjob= SparkLauncher(...).launch; spark.waitFor > } > > Similarly, future2 to futureN. > > future1.onComplete{...} > } > > }// end of mainsparkjob > -- > > > [image: Inline image 1] > > On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi < > david.hode...@niceactimize.com> wrote: > > I am not familiar of any problem with that. > > Anyway, If you run spark applicaction you would have multiple jobs, which > makes sense that it is not a problem. > > > > Thanks David. > > > > *From:* Naveen [mailto:hadoopst...@gmail.com] > *Sent:* Wednesday, December 21, 2016 9:18 AM > *To:* dev@spark.apache.org; u...@spark.apache.org > *Subject:* Launching multiple spark jobs within a main spark job. > > > > Hi Team, > > > > Is it ok to spawn multiple spark jobs within a main spark job, my main > spark job's driver which was launched on yarn cluster, will do some > preprocessing and based on it, it needs to launch multilple spark jobs on > yarn cluster. Not sure if this right pattern. > > > > Please share your thoughts. > > Sample code i ve is as below for better understanding.. > > - > > > > Object Mainsparkjob { > > > > main(...){ > > > > val sc=new SparkContext(..) > > > > Fetch from hive..using hivecontext > > Fetch from hbase > > > > //spawning multiple Futures.. > > Val future1=Future{ > > Val sparkjob= SparkLauncher(...).launch; spark.waitFor > > } > > > > Similarly, future2 to futureN. > > > > future1.onComplete{...} > > } > > > > }// end of mainsparkjob > > -- > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > > >
Re: Launching multiple spark jobs within a main spark job.
Thanks Liang! I get your point. It would mean that when launching spark jobs, mode needs to be specified as client for all spark jobs. However, my concern is to know if driver's memory(which is launching spark jobs) will be used completely by the Future's(sparkcontext's) or these spawned sparkcontexts will get different nodes / executors from resource manager? On Wed, Dec 21, 2016 at 6:43 PM, Naveen wrote: > Hi Sebastian, > > Yes, for fetching the details from Hive and HBase, I would want to use > Spark's HiveContext etc. > However, based on your point, I might have to check if JDBC based driver > connection could be used to do the same. > > Main reason for this is to avoid a client-server architecture design. > > If we go by a normal scala app without creating a sparkcontext as per your > suggestion, then > 1. it turns out to be a client program on cluster on a single node, and > for any multiple invocation through xyz scheduler , it will be invoked > always from that same node > 2. Having client program on a single data node might create a hotspot for > that data node which might create a bottleneck as all invocations might > create JVMs on that node itself. > 3. With above, we will loose the Spark on YARN's feature of dynamically > allocating a driver on any available data node through RM and NM > co-ordination. With YARN and Cluster mode of invoking a spark-job, it will > help distribute multiple application(main one) in cluster uniformly. > > Thanks and please let me know your views. > > > On Wed, Dec 21, 2016 at 5:43 PM, Sebastian Piu > wrote: > >> Is there any reason you need a context on the application launching the >> jobs? >> You can use SparkLauncher in a normal app and just listen for state >> transitions >> >> On Wed, 21 Dec 2016, 11:44 Naveen, wrote: >> >>> Hi Team, >>> >>> Thanks for your responses. >>> Let me give more details in a picture of how I am trying to launch jobs. >>> >>> Main spark job will launch other spark-job similar to calling multiple >>> spark-submit within a Spark driver program. >>> These spawned threads for new jobs will be totally different components, >>> so these cannot be implemented using spark actions. >>> >>> sample code: >>> >>> - >>> >>> Object Mainsparkjob { >>> >>> main(...){ >>> >>> val sc=new SparkContext(..) >>> >>> Fetch from hive..using hivecontext >>> Fetch from hbase >>> >>> //spawning multiple Futures.. >>> Val future1=Future{ >>> Val sparkjob= SparkLauncher(...).launch; spark.waitFor >>> } >>> >>> Similarly, future2 to futureN. >>> >>> future1.onComplete{...} >>> } >>> >>> }// end of mainsparkjob >>> -- >>> >>> >>> [image: Inline image 1] >>> >>> On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi < >>> david.hode...@niceactimize.com> wrote: >>> >>> I am not familiar of any problem with that. >>> >>> Anyway, If you run spark applicaction you would have multiple jobs, >>> which makes sense that it is not a problem. >>> >>> >>> >>> Thanks David. >>> >>> >>> >>> *From:* Naveen [mailto:hadoopst...@gmail.com] >>> *Sent:* Wednesday, December 21, 2016 9:18 AM >>> *To:* dev@spark.apache.org; u...@spark.apache.org >>> *Subject:* Launching multiple spark jobs within a main spark job. >>> >>> >>> >>> Hi Team, >>> >>> >>> >>> Is it ok to spawn multiple spark jobs within a main spark job, my main >>> spark job's driver which was launched on yarn cluster, will do some >>> preprocessing and based on it, it needs to launch multilple spark jobs on >>> yarn cluster. Not sure if this right pattern. >>> >>> >>> >>> Please share your thoughts. >>> >>> Sample code i ve is as below for better understanding.. >>> >>> - >>> >>> >>> >>> Object Mainsparkjob { >>> >>> >>> >>> main(...){ >>> >>> >>> >>> val sc=new SparkContext(..) >>> >>> >>> >>> Fetch from hive..using hivecontext >>> >>> Fetch from hbase >>> >>> >>> >>> //spawning multiple Futures.. >>> >>> Val future1=Future{ >>> >>> Val sparkjob= SparkLauncher(...).launch; spark.waitFor >>> >>> } >>> >>> >>> >>> Similarly, future2 to futureN. >>> >>> >>> >>> future1.onComplete{...} >>> >>> } >>> >>> >>> >>> }// end of mainsparkjob >>> >>> -- >>> >>> >>> Confidentiality: This communication and any attachments are intended for >>> the above-named persons only and may be confidential and/or legally >>> privileged. Any opinions expressed in this communication are not >>> necessarily those of NICE Actimize. If this communication has come to you >>> in error you must take no action based on it, nor must you copy or show it >>> to anyone; please delete/destroy and inform the sender by e-mail >>> immediately. >>> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. >>> Viruses: Although we have taken steps toward ensuring that this e-mail >>> and attachments are free from any virus, we advise that in keeping with >>> good computing practice the recipient should ensure they are actually virus >>> free. >>> >>> >>> >
Re: Launching multiple spark jobs within a main spark job.
Hi Sebastian, Yes, for fetching the details from Hive and HBase, I would want to use Spark's HiveContext etc. However, based on your point, I might have to check if JDBC based driver connection could be used to do the same. Main reason for this is to avoid a client-server architecture design. If we go by a normal scala app without creating a sparkcontext as per your suggestion, then 1. it turns out to be a client program on cluster on a single node, and for any multiple invocation through xyz scheduler , it will be invoked always from that same node 2. Having client program on a single data node might create a hotspot for that data node which might create a bottleneck as all invocations might create JVMs on that node itself. 3. With above, we will loose the Spark on YARN's feature of dynamically allocating a driver on any available data node through RM and NM co-ordination. With YARN and Cluster mode of invoking a spark-job, it will help distribute multiple application(main one) in cluster uniformly. Thanks and please let me know your views. On Wed, Dec 21, 2016 at 5:43 PM, Sebastian Piu wrote: > Is there any reason you need a context on the application launching the > jobs? > You can use SparkLauncher in a normal app and just listen for state > transitions > > On Wed, 21 Dec 2016, 11:44 Naveen, wrote: > >> Hi Team, >> >> Thanks for your responses. >> Let me give more details in a picture of how I am trying to launch jobs. >> >> Main spark job will launch other spark-job similar to calling multiple >> spark-submit within a Spark driver program. >> These spawned threads for new jobs will be totally different components, >> so these cannot be implemented using spark actions. >> >> sample code: >> >> - >> >> Object Mainsparkjob { >> >> main(...){ >> >> val sc=new SparkContext(..) >> >> Fetch from hive..using hivecontext >> Fetch from hbase >> >> //spawning multiple Futures.. >> Val future1=Future{ >> Val sparkjob= SparkLauncher(...).launch; spark.waitFor >> } >> >> Similarly, future2 to futureN. >> >> future1.onComplete{...} >> } >> >> }// end of mainsparkjob >> -- >> >> >> [image: Inline image 1] >> >> On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi < >> david.hode...@niceactimize.com> wrote: >> >> I am not familiar of any problem with that. >> >> Anyway, If you run spark applicaction you would have multiple jobs, which >> makes sense that it is not a problem. >> >> >> >> Thanks David. >> >> >> >> *From:* Naveen [mailto:hadoopst...@gmail.com] >> *Sent:* Wednesday, December 21, 2016 9:18 AM >> *To:* dev@spark.apache.org; u...@spark.apache.org >> *Subject:* Launching multiple spark jobs within a main spark job. >> >> >> >> Hi Team, >> >> >> >> Is it ok to spawn multiple spark jobs within a main spark job, my main >> spark job's driver which was launched on yarn cluster, will do some >> preprocessing and based on it, it needs to launch multilple spark jobs on >> yarn cluster. Not sure if this right pattern. >> >> >> >> Please share your thoughts. >> >> Sample code i ve is as below for better understanding.. >> >> - >> >> >> >> Object Mainsparkjob { >> >> >> >> main(...){ >> >> >> >> val sc=new SparkContext(..) >> >> >> >> Fetch from hive..using hivecontext >> >> Fetch from hbase >> >> >> >> //spawning multiple Futures.. >> >> Val future1=Future{ >> >> Val sparkjob= SparkLauncher(...).launch; spark.waitFor >> >> } >> >> >> >> Similarly, future2 to futureN. >> >> >> >> future1.onComplete{...} >> >> } >> >> >> >> }// end of mainsparkjob >> >> -- >> >> >> Confidentiality: This communication and any attachments are intended for >> the above-named persons only and may be confidential and/or legally >> privileged. Any opinions expressed in this communication are not >> necessarily those of NICE Actimize. If this communication has come to you >> in error you must take no action based on it, nor must you copy or show it >> to anyone; please delete/destroy and inform the sender by e-mail >> immediately. >> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. >> Viruses: Although we have taken steps toward ensuring that this e-mail >> and attachments are free from any virus, we advise that in keeping with >> good computing practice the recipient should ensure they are actually virus >> free. >> >> >>
Re: Launching multiple spark jobs within a main spark job.
OK. I think it is little unusual use pattern, but it should work. As I said before, if you want those Spark applications to share cluster resources, proper configs is needed for Spark. If you submit the main driver and all other Spark applications in client mode under yarn, you should make sure the node running the driver has enough resources to run them. I am not sure if you can use `SparkLauncher` to submit them in different mode, e.g., main driver in client mode, others in cluster mode. Worth trying. - Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Launching-multiple-spark-jobs-within-a-main-spark-job-tp20311p20315.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Launching multiple spark jobs within a main spark job.
Hi Team, Thanks for your responses. Let me give more details in a picture of how I am trying to launch jobs. Main spark job will launch other spark-job similar to calling multiple spark-submit within a Spark driver program. These spawned threads for new jobs will be totally different components, so these cannot be implemented using spark actions. sample code: - Object Mainsparkjob { main(...){ val sc=new SparkContext(..) Fetch from hive..using hivecontext Fetch from hbase //spawning multiple Futures.. Val future1=Future{ Val sparkjob= SparkLauncher(...).launch; spark.waitFor } Similarly, future2 to futureN. future1.onComplete{...} } }// end of mainsparkjob -- [image: Inline image 1] On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi < david.hode...@niceactimize.com> wrote: > I am not familiar of any problem with that. > > Anyway, If you run spark applicaction you would have multiple jobs, which > makes sense that it is not a problem. > > > > Thanks David. > > > > *From:* Naveen [mailto:hadoopst...@gmail.com] > *Sent:* Wednesday, December 21, 2016 9:18 AM > *To:* dev@spark.apache.org; u...@spark.apache.org > *Subject:* Launching multiple spark jobs within a main spark job. > > > > Hi Team, > > > > Is it ok to spawn multiple spark jobs within a main spark job, my main > spark job's driver which was launched on yarn cluster, will do some > preprocessing and based on it, it needs to launch multilple spark jobs on > yarn cluster. Not sure if this right pattern. > > > > Please share your thoughts. > > Sample code i ve is as below for better understanding.. > > - > > > > Object Mainsparkjob { > > > > main(...){ > > > > val sc=new SparkContext(..) > > > > Fetch from hive..using hivecontext > > Fetch from hbase > > > > //spawning multiple Futures.. > > Val future1=Future{ > > Val sparkjob= SparkLauncher(...).launch; spark.waitFor > > } > > > > Similarly, future2 to futureN. > > > > future1.onComplete{...} > > } > > > > }// end of mainsparkjob > > -- > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. >
Re: Launching multiple spark jobs within a main spark job.
Hi, As you launch multiple Spark jobs through `SparkLauncher`, I think it actually works like you run multiple Spark applications with `spark-submit`. By default each application will try to use all available nodes. If your purpose is to share cluster resources across those Spark jobs/applications, you may need to set some configs properly. Please check out: http://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications As you said you launch the main Spark job on yarn cluster, if you are using cluster mode, actually you will submit those Spark jobs/applications on the node which the driver runs. It looks weird. Looks like you try to fetch some data first and do some jobs on the data. Can't you just do those jobs in the main driver as Spark actions with its API? - Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Launching-multiple-spark-jobs-within-a-main-spark-job-tp20311p20312.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org