Re: Launching multiple spark jobs within a main spark job.

Vadim Semenov Wed, 21 Dec 2016 08:28:04 -0800

Check the source code for SparkLauncher:
https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java#L541


a separate process will be started using `spark-submit` and if it uses
`yarn-cluster` mode, a driver may be launched on another NodeManager or may
be launched on the same NodeManager, so you would need to work around it if
you want to avoid hot spots.

On Wed, Dec 21, 2016 at 8:19 AM, Naveen <hadoopst...@gmail.com> wrote:

> Thanks Liang!
> I get your point. It would mean that when launching spark jobs, mode needs
> to be specified as client for all spark jobs.
> However, my concern is to know if driver's memory(which is launching spark
> jobs) will be used completely by the Future's(sparkcontext's) or these
> spawned sparkcontexts will get different nodes / executors from resource
> manager?
>
> On Wed, Dec 21, 2016 at 6:43 PM, Naveen <hadoopst...@gmail.com> wrote:
>
>> Hi Sebastian,
>>
>> Yes, for fetching the details from Hive and HBase, I would want to use
>> Spark's HiveContext etc.
>> However, based on your point, I might have to check if JDBC based driver
>> connection could be used to do the same.
>>
>> Main reason for this is to avoid a client-server architecture design.
>>
>> If we go by a normal scala app without creating a sparkcontext as per
>> your suggestion, then
>> 1. it turns out to be a client program on cluster on a single node, and
>> for any multiple invocation through xyz scheduler , it will be invoked
>> always from that same node
>> 2. Having client program on a single data node might create a hotspot for
>> that data node which might create a bottleneck as all invocations might
>> create JVMs on that node itself.
>> 3. With above, we will loose the Spark on YARN's feature of dynamically
>> allocating a driver on any available data node through RM and NM
>> co-ordination. With YARN and Cluster mode of invoking a spark-job, it will
>> help distribute multiple application(main one) in cluster uniformly.
>>
>> Thanks and please let me know your views.
>>
>>
>> On Wed, Dec 21, 2016 at 5:43 PM, Sebastian Piu <sebastian....@gmail.com>
>> wrote:
>>
>>> Is there any reason you need a context on the application launching the
>>> jobs?
>>> You can use SparkLauncher in a normal app and just listen for state
>>> transitions
>>>
>>> On Wed, 21 Dec 2016, 11:44 Naveen, <hadoopst...@gmail.com> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> Thanks for your responses.
>>>> Let me give more details in a picture of how I am trying to launch jobs.
>>>>
>>>> Main spark job will launch other spark-job similar to calling multiple
>>>> spark-submit within a Spark driver program.
>>>> These spawned threads for new jobs will be totally different
>>>> components, so these cannot be implemented using spark actions.
>>>>
>>>> sample code:
>>>>
>>>> ---------------------
>>>>
>>>> Object Mainsparkjob {
>>>>
>>>> main(...){
>>>>
>>>> val sc=new SparkContext(..)
>>>>
>>>> Fetch from hive..using hivecontext
>>>> Fetch from hbase
>>>>
>>>> //spawning multiple Futures..
>>>> Val future1=Future{
>>>> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
>>>> }
>>>>
>>>> Similarly, future2 to futureN.
>>>>
>>>> future1.onComplete{...}
>>>> }
>>>>
>>>> }// end of mainsparkjob
>>>> ----------------------
>>>>
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi <
>>>> david.hode...@niceactimize.com> wrote:
>>>>
>>>> I am not familiar of any problem with that.
>>>>
>>>> Anyway, If you run spark applicaction you would have multiple jobs,
>>>> which makes sense that it is not a problem.
>>>>
>>>>
>>>>
>>>> Thanks David.
>>>>
>>>>
>>>>
>>>> *From:* Naveen [mailto:hadoopst...@gmail.com]
>>>> *Sent:* Wednesday, December 21, 2016 9:18 AM
>>>> *To:* d...@spark.apache.org; user@spark.apache.org
>>>> *Subject:* Launching multiple spark jobs within a main spark job.
>>>>
>>>>
>>>>
>>>> Hi Team,
>>>>
>>>>
>>>>
>>>> Is it ok to spawn multiple spark jobs within a main spark job, my main
>>>> spark job's driver which was launched on yarn cluster, will do some
>>>> preprocessing and based on it, it needs to launch multilple spark jobs on
>>>> yarn cluster. Not sure if this right pattern.
>>>>
>>>>
>>>>
>>>> Please share your thoughts.
>>>>
>>>> Sample code i ve is as below for better understanding..
>>>>
>>>> ---------------------
>>>>
>>>>
>>>>
>>>> Object Mainsparkjob {
>>>>
>>>>
>>>>
>>>> main(...){
>>>>
>>>>
>>>>
>>>> val sc=new SparkContext(..)
>>>>
>>>>
>>>>
>>>> Fetch from hive..using hivecontext
>>>>
>>>> Fetch from hbase
>>>>
>>>>
>>>>
>>>> //spawning multiple Futures..
>>>>
>>>> Val future1=Future{
>>>>
>>>> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>> Similarly, future2 to futureN.
>>>>
>>>>
>>>>
>>>> future1.onComplete{...}
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>> }// end of mainsparkjob
>>>>
>>>> ----------------------
>>>>
>>>>
>>>> Confidentiality: This communication and any attachments are intended
>>>> for the above-named persons only and may be confidential and/or legally
>>>> privileged. Any opinions expressed in this communication are not
>>>> necessarily those of NICE Actimize. If this communication has come to you
>>>> in error you must take no action based on it, nor must you copy or show it
>>>> to anyone; please delete/destroy and inform the sender by e-mail
>>>> immediately.
>>>> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
>>>> Viruses: Although we have taken steps toward ensuring that this e-mail
>>>> and attachments are free from any virus, we advise that in keeping with
>>>> good computing practice the recipient should ensure they are actually virus
>>>> free.
>>>>
>>>>
>>>>
>>
>

Re: Launching multiple spark jobs within a main spark job.

Reply via email to