subject:"RE\: Launching multiple spark jobs within a main spark job."

Re: Launching multiple spark jobs within a main spark job.

2016-12-24 Thread Naveen

Thanks Liang, Vadim and everyone for your inputs!!

With this clarity, I've tried client modes for both main and sub-spark
jobs. Every main spark job and its corresponding threaded spark jobs are
coming up on the YARN applications list and the jobs are getting executed
properly. I need to now test with cluster modes at both levels, and need to
setup spark-submit and few configurations properly on all data nodes in the
cluster. I will share the updates as and when I execute and analyze further.

Concern now which I am thinking is: how to throttle multiple jobs launching
based on the YARN cluster's availability. This exercise will be similar to
performing cluster's break-point analysis. But problem here is that we will
not know the file sizes until we read and get in memory and since Spark's
memory mechanics are more subtle and fragile, need to be 100% sure and
avoid OOM (out-of-memory) issues. Not sure if there is any process
available which can poll resource manager's information and tell if any
further jobs can be submitted to YARN.

On Thu, Dec 22, 2016 at 7:26 AM, Liang-Chi Hsieh  wrote:

>
> If you run the main driver and other Spark jobs in client mode, you can
> make
> sure they (I meant all the drivers) are running at the same node. Of course
> all drivers now consume the resources at the same node.
>
> If you run the main driver in client mode, but run other Spark jobs in
> cluster mode, the drivers of those Spark jobs will be launched at other
> nodes in the cluster. It should work too. It is as same as you run a Spark
> app in client mode and more others in cluster mode.
>
> If you run your main driver in cluster mode, and run other Spark jobs in
> cluster mode too, you may need  Spark properly installed in all nodes in
> the
> cluster, because those Spark jobs will be launched at the node which the
> main driver is running on.
>
>
>
>
>
> -
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Launching-multiple-
> spark-jobs-within-a-main-spark-job-tp20311p20327.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Liang-Chi Hsieh


If you run the main driver and other Spark jobs in client mode, you can make
sure they (I meant all the drivers) are running at the same node. Of course
all drivers now consume the resources at the same node.

If you run the main driver in client mode, but run other Spark jobs in
cluster mode, the drivers of those Spark jobs will be launched at other
nodes in the cluster. It should work too. It is as same as you run a Spark
app in client mode and more others in cluster mode.

If you run your main driver in cluster mode, and run other Spark jobs in
cluster mode too, you may need  Spark properly installed in all nodes in the
cluster, because those Spark jobs will be launched at the node which the
main driver is running on.





-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Launching-multiple-spark-jobs-within-a-main-spark-job-tp20311p20327.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

RE: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread David Hodeffi

I am not familiar of any problem with that.
Anyway, If you run spark applicaction you would have multiple jobs, which makes 
sense that it is not a problem.

Thanks David.

From: Naveen [mailto:hadoopst...@gmail.com]
Sent: Wednesday, December 21, 2016 9:18 AM
To: dev@spark.apache.org; u...@spark.apache.org
Subject: Launching multiple spark jobs within a main spark job.

Hi Team,

Is it ok to spawn multiple spark jobs within a main spark job, my main spark 
job's driver which was launched on yarn cluster, will do some preprocessing and 
based on it, it needs to launch multilple spark jobs on yarn cluster. Not sure 
if this right pattern.

Please share your thoughts.
Sample code i ve is as below for better understanding..
-

Object Mainsparkjob {

main(...){

val sc=new SparkContext(..)

Fetch from hive..using hivecontext
Fetch from hbase

//spawning multiple Futures..
Val future1=Future{
Val sparkjob= SparkLauncher(...).launch; spark.waitFor
}

Similarly, future2 to futureN.

future1.onComplete{...}
}

}// end of mainsparkjob
--

Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Sebastian Piu

Is there any reason you need a context on the application launching the
jobs?
You can use SparkLauncher in a normal app and just listen for state
transitions

On Wed, 21 Dec 2016, 11:44 Naveen,  wrote:

> Hi Team,
>
> Thanks for your responses.
> Let me give more details in a picture of how I am trying to launch jobs.
>
> Main spark job will launch other spark-job similar to calling multiple
> spark-submit within a Spark driver program.
> These spawned threads for new jobs will be totally different components,
> so these cannot be implemented using spark actions.
>
> sample code:
>
> -
>
> Object Mainsparkjob {
>
> main(...){
>
> val sc=new SparkContext(..)
>
> Fetch from hive..using hivecontext
> Fetch from hbase
>
> //spawning multiple Futures..
> Val future1=Future{
> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
> }
>
> Similarly, future2 to futureN.
>
> future1.onComplete{...}
> }
>
> }// end of mainsparkjob
> --
>
>
> [image: Inline image 1]
>
> On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi <
> david.hode...@niceactimize.com> wrote:
>
> I am not familiar of any problem with that.
>
> Anyway, If you run spark applicaction you would have multiple jobs, which
> makes sense that it is not a problem.
>
>
>
> Thanks David.
>
>
>
> *From:* Naveen [mailto:hadoopst...@gmail.com]
> *Sent:* Wednesday, December 21, 2016 9:18 AM
> *To:* dev@spark.apache.org; u...@spark.apache.org
> *Subject:* Launching multiple spark jobs within a main spark job.
>
>
>
> Hi Team,
>
>
>
> Is it ok to spawn multiple spark jobs within a main spark job, my main
> spark job's driver which was launched on yarn cluster, will do some
> preprocessing and based on it, it needs to launch multilple spark jobs on
> yarn cluster. Not sure if this right pattern.
>
>
>
> Please share your thoughts.
>
> Sample code i ve is as below for better understanding..
>
> -
>
>
>
> Object Mainsparkjob {
>
>
>
> main(...){
>
>
>
> val sc=new SparkContext(..)
>
>
>
> Fetch from hive..using hivecontext
>
> Fetch from hbase
>
>
>
> //spawning multiple Futures..
>
> Val future1=Future{
>
> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
>
> }
>
>
>
> Similarly, future2 to futureN.
>
>
>
> future1.onComplete{...}
>
> }
>
>
>
> }// end of mainsparkjob
>
> --
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>
>
>

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Naveen

Thanks Liang!
I get your point. It would mean that when launching spark jobs, mode needs
to be specified as client for all spark jobs.
However, my concern is to know if driver's memory(which is launching spark
jobs) will be used completely by the Future's(sparkcontext's) or these
spawned sparkcontexts will get different nodes / executors from resource
manager?

On Wed, Dec 21, 2016 at 6:43 PM, Naveen  wrote:

> Hi Sebastian,
>
> Yes, for fetching the details from Hive and HBase, I would want to use
> Spark's HiveContext etc.
> However, based on your point, I might have to check if JDBC based driver
> connection could be used to do the same.
>
> Main reason for this is to avoid a client-server architecture design.
>
> If we go by a normal scala app without creating a sparkcontext as per your
> suggestion, then
> 1. it turns out to be a client program on cluster on a single node, and
> for any multiple invocation through xyz scheduler , it will be invoked
> always from that same node
> 2. Having client program on a single data node might create a hotspot for
> that data node which might create a bottleneck as all invocations might
> create JVMs on that node itself.
> 3. With above, we will loose the Spark on YARN's feature of dynamically
> allocating a driver on any available data node through RM and NM
> co-ordination. With YARN and Cluster mode of invoking a spark-job, it will
> help distribute multiple application(main one) in cluster uniformly.
>
> Thanks and please let me know your views.
>
>
> On Wed, Dec 21, 2016 at 5:43 PM, Sebastian Piu 
> wrote:
>
>> Is there any reason you need a context on the application launching the
>> jobs?
>> You can use SparkLauncher in a normal app and just listen for state
>> transitions
>>
>> On Wed, 21 Dec 2016, 11:44 Naveen,  wrote:
>>
>>> Hi Team,
>>>
>>> Thanks for your responses.
>>> Let me give more details in a picture of how I am trying to launch jobs.
>>>
>>> Main spark job will launch other spark-job similar to calling multiple
>>> spark-submit within a Spark driver program.
>>> These spawned threads for new jobs will be totally different components,
>>> so these cannot be implemented using spark actions.
>>>
>>> sample code:
>>>
>>> -
>>>
>>> Object Mainsparkjob {
>>>
>>> main(...){
>>>
>>> val sc=new SparkContext(..)
>>>
>>> Fetch from hive..using hivecontext
>>> Fetch from hbase
>>>
>>> //spawning multiple Futures..
>>> Val future1=Future{
>>> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
>>> }
>>>
>>> Similarly, future2 to futureN.
>>>
>>> future1.onComplete{...}
>>> }
>>>
>>> }// end of mainsparkjob
>>> --
>>>
>>>
>>> [image: Inline image 1]
>>>
>>> On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi <
>>> david.hode...@niceactimize.com> wrote:
>>>
>>> I am not familiar of any problem with that.
>>>
>>> Anyway, If you run spark applicaction you would have multiple jobs,
>>> which makes sense that it is not a problem.
>>>
>>>
>>>
>>> Thanks David.
>>>
>>>
>>>
>>> *From:* Naveen [mailto:hadoopst...@gmail.com]
>>> *Sent:* Wednesday, December 21, 2016 9:18 AM
>>> *To:* dev@spark.apache.org; u...@spark.apache.org
>>> *Subject:* Launching multiple spark jobs within a main spark job.
>>>
>>>
>>>
>>> Hi Team,
>>>
>>>
>>>
>>> Is it ok to spawn multiple spark jobs within a main spark job, my main
>>> spark job's driver which was launched on yarn cluster, will do some
>>> preprocessing and based on it, it needs to launch multilple spark jobs on
>>> yarn cluster. Not sure if this right pattern.
>>>
>>>
>>>
>>> Please share your thoughts.
>>>
>>> Sample code i ve is as below for better understanding..
>>>
>>> -
>>>
>>>
>>>
>>> Object Mainsparkjob {
>>>
>>>
>>>
>>> main(...){
>>>
>>>
>>>
>>> val sc=new SparkContext(..)
>>>
>>>
>>>
>>> Fetch from hive..using hivecontext
>>>
>>> Fetch from hbase
>>>
>>>
>>>
>>> //spawning multiple Futures..
>>>
>>> Val future1=Future{
>>>
>>> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
>>>
>>> }
>>>
>>>
>>>
>>> Similarly, future2 to futureN.
>>>
>>>
>>>
>>> future1.onComplete{...}
>>>
>>> }
>>>
>>>
>>>
>>> }// end of mainsparkjob
>>>
>>> --
>>>
>>>
>>> Confidentiality: This communication and any attachments are intended for
>>> the above-named persons only and may be confidential and/or legally
>>> privileged. Any opinions expressed in this communication are not
>>> necessarily those of NICE Actimize. If this communication has come to you
>>> in error you must take no action based on it, nor must you copy or show it
>>> to anyone; please delete/destroy and inform the sender by e-mail
>>> immediately.
>>> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
>>> Viruses: Although we have taken steps toward ensuring that this e-mail
>>> and attachments are free from any virus, we advise that in keeping with
>>> good computing practice the recipient should ensure they are actually virus
>>> free.
>>>
>>>
>>>
>

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Naveen

Hi Sebastian,

Yes, for fetching the details from Hive and HBase, I would want to use
Spark's HiveContext etc.
However, based on your point, I might have to check if JDBC based driver
connection could be used to do the same.

Main reason for this is to avoid a client-server architecture design.

If we go by a normal scala app without creating a sparkcontext as per your
suggestion, then
1. it turns out to be a client program on cluster on a single node, and for
any multiple invocation through xyz scheduler , it will be invoked always
from that same node
2. Having client program on a single data node might create a hotspot for
that data node which might create a bottleneck as all invocations might
create JVMs on that node itself.
3. With above, we will loose the Spark on YARN's feature of dynamically
allocating a driver on any available data node through RM and NM
co-ordination. With YARN and Cluster mode of invoking a spark-job, it will
help distribute multiple application(main one) in cluster uniformly.

Thanks and please let me know your views.


On Wed, Dec 21, 2016 at 5:43 PM, Sebastian Piu 
wrote:

> Is there any reason you need a context on the application launching the
> jobs?
> You can use SparkLauncher in a normal app and just listen for state
> transitions
>
> On Wed, 21 Dec 2016, 11:44 Naveen,  wrote:
>
>> Hi Team,
>>
>> Thanks for your responses.
>> Let me give more details in a picture of how I am trying to launch jobs.
>>
>> Main spark job will launch other spark-job similar to calling multiple
>> spark-submit within a Spark driver program.
>> These spawned threads for new jobs will be totally different components,
>> so these cannot be implemented using spark actions.
>>
>> sample code:
>>
>> -
>>
>> Object Mainsparkjob {
>>
>> main(...){
>>
>> val sc=new SparkContext(..)
>>
>> Fetch from hive..using hivecontext
>> Fetch from hbase
>>
>> //spawning multiple Futures..
>> Val future1=Future{
>> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
>> }
>>
>> Similarly, future2 to futureN.
>>
>> future1.onComplete{...}
>> }
>>
>> }// end of mainsparkjob
>> --
>>
>>
>> [image: Inline image 1]
>>
>> On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi <
>> david.hode...@niceactimize.com> wrote:
>>
>> I am not familiar of any problem with that.
>>
>> Anyway, If you run spark applicaction you would have multiple jobs, which
>> makes sense that it is not a problem.
>>
>>
>>
>> Thanks David.
>>
>>
>>
>> *From:* Naveen [mailto:hadoopst...@gmail.com]
>> *Sent:* Wednesday, December 21, 2016 9:18 AM
>> *To:* dev@spark.apache.org; u...@spark.apache.org
>> *Subject:* Launching multiple spark jobs within a main spark job.
>>
>>
>>
>> Hi Team,
>>
>>
>>
>> Is it ok to spawn multiple spark jobs within a main spark job, my main
>> spark job's driver which was launched on yarn cluster, will do some
>> preprocessing and based on it, it needs to launch multilple spark jobs on
>> yarn cluster. Not sure if this right pattern.
>>
>>
>>
>> Please share your thoughts.
>>
>> Sample code i ve is as below for better understanding..
>>
>> -
>>
>>
>>
>> Object Mainsparkjob {
>>
>>
>>
>> main(...){
>>
>>
>>
>> val sc=new SparkContext(..)
>>
>>
>>
>> Fetch from hive..using hivecontext
>>
>> Fetch from hbase
>>
>>
>>
>> //spawning multiple Futures..
>>
>> Val future1=Future{
>>
>> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
>>
>> }
>>
>>
>>
>> Similarly, future2 to futureN.
>>
>>
>>
>> future1.onComplete{...}
>>
>> }
>>
>>
>>
>> }// end of mainsparkjob
>>
>> --
>>
>>
>> Confidentiality: This communication and any attachments are intended for
>> the above-named persons only and may be confidential and/or legally
>> privileged. Any opinions expressed in this communication are not
>> necessarily those of NICE Actimize. If this communication has come to you
>> in error you must take no action based on it, nor must you copy or show it
>> to anyone; please delete/destroy and inform the sender by e-mail
>> immediately.
>> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
>> Viruses: Although we have taken steps toward ensuring that this e-mail
>> and attachments are free from any virus, we advise that in keeping with
>> good computing practice the recipient should ensure they are actually virus
>> free.
>>
>>
>>

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Liang-Chi Hsieh


OK.

I think it is little unusual use pattern, but it should work.

As I said before, if you want those Spark applications to share cluster
resources, proper configs is needed for Spark.

If you submit the main driver and all other Spark applications in client
mode under yarn, you should make sure the node running the driver has enough
resources to run them.

I am not sure if you can use `SparkLauncher` to submit them in different
mode, e.g., main driver in client mode, others in cluster mode. Worth
trying.





-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Launching-multiple-spark-jobs-within-a-main-spark-job-tp20311p20315.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Naveen

Hi Team,

Thanks for your responses.
Let me give more details in a picture of how I am trying to launch jobs.

Main spark job will launch other spark-job similar to calling multiple
spark-submit within a Spark driver program.
These spawned threads for new jobs will be totally different components, so
these cannot be implemented using spark actions.

sample code:
-

Object Mainsparkjob {

main(...){

val sc=new SparkContext(..)

Fetch from hive..using hivecontext
Fetch from hbase

//spawning multiple Futures..
Val future1=Future{
Val sparkjob= SparkLauncher(...).launch; spark.waitFor
}

Similarly, future2 to futureN.

future1.onComplete{...}
}

}// end of mainsparkjob
--


[image: Inline image 1]

On Wed, Dec 21, 2016 at 3:13 PM, David Hodeffi <
david.hode...@niceactimize.com> wrote:

> I am not familiar of any problem with that.
>
> Anyway, If you run spark applicaction you would have multiple jobs, which
> makes sense that it is not a problem.
>
>
>
> Thanks David.
>
>
>
> *From:* Naveen [mailto:hadoopst...@gmail.com]
> *Sent:* Wednesday, December 21, 2016 9:18 AM
> *To:* dev@spark.apache.org; u...@spark.apache.org
> *Subject:* Launching multiple spark jobs within a main spark job.
>
>
>
> Hi Team,
>
>
>
> Is it ok to spawn multiple spark jobs within a main spark job, my main
> spark job's driver which was launched on yarn cluster, will do some
> preprocessing and based on it, it needs to launch multilple spark jobs on
> yarn cluster. Not sure if this right pattern.
>
>
>
> Please share your thoughts.
>
> Sample code i ve is as below for better understanding..
>
> -
>
>
>
> Object Mainsparkjob {
>
>
>
> main(...){
>
>
>
> val sc=new SparkContext(..)
>
>
>
> Fetch from hive..using hivecontext
>
> Fetch from hbase
>
>
>
> //spawning multiple Futures..
>
> Val future1=Future{
>
> Val sparkjob= SparkLauncher(...).launch; spark.waitFor
>
> }
>
>
>
> Similarly, future2 to futureN.
>
>
>
> future1.onComplete{...}
>
> }
>
>
>
> }// end of mainsparkjob
>
> --
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Liang-Chi Hsieh

Hi,

As you launch multiple Spark jobs through `SparkLauncher`, I think it
actually works like you run multiple Spark applications with `spark-submit`.

By default each application will try to use all available nodes. If your
purpose is to share cluster resources across those Spark jobs/applications,
you may need to set some configs properly.

Please check out:

http://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications

As you said you launch the main Spark job on yarn cluster, if you are using
cluster mode, actually you will submit those Spark jobs/applications on the
node which the driver runs. It looks weird.

Looks like you try to fetch some data first and do some jobs on the data.
Can't you just do those jobs in the main driver as Spark actions with its
API?



-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Launching-multiple-spark-jobs-within-a-main-spark-job-tp20311p20312.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Launching multiple spark jobs within a main spark job.

Re: Launching multiple spark jobs within a main spark job.

RE: Launching multiple spark jobs within a main spark job.

Re: Launching multiple spark jobs within a main spark job.

Re: Launching multiple spark jobs within a main spark job.

Re: Launching multiple spark jobs within a main spark job.

Re: Launching multiple spark jobs within a main spark job.

Re: Launching multiple spark jobs within a main spark job.

Re: Launching multiple spark jobs within a main spark job.

9 matches

Site Navigation

Mail list logo

Footer information