Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-08 Thread bsikander
Qiao, Richard wrote
> Comparing #1 and #3, my understanding of “submitted” is “the jar is
> submitted to executors”. With this concept, you may define your own
> status.

In SparkLauncher, SUBMITTED means that the Driver was able to acquire cores
from Spark cluster and Launcher is waiting for Driver to connect back. Once
it connects back, the state of Driver is changed to CONNECTED.
As Marcelo mentioned, Launcher can only tell me about the Driver state and
it is not possible to guess the state of "application (executors)". For the
state of executors we can use SparkListener.

With the combination of both Launcher + Listener, I have a solution. As you
mentioned, that even if 1 executor is allocated to "application", the state
will change to RUNNING. So in my application, I change the status of my job
to RUNNING only if I receive RUNNING from Launcher and onExecuterAdded event
from SparkListener.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-08 Thread bsikander
Qiao, Richard wrote
> For your question of example, the answer is yes.

Perfect. I am assuming that this is true for Spark-standalone/YARN/Mesos.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Qiao, Richard
For your question of example, the answer is yes.
“For example, if an application wanted 4 executors
(spark.executor.instances=4) but the spark cluster can only provide 1
executor. This means that I will only receive 1 onExecutorAdded event. Will
the application state change to RUNNING (even if 1 executor was allocated)?
“

Best Regards
Richard


On 12/7/17, 2:40 PM, "bsikander"  wrote:

Marcelo Vanzin wrote
> I'm not sure I follow you here. This is something that you are
> defining, not Spark.

Yes, you are right. In my code, 
1) my notion of RUNNING is that both driver + executors are in RUNNING
state.
2) my notion of WAITING is if any one of driver/executor is in WAITING
state.

So,
- SparkLauncher provides me the details about the "driver".
RUNNING/SUBMITTED/WAITING
- SparkListener provides me the details about the "executor" using
onExecutorAdded/onExecutorDeleted

I want to combine both SparkLauncher + SparkListener to achieve my view of
RUNNING/WAITING.

The only thing confusing me here is that I don't know how Spark internally
converts applications from WAITING to RUNNING state.
For example, if an application wanted 4 executors
(spark.executor.instances=4) but the spark cluster can only provide 1
executor. This means that I will only receive 1 onExecutorAdded event. Will
the application state change to RUNNING (even if 1 executor was allocated)?

If I am clear on this logic I can implement my feature.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org





The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Qiao, Richard
For #2, do you mean “RUNNING” showing in “Driver” table? If yes, that is not a 
problem, because driver does run, while there is no executor available, as can 
be a status for you to catch – Driver running while no executors.
Comparing #1 and #3, my understanding of “submitted” is “the jar is submitted 
to executors”. With this concept, you may define your own status.

Best Regards
Richard


On 12/4/17, 4:06 AM, "bsikander"  wrote:

So, I tried to use SparkAppHandle.Listener with SparkLauncher as you
suggested. The behavior of Launcher is not what I expected.

1- If I start the job (using SparkLauncher) and my Spark cluster has enough
cores available, I receive events in my class extending
SparkAppHandle.Listener and I see the status getting changed from
UNKOWN->CONNECTED -> SUBMITTED -> RUNNING. All good here.

2- If my Spark cluster has cores only for my Driver process (running in
cluster mode) but no cores for my executor, then I still receive the RUNNING
event. I was expecting something else since my executor has no cores and
Master UI shows WAITING state for executors, listener should respond with
SUBMITTED state instead of RUNNING.

3- If my Spark cluster has no cores for even the driver process then
SparkLauncher invokes no events at all. The state stays in UNKNOWN. I would
have expected it to be in SUBMITTED state atleast.

*Is there any way with which I can reliably get the WAITING state of job?*
Driver=RUNNING, executor=RUNNING, overall state should be RUNNING
Driver=RUNNING, executor=WAITING overall state should be SUBMITTED/WAITING
Driver=WAITING, executor=WAITING overall state should be
CONNECTED/SUBMITTED/WAITING







--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org





The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Marcelo Vanzin
That's the Spark Master's view of the application. I don't know
exactly what it means in the different run modes, I'm more familiar
with YARN. But I wouldn't be surprised if, as with others, it mostly
tracks the driver's state.

On Thu, Dec 7, 2017 at 12:06 PM, bsikander  wrote:
> 
>
> See the image. I am referring to this state when I say "Application State".
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread bsikander

 

See the image. I am referring to this state when I say "Application State".



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Marcelo Vanzin
On Thu, Dec 7, 2017 at 11:40 AM, bsikander  wrote:
> For example, if an application wanted 4 executors
> (spark.executor.instances=4) but the spark cluster can only provide 1
> executor. This means that I will only receive 1 onExecutorAdded event. Will
> the application state change to RUNNING (even if 1 executor was allocated)?

What application state are you talking about? That's the thing that
you seem to be confused about here.

As you've already learned, SparkLauncher only cares about the driver.
So RUNNING means the driver is running.

And there's no concept of running anywhere else I know of that is
exposed to Spark applications. So I don't know which code you're
referring to when you say "the application state change to RUNNING".

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread bsikander
Marcelo Vanzin wrote
> I'm not sure I follow you here. This is something that you are
> defining, not Spark.

Yes, you are right. In my code, 
1) my notion of RUNNING is that both driver + executors are in RUNNING
state.
2) my notion of WAITING is if any one of driver/executor is in WAITING
state.

So,
- SparkLauncher provides me the details about the "driver".
RUNNING/SUBMITTED/WAITING
- SparkListener provides me the details about the "executor" using
onExecutorAdded/onExecutorDeleted

I want to combine both SparkLauncher + SparkListener to achieve my view of
RUNNING/WAITING.

The only thing confusing me here is that I don't know how Spark internally
converts applications from WAITING to RUNNING state.
For example, if an application wanted 4 executors
(spark.executor.instances=4) but the spark cluster can only provide 1
executor. This means that I will only receive 1 onExecutorAdded event. Will
the application state change to RUNNING (even if 1 executor was allocated)?

If I am clear on this logic I can implement my feature.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread Marcelo Vanzin
On Tue, Dec 5, 2017 at 12:43 PM, bsikander  wrote:
> 2) If I use context.addSparkListener, I can customize the listener but then
> I miss the onApplicationStart event. Also, I don't know the Spark's logic to
> changing the state of application from WAITING -> RUNNING.

I'm not sure I follow you here. This is something that you are
defining, not Spark.

"SparkLauncher" has its own view of that those mean, and it doesn't match yours.

"SparkListener" has no notion of whether an app is running or not.

It's up to you to define what waiting and running mean in your code,
and map the events Spark provides you to those concepts.

e.g., a job is running after your listener gets an "onJobStart" event.
But the application might have been running already before that job
started.

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread bsikander
Thank you for the reply.

I am not a Spark expert but I was reading through the code and I thought
that the state was changed from SUBMITTED to RUNNING only after executors
(CoarseGrainedExecutorBackend) were registered.
https://github.com/apache/spark/commit/015f7ef503d5544f79512b626749a1f0c48b#diff-a755f3d892ff2506a7aa7db52022d77cR95

As you mentioned that Launcher has no idea about executors, probably my
understanding is not correct.



SparkListener is an option but it has its own pitfalls. 
1) If I use spark.extraListeners, I get all the events but I cannot
customize the Listener, since I have to pass the class as a string to
spark-submit/Launcher. 
2) If I use context.addSparkListener, I can customize the listener but then
I miss the onApplicationStart event. Also, I don't know the Spark's logic to
changing the state of application from WAITING -> RUNNING.

Maybe you can answer,
If I have a Spark job which needs 3 executors and cluster can only provide 1
executor, will the application be in WAITING or RUNNING ?
If I know the Spark's logic then I can program something with
SparkListener.onExecutorAdded event to correctly figure out the state.

One other alternate can be to use Spark Master Json (http://<>:8080/json),
but the problem with this is that it returns everything and I was not able
to find any way to filter ..



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread Marcelo Vanzin
SparkLauncher operates at a different layer than Spark applications.
It doesn't know about executors or driver or anything, just whether
the Spark application was started or not. So it doesn't work for your
case.

The best option for your case is to install a SparkListener and
monitor events. But that will not tell you when things do not happen,
just when they do happen, so maybe even that is not enough for you.


On Mon, Dec 4, 2017 at 1:06 AM, bsikander  wrote:
> So, I tried to use SparkAppHandle.Listener with SparkLauncher as you
> suggested. The behavior of Launcher is not what I expected.
>
> 1- If I start the job (using SparkLauncher) and my Spark cluster has enough
> cores available, I receive events in my class extending
> SparkAppHandle.Listener and I see the status getting changed from
> UNKOWN->CONNECTED -> SUBMITTED -> RUNNING. All good here.
>
> 2- If my Spark cluster has cores only for my Driver process (running in
> cluster mode) but no cores for my executor, then I still receive the RUNNING
> event. I was expecting something else since my executor has no cores and
> Master UI shows WAITING state for executors, listener should respond with
> SUBMITTED state instead of RUNNING.
>
> 3- If my Spark cluster has no cores for even the driver process then
> SparkLauncher invokes no events at all. The state stays in UNKNOWN. I would
> have expected it to be in SUBMITTED state atleast.
>
> *Is there any way with which I can reliably get the WAITING state of job?*
> Driver=RUNNING, executor=RUNNING, overall state should be RUNNING
> Driver=RUNNING, executor=WAITING overall state should be SUBMITTED/WAITING
> Driver=WAITING, executor=WAITING overall state should be
> CONNECTED/SUBMITTED/WAITING
>
>
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-04 Thread bsikander
So, I tried to use SparkAppHandle.Listener with SparkLauncher as you
suggested. The behavior of Launcher is not what I expected.

1- If I start the job (using SparkLauncher) and my Spark cluster has enough
cores available, I receive events in my class extending
SparkAppHandle.Listener and I see the status getting changed from
UNKOWN->CONNECTED -> SUBMITTED -> RUNNING. All good here.

2- If my Spark cluster has cores only for my Driver process (running in
cluster mode) but no cores for my executor, then I still receive the RUNNING
event. I was expecting something else since my executor has no cores and
Master UI shows WAITING state for executors, listener should respond with
SUBMITTED state instead of RUNNING.

3- If my Spark cluster has no cores for even the driver process then
SparkLauncher invokes no events at all. The state stays in UNKNOWN. I would
have expected it to be in SUBMITTED state atleast.

*Is there any way with which I can reliably get the WAITING state of job?*
Driver=RUNNING, executor=RUNNING, overall state should be RUNNING
Driver=RUNNING, executor=WAITING overall state should be SUBMITTED/WAITING
Driver=WAITING, executor=WAITING overall state should be
CONNECTED/SUBMITTED/WAITING







--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-11-08 Thread Davide.Mandrini
In this case, the only way to check the status is via REST calls to the Spark
json API, accessible at http://:/json/



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-11-08 Thread bsikander
Thank you for the reply.

I am currently not using SparkLauncher to launch my driver. Rather, I am
using the old fashion spark-submit and moving to SparkLauncher is not an
option right now.
Do I have any options there?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-11-07 Thread Davide.Mandrini
Hello Behroz,

you can use a SparkListener to get updates from the underlying process (c.f.
https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/scheduler/SparkListener.html
)

You need first to create your own SparkAppListener class that extends it:
-
private static class SparkAppListener implements SparkAppHandle.Listener,
Runnable {

SparkAppListener() {}

@Override
public void stateChanged(SparkAppHandle handle) {
String sparkAppId = handle.getAppId();
SparkAppHandle.State appState = handle.getState();
log.info("Spark job with app id: " + sparkAppId + ", State
changed to: " + appState);
}

@Override
public void infoChanged(SparkAppHandle handle) {}

@Override
public void run() {}
}
-



Then you can run it in a thread via a Executors.newCachedThreadPool (or with
a simple New Thread())
-
private final static ExecutorService listenerService =
Executors.newCachedThreadPool();

SparkAppListener appListener = new SparkAppListener();
listenerService.execute(appListener);

SparkLauncher command = new SparkLauncher()
.setAppName(appName)
.setSparkHome(sparkHome)
.setAppResource(appResource)
.setMainClass(mainClass)
.setMaster(master) .

SparkAppHandle appHandle = launcher.startApplication(appListener);
-

At this point, every time the state changes, you will execute the
SparkAppListener.stateChanged method.

Hope it helps,
Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-11-07 Thread bsikander
Anyone ?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Programmatically get status of job (WAITING/RUNNING)

2017-10-30 Thread Behroz Sikander
Hi,

I have a Spark Cluster running in client mode. I programmatically submit
jobs to spark cluster. Under the hood, I am using spark-submit.

If my cluster is overloaded and I start a context, the driver JVM keeps on
waiting for executors. The executors are in waiting state because cluster
does not have enough resources. Here are the log messages in driver logs

2017-10-27 13:20:15,260 WARN Timer-0
org.apache.spark.scheduler.TaskSchedulerImpl []: Initial job has not
accepted any resources; check your cluster UI to ensure that workers
are registered and have sufficient resources2017-10-27 13:20:30,259
WARN Timer-0 org.apache.spark.scheduler.TaskSchedulerImpl []: Initial
job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient resources

Is it possible to programmatically check the status of application (e.g.
Running/Waiting etc)? I know that we can use the application id and then
query the history server but I would like to know a solution which does not
involve REST calls to history server.

SparkContext should know about the state? How can i get this information
from sc?


Regards,

Behroz