Re: Getting spark job progress programmatically

2014-11-20 Thread andy petrella
Awesome! And Patrick just gave his LGTM ;-)

On Wed Nov 19 2014 at 5:13:17 PM Aniket Bhatnagar 
aniket.bhatna...@gmail.com wrote:

 Thanks for pointing this out Mark. Had totally missed the existing JIRA
 items

 On Wed Nov 19 2014 at 21:42:19 Mark Hamstra m...@clearstorydata.com
 wrote:

 This is already being covered by SPARK-2321 and SPARK-4145.  There are
 pull requests that are already merged or already very far along -- e.g.,
 https://github.com/apache/spark/pull/3009

 If there is anything that needs to be added, please add it to those
 issues or PRs.

 On Wed, Nov 19, 2014 at 7:55 AM, Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 I have for now submitted a JIRA ticket @
 https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my
 experiences ( hacks) and submit them as a feature request for public API.

 On Tue Nov 18 2014 at 20:35:00 andy petrella andy.petre...@gmail.com
 wrote:

 yep, we should also propose to add this stuffs in the public API.

 Any other ideas?

 On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 Thanks Andy. This is very useful. This gives me all active stages 
 their percentage completion but I am unable to tie stages to job group (or
 specific job). I looked at Spark's code and to me, it
 seems org.apache.spark.scheduler.ActiveJob's group ID should get 
 propagated
 to StageInfo (possibly in the StageInfo.fromStage method). For now, I will
 have to write my own version JobProgressListener that stores stageId to
 group Id mapping.

 I will submit a JIRA ticket and seek spark dev's opinion on this. Many
 thanks for your prompt help Andy.

 Thanks,
 Aniket


 On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com
 wrote:

 I started some quick hack for that in the notebook, you can head to:
 https://github.com/andypetrella/spark-notebook/
 blob/master/common/src/main/scala/notebook/front/widgets/
 SparkInfo.scala

 On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 I am writing yet another Spark job server and have been able to
 submit jobs and return/save results. I let multiple jobs use the same 
 spark
 context but I set job group while firing each job so that I can in 
 future
 cancel jobs. Further, what I deserve to do is provide some kind of 
 status
 update/progress on running jobs (a % completion but be awesome) but I am
 unable to figure out appropriate spark API to use. I do however see 
 status
 reporting in spark UI so there must be a way to get status of various
 stages per job group. Any hints on what APIs should I look at?





Re: Getting spark job progress programmatically

2014-11-19 Thread Aniket Bhatnagar
I have for now submitted a JIRA ticket @
https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my
experiences ( hacks) and submit them as a feature request for public API.
On Tue Nov 18 2014 at 20:35:00 andy petrella andy.petre...@gmail.com
wrote:

 yep, we should also propose to add this stuffs in the public API.

 Any other ideas?

 On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 Thanks Andy. This is very useful. This gives me all active stages  their
 percentage completion but I am unable to tie stages to job group (or
 specific job). I looked at Spark's code and to me, it
 seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated
 to StageInfo (possibly in the StageInfo.fromStage method). For now, I will
 have to write my own version JobProgressListener that stores stageId to
 group Id mapping.

 I will submit a JIRA ticket and seek spark dev's opinion on this. Many
 thanks for your prompt help Andy.

 Thanks,
 Aniket


 On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com
 wrote:

 I started some quick hack for that in the notebook, you can head to:
 https://github.com/andypetrella/spark-notebook/
 blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala

 On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 I am writing yet another Spark job server and have been able to submit
 jobs and return/save results. I let multiple jobs use the same spark
 context but I set job group while firing each job so that I can in future
 cancel jobs. Further, what I deserve to do is provide some kind of status
 update/progress on running jobs (a % completion but be awesome) but I am
 unable to figure out appropriate spark API to use. I do however see status
 reporting in spark UI so there must be a way to get status of various
 stages per job group. Any hints on what APIs should I look at?




Re: Getting spark job progress programmatically

2014-11-19 Thread Aniket Bhatnagar
Thanks for pointing this out Mark. Had totally missed the existing JIRA
items

On Wed Nov 19 2014 at 21:42:19 Mark Hamstra m...@clearstorydata.com wrote:

 This is already being covered by SPARK-2321 and SPARK-4145.  There are
 pull requests that are already merged or already very far along -- e.g.,
 https://github.com/apache/spark/pull/3009

 If there is anything that needs to be added, please add it to those issues
 or PRs.

 On Wed, Nov 19, 2014 at 7:55 AM, Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 I have for now submitted a JIRA ticket @
 https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my
 experiences ( hacks) and submit them as a feature request for public API.

 On Tue Nov 18 2014 at 20:35:00 andy petrella andy.petre...@gmail.com
 wrote:

 yep, we should also propose to add this stuffs in the public API.

 Any other ideas?

 On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 Thanks Andy. This is very useful. This gives me all active stages 
 their percentage completion but I am unable to tie stages to job group (or
 specific job). I looked at Spark's code and to me, it
 seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated
 to StageInfo (possibly in the StageInfo.fromStage method). For now, I will
 have to write my own version JobProgressListener that stores stageId to
 group Id mapping.

 I will submit a JIRA ticket and seek spark dev's opinion on this. Many
 thanks for your prompt help Andy.

 Thanks,
 Aniket


 On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com
 wrote:

 I started some quick hack for that in the notebook, you can head to:
 https://github.com/andypetrella/spark-notebook/
 blob/master/common/src/main/scala/notebook/front/widgets/
 SparkInfo.scala

 On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 I am writing yet another Spark job server and have been able to
 submit jobs and return/save results. I let multiple jobs use the same 
 spark
 context but I set job group while firing each job so that I can in future
 cancel jobs. Further, what I deserve to do is provide some kind of status
 update/progress on running jobs (a % completion but be awesome) but I am
 unable to figure out appropriate spark API to use. I do however see 
 status
 reporting in spark UI so there must be a way to get status of various
 stages per job group. Any hints on what APIs should I look at?





Re: Getting spark job progress programmatically

2014-11-19 Thread Mark Hamstra
This is already being covered by SPARK-2321 and SPARK-4145.  There are pull
requests that are already merged or already very far along -- e.g.,
https://github.com/apache/spark/pull/3009

If there is anything that needs to be added, please add it to those issues
or PRs.

On Wed, Nov 19, 2014 at 7:55 AM, Aniket Bhatnagar 
aniket.bhatna...@gmail.com wrote:

 I have for now submitted a JIRA ticket @
 https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my
 experiences ( hacks) and submit them as a feature request for public API.

 On Tue Nov 18 2014 at 20:35:00 andy petrella andy.petre...@gmail.com
 wrote:

 yep, we should also propose to add this stuffs in the public API.

 Any other ideas?

 On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 Thanks Andy. This is very useful. This gives me all active stages 
 their percentage completion but I am unable to tie stages to job group (or
 specific job). I looked at Spark's code and to me, it
 seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated
 to StageInfo (possibly in the StageInfo.fromStage method). For now, I will
 have to write my own version JobProgressListener that stores stageId to
 group Id mapping.

 I will submit a JIRA ticket and seek spark dev's opinion on this. Many
 thanks for your prompt help Andy.

 Thanks,
 Aniket


 On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com
 wrote:

 I started some quick hack for that in the notebook, you can head to:
 https://github.com/andypetrella/spark-notebook/
 blob/master/common/src/main/scala/notebook/front/widgets/
 SparkInfo.scala

 On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 I am writing yet another Spark job server and have been able to submit
 jobs and return/save results. I let multiple jobs use the same spark
 context but I set job group while firing each job so that I can in future
 cancel jobs. Further, what I deserve to do is provide some kind of status
 update/progress on running jobs (a % completion but be awesome) but I am
 unable to figure out appropriate spark API to use. I do however see status
 reporting in spark UI so there must be a way to get status of various
 stages per job group. Any hints on what APIs should I look at?




Getting spark job progress programmatically

2014-11-18 Thread Aniket Bhatnagar
I am writing yet another Spark job server and have been able to submit jobs
and return/save results. I let multiple jobs use the same spark context but
I set job group while firing each job so that I can in future cancel jobs.
Further, what I deserve to do is provide some kind of status
update/progress on running jobs (a % completion but be awesome) but I am
unable to figure out appropriate spark API to use. I do however see status
reporting in spark UI so there must be a way to get status of various
stages per job group. Any hints on what APIs should I look at?


Re: Getting spark job progress programmatically

2014-11-18 Thread andy petrella
I started some quick hack for that in the notebook, you can head to:
https://github.com/andypetrella/spark-notebook/blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala

On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar 
aniket.bhatna...@gmail.com wrote:

 I am writing yet another Spark job server and have been able to submit
 jobs and return/save results. I let multiple jobs use the same spark
 context but I set job group while firing each job so that I can in future
 cancel jobs. Further, what I deserve to do is provide some kind of status
 update/progress on running jobs (a % completion but be awesome) but I am
 unable to figure out appropriate spark API to use. I do however see status
 reporting in spark UI so there must be a way to get status of various
 stages per job group. Any hints on what APIs should I look at?


Re: Getting spark job progress programmatically

2014-11-18 Thread Aniket Bhatnagar
Thanks Andy. This is very useful. This gives me all active stages  their
percentage completion but I am unable to tie stages to job group (or
specific job). I looked at Spark's code and to me, it
seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated
to StageInfo (possibly in the StageInfo.fromStage method). For now, I will
have to write my own version JobProgressListener that stores stageId to
group Id mapping.

I will submit a JIRA ticket and seek spark dev's opinion on this. Many
thanks for your prompt help Andy.

Thanks,
Aniket

On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com
wrote:

 I started some quick hack for that in the notebook, you can head to:
 https://github.com/andypetrella/spark-notebook/
 blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala

 On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 I am writing yet another Spark job server and have been able to submit
 jobs and return/save results. I let multiple jobs use the same spark
 context but I set job group while firing each job so that I can in future
 cancel jobs. Further, what I deserve to do is provide some kind of status
 update/progress on running jobs (a % completion but be awesome) but I am
 unable to figure out appropriate spark API to use. I do however see status
 reporting in spark UI so there must be a way to get status of various
 stages per job group. Any hints on what APIs should I look at?




Re: Getting spark job progress programmatically

2014-11-18 Thread andy petrella
yep, we should also propose to add this stuffs in the public API.

Any other ideas?

On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar 
aniket.bhatna...@gmail.com wrote:

 Thanks Andy. This is very useful. This gives me all active stages  their
 percentage completion but I am unable to tie stages to job group (or
 specific job). I looked at Spark's code and to me, it
 seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated
 to StageInfo (possibly in the StageInfo.fromStage method). For now, I will
 have to write my own version JobProgressListener that stores stageId to
 group Id mapping.

 I will submit a JIRA ticket and seek spark dev's opinion on this. Many
 thanks for your prompt help Andy.

 Thanks,
 Aniket


 On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com
 wrote:

 I started some quick hack for that in the notebook, you can head to:
 https://github.com/andypetrella/spark-notebook/
 blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala

 On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar 
 aniket.bhatna...@gmail.com wrote:

 I am writing yet another Spark job server and have been able to submit
 jobs and return/save results. I let multiple jobs use the same spark
 context but I set job group while firing each job so that I can in future
 cancel jobs. Further, what I deserve to do is provide some kind of status
 update/progress on running jobs (a % completion but be awesome) but I am
 unable to figure out appropriate spark API to use. I do however see status
 reporting in spark UI so there must be a way to get status of various
 stages per job group. Any hints on what APIs should I look at?