Re: Getting spark job progress programmatically
Awesome! And Patrick just gave his LGTM ;-) On Wed Nov 19 2014 at 5:13:17 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Thanks for pointing this out Mark. Had totally missed the existing JIRA items On Wed Nov 19 2014 at 21:42:19 Mark Hamstra m...@clearstorydata.com wrote: This is already being covered by SPARK-2321 and SPARK-4145. There are pull requests that are already merged or already very far along -- e.g., https://github.com/apache/spark/pull/3009 If there is anything that needs to be added, please add it to those issues or PRs. On Wed, Nov 19, 2014 at 7:55 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I have for now submitted a JIRA ticket @ https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my experiences ( hacks) and submit them as a feature request for public API. On Tue Nov 18 2014 at 20:35:00 andy petrella andy.petre...@gmail.com wrote: yep, we should also propose to add this stuffs in the public API. Any other ideas? On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Thanks Andy. This is very useful. This gives me all active stages their percentage completion but I am unable to tie stages to job group (or specific job). I looked at Spark's code and to me, it seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated to StageInfo (possibly in the StageInfo.fromStage method). For now, I will have to write my own version JobProgressListener that stores stageId to group Id mapping. I will submit a JIRA ticket and seek spark dev's opinion on this. Many thanks for your prompt help Andy. Thanks, Aniket On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com wrote: I started some quick hack for that in the notebook, you can head to: https://github.com/andypetrella/spark-notebook/ blob/master/common/src/main/scala/notebook/front/widgets/ SparkInfo.scala On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark API to use. I do however see status reporting in spark UI so there must be a way to get status of various stages per job group. Any hints on what APIs should I look at?
Re: Getting spark job progress programmatically
I have for now submitted a JIRA ticket @ https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my experiences ( hacks) and submit them as a feature request for public API. On Tue Nov 18 2014 at 20:35:00 andy petrella andy.petre...@gmail.com wrote: yep, we should also propose to add this stuffs in the public API. Any other ideas? On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Thanks Andy. This is very useful. This gives me all active stages their percentage completion but I am unable to tie stages to job group (or specific job). I looked at Spark's code and to me, it seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated to StageInfo (possibly in the StageInfo.fromStage method). For now, I will have to write my own version JobProgressListener that stores stageId to group Id mapping. I will submit a JIRA ticket and seek spark dev's opinion on this. Many thanks for your prompt help Andy. Thanks, Aniket On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com wrote: I started some quick hack for that in the notebook, you can head to: https://github.com/andypetrella/spark-notebook/ blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark API to use. I do however see status reporting in spark UI so there must be a way to get status of various stages per job group. Any hints on what APIs should I look at?
Re: Getting spark job progress programmatically
Thanks for pointing this out Mark. Had totally missed the existing JIRA items On Wed Nov 19 2014 at 21:42:19 Mark Hamstra m...@clearstorydata.com wrote: This is already being covered by SPARK-2321 and SPARK-4145. There are pull requests that are already merged or already very far along -- e.g., https://github.com/apache/spark/pull/3009 If there is anything that needs to be added, please add it to those issues or PRs. On Wed, Nov 19, 2014 at 7:55 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I have for now submitted a JIRA ticket @ https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my experiences ( hacks) and submit them as a feature request for public API. On Tue Nov 18 2014 at 20:35:00 andy petrella andy.petre...@gmail.com wrote: yep, we should also propose to add this stuffs in the public API. Any other ideas? On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Thanks Andy. This is very useful. This gives me all active stages their percentage completion but I am unable to tie stages to job group (or specific job). I looked at Spark's code and to me, it seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated to StageInfo (possibly in the StageInfo.fromStage method). For now, I will have to write my own version JobProgressListener that stores stageId to group Id mapping. I will submit a JIRA ticket and seek spark dev's opinion on this. Many thanks for your prompt help Andy. Thanks, Aniket On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com wrote: I started some quick hack for that in the notebook, you can head to: https://github.com/andypetrella/spark-notebook/ blob/master/common/src/main/scala/notebook/front/widgets/ SparkInfo.scala On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark API to use. I do however see status reporting in spark UI so there must be a way to get status of various stages per job group. Any hints on what APIs should I look at?
Re: Getting spark job progress programmatically
This is already being covered by SPARK-2321 and SPARK-4145. There are pull requests that are already merged or already very far along -- e.g., https://github.com/apache/spark/pull/3009 If there is anything that needs to be added, please add it to those issues or PRs. On Wed, Nov 19, 2014 at 7:55 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I have for now submitted a JIRA ticket @ https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my experiences ( hacks) and submit them as a feature request for public API. On Tue Nov 18 2014 at 20:35:00 andy petrella andy.petre...@gmail.com wrote: yep, we should also propose to add this stuffs in the public API. Any other ideas? On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Thanks Andy. This is very useful. This gives me all active stages their percentage completion but I am unable to tie stages to job group (or specific job). I looked at Spark's code and to me, it seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated to StageInfo (possibly in the StageInfo.fromStage method). For now, I will have to write my own version JobProgressListener that stores stageId to group Id mapping. I will submit a JIRA ticket and seek spark dev's opinion on this. Many thanks for your prompt help Andy. Thanks, Aniket On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com wrote: I started some quick hack for that in the notebook, you can head to: https://github.com/andypetrella/spark-notebook/ blob/master/common/src/main/scala/notebook/front/widgets/ SparkInfo.scala On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark API to use. I do however see status reporting in spark UI so there must be a way to get status of various stages per job group. Any hints on what APIs should I look at?
Getting spark job progress programmatically
I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark API to use. I do however see status reporting in spark UI so there must be a way to get status of various stages per job group. Any hints on what APIs should I look at?
Re: Getting spark job progress programmatically
I started some quick hack for that in the notebook, you can head to: https://github.com/andypetrella/spark-notebook/blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark API to use. I do however see status reporting in spark UI so there must be a way to get status of various stages per job group. Any hints on what APIs should I look at?
Re: Getting spark job progress programmatically
Thanks Andy. This is very useful. This gives me all active stages their percentage completion but I am unable to tie stages to job group (or specific job). I looked at Spark's code and to me, it seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated to StageInfo (possibly in the StageInfo.fromStage method). For now, I will have to write my own version JobProgressListener that stores stageId to group Id mapping. I will submit a JIRA ticket and seek spark dev's opinion on this. Many thanks for your prompt help Andy. Thanks, Aniket On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com wrote: I started some quick hack for that in the notebook, you can head to: https://github.com/andypetrella/spark-notebook/ blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark API to use. I do however see status reporting in spark UI so there must be a way to get status of various stages per job group. Any hints on what APIs should I look at?
Re: Getting spark job progress programmatically
yep, we should also propose to add this stuffs in the public API. Any other ideas? On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Thanks Andy. This is very useful. This gives me all active stages their percentage completion but I am unable to tie stages to job group (or specific job). I looked at Spark's code and to me, it seems org.apache.spark.scheduler.ActiveJob's group ID should get propagated to StageInfo (possibly in the StageInfo.fromStage method). For now, I will have to write my own version JobProgressListener that stores stageId to group Id mapping. I will submit a JIRA ticket and seek spark dev's opinion on this. Many thanks for your prompt help Andy. Thanks, Aniket On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com wrote: I started some quick hack for that in the notebook, you can head to: https://github.com/andypetrella/spark-notebook/ blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am writing yet another Spark job server and have been able to submit jobs and return/save results. I let multiple jobs use the same spark context but I set job group while firing each job so that I can in future cancel jobs. Further, what I deserve to do is provide some kind of status update/progress on running jobs (a % completion but be awesome) but I am unable to figure out appropriate spark API to use. I do however see status reporting in spark UI so there must be a way to get status of various stages per job group. Any hints on what APIs should I look at?