[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API
[ https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113400#comment-14113400 ] Apache Spark commented on SPARK-2636: - User 'lirui-intel' has created a pull request for this issue: https://github.com/apache/spark/pull/2176 no where to get job identifier while submit spark job through spark API --- Key: SPARK-2636 URL: https://issues.apache.org/jira/browse/SPARK-2636 Project: Spark Issue Type: New Feature Components: Java API Reporter: Chengxiang Li Labels: hive In Hive on Spark, we want to track spark job status through Spark API, the basic idea is as following: # create an hive-specified spark listener and register it to spark listener bus. # hive-specified spark listener generate job status by spark listener events. # hive driver track job status through hive-specified spark listener. the current problem is that hive driver need job identifier to track specified job status through spark listener, but there is no spark API to get job identifier(like job id) while submit spark job. I think other project whoever try to track job status with spark API would suffer from this as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API
[ https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110172#comment-14110172 ] Rui Li commented on SPARK-2636: --- Just want to make sure I understand everything correctly: I think user submits a job via an RDD action, which in turn calls {{SparkContex.runJob - DAGScheduler.runJob - DAGScheduler.submitJob - DAGScheduler.handleJobSubmitted}}. The requirement is we should return some job ID to the user. So I think putting that in a DAGScheduler method doesn't help? BTW, {{DAGScheduler.submitJob}} returns a {{JobWaiter}} which contains the job ID. Also, by job ID, do we mean {{org.apache.spark.streaming.scheduler.Job.id}} or {{org.apache.spark.scheduler.ActiveJob.jobId}}? Please let me know if I misunderstand anything. no where to get job identifier while submit spark job through spark API --- Key: SPARK-2636 URL: https://issues.apache.org/jira/browse/SPARK-2636 Project: Spark Issue Type: New Feature Components: Java API Reporter: Chengxiang Li Labels: hive In Hive on Spark, we want to track spark job status through Spark API, the basic idea is as following: # create an hive-specified spark listener and register it to spark listener bus. # hive-specified spark listener generate job status by spark listener events. # hive driver track job status through hive-specified spark listener. the current problem is that hive driver need job identifier to track specified job status through spark listener, but there is no spark API to get job identifier(like job id) while submit spark job. I think other project whoever try to track job status with spark API would suffer from this as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API
[ https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086494#comment-14086494 ] Marcelo Vanzin commented on SPARK-2636: --- (BTW, just checked SPARK-2321, so if you really mean the {{Job}} id, ignore my comments, since yes, it's kind of a pain to know the ID of a job you're submitting to the context.) no where to get job identifier while submit spark job through spark API --- Key: SPARK-2636 URL: https://issues.apache.org/jira/browse/SPARK-2636 Project: Spark Issue Type: New Feature Reporter: Chengxiang Li In Hive on Spark, we want to track spark job status through Spark API, the basic idea is as following: # create an hive-specified spark listener and register it to spark listener bus. # hive-specified spark listener generate job status by spark listener events. # hive driver track job status through hive-specified spark listener. the current problem is that hive driver need job identifier to track specified job status through spark listener, but there is no spark API to get job identifier(like job id) while submit spark job. I think other project whoever try to track job status with spark API would suffer from this as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API
[ https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087102#comment-14087102 ] Chengxiang Li commented on SPARK-2636: -- {quote} There are two ways I think. One is for DAGScheduler.runJob to return an integer (or long) id for the job. An alternative, which I think is better and relates to SPARK-2321, is for runJob to return some Job object that has information about the id and can be queried about progress. {quote} DAGScheduler is Spark internal class, User can hardly use it directly. I like your second idea, return a Job info object while submit spark job in SparkContext(JavaSparkContext in this case) or RDD level. Actually AsyncRDDActions has done part of this work, I think it maybe a good place to fix this issue. no where to get job identifier while submit spark job through spark API --- Key: SPARK-2636 URL: https://issues.apache.org/jira/browse/SPARK-2636 Project: Spark Issue Type: New Feature Reporter: Chengxiang Li In Hive on Spark, we want to track spark job status through Spark API, the basic idea is as following: # create an hive-specified spark listener and register it to spark listener bus. # hive-specified spark listener generate job status by spark listener events. # hive driver track job status through hive-specified spark listener. the current problem is that hive driver need job identifier to track specified job status through spark listener, but there is no spark API to get job identifier(like job id) while submit spark job. I think other project whoever try to track job status with spark API would suffer from this as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API
[ https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071311#comment-14071311 ] Chengxiang Li commented on SPARK-2636: -- cc [~rxin] [~xuefuz] no where to get job identifier while submit spark job through spark API --- Key: SPARK-2636 URL: https://issues.apache.org/jira/browse/SPARK-2636 Project: Spark Issue Type: New Feature Reporter: Chengxiang Li In Hive on Spark, we want to track spark job status through Spark API, the basic idea is as following: # create an hive-specified spark listener and register it to spark listener bus. # hive-specified spark listener generate job status by spark listener events. # hive driver track job status through hive-specified spark listener. the current problem is that hive driver need job identifier to track specified job status through spark listener, but there is no spark API to get job identifier(like job id) while submit spark job. I think other project whoever try to track job status with spark API would suffer from this as well. -- This message was sent by Atlassian JIRA (v6.2#6252)