[ https://issues.apache.org/jira/browse/LIVY-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Björn Lohrmann updated LIVY-533: -------------------------------- Description: Running stages of Spark jobs submitted via Livy' programmatic API cannot (always) be successfully cancelled. The current implementation of .JobWrapper.cancel() interrupts the worker thread on the Spark driver (via Future.cancel(true)): [https://github.com/apache/incubator-livy/blob/4cfb6bcb8fb9ac6b2d6c8b3d04b20f647b507e1f/rsc/src/main/java/org/apache/livy/rsc/driver/JobWrapper.java#L84] This does not always cancel all activity in Spark, e.g. long-running stages may remain unaffected. The Spark-way of cancelling jobs seems to be via SparkContext.setJobGroup()/cancelJobGroup(), which is also being used in Livy's REPL Session: [https://github.com/apache/incubator-livy/blob/4cfb6bcb8fb9ac6b2d6c8b3d04b20f647b507e1f/repl/src/main/scala/org/apache/livy/repl/Session.scala#L164] I have opened a PR that invokes setJobGroup()/cancelJobGroup() in addition to interrupting the worker thread running on the driver: [https://github.com/apache/incubator-livy/pull/128] It would be great if the fix could make it into the 0.6 release. was: Running stages of Spark jobs submitted via Livy' programmatic API cannot (always) be successfully cancelled. The current implementation of .JobWrapper.cancel() interrupts the worker thread on the Spark driver (via Future.cancel(true)): [https://github.com/apache/incubator-livy/blob/4cfb6bcb8fb9ac6b2d6c8b3d04b20f647b507e1f/rsc/src/main/java/org/apache/livy/rsc/driver/JobWrapper.java#L84] This does not always cancel all activity in Spark, e.g. long-running stages may remain unaffected. The Spark-way of cancelling jobs seems to be via SparkContext.setJobGroup()/cancelJobGroup(), which is also being used in Livy's REPL Session: [https://github.com/apache/incubator-livy/blob/4cfb6bcb8fb9ac6b2d6c8b3d04b20f647b507e1f/repl/src/main/scala/org/apache/livy/repl/Session.scala#L164] I have opened a PR that invokes setJobGroup()/cancelJobGroup() in addition to interrupting the worker thread running on the driver: [https://github.com/apache/incubator-livy/pull/128] > Spark jobs submitted via programmatic API cannot always be canceled > --------------------------------------------------------------------- > > Key: LIVY-533 > URL: https://issues.apache.org/jira/browse/LIVY-533 > Project: Livy > Issue Type: Bug > Components: RSC > Affects Versions: 0.5.0 > Reporter: Björn Lohrmann > Priority: Major > Labels: pull-request-available > > Running stages of Spark jobs submitted via Livy' programmatic API cannot > (always) be successfully cancelled. > The current implementation of .JobWrapper.cancel() interrupts the worker > thread on the Spark driver (via Future.cancel(true)): > [https://github.com/apache/incubator-livy/blob/4cfb6bcb8fb9ac6b2d6c8b3d04b20f647b507e1f/rsc/src/main/java/org/apache/livy/rsc/driver/JobWrapper.java#L84] > This does not always cancel all activity in Spark, e.g. long-running stages > may remain unaffected. > The Spark-way of cancelling jobs seems to be via > SparkContext.setJobGroup()/cancelJobGroup(), which is also being used in > Livy's REPL Session: > [https://github.com/apache/incubator-livy/blob/4cfb6bcb8fb9ac6b2d6c8b3d04b20f647b507e1f/repl/src/main/scala/org/apache/livy/repl/Session.scala#L164] > I have opened a PR that invokes setJobGroup()/cancelJobGroup() in addition to > interrupting the worker thread running on the driver: > [https://github.com/apache/incubator-livy/pull/128] > > It would be great if the fix could make it into the 0.6 release. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)