subject:"\[GitHub\] \[spark\] cloud\-fan commented on a diff in pull request #42199\: \[SPARK\-44579\]\[SQL\] Support Interrupt On Cancel in SQLExecution"

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub



cloud-fan commented on code in PR #42199:
URL: https://github.com/apache/spark/pull/42199#discussion_r1331581159


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -77,6 +79,11 @@ object SQLExecution {
 }
 val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong
 executionIdToQueryExecution.put(executionId, queryExecution)
+val originalInterruptOnCancel = 
sc.getLocalProperty(SPARK_JOB_INTERRUPT_ON_CANCEL)
+if (originalInterruptOnCancel == null) {
+  val interruptOnCancel = 
sparkSession.conf.get(SQLConf.INTERRUPT_ON_CANCEL)
+  sc.setInterruptOnCancel(interruptOnCancel)

Review Comment:
   This has some side effects: if a task fails many times, we will abort its 
stage and interrupt other running tasks if this is set to true. However, if the 
task does file scans, then interrupting the task means we won't trigger the 
task completion callback which frees file streams.
   
   I think this will lead to memory leak, as runtime execution error is common, 
especially with ansi mode.
   
   How does thriftserver leverage this feature?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub



cloud-fan commented on code in PR #42199:
URL: https://github.com/apache/spark/pull/42199#discussion_r1331684745


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -77,6 +79,11 @@ object SQLExecution {
 }
 val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong
 executionIdToQueryExecution.put(executionId, queryExecution)
+val originalInterruptOnCancel = 
sc.getLocalProperty(SPARK_JOB_INTERRUPT_ON_CANCEL)
+if (originalInterruptOnCancel == null) {
+  val interruptOnCancel = 
sparkSession.conf.get(SQLConf.INTERRUPT_ON_CANCEL)
+  sc.setInterruptOnCancel(interruptOnCancel)

Review Comment:
   > does this PR just leverage a feature that SC already have?
   
   Yes, the problem is, now we enable this feature by default, while this 
feature can lead to memory leak.
   
   That's why I asked, how does thriftserver leverage this feature? After this 
PR, all queries will use this feature, not only thriftserver.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub



cloud-fan commented on code in PR #42199:
URL: https://github.com/apache/spark/pull/42199#discussion_r1332544711


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -77,6 +79,11 @@ object SQLExecution {
 }
 val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong
 executionIdToQueryExecution.put(executionId, queryExecution)
+val originalInterruptOnCancel = 
sc.getLocalProperty(SPARK_JOB_INTERRUPT_ON_CANCEL)
+if (originalInterruptOnCancel == null) {
+  val interruptOnCancel = 
sparkSession.conf.get(SQLConf.INTERRUPT_ON_CANCEL)
+  sc.setInterruptOnCancel(interruptOnCancel)

Review Comment:
   I think it's OK to interrupt tasks when we explicitly cancel jobs, but 
`sc.setInterruptOnCancel` does more than this. It interrupts tasks if the stage 
is finished (completed -> interrupt speculative tasks, failed -> interrupt 
remaining running tasks). This has side effects and we'd better avoid it.
   
   Is it possible we can also call `cancelJobGroup` explicitly in the sql-shell?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-21 Thread via GitHub



cloud-fan commented on code in PR #42199:
URL: https://github.com/apache/spark/pull/42199#discussion_r1332984898


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -77,6 +79,11 @@ object SQLExecution {
 }
 val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong
 executionIdToQueryExecution.put(executionId, queryExecution)
+val originalInterruptOnCancel = 
sc.getLocalProperty(SPARK_JOB_INTERRUPT_ON_CANCEL)
+if (originalInterruptOnCancel == null) {
+  val interruptOnCancel = 
sparkSession.conf.get(SQLConf.INTERRUPT_ON_CANCEL)
+  sc.setInterruptOnCancel(interruptOnCancel)

Review Comment:
   cool, can we revert it first now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-21 Thread via GitHub



cloud-fan commented on code in PR #42199:
URL: https://github.com/apache/spark/pull/42199#discussion_r1333767286


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -77,6 +79,11 @@ object SQLExecution {
 }
 val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong
 executionIdToQueryExecution.put(executionId, queryExecution)
+val originalInterruptOnCancel = 
sc.getLocalProperty(SPARK_JOB_INTERRUPT_ON_CANCEL)
+if (originalInterruptOnCancel == null) {
+  val interruptOnCancel = 
sparkSession.conf.get(SQLConf.INTERRUPT_ON_CANCEL)
+  sc.setInterruptOnCancel(interruptOnCancel)

Review Comment:
   Looking at the API doc
   ```
 /**
  * Set the behavior of job cancellation from jobs started in this thread.
  *
  * @param interruptOnCancel If true, then job cancellation will result in 
`Thread.interrupt()`
  * being called on the job's executor threads. This is useful to help 
ensure that the tasks
  * are actually stopped in a timely manner, but is off by default due to 
HDFS-1208, where HDFS
  * may respond to Thread.interrupt() by marking nodes as dead.
  *
  * @since 3.5.0
  */
 def setInterruptOnCancel(interruptOnCancel: Boolean): Unit = {
   setLocalProperty(SparkContext.SPARK_JOB_INTERRUPT_ON_CANCEL, 
interruptOnCancel.toString)
 }
   ```
   
   This PR basically turns it on by default in Spark SQL, which is risky. Our 
internal test pipeline starts to fail with this change, due to file stream leak.
   
   I'm OK to add a config so that people can control it, but can we keep the 
behavior the same as before? (false by default)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

5 matches

Site Navigation

Mail list logo

Footer information