Eyal Zituny created SPARK-27330:
-----------------------------------

             Summary: ForeachWriter is not being closed once a batch is aborted
                 Key: SPARK-27330
                 URL: https://issues.apache.org/jira/browse/SPARK-27330
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.4.0
            Reporter: Eyal Zituny


in cases where a micro batch is being killed (interrupted), not during actual 
processing done by the ForeachDataWriter (for example when iterating the 
iterator), DataWritingSparkTask will catch the interrupted exception and call  
dataWriter.abort()

the problem is that ForeachDataWriter has an empty implementation for the abort 
method.

as a result of that, i have encountered issues in connections which were opened 
in the "open" method when the writer has been created but never closed.

this wasn't the behavior pre spark 2.4

my suggestion is to call ForeachWriter.close() when DataWriter.abort() is 
called, and exception should also be provided in order to notify the foreach 
writer that this task has failed

 

stack trace from the exception i have encountered:
org.apache.spark.TaskKilledException: null
 at 
org.apache.spark.TaskContextImpl.killTaskIfInterrupted(TaskContextImpl.scala:149)
 at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:36)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
 at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:117)
 at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
 at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
 at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
 at 
org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
 at 
org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to