[jira] [Updated] (SPARK-19617) Fix the race condition when starting and stopping a query quickly

Shixiong Zhu (JIRA) Thu, 16 Feb 2017 11:06:17 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-19617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shixiong Zhu updated SPARK-19617:
---------------------------------
    Description: 
The streaming thread in StreamExecution uses the following ways to check if it 
should exit:
- Catch an InterruptException.
- `StreamExecution.state` is TERMINATED.

when starting and stopping a query quickly, the above two checks may both fail.
- Hit [HADOOP-14084|https://issues.apache.org/jira/browse/HADOOP-14084] and 
swallow InterruptException
- StreamExecution.stop is called before `state` becomes `ACTIVE`. Then 
[runBatches|https://github.com/apache/spark/blob/dcc2d540a53f0bd04baead43fdee1c170ef2b9f3/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L252]
 changes the state from `TERMINATED` to `ACTIVE`.

If the above cases both happen, the query will hang forever.

  was:
Saw the following exception in some test log:
{code}
17/02/14 21:20:10.987 stream execution thread for this_query [id = 
09fd5d6d-bea3-4891-88c7-0d0f1909188d, runId = 
a564cb52-bc3d-47f1-8baf-7e0e4fa79a5e] WARN Shell: Interrupted while joining on: 
Thread[Thread-48,5,main]
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1249)
        at java.lang.Thread.join(Thread.java:1323)
        at org.apache.hadoop.util.Shell.joinThread(Shell.java:626)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:577)
        at org.apache.hadoop.util.Shell.run(Shell.java:479)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:491)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:532)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:509)
        at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1066)
        at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:176)
        at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
        at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
        at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
        at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
        at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:733)
        at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileContextManager.mkdirs(HDFSMetadataLog.scala:385)
        at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.<init>(HDFSMetadataLog.scala:75)
        at 
org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.<init>(CompactibleFileStreamLog.scala:46)
        at 
org.apache.spark.sql.execution.streaming.FileStreamSourceLog.<init>(FileStreamSourceLog.scala:36)
        at 
org.apache.spark.sql.execution.streaming.FileStreamSource.<init>(FileStreamSource.scala:59)
        at 
org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:246)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:145)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:141)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:141)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:136)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:252)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:191)
{code}

This is the cause of some test timeout failures on Jenkins.


> Fix the race condition when starting and stopping a query quickly
> -----------------------------------------------------------------
>
>                 Key: SPARK-19617
>                 URL: https://issues.apache.org/jira/browse/SPARK-19617
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.0.2, 2.1.0
>            Reporter: Shixiong Zhu
>            Assignee: Shixiong Zhu
>
> The streaming thread in StreamExecution uses the following ways to check if 
> it should exit:
> - Catch an InterruptException.
> - `StreamExecution.state` is TERMINATED.
> when starting and stopping a query quickly, the above two checks may both 
> fail.
> - Hit [HADOOP-14084|https://issues.apache.org/jira/browse/HADOOP-14084] and 
> swallow InterruptException
> - StreamExecution.stop is called before `state` becomes `ACTIVE`. Then 
> [runBatches|https://github.com/apache/spark/blob/dcc2d540a53f0bd04baead43fdee1c170ef2b9f3/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L252]
>  changes the state from `TERMINATED` to `ACTIVE`.
> If the above cases both happen, the query will hang forever.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19617) Fix the race condition when starting and stopping a query quickly

Reply via email to