[jira] [Updated] (SPARK-19617) Fix the race condition when starting and stopping a query quickly

2017-02-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19617:
-
Fix Version/s: 2.2.0

> Fix the race condition when starting and stopping a query quickly
> -
>
> Key: SPARK-19617
> URL: https://issues.apache.org/jira/browse/SPARK-19617
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.2.0
>
>
> The streaming thread in StreamExecution uses the following ways to check if 
> it should exit:
> - Catch an InterruptException.
> - `StreamExecution.state` is TERMINATED.
> when starting and stopping a query quickly, the above two checks may both 
> fail.
> - Hit [HADOOP-14084|https://issues.apache.org/jira/browse/HADOOP-14084] and 
> swallow InterruptException
> - StreamExecution.stop is called before `state` becomes `ACTIVE`. Then 
> [runBatches|https://github.com/apache/spark/blob/dcc2d540a53f0bd04baead43fdee1c170ef2b9f3/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L252]
>  changes the state from `TERMINATED` to `ACTIVE`.
> If the above cases both happen, the query will hang forever.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19617) Fix the race condition when starting and stopping a query quickly

2017-02-16 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19617:
-
Description: 
The streaming thread in StreamExecution uses the following ways to check if it 
should exit:
- Catch an InterruptException.
- `StreamExecution.state` is TERMINATED.

when starting and stopping a query quickly, the above two checks may both fail.
- Hit [HADOOP-14084|https://issues.apache.org/jira/browse/HADOOP-14084] and 
swallow InterruptException
- StreamExecution.stop is called before `state` becomes `ACTIVE`. Then 
[runBatches|https://github.com/apache/spark/blob/dcc2d540a53f0bd04baead43fdee1c170ef2b9f3/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L252]
 changes the state from `TERMINATED` to `ACTIVE`.

If the above cases both happen, the query will hang forever.

  was:
Saw the following exception in some test log:
{code}
17/02/14 21:20:10.987 stream execution thread for this_query [id = 
09fd5d6d-bea3-4891-88c7-0d0f1909188d, runId = 
a564cb52-bc3d-47f1-8baf-7e0e4fa79a5e] WARN Shell: Interrupted while joining on: 
Thread[Thread-48,5,main]
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.util.Shell.joinThread(Shell.java:626)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:577)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at 
org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:491)
at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:532)
at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1066)
at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:176)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:733)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileContextManager.mkdirs(HDFSMetadataLog.scala:385)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:75)
at 
org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
at 
org.apache.spark.sql.execution.streaming.FileStreamSourceLog.(FileStreamSourceLog.scala:36)
at 
org.apache.spark.sql.execution.streaming.FileStreamSource.(FileStreamSource.scala:59)
at 
org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:246)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:145)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:141)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:141)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:136)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:252)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:191)
{code}

This is the cause of some test timeout failures on Jenkins.


> Fix the race condition when starting and stopping a query quickly
> -
>
> Key: SPARK-19617
> URL: https://issues.apache.org/jira/browse/SPARK-19617
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects 

[jira] [Updated] (SPARK-19617) Fix the race condition when starting and stopping a query quickly

2017-02-16 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19617:
-
Summary: Fix the race condition when starting and stopping a query quickly  
(was: Fix a case that a query may not stop due to HADOOP-14084)

> Fix the race condition when starting and stopping a query quickly
> -
>
> Key: SPARK-19617
> URL: https://issues.apache.org/jira/browse/SPARK-19617
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> Saw the following exception in some test log:
> {code}
> 17/02/14 21:20:10.987 stream execution thread for this_query [id = 
> 09fd5d6d-bea3-4891-88c7-0d0f1909188d, runId = 
> a564cb52-bc3d-47f1-8baf-7e0e4fa79a5e] WARN Shell: Interrupted while joining 
> on: Thread[Thread-48,5,main]
> java.lang.InterruptedException
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Thread.join(Thread.java:1249)
>   at java.lang.Thread.join(Thread.java:1323)
>   at org.apache.hadoop.util.Shell.joinThread(Shell.java:626)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:577)
>   at org.apache.hadoop.util.Shell.run(Shell.java:479)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:491)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:532)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:509)
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1066)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:176)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:733)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileContextManager.mkdirs(HDFSMetadataLog.scala:385)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:75)
>   at 
> org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSourceLog.(FileStreamSourceLog.scala:36)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSource.(FileStreamSource.scala:59)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:246)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:145)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:141)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:141)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:136)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:252)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:191)
> {code}
> This is the cause of some test timeout failures on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org