[jira] [Updated] (SPARK-6222) [STREAMING] All data may not be recovered from WAL when driver is killed

Hari Shreedharan (JIRA) Mon, 09 Mar 2015 11:34:41 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hari Shreedharan updated SPARK-6222:
------------------------------------
    Description: 
When testing for our next release, our internal tests written by [~wypoon] 
caught a regression in Spark Streaming between 1.2.0 and 1.3.0. The test runs 
FlumePolling stream to read data from Flume, then kills the Application Master. 
Once YARN restarts it, the test waits until no more data is to be written and 
verifies the original against the data on HDFS. This was passing in 1.2.0, but 
is failing now.

Since the test ties into Cloudera's internal infrastructure and build process, 
it cannot be directly run on an Apache build. But I have been working on 
isolating the commit that may have caused the regression. I have confirmed that 
it was caused by SPARK-5147 (PR # 
[4149|https://github.com/apache/spark/pull/4149]). I confirmed this several 
times using the test and the failure is consistently reproducible. 

To re-confirm, I reverted just this one commit (and Clock consolidation one to 
avoid conflicts), and the issue was no longer reproducible.

Since this is a data loss issue, I believe this is a blocker for Spark 1.3.0
/cc [~tdas], [~pwendell]

  was:
When testing for our next release, our internal tests written by [~wypoon] 
caught a regression in Spark Streaming between 1.2.0 and 1.3.0. The test runs 
FlumePolling stream to read data from Flume, then kills the Application Master. 
Once YARN restarts it, the test waits until no more data is to be written and 
verifies the original against the data on HDFS. This was passing in 1.2.0, but 
is failing now.

Since the test ties into Cloudera's internal infrastructure and build process, 
it cannot be directly run on an Apache build. But I have been working on 
isolating the commit that may have caused the regression. I have confirmed that 
it was caused by SPARK-5157 (PR # 
[4149|https://github.com/apache/spark/pull/4149]). I confirmed this several 
times using the test and the failure is consistently reproducible. 

To re-confirm, I reverted just this one commit (and Clock consolidation one to 
avoid conflicts), and the issue was no longer reproducible.

Since this is a data loss issue, I believe this is a blocker for Spark 1.3.0
/cc [~tdas], [~pwendell]


> [STREAMING] All data may not be recovered from WAL when driver is killed
> ------------------------------------------------------------------------
>
>                 Key: SPARK-6222
>                 URL: https://issues.apache.org/jira/browse/SPARK-6222
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.3.0
>            Reporter: Hari Shreedharan
>            Priority: Blocker
>
> When testing for our next release, our internal tests written by [~wypoon] 
> caught a regression in Spark Streaming between 1.2.0 and 1.3.0. The test runs 
> FlumePolling stream to read data from Flume, then kills the Application 
> Master. Once YARN restarts it, the test waits until no more data is to be 
> written and verifies the original against the data on HDFS. This was passing 
> in 1.2.0, but is failing now.
> Since the test ties into Cloudera's internal infrastructure and build 
> process, it cannot be directly run on an Apache build. But I have been 
> working on isolating the commit that may have caused the regression. I have 
> confirmed that it was caused by SPARK-5147 (PR # 
> [4149|https://github.com/apache/spark/pull/4149]). I confirmed this several 
> times using the test and the failure is consistently reproducible. 
> To re-confirm, I reverted just this one commit (and Clock consolidation one 
> to avoid conflicts), and the issue was no longer reproducible.
> Since this is a data loss issue, I believe this is a blocker for Spark 1.3.0
> /cc [~tdas], [~pwendell]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6222) [STREAMING] All data may not be recovered from WAL when driver is killed

Reply via email to