[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161567164
  
@zsxwing Please check this. 
I think this problem has been caused by the #9707 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161608484
  
**[Test build #2162 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2162/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class CrossValidator @Since(\"1.2.0\") (@Since(\"1.4.0\") override val uid: 
String)`\n  * `class ParamGridBuilder @Since(\"1.2.0\") `\n  * `class 
TrainValidationSplit @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: 
String)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/10127

[SPARK-12122][STREAMING] Prevent batches from being submitted twice after 
recovering StreamingContext from checkpoint



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-12122

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10127


commit d904b25a7037e2b12693158f29e069f13aa0fa78
Author: Tathagata Das 
Date:   2015-12-03T09:30:27Z

Remove duplicate




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161591828
  
**[Test build #47134 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47134/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161569412
  
**[Test build #47134 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47134/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161567552
  
**[Test build #2161 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2161/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161567545
  
**[Test build #2162 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2162/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161592110
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47134/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161592106
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161590894
  
**[Test build #2161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2161/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161768001
  
**[Test build #2165 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2165/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161767836
  
**[Test build #2164 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2164/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161768758
  
**[Test build #47163 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47163/consoleFull)**
 for PR 10127 at commit 
[`fe69fbf`](https://github.com/apache/spark/commit/fe69fbfe185e12e70d46003a40988fff57cd24b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/10127#discussion_r46604122
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
 ---
@@ -220,7 +220,8 @@ class JobGenerator(jobScheduler: JobScheduler) extends 
Logging {
 logInfo("Batches pending processing (" + pendingTimes.size + " 
batches): " +
   pendingTimes.mkString(", "))
 // Reschedule jobs for these times
-val timesToReschedule = (pendingTimes ++ 
downTimes).distinct.sorted(Time.ordering)
+val timesToReschedule = (pendingTimes ++ downTimes).filter { _ != 
restartTime }
+  .distinct.sorted(Time.ordering)
--- End diff --

Explained offline: 

The restart time is always checkpointTime+1 (assuming batch duration = 1). 
However, pending times can already have batches >= checkpointTime+1. This can 
cause `timesToReschedule` to have batches >= checkpointTime+1, which will be 
explicitly submitted, and then resubmitted through the timer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161783905
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47163/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161783902
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161783301
  
**[Test build #2164 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2164/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161768648
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161783330
  
**[Test build #2165 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2165/consoleFull)**
 for PR 10127 at commit 
[`d904b25`](https://github.com/apache/spark/commit/d904b25a7037e2b12693158f29e069f13aa0fa78).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10127#issuecomment-161783734
  
**[Test build #47163 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47163/consoleFull)**
 for PR 10127 at commit 
[`fe69fbf`](https://github.com/apache/spark/commit/fe69fbfe185e12e70d46003a40988fff57cd24b7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12122][STREAMING] Prevent batches from ...

2015-12-03 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10127#discussion_r46592510
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
 ---
@@ -220,7 +220,8 @@ class JobGenerator(jobScheduler: JobScheduler) extends 
Logging {
 logInfo("Batches pending processing (" + pendingTimes.size + " 
batches): " +
   pendingTimes.mkString(", "))
 // Reschedule jobs for these times
-val timesToReschedule = (pendingTimes ++ 
downTimes).distinct.sorted(Time.ordering)
+val timesToReschedule = (pendingTimes ++ downTimes).filter { _ != 
restartTime }
+  .distinct.sorted(Time.ordering)
--- End diff --

Could you clarify why `pendingTimes` may contain `restartTime`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org