[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...

2017-01-24 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/16687


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...

2017-01-23 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16687#discussion_r97483845
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
 ---
@@ -146,6 +147,11 @@ class JobGenerator(jobScheduler: JobScheduler) extends 
Logging {
   while (!hasTimedOut && !haveAllBatchesBeenProcessed) {
 Thread.sleep(pollTime)
   }
+  if (shouldCheckpoint
+&& !(lastProcessedBatch - 
graph.zeroTime).isMultipleOf(ssc.checkpointDuration)) {
+ssc.graph.updateCheckpointData(lastProcessedBatch)
+checkpointWriter.write(new Checkpoint(ssc, lastProcessedBatch), 
false)
+  }
--- End diff --

do once more checkpoint before stop


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...

2017-01-23 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16687#discussion_r97483687
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -837,6 +839,29 @@ class StreamingContextSuite extends SparkFunSuite with 
BeforeAndAfter with Timeo
 assert(latch.await(60, TimeUnit.SECONDS))
   }
 
+  test("SPARK-19343 Do once optimistic checkpoint before stop") {
+val testDirectory = Utils.createTempDir().getAbsolutePath()
+val checkpointDirectory = Utils.createTempDir().getAbsolutePath()
+ssc = new StreamingContext(conf.clone.set("someKey", "someValue"), 
batchDuration)
+ssc.checkpoint(checkpointDirectory)
+val stream = 
ssc.textFileStream(testDirectory).checkpoint(batchDuration * 11)
+stream.foreachRDD { rdd => rdd.count() }
+ssc.start()
+try {
+  Thread.sleep(batchDuration.milliseconds * 13)
+  ssc.stop(true, true)
--- End diff --

Sleep for 13 batch duration, so there should only do once checkpoint before 
pr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...

2017-01-23 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16687

[SPARK-19343][DStreams] Do once optimistic checkpoint before stop

## What changes were proposed in this pull request?

Streaming job restarts from checkpoint, and it will rebuild several batch 
until finding latest checkpointed RDD. So we can do once optimistic checkpoint 
just before stop, so that reducing unnecessary recomputation.

## How was this patch tested?

add new unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19343

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16687


commit a63306e53c19b0db6574260c9716c6a76cf223e0
Author: uncleGen 
Date:   2017-01-24T06:24:08Z

SPARK-19343: Do once optimistic checkpoint before stop




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org