[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/16687 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16687#discussion_r97483845 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala --- @@ -146,6 +147,11 @@ class JobGenerator(jobScheduler: JobScheduler) extends Logging { while (!hasTimedOut && !haveAllBatchesBeenProcessed) { Thread.sleep(pollTime) } + if (shouldCheckpoint +&& !(lastProcessedBatch - graph.zeroTime).isMultipleOf(ssc.checkpointDuration)) { +ssc.graph.updateCheckpointData(lastProcessedBatch) +checkpointWriter.write(new Checkpoint(ssc, lastProcessedBatch), false) + } --- End diff -- do once more checkpoint before stop --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16687#discussion_r97483687 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -837,6 +839,29 @@ class StreamingContextSuite extends SparkFunSuite with BeforeAndAfter with Timeo assert(latch.await(60, TimeUnit.SECONDS)) } + test("SPARK-19343 Do once optimistic checkpoint before stop") { +val testDirectory = Utils.createTempDir().getAbsolutePath() +val checkpointDirectory = Utils.createTempDir().getAbsolutePath() +ssc = new StreamingContext(conf.clone.set("someKey", "someValue"), batchDuration) +ssc.checkpoint(checkpointDirectory) +val stream = ssc.textFileStream(testDirectory).checkpoint(batchDuration * 11) +stream.foreachRDD { rdd => rdd.count() } +ssc.start() +try { + Thread.sleep(batchDuration.milliseconds * 13) + ssc.stop(true, true) --- End diff -- Sleep for 13 batch duration, so there should only do once checkpoint before pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16687 [SPARK-19343][DStreams] Do once optimistic checkpoint before stop ## What changes were proposed in this pull request? Streaming job restarts from checkpoint, and it will rebuild several batch until finding latest checkpointed RDD. So we can do once optimistic checkpoint just before stop, so that reducing unnecessary recomputation. ## How was this patch tested? add new unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19343 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16687.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16687 commit a63306e53c19b0db6574260c9716c6a76cf223e0 Author: uncleGenDate: 2017-01-24T06:24:08Z SPARK-19343: Do once optimistic checkpoint before stop --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org