[ https://issues.apache.org/jira/browse/SPARK-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333119#comment-15333119 ]
Aamir Abbas commented on SPARK-15919: ------------------------------------- I have tried the solution you suggested, i-e window() function. Here's my code. {code} Duration batchInterval = new Duration(300000); // 5 minutes javaStream.window(batchInterval, batchInterval).dstream().saveAsTextFiles(getBaseOutputPath(), ""); {code} The actual output of this snippet is that it gets the base output path once, creates folders in that path, and saves each record from RDDs as a separate file. The expected output was to get new base output path every time the window() function is applied, and save all the records from RDDs in a single file. Please let me know if I am applying the window() function wrongly, and how to do that right. > DStream "saveAsTextFile" doesn't update the prefix after each checkpoint > ------------------------------------------------------------------------ > > Key: SPARK-15919 > URL: https://issues.apache.org/jira/browse/SPARK-15919 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 1.6.1 > Environment: Amazon EMR > Reporter: Aamir Abbas > > I have a Spark streaming job that reads a data stream, and saves it as a text > file after a predefined time interval. In the function > stream.dstream().repartition(1).saveAsTextFiles(getOutputPath(), ""); > The function getOutputPath() generates a new path every time the function is > called, depending on the current system time. > However, the output path prefix remains the same for all the batches, which > effectively means that function is not called again for the next batch of the > stream, although the files are being saved after each checkpoint interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org