[ https://issues.apache.org/jira/browse/SPARK-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329308#comment-15329308 ]
Aamir Abbas commented on SPARK-15919: ------------------------------------- This is exactly where our requirement is different. As per our use case, we need to save all the RDDs in an interval, say 5 minutes, in one file. So in case there are 1000 RDDs coming every 5 minutes, we need to save them in one file instead of 1000 files. Please let us know if and how Spark supports this, as this is a legitimate requirement. Please let us know if this should be added as a feature request, if it's currently not supported. > DStream "saveAsTextFile" doesn't update the prefix after each checkpoint > ------------------------------------------------------------------------ > > Key: SPARK-15919 > URL: https://issues.apache.org/jira/browse/SPARK-15919 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 1.6.1 > Environment: Amazon EMR > Reporter: Aamir Abbas > > I have a Spark streaming job that reads a data stream, and saves it as a text > file after a predefined time interval. In the function > stream.dstream().repartition(1).saveAsTextFiles(getOutputPath(), ""); > The function getOutputPath() generates a new path every time the function is > called, depending on the current system time. > However, the output path prefix remains the same for all the batches, which > effectively means that function is not called again for the next batch of the > stream, although the files are being saved after each checkpoint interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org