[ 
https://issues.apache.org/jira/browse/SPARK-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333119#comment-15333119
 ] 

Aamir Abbas commented on SPARK-15919:
-------------------------------------

I have tried the solution you suggested, i-e window() function. Here's my code.

{code}
Duration batchInterval = new Duration(300000); // 5 minutes
javaStream.window(batchInterval, 
batchInterval).dstream().saveAsTextFiles(getBaseOutputPath(), "");
{code}

The actual output of this snippet is that it gets the base output path once, 
creates folders in that path, and saves each record from RDDs as a separate 
file.

The expected output was to get new base output path every time the window() 
function is applied, and save all the records from RDDs in a single file.

Please let me know if I am applying the window() function wrongly, and how to 
do that right.

> DStream "saveAsTextFile" doesn't update the prefix after each checkpoint
> ------------------------------------------------------------------------
>
>                 Key: SPARK-15919
>                 URL: https://issues.apache.org/jira/browse/SPARK-15919
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.6.1
>         Environment: Amazon EMR
>            Reporter: Aamir Abbas
>
> I have a Spark streaming job that reads a data stream, and saves it as a text 
> file after a predefined time interval. In the function 
> stream.dstream().repartition(1).saveAsTextFiles(getOutputPath(), "");
> The function getOutputPath() generates a new path every time the function is 
> called, depending on the current system time.
> However, the output path prefix remains the same for all the batches, which 
> effectively means that function is not called again for the next batch of the 
> stream, although the files are being saved after each checkpoint interval. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to