Hi Tathagata, Thanks for your answer. Please see my further question below:
On Wed, Jul 16, 2014 at 6:57 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Answers inline. > > > On Wed, Jul 16, 2014 at 5:39 PM, Bill Jay <bill.jaypeter...@gmail.com> > wrote: > >> Hi all, >> >> I am currently using Spark Streaming to conduct a real-time data >> analytics. We receive data from Kafka. We want to generate output files >> that contain results that are based on the data we receive from a specific >> time interval. >> >> I have several questions on Spark Streaming's timestamp: >> >> 1) If I use saveAsTextFiles, it seems Spark streaming will generate files >> in complete minutes, such as 5:00:01, 5:00:01 (converted from Unix time), >> etc. Does this mean the results are based on the data from 5:00:01 to >> 5:00:02, 5:00:02 to 5:00:03, etc. Or the time stamps just mean the time the >> files are generated? >> >> File named 5:00:01 contains results from data received between 5:00:00 > and 5:00:01 (based on system time of the cluster). > > > >> 2) If I do not use saveAsTextFiles, how do I get the exact time interval >> of the RDD when I use foreachRDD to do custom output of the results? >> >> There is a version of foreachRDD which allows you specify the function > that takes in Time object. > > >> 3) How can we specify the starting time of the batches? >> > > What do you mean? Batches are timed based on the system time of the > cluster. > I would like to control the starting time and ending time of each batch. For example, if I use saveAsTextFiles as output method and the batch size is 1 minute, Spark will align time intervals to complete minutes, such as 5:01:00, 5:02:00, 5:03:00. It will have not results that are 5:01:03, 5:02:03, 5:03:03, etc. My goal is to generate output for a customized interval such as from 5:01:30 to 5:02:29, 5:02:30 to 5:03:29, etc. I checked the api of foreachRDD with time parameter. It seems there is not explanation on what does that parameter mean. Does it mean the starting time of the first batch? > > >> >> Thanks! >> >> Bill >> > >