Hi Tathagata,

Thanks for your answer. Please see my further question below:


On Wed, Jul 16, 2014 at 6:57 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Answers inline.
>
>
> On Wed, Jul 16, 2014 at 5:39 PM, Bill Jay <bill.jaypeter...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I am currently using Spark Streaming to conduct a real-time data
>> analytics. We receive data from Kafka. We want to generate output files
>> that contain results that are based on the data we receive from a specific
>> time interval.
>>
>> I have several questions on Spark Streaming's timestamp:
>>
>> 1) If I use saveAsTextFiles, it seems Spark streaming will generate files
>> in complete minutes, such as 5:00:01, 5:00:01 (converted from Unix time),
>> etc. Does this mean the results are based on the data from 5:00:01 to
>> 5:00:02, 5:00:02 to 5:00:03, etc. Or the time stamps just mean the time the
>> files are generated?
>>
>> File named  5:00:01 contains results from data received between  5:00:00
> and  5:00:01 (based on system time of the cluster).
>
>
>
>> 2) If I do not use saveAsTextFiles, how do I get the exact time interval
>> of the RDD when I use foreachRDD to do custom output of the results?
>>
>> There is a version of foreachRDD which allows you specify the function
> that takes in Time object.
>
>
>> 3) How can we specify the starting time of the batches?
>>
>
> What do you mean? Batches are timed based on the system time of the
> cluster.
>
I would like to control the starting time and ending time of each batch.
For example, if I use saveAsTextFiles as output method and the batch size
is 1 minute, Spark will align time intervals to complete minutes, such as
5:01:00, 5:02:00, 5:03:00. It will have not results that are 5:01:03,
5:02:03, 5:03:03, etc. My goal is to generate output for a customized
interval such as from 5:01:30 to 5:02:29, 5:02:30 to 5:03:29, etc.

I checked the api of foreachRDD with time parameter. It seems there is not
explanation on what does that parameter mean. Does it mean the starting
time of the first batch?

>
>
>>
>> Thanks!
>>
>> Bill
>>
>
>

Reply via email to