Re: Spark Streaming timing considerations

2014-07-21 Thread Laeeq Ahmed
Hi TD, Thanks for the help. The only problem left here is that the dstreamTime contains some extra information which seems date i.e. 1405944367000 ms whereas my application timestamps are just in sec which I converted to ms. e.g. 2300, 2400, 2500 etc. So the filter doesn't take effect. I

Re: Spark Streaming timing considerations

2014-07-21 Thread Sean Owen
That is just standard Unix time. 1405944367000 = Sun, 09 Aug 46522 05:56:40 GMT On Mon, Jul 21, 2014 at 5:43 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi TD, Thanks for the help. The only problem left here is that the dstreamTime contains some extra information which seems date i.e.

Re: Spark Streaming timing considerations

2014-07-21 Thread Sean Owen
Uh, right. I mean: 1405944367 = Mon, 21 Jul 2014 12:06:07 GMT On Mon, Jul 21, 2014 at 5:47 PM, Sean Owen so...@cloudera.com wrote: That is just standard Unix time. 1405944367000 = Sun, 09 Aug 46522 05:56:40 GMT On Mon, Jul 21, 2014 at 5:43 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi

Re: Spark Streaming timing considerations

2014-07-21 Thread Tathagata Das
You will have to use some function that converts the dstreamTime (ms since epoch, same format as returned by System.currentTimeMillis), and your application-level time. TD On Mon, Jul 21, 2014 at 9:47 AM, Sean Owen so...@cloudera.com wrote: Uh, right. I mean: 1405944367 = Mon, 21 Jul 2014

Re: Spark Streaming timing considerations

2014-07-17 Thread Laeeq Ahmed
Hi TD, I have been able to filter the first WindowedRDD, but I am not sure how to make a generic filter. The larger window is 8 seconds and want to fetch 4 second based on application-time-stamp. I have seen an earlier post which suggest timeStampBasedwindow but I am not sure how to make

Re: Spark Streaming timing considerations

2014-07-17 Thread Tathagata Das
You have to define what is the range records that needs to be filtered out in every windowed RDD, right? For example, when the DStream.window has data from from times 0 - 8 seconds by DStream time, you only want to filter out data that falls into say 4 - 8 seconds by application time. This latter

Re: Spark Streaming timing considerations

2014-07-11 Thread Tathagata Das
This is not in the current streaming API. Queue stream is useful for testing with generated RDDs, but not for actual data. For actual data stream, the slack time can be implemented by doing DStream.window on a larger window that take slack time in consideration, and then the required