RE: How to use ManualClock with Spark streaming

2017-04-05 Thread Mendelson, Assaf
You can try taking a look at this: 
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/

Thanks,
  Assaf.

From: Hemalatha A [mailto:hemalatha.amru...@googlemail.com]
Sent: Wednesday, April 05, 2017 1:59 PM
To: Saisai Shao; user@spark.apache.org
Subject: Re: How to use ManualClock with Spark streaming

Any updates on how can I use ManualClock other than editing the Spark source 
code?

On Wed, Mar 1, 2017 at 10:19 AM, Hemalatha A 
<hemalatha.amru...@googlemail.com<mailto:hemalatha.amru...@googlemail.com>> 
wrote:
It is certainly possible through a hack.
I was referring to below post where TD says it is possible thru a hack. I 
wanted to know if there is  any way other than editing the Spark source code.

https://groups.google.com/forum/#!searchin/spark-users/manualclock%7Csort:relevance/spark-users/ES8X1l_xn5s/6PvGGRDfgnMJ

On Wed, Mar 1, 2017 at 7:09 AM, Saisai Shao 
<sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>> wrote:
I don't think using ManualClock is a right way to fix your problem here in 
Spark Streaming.

ManualClock in Spark is mainly used for unit test, it should manually advance 
the time to make the unit test work. The usage looks different compared to the 
scenario you mentioned.

Thanks
Jerry

On Tue, Feb 28, 2017 at 10:53 PM, Hemalatha A 
<hemalatha.amru...@googlemail.com<mailto:hemalatha.amru...@googlemail.com>> 
wrote:

Hi,

I am running streaming application reading data from kafka and performing 
window operations on it. I have a usecase where  all incoming events have a 
fixed latency of 10s, which means data belonging to minute 10:00:00 will arrive 
10s late at 10:00:10.

I want to set the spark clock to "Manualclock" and set the time behind by 10s 
so that the batch calculation triggers at 10:00:10, during which time all the 
events for the previous minute has arrived.

But, I see that "spark.streaming.clock" is hardcoded to 
"org.apache.spark.util.SystemClock" in the code.

Is there a way to easily  hack this property to use Manual clock.
--


Regards
Hemalatha




--


Regards
Hemalatha



--


Regards
Hemalatha


Re: How to use ManualClock with Spark streaming

2017-04-05 Thread Hemalatha A
Any updates on how can I use ManualClock other than editing the Spark
source code?

On Wed, Mar 1, 2017 at 10:19 AM, Hemalatha A <
hemalatha.amru...@googlemail.com> wrote:

> It is certainly possible through a hack.
> I was referring to below post where TD says it is possible thru a hack. I
> wanted to know if there is  any way other than editing the Spark source
> code.
>
> https://groups.google.com/forum/#!searchin/spark-users/manua
> lclock%7Csort:relevance/spark-users/ES8X1l_xn5s/6PvGGRDfgnMJ
>
> On Wed, Mar 1, 2017 at 7:09 AM, Saisai Shao 
> wrote:
>
>> I don't think using ManualClock is a right way to fix your problem here
>> in Spark Streaming.
>>
>> ManualClock in Spark is mainly used for unit test, it should manually
>> advance the time to make the unit test work. The usage looks different
>> compared to the scenario you mentioned.
>>
>> Thanks
>> Jerry
>>
>> On Tue, Feb 28, 2017 at 10:53 PM, Hemalatha A <
>> hemalatha.amru...@googlemail.com> wrote:
>>
>>>
>>> Hi,
>>>
>>> I am running streaming application reading data from kafka and
>>> performing window operations on it. I have a usecase where  all incoming
>>> events have a fixed latency of 10s, which means data belonging to minute
>>> 10:00:00 will arrive 10s late at 10:00:10.
>>>
>>> I want to set the spark clock to "Manualclock" and set the time behind
>>> by 10s so that the batch calculation triggers at 10:00:10, during which
>>> time all the events for the previous minute has arrived.
>>>
>>> But, I see that "spark.streaming.clock" is hardcoded to "
>>> org.apache.spark.util.SystemClock" in the code.
>>>
>>> Is there a way to easily  hack this property to use Manual clock.
>>> --
>>>
>>>
>>> Regards
>>> Hemalatha
>>>
>>
>>
>
>
> --
>
>
> Regards
> Hemalatha
>



-- 


Regards
Hemalatha


Re: How to use ManualClock with Spark streaming

2017-03-20 Thread ??????????
hi  Hemalatha,

you can use the time windows, it looks likee

df.groupby(windows('timestamp', '20 seconds',  '10 seconds'))

---Original---
From: "Saisai Shao"<sai.sai.s...@gmail.com>
Date: 2017/3/1 09:39:58
To: "Hemalatha A"<hemalatha.amru...@googlemail.com>;
Cc: "spark users"<user@spark.apache.org>;
Subject: Re: How to use ManualClock with Spark streaming


I don't think using ManualClock is a right way to fix your problem here in 
Spark Streaming.

ManualClock in Spark is mainly used for unit test, it should manually advance 
the time to make the unit test work. The usage looks different compared to the 
scenario you mentioned.


Thanks
Jerry


On Tue, Feb 28, 2017 at 10:53 PM, Hemalatha A 
<hemalatha.amru...@googlemail.com> wrote:

Hi,


I am running streaming application reading data from kafka and performing 
window operations on it. I have a usecase where all incoming events have 
a fixed latency of 10s, which means data belonging to minute 10:00:00 will 
arrive 10s late at 10:00:10.


I want to set the spark clock to "Manualclock" and set the time behind by 10s 
so that the batch calculation triggers at 10:00:10, during which time all the 
events for the previous minute has arrived.


But, I see that"spark.streaming.clock" is hardcoded to  
"org.apache.spark.util.SystemClock" in the code.


Is there a way to easily hack this property to use Manual clock.

-- 


Regards
Hemalatha

Re: How to use ManualClock with Spark streaming

2017-02-28 Thread Saisai Shao
I don't think using ManualClock is a right way to fix your problem here in
Spark Streaming.

ManualClock in Spark is mainly used for unit test, it should manually
advance the time to make the unit test work. The usage looks different
compared to the scenario you mentioned.

Thanks
Jerry

On Tue, Feb 28, 2017 at 10:53 PM, Hemalatha A <
hemalatha.amru...@googlemail.com> wrote:

>
> Hi,
>
> I am running streaming application reading data from kafka and performing
> window operations on it. I have a usecase where  all incoming events have a
> fixed latency of 10s, which means data belonging to minute 10:00:00 will
> arrive 10s late at 10:00:10.
>
> I want to set the spark clock to "Manualclock" and set the time behind by
> 10s so that the batch calculation triggers at 10:00:10, during which time
> all the events for the previous minute has arrived.
>
> But, I see that "spark.streaming.clock" is hardcoded to "
> org.apache.spark.util.SystemClock" in the code.
>
> Is there a way to easily  hack this property to use Manual clock.
> --
>
>
> Regards
> Hemalatha
>


How to use ManualClock with Spark streaming

2017-02-28 Thread Hemalatha A
Hi,

I am running streaming application reading data from kafka and performing
window operations on it. I have a usecase where  all incoming events have a
fixed latency of 10s, which means data belonging to minute 10:00:00 will
arrive 10s late at 10:00:10.

I want to set the spark clock to "Manualclock" and set the time behind by
10s so that the batch calculation triggers at 10:00:10, during which time
all the events for the previous minute has arrived.

But, I see that "spark.streaming.clock" is hardcoded to
"org.apache.spark.util.SystemClock"
in the code.

Is there a way to easily  hack this property to use Manual clock.
-- 


Regards
Hemalatha