Re: How to reduceByKeyAndWindow in Structured Streaming?

2018-07-30 Thread oripwk
Thanks guys, it really helps.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: How to reduceByKeyAndWindow in Structured Streaming?

2018-06-28 Thread Tathagata Das
The fundamental conceptual difference between the windowing in DStream vs
Structured Streaming is that DStream used the arrival time of the record in
Spark (aka processing time) and Structured Streaming using event time. If
you want to exactly replicate DStream's processing time windows in
Structured Streaming, then you an just add the current timestamp as an
additional column in the DataFrame and group by using that.

streamingDF
.withColumn("processing_time", current_timestamp())
.groupBy($"key", window($"processing_time", "5 minutes"))
.agg(sum($"value") as "total")


On Thu, Jun 28, 2018 at 2:24 AM, Gerard Maas  wrote:

> Hi,
>
> In Structured Streaming lingo, "ReduceByKeyAndWindow" would be a window
> aggregation with a composite key.
> Something like:
> stream.groupBy($"key", window($"timestamp", "5 minutes"))
>.agg(sum($"value") as "total")
>
> The aggregate could be any supported SQL function.
> Is this what you are looking for? Otherwise, share your specific use case
> to see how it could be implemented in Structured Streaming.
>
> kr, Gerard.
>
> On Thu, Jun 28, 2018 at 10:21 AM oripwk  wrote:
>
>> In Structured Streaming, there's the notion of event-time windowing:
>>
>>
>>
>> However, this is not quite similar to DStream's windowing operations: in
>> Structured Streaming, windowing groups the data by fixed time-windows, and
>> every event in a time window is associated to its group:
>>
>>
>> And in DStreams it just outputs all the data according to a limited window
>> in time (last 10 minutes for example).
>>
>> The question was asked also  here
>> > someway-to-do-the-eqivalent-of-reducebykeyandwindow-in-spark-structured>
>> , if it makes it clearer.
>>
>> How the latter can be achieved in Structured Streaming?
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: How to reduceByKeyAndWindow in Structured Streaming?

2018-06-28 Thread Gerard Maas
Hi,

In Structured Streaming lingo, "ReduceByKeyAndWindow" would be a window
aggregation with a composite key.
Something like:
stream.groupBy($"key", window($"timestamp", "5 minutes"))
   .agg(sum($"value") as "total")

The aggregate could be any supported SQL function.
Is this what you are looking for? Otherwise, share your specific use case
to see how it could be implemented in Structured Streaming.

kr, Gerard.

On Thu, Jun 28, 2018 at 10:21 AM oripwk  wrote:

> In Structured Streaming, there's the notion of event-time windowing:
>
>
>
> However, this is not quite similar to DStream's windowing operations: in
> Structured Streaming, windowing groups the data by fixed time-windows, and
> every event in a time window is associated to its group:
>
>
> And in DStreams it just outputs all the data according to a limited window
> in time (last 10 minutes for example).
>
> The question was asked also  here
> <
> https://stackoverflow.com/questions/49821646/is-there-someway-to-do-the-eqivalent-of-reducebykeyandwindow-in-spark-structured>
>
> , if it makes it clearer.
>
> How the latter can be achieved in Structured Streaming?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


How to reduceByKeyAndWindow in Structured Streaming?

2018-06-28 Thread oripwk
In Structured Streaming, there's the notion of event-time windowing:



However, this is not quite similar to DStream's windowing operations: in
Structured Streaming, windowing groups the data by fixed time-windows, and
every event in a time window is associated to its group:


And in DStreams it just outputs all the data according to a limited window
in time (last 10 minutes for example).

The question was asked also  here

 
, if it makes it clearer.

How the latter can be achieved in Structured Streaming?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org