Burak Yavuz created SPARK-14160:
-----------------------------------

             Summary: Windowing for structured streaming
                 Key: SPARK-14160
                 URL: https://issues.apache.org/jira/browse/SPARK-14160
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
            Reporter: Burak Yavuz


This JIRA is to track the status regarding event time windowing operations for 
Continuous queries.

The proposed API is as follows.

There are 3 parameters for the window :
 1. Time column. This will generally be the event time column for the record, 
but it should be possible to use ingestion time as well using an expression.
 2. The window length
 3. Slide interval (optional). The slide interval will create new windows with 
the window length provided in (2) at each interval. If the slide interval is 
not provided, we will generate tumbling windows.

Examples:
Consider the following schema for our data:
{code} sensor_id, measurement, timestamp {code}

In order to generate 30 second tumbling windows and averaging out the 
measurement values for each sensor, we may write something like:
{code}
df.window("timestamp", 30.seconds)
  .groupBy("sensor_id")
  .agg(mean("measurement"))
{code}

using the DataSet/DataFrame api.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to