[
https://issues.apache.org/jira/browse/FLINK-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937910#comment-15937910
]
ASF GitHub Bot commented on FLINK-5653:
---------------------------------------
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/3574
Hi @huawei-flink, let me explain the idea of using `MapState` and its
benefits in more detail.
I'll start with the way that a `ListState` works. With `ListState` we can
get efficient access to the head element of the list. However, when updating
the `ListState`, we cannot remove individual elements but have to clear the
complete state and reinsert all elements that should remain. Hence we always
need to deserialize and serialize all elements of a `ListState`.
With the `MapState` approach, we would put the elements in a map which is
keyed on their processing timestamp. Since multiple records can arrive within
the same millisecond, we use a `List[Row]` as value type for the map. To
process a new row, we have to find the "oldest" row (i.e., the one with the
smallest timestamp) to retract it from the accumulator. With `ListState` this
is trivial, it is the head element. With `MapState` we have to iterate over the
keys and find the smallest one (smallest processing timestamp). This requires
to deserialize all keys, but these are only `Long` values and not complete
rows. With the smallest key, we can get the `List[Row]` value and take the
first Row from the list and retract it from the accumulator. When updating the
state, we only update the `List[Row]` value of the smallest key (or possible
remove it if the `List[Row]` became empty).
So the benefit of using `MapState` of `ListState` is that we only read `n`
Long (+ read/write 1 `List[Row]`) instead of reading and writing `n` Row values.
> Add processing time OVER ROWS BETWEEN x PRECEDING aggregation to SQL
> --------------------------------------------------------------------
>
> Key: FLINK-5653
> URL: https://issues.apache.org/jira/browse/FLINK-5653
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Reporter: Fabian Hueske
> Assignee: Stefano Bortoli
>
> The goal of this issue is to add support for OVER ROWS aggregations on
> processing time streams to the SQL interface.
> Queries similar to the following should be supported:
> {code}
> SELECT
> a,
> SUM(b) OVER (PARTITION BY c ORDER BY procTime() ROWS BETWEEN 2 PRECEDING
> AND CURRENT ROW) AS sumB,
> MIN(b) OVER (PARTITION BY c ORDER BY procTime() ROWS BETWEEN 2 PRECEDING
> AND CURRENT ROW) AS minB
> FROM myStream
> {code}
> The following restrictions should initially apply:
> - All OVER clauses in the same SELECT clause must be exactly the same.
> - The PARTITION BY clause is optional (no partitioning results in single
> threaded execution).
> - The ORDER BY clause may only have procTime() as parameter. procTime() is a
> parameterless scalar function that just indicates processing time mode.
> - UNBOUNDED PRECEDING is not supported (see FLINK-5656)
> - FOLLOWING is not supported.
> The restrictions will be resolved in follow up issues. If we find that some
> of the restrictions are trivial to address, we can add the functionality in
> this issue as well.
> This issue includes:
> - Design of the DataStream operator to compute OVER ROW aggregates
> - Translation from Calcite's RelNode representation (LogicalProject with
> RexOver expression).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)