godfreyhe commented on a change in pull request #17651:
URL: https://github.com/apache/flink/pull/17651#discussion_r747367356



##########
File path: docs/content.zh/docs/dev/table/sql/queries/window-deduplication.md
##########
@@ -0,0 +1,115 @@
+---
+title: "窗口去重"
+weight: 16
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Window Deduplication
+{{< label Streaming >}}
+
+Window Deduplication is a special [Deduplication]({{< ref 
"docs/dev/table/sql/queries/deduplication" >}}) which removes rows that 
duplicate over a set of columns, keeping the first one or the last one for each 
window and partitioned keys. 
+
+For streaming queries, unlike regular Deduplicate on continuous tables, Window 
Deduplication does not emit intermediate results but only a final result at the 
end of the window. Moreover, window Deduplication purges all intermediate state 
when no longer needed.
+Therefore, window Deduplication queries have better performance if users don't 
need results updated per record. Usually, Window Deduplication is used with 
[Windowing TVF]({{< ref "docs/dev/table/sql/queries/window-tvf" >}}) directly. 
Besides, Window Deduplication could be used with other operations based on 
[Windowing TVF]({{< ref "docs/dev/table/sql/queries/window-tvf" >}}), such as 
[Window Aggregation]({{< ref "docs/dev/table/sql/queries/window-agg" >}}), 
[Window TopN]({{< ref "docs/dev/table/sql/queries/window-topn">}}) and [Window 
Join]({{< ref "docs/dev/table/sql/queries/window-join">}}). 
+
+Window Deduplication can be defined in the same syntax as regular 
Deduplication, see [Deduplication documentation]({{< ref 
"docs/dev/table/sql/queries/deduplication" >}}) for more information.
+Besides that, Window Deduplication requires the `PARTITION BY` clause contains 
`window_start` and `window_end` columns of the relation.
+Otherwise, the optimizer won’t be able to translate the query.
+
+Flink uses `ROW_NUMBER()` to remove duplicates, just like the way of [Window 
Top-N query]({{< ref "docs/dev/table/sql/queries/window-topn" >}}). In theory, 
Window Deduplication is a special case of Window Top-N in which the N is one 
and order by the processing time or event time.
+
+The following shows the syntax of the Window Deduplication statement:
+
+```sql
+SELECT [column_list]
+FROM (
+   SELECT [column_list],
+     ROW_NUMBER() OVER (PARTITION BY window_start, window_end [, col_key1...]
+       ORDER BY time_attr [asc|desc]) AS rownum
+   FROM table_name) -- relation applied windowing TVF
+WHERE (rownum = 1 | rownum <=1 | rownum < 2) [AND conditions]
+```
+
+**Parameter Specification:**
+
+- `ROW_NUMBER()`: Assigns an unique, sequential number to each row, starting 
with one.
+- `PARTITION BY window_start, window_end [, col_key1...]`: Specifies the 
partition columns which contain `window_start`, `window_end` and other 
partition keys.
+- `ORDER BY time_attr [asc|desc]`: Specifies the ordering column, it must be a 
[time attribute]({{< ref "docs/dev/table/concepts/time_attributes" >}}). 
Currently Flink supports [processing time attribute]({{< ref 
"docs/dev/table/concepts/time_attributes" >}}#processing-time) and [event time 
attribute]({{< ref "docs/dev/table/concepts/time_attributes" >}}#event-time). 
Ordering by ASC means keeping the first row, ordering by DESC means keeping the 
last row.
+- `WHERE (rownum = 1 | rownum <=1 | rownum < 2)`: The `rownum = 1 | rownum <=1 
| rownum < 2` is required for Flink to recognize this query is Window 
Deduplication.

Review comment:
       `is required for Flink to recognize the first/last row is needed` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to