[ https://issues.apache.org/jira/browse/FLINK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838202#comment-15838202 ]
ASF GitHub Bot commented on FLINK-5529: --------------------------------------- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3191#discussion_r97833002 --- Diff: docs/dev/windows.md --- @@ -758,8 +831,33 @@ input val input: DataStream[T] = ... input - .windowAll(<window assigner>) + .keyBy(<key selector>) + .window(<window assigner>) + .allowedLateness(<time>) .<windowed transformation>(<window function>) {% endhighlight %} </div> </div> + +<span class="label label-info">Note</span> When using the `GlobalWindows` window assigner no +data is ever considered late because the end timestamp of the global window is `Long.MAX_VALUE`. + +### Late elements considerations + +When specifying an allowed lateness greater than 0, the window along with its content is kept after the watermark passes +the end of the window. In these cases, when a late but not dropped element arrives, it will trigger another firing for the --- End diff -- Not "it will trigger" but "it could trigger" (all depends on the Trigger 😉 ) > Improve / extends windowing documentation > ----------------------------------------- > > Key: FLINK-5529 > URL: https://issues.apache.org/jira/browse/FLINK-5529 > Project: Flink > Issue Type: Sub-task > Components: Documentation > Reporter: Stephan Ewen > Assignee: Kostas Kloudas > Fix For: 1.2.0, 1.3.0 > > > Suggested Outline: > {code} > Windows > (0) Outline: The anatomy of a window operation > stream > [.keyBy(...)] <- keyed versus non-keyed windows > .window(...) <- required: "assigner" > [.trigger(...)] <- optional: "trigger" (else default trigger) > [.evictor(...)] <- optional: "evictor" (else no evictor) > [.allowedLateness()] <- optional, else zero > .reduce/fold/apply() <- required: "function" > (1) Types of windows > - tumble > - slide > - session > - global > (2) Pre-defined windows > timeWindow() (tumble, slide) > countWindow() (tumble, slide) > - mention that count windows are inherently > resource leaky unless limited key space > (3) Window Functions > - apply: most basic, iterates over elements in window > > - aggregating: reduce and fold, can be used with "apply()" which will get > one element > > - forward reference to state size section > (4) Advanced Windows > - assigner > - simple > - merging > - trigger > - registering timers (processing time, event time) > - state in triggers > - life cycle of a window > - create > - state > - cleanup > - when is window contents purged > - when is state dropped > - when is metadata (like merging set) dropped > (5) Late data > - picture > - fire vs fire_and_purge: late accumulates vs late resurrects (cf > discarding mode) > > (6) Evictors > - TDB > > (7) State size: How large will the state be? > Basic rule: Each element has one copy per window it is assigned to > --> num windows * num elements in window > --> example: tumbline is one copy, sliding(n,m) is n/m copies > --> per key > Pre-aggregation: > - if reduce or fold is set -> one element per window (rather than num > elements in window) > - evictor voids pre-aggregation from the perspective of state > Special rules: > - fold cannot pre-aggregate on session windows (and other merging windows) > (8) Non-keyed windows > - all elements through the same windows > - currently not parallel > - possible parallel in the future when having pre-aggregation functions > - inherently (by definition) produce a result stream with parallelism one > - state similar to one key of keyed windows > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)