[ https://issues.apache.org/jira/browse/FLINK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839971#comment-15839971 ]
ASF GitHub Bot commented on FLINK-5529: --------------------------------------- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3191#discussion_r98028880 --- Diff: docs/dev/windows.md --- @@ -795,15 +821,17 @@ necessarily the ones that arrive first or last. ## Allowed Lateness -When working with *event-time* windowing it can happen that elements arrive late, *i.e.* the watermark that Flink uses to +When working with *event-time* windowing, it can happen that elements arrive late, *i.e.* the watermark that Flink uses to keep track of the progress of event-time is already past the end timestamp of a window to which an element belongs. See [event time](./event_time.html) and especially [late elements](./event_time.html#late-elements) for a more thorough discussion of how Flink deals with event time. -By default, late elements are dropped if their associated window was already evaluated. However, +By default, late elements are dropped when the watermark is past the end of the window. However, Flink allows to specify a maximum *allowed lateness* for window operators. Allowed lateness -specifies by how much time elements can be late before they are dropped. Elements that arrive -within the allowed lateness of a window are still added to the window and trigger an immediate evaluation of the window which might emit elements. +specifies by how much time elements can be late before they are dropped, and its default value is 0. +Elements that arrive after the watermark is past the end of the window but before it passes the end of +window plus the allowed lateness, are still added to the window. Depending on the trigger used, --- End diff -- "passes the end of window" -> "**passed** the end of **the** window" > Improve / extends windowing documentation > ----------------------------------------- > > Key: FLINK-5529 > URL: https://issues.apache.org/jira/browse/FLINK-5529 > Project: Flink > Issue Type: Sub-task > Components: Documentation > Reporter: Stephan Ewen > Assignee: Kostas Kloudas > Fix For: 1.2.0, 1.3.0 > > > Suggested Outline: > {code} > Windows > (0) Outline: The anatomy of a window operation > stream > [.keyBy(...)] <- keyed versus non-keyed windows > .window(...) <- required: "assigner" > [.trigger(...)] <- optional: "trigger" (else default trigger) > [.evictor(...)] <- optional: "evictor" (else no evictor) > [.allowedLateness()] <- optional, else zero > .reduce/fold/apply() <- required: "function" > (1) Types of windows > - tumble > - slide > - session > - global > (2) Pre-defined windows > timeWindow() (tumble, slide) > countWindow() (tumble, slide) > - mention that count windows are inherently > resource leaky unless limited key space > (3) Window Functions > - apply: most basic, iterates over elements in window > > - aggregating: reduce and fold, can be used with "apply()" which will get > one element > > - forward reference to state size section > (4) Advanced Windows > - assigner > - simple > - merging > - trigger > - registering timers (processing time, event time) > - state in triggers > - life cycle of a window > - create > - state > - cleanup > - when is window contents purged > - when is state dropped > - when is metadata (like merging set) dropped > (5) Late data > - picture > - fire vs fire_and_purge: late accumulates vs late resurrects (cf > discarding mode) > > (6) Evictors > - TDB > > (7) State size: How large will the state be? > Basic rule: Each element has one copy per window it is assigned to > --> num windows * num elements in window > --> example: tumbline is one copy, sliding(n,m) is n/m copies > --> per key > Pre-aggregation: > - if reduce or fold is set -> one element per window (rather than num > elements in window) > - evictor voids pre-aggregation from the perspective of state > Special rules: > - fold cannot pre-aggregate on session windows (and other merging windows) > (8) Non-keyed windows > - all elements through the same windows > - currently not parallel > - possible parallel in the future when having pre-aggregation functions > - inherently (by definition) produce a result stream with parallelism one > - state similar to one key of keyed windows > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)