[ https://issues.apache.org/jira/browse/BEAM-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17122741#comment-17122741 ]
Beam JIRA Bot commented on BEAM-7825: ------------------------------------- This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3. Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean. > Python's DirectRunner emits multiple panes per window and does not discard > late data > ------------------------------------------------------------------------------------ > > Key: BEAM-7825 > URL: https://issues.apache.org/jira/browse/BEAM-7825 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Affects Versions: 2.13.0 > Environment: OS: Debian rodete. > Beam versions: 2.15.0.dev. > Python versions: Python 2.7, Python 3.7 > Reporter: Alexey Strokach > Priority: P2 > Labels: stale-P2 > Time Spent: 5h 10m > Remaining Estimate: 0h > > The documentation for Beam's Windowing and Triggers functionality [states > that|https://beam.apache.org/documentation/programming-guide/#triggers] _"if > you use Beam’s default windowing configuration and default trigger, Beam > outputs the aggregated result when it estimates all data has arrived, and > discards all subsequent data for that window"_. However, it seems that the > current behavior of Python's DirectRunner is inconsistent with both of those > points. As the {{StreamingWordGroupIT.test_discard_late_data}} test shows, > DirectRunner appears to process every data point that it reads from the input > stream, irrespective of whether or not the timestamp of that data point is > older than the timestamps of the windows that have already been processed. > Furthermore, as the {{StreamingWordGroupIT.test_single_output_per_window}} > test shows, DirectRunner generates multiple "panes" for the same window, > apparently disregarding the notion of a watermark? > The Dataflow runner passes both of those end-to-end tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)