[ 
https://issues.apache.org/jira/browse/BEAM-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17122741#comment-17122741
 ] 

Beam JIRA Bot commented on BEAM-7825:
-------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it 
has been labeled "stale-P2". If this issue is still affecting you, we care! 
Please comment and remove the label. Otherwise, in 14 days the issue will be 
moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed 
explanation of what these priorities mean.


> Python's DirectRunner emits multiple panes per window and does not discard 
> late data
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-7825
>                 URL: https://issues.apache.org/jira/browse/BEAM-7825
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.13.0
>         Environment: OS: Debian rodete.
> Beam versions: 2.15.0.dev.
> Python versions: Python 2.7, Python 3.7
>            Reporter: Alexey Strokach
>            Priority: P2
>              Labels: stale-P2
>          Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The documentation for Beam's Windowing and Triggers functionality [states 
> that|https://beam.apache.org/documentation/programming-guide/#triggers] _"if 
> you use Beam’s default windowing configuration and default trigger, Beam 
> outputs the aggregated result when it estimates all data has arrived, and 
> discards all subsequent data for that window"_. However, it seems that the 
> current behavior of Python's DirectRunner is inconsistent with both of those 
> points. As the {{StreamingWordGroupIT.test_discard_late_data}} test shows, 
> DirectRunner appears to process every data point that it reads from the input 
> stream, irrespective of whether or not the timestamp of that data point is 
> older than the timestamps of the windows that have already been processed. 
> Furthermore, as the {{StreamingWordGroupIT.test_single_output_per_window}} 
> test shows, DirectRunner generates multiple "panes" for the same window, 
> apparently disregarding the notion of a watermark?
> The Dataflow runner passes both of those end-to-end tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to