[ 
https://issues.apache.org/jira/browse/BEAM-9733?focusedWorklogId=422178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422178
 ]

ASF GitHub Bot logged work on BEAM-9733:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Apr/20 17:36
            Start Date: 14/Apr/20 17:36
    Worklog Time Spent: 10m 
      Work Description: mxm commented on issue #11362: [BEAM-9733] Always let 
ImpulseSourceFunction emit a final watermark
URL: https://github.com/apache/beam/pull/11362#issuecomment-613579619
 
 
   > @mxm these are significant changes after the PR was approved. Questions:
   
   I'm aware of that. I've tried to keep the necessary changes at a minimum. 
I'm happy to hear your feedback.
   
   > 1. What test coverage do we have for ensuring that watermarks don't bypass 
in-progress elements?
   
   We have tests in `ExecutableStageDoFnOperatorTest` 
(`outputsAreTaggedCorrectly`, `testEnsureDeferredStateCleanupTimerFiring`) and 
through our integration tests. Admittedly, it would be good to dedicate a test 
specifically to watermark behavior.
   
   > 2. Do these changes affect how the main input watermark interacts with the 
side input watermark?
   
   Effectively, side inputs in portability were broken before because (a) the 
side input watermark hold was abused by the the portable operator (b) only 
`processWatermark` was overridden but for proper support for side inputs we 
have to override `processWatermark1` (1 is the main input when we have side 
inputs, `DoFnoperator#processWatermark` calls `processWatermark1` when we do 
not have side inputs, but `processWatermark` is not called when we have side 
input, only `processWatermark1`).
    
   > 3. Will the added watermark logging affect the usefulness of debug logging 
for other investigations (I had in the past removed it after done debugging 
issues)
   
   It greatly helped me to debug the current behavior and develop the solution. 
I find it immensely helpful and would like to keep it if further debugging is 
necessary. It is very useful to have debug information already built-in, 
instead of having to add it manually every time (which in any case will be 
possible).
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 422178)
    Time Spent: 3h 10m  (was: 3h)

> ImpulseSourceFunction does not emit a final watermark
> -----------------------------------------------------
>
>                 Key: BEAM-9733
>                 URL: https://issues.apache.org/jira/browse/BEAM-9733
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Critical
>             Fix For: 2.21.0
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The Flink Runner's {{ImpulseSourceFunction}} does not emit a final watermark, 
> unless {{--shutdownSourcesOnFinalWatermark}} flag has been specified (the 
> flag is used in tests to shutdown the pipeline after reading all data). Most 
> pipelines will be long-running and thus do not specify the flag. 
> Not sending out the final watermark causes GroupByKey to hold back the data 
> of event time windows until the pipeline is shut down (the final watermark is 
> always emitted on pipeline shutdown which is why using the above flag works).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to