[ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=262543&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262543
 ]

ASF GitHub Bot logged work on BEAM-7520:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jun/19 18:25
            Start Date: 18/Jun/19 18:25
    Worklog Time Spent: 10m 
      Work Description: je-ik commented on pull request #8815: [BEAM-7520] fire 
timers only for single instant at a moment
URL: https://github.com/apache/beam/pull/8815#discussion_r294964234
 
 

 ##########
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
 ##########
 @@ -651,16 +652,12 @@ private void processElement(Map<W, W> 
windowToMergeResult, WindowedValue<InputT>
       // it but the local output watermark (also for this key) has not. After 
data is emitted and
       // the output watermark hold is released, the output watermark on this 
key will immediately
       // exceed the end of the window (otherwise we could see multiple ON_TIME 
outputs)
-      this.isEndOfWindow =
-          
timerInternals.currentInputWatermarkTime().isAfter(window.maxTimestamp())
-              && outputWatermarkBeforeEOW;
+      this.isEndOfWindow = !timestamp.isBefore(window.maxTimestamp()) && 
outputWatermarkBeforeEOW;
 
       // The "GC time" is reached when the input watermark surpasses the end 
of the window
       // plus allowed lateness. After this, the window is expired and expunged.
       this.isGarbageCollection =
-          timerInternals
-              .currentInputWatermarkTime()
-              .isAfter(LateDataUtils.garbageCollectionTime(window, 
windowingStrategy));
+          !timestamp.isBefore(LateDataUtils.garbageCollectionTime(window, 
windowingStrategy));
 
 Review comment:
   This one was tough. The problem here was, that previously input watermark 
moved at the end of each bundle. Each bundle also contained its own timers. But 
- because timers can generate new timers and these are processed in different 
bundle - the actual timestamp of timer must be taken into account. Otherwise a 
timer setup for some earlier time might trigger window garbage collection, 
because input watermark is already way ahead. I'm not quite sure of the other 
consequences though. Is it correct, that one bundle might generate multiple 
bundles, or is there a bundle atomicity requirement? On the other hand, runners 
that don't have concept of bundles (and have therefore by definition bundle of 
size 1) have to generate multiple bundles from single bundle, so that might be 
ok. Am I right?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 262543)
    Time Spent: 0.5h  (was: 20m)

> DirectRunner timers are not strictly time ordered
> -------------------------------------------------
>
>                 Key: BEAM-7520
>                 URL: https://issues.apache.org/jira/browse/BEAM-7520
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-direct
>    Affects Versions: 2.13.0
>            Reporter: Jan Lukavský
>            Assignee: Jan Lukavský
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between <windowStart, windowEnd), let's denote that 
> timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to