kennknowles commented on PR #36838:
URL: https://github.com/apache/beam/pull/36838#issuecomment-3558736022

   > > Thanks @kennknowles, @clairemcginty is planning to do some testing today.
   > 
   > Ran our 
[tests](https://github.com/spotify/scio/actions/runs/19513763696/job/55860003717)
 with a locally built snapshot of this branch, still seeing the following error:
   > 
   > ```
   > [info]   Cause: java.lang.IllegalArgumentException: Cannot output with 
timestamp 1970-01-01T00:00:00.001Z. Output timestamps must be no earlier than 
the timestamp of the current input or timer (1970-01-01T00:00:00.010Z) minus 
the allowed skew (0 milliseconds) and no later than 294247-01-10T04:00:54.775Z. 
See the DoFn#getAllowedTimestampSkew() Javadoc for details on changing the 
allowed skew.
   > [info]   at 
org.apache.beam.runners.core.SimpleDoFnRunner.checkTimestamp(SimpleDoFnRunner.java:263)
   > [info]   at 
org.apache.beam.runners.core.SimpleDoFnRunner.access$1300(SimpleDoFnRunner.java:89)
   > [info]   at 
org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.lambda$outputWindowedValue$0(SimpleDoFnRunner.java:464)
   > [info]   at 
org.apache.beam.sdk.values.WindowedValues$Builder.output(WindowedValues.java:222)
   > [info]   at 
org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.outputWindowedValue(SimpleDoFnRunner.java:467)
   > [info]   at 
org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.output(DoFnOutputReceivers.java:123)
   > [info]   at 
org.apache.beam.sdk.values.WindowedValues$Builder.output(WindowedValues.java:222)
   > [info]   at 
org.apache.beam.sdk.transforms.DoFn$OutputReceiver.outputWindowedValue(DoFn.java:416)
   > [info]   at 
com.spotify.scio.transforms.FileDownloadDoFn.lambda$processElement$0(FileDownloadDoFn.java:105)
   > [info]   at 
com.spotify.scio.transforms.FileDownloadDoFn.flush(FileDownloadDoFn.java:141)
   > [info]   ...
   > ```
   > 
   > is it possible the precision of the timestamp checks changed, or something 
like that? `1970-01-01T00:00:00.001Z` and `1970-01-01T00:00:00.010Z` are only 
off by a millisecond
   
   I don't think so. There's https://s.apache.org/beam-timestamp-strategy but I 
don't think it is related to this.
   
   Please note that this failure is in a different place than the original 
report so my commentary may not apply to both. I looked at the Scio code at 
https://github.com/spotify/scio/blob/041ae06b881dc142ec7e5e6392af7bf627a5d75f/scio-core/src/main/java/com/spotify/scio/transforms/FileDownloadDoFn.java#L105
 and I do see that the DoFn is pulling elements from `batch` and outputting 
them in the context of a different element. So this will definitely look to the 
runtime like outputting an element with a timestamp that is older than the 
current input element. I think the fact that it worked before may have just 
been luck. Though I can see how unlikely it is for it to have worked for as 
long as it has on that basis. Nonetheless the fact that the `FileDownloadDoFn` 
would hit this error is pretty plain.
   
   In discussion with @scwhittle he points out that this would be perfectly 
legal if done from `@FinishBundle`. So it is a bit restrictive to disallow it 
in a process element. The use case of accumulating a batch in a local variable 
and flushing periodically as in `FileDownloadDoFn` is clearly something we want 
to allow. To do so we have a few options:
   
    - Eliminate the timestamp skew check. This might be fine, especially if it 
is done intentionally. The check is just to protect DoFn authors from error.
    - Track all timestamps in the bundle and allow skew back to the earliest in 
the bundle. This seems pretty ad hoc. It allows this use case but without sound 
reasoning.
    - Check timestamps against the watermark rather than the current element. I 
think this is principled but I'm not sure if we have plumbed the needed 
watermark to the right place for this to be easy.
   
   I suggest this:
   
    - Immediately, you can just set allowed skew to a large number and it will 
effectively disable the check.
    - Short term, we can add a method to just disable the check entirely for a 
DoFn that has this computation pattern.
    - Medium term, we can revise it to check against the watermark, after 
thinking a bit more to ensure this is going to be ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to