kennknowles commented on PR #36838: URL: https://github.com/apache/beam/pull/36838#issuecomment-3558736022
> > Thanks @kennknowles, @clairemcginty is planning to do some testing today. > > Ran our [tests](https://github.com/spotify/scio/actions/runs/19513763696/job/55860003717) with a locally built snapshot of this branch, still seeing the following error: > > ``` > [info] Cause: java.lang.IllegalArgumentException: Cannot output with timestamp 1970-01-01T00:00:00.001Z. Output timestamps must be no earlier than the timestamp of the current input or timer (1970-01-01T00:00:00.010Z) minus the allowed skew (0 milliseconds) and no later than 294247-01-10T04:00:54.775Z. See the DoFn#getAllowedTimestampSkew() Javadoc for details on changing the allowed skew. > [info] at org.apache.beam.runners.core.SimpleDoFnRunner.checkTimestamp(SimpleDoFnRunner.java:263) > [info] at org.apache.beam.runners.core.SimpleDoFnRunner.access$1300(SimpleDoFnRunner.java:89) > [info] at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.lambda$outputWindowedValue$0(SimpleDoFnRunner.java:464) > [info] at org.apache.beam.sdk.values.WindowedValues$Builder.output(WindowedValues.java:222) > [info] at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.outputWindowedValue(SimpleDoFnRunner.java:467) > [info] at org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.output(DoFnOutputReceivers.java:123) > [info] at org.apache.beam.sdk.values.WindowedValues$Builder.output(WindowedValues.java:222) > [info] at org.apache.beam.sdk.transforms.DoFn$OutputReceiver.outputWindowedValue(DoFn.java:416) > [info] at com.spotify.scio.transforms.FileDownloadDoFn.lambda$processElement$0(FileDownloadDoFn.java:105) > [info] at com.spotify.scio.transforms.FileDownloadDoFn.flush(FileDownloadDoFn.java:141) > [info] ... > ``` > > is it possible the precision of the timestamp checks changed, or something like that? `1970-01-01T00:00:00.001Z` and `1970-01-01T00:00:00.010Z` are only off by a millisecond I don't think so. There's https://s.apache.org/beam-timestamp-strategy but I don't think it is related to this. Please note that this failure is in a different place than the original report so my commentary may not apply to both. I looked at the Scio code at https://github.com/spotify/scio/blob/041ae06b881dc142ec7e5e6392af7bf627a5d75f/scio-core/src/main/java/com/spotify/scio/transforms/FileDownloadDoFn.java#L105 and I do see that the DoFn is pulling elements from `batch` and outputting them in the context of a different element. So this will definitely look to the runtime like outputting an element with a timestamp that is older than the current input element. I think the fact that it worked before may have just been luck. Though I can see how unlikely it is for it to have worked for as long as it has on that basis. Nonetheless the fact that the `FileDownloadDoFn` would hit this error is pretty plain. In discussion with @scwhittle he points out that this would be perfectly legal if done from `@FinishBundle`. So it is a bit restrictive to disallow it in a process element. The use case of accumulating a batch in a local variable and flushing periodically as in `FileDownloadDoFn` is clearly something we want to allow. To do so we have a few options: - Eliminate the timestamp skew check. This might be fine, especially if it is done intentionally. The check is just to protect DoFn authors from error. - Track all timestamps in the bundle and allow skew back to the earliest in the bundle. This seems pretty ad hoc. It allows this use case but without sound reasoning. - Check timestamps against the watermark rather than the current element. I think this is principled but I'm not sure if we have plumbed the needed watermark to the right place for this to be easy. I suggest this: - Immediately, you can just set allowed skew to a large number and it will effectively disable the check. - Short term, we can add a method to just disable the check entirely for a DoFn that has this computation pattern. - Medium term, we can revise it to check against the watermark, after thinking a bit more to ensure this is going to be ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
