[ https://issues.apache.org/jira/browse/BEAM-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507379#comment-15507379 ]
Ben Chambers commented on BEAM-644: ----------------------------------- Minor note on "A function from TimestampedElement<T> to new timestamp that always falls within D of the original timestamp." Rather than "within D" I think the requirement is that for an input with timestamp t, the output timestamp is >= t+D. This ensures that the output timestamps relation to the output watermark is no later than the input timestamps relation to the input watermark. > Primitive to shift the watermark while assigning timestamps > ----------------------------------------------------------- > > Key: BEAM-644 > URL: https://issues.apache.org/jira/browse/BEAM-644 > Project: Beam > Issue Type: New Feature > Components: beam-model > Reporter: Kenneth Knowles > Assignee: Kenneth Knowles > > There is a general need, especially important in the presence of > SplittableDoFn, to be able to assign new timestamps to elements without > making them late or droppable. > - DoFn.withAllowedTimestampSkew is inadequate, because it simply allows one > to produce late data, but does not allow one to shift the watermark so the > new data is on-time. > - For a SplittableDoFn, one may receive an element such as the name of a log > file that contains elements for the day preceding the log file. The timestamp > on the filename must currently be the beginning of the log. If such elements > are constantly flowing, it may be OK, but since we don't know that element is > coming, in that absence of data, the watermark may advance. We need a way to > keep it far enough back even in the absence of data holding it back. > One idea is a new primitive ShiftWatermark / AdjustTimestamps with the > following pieces: > - A constant duration (positive or negative) D by which to shift the > watermark. > - A function from TimestampedElement<T> to new timestamp that always falls > within D of the original timestamp. > With this primitive added, outputWithTimestamp and withAllowedTimestampSkew > could be removed, simplifying DoFn. > Alternatively, all of this functionality could be bolted on to DoFn. > This ticket is not a proposal, but a record of the issue and ideas that were > mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)