[ https://issues.apache.org/jira/browse/BEAM-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953812#comment-15953812 ]
Daniel Halperin commented on BEAM-644: -------------------------------------- [~jkff] any update here w.r.t. {{SplittableDoFn}}? > Primitive to shift the watermark while assigning timestamps > ----------------------------------------------------------- > > Key: BEAM-644 > URL: https://issues.apache.org/jira/browse/BEAM-644 > Project: Beam > Issue Type: New Feature > Components: beam-model, beam-model-runner-api > Reporter: Kenneth Knowles > > There is a general need, especially important in the presence of > SplittableDoFn, to be able to assign new timestamps to elements without > making them late or droppable. > - DoFn.withAllowedTimestampSkew is inadequate, because it simply allows one > to produce late data, but does not allow one to shift the watermark so the > new data is on-time. > - For a SplittableDoFn, one may receive an element such as the name of a log > file that contains elements for the day preceding the log file. The timestamp > on the filename must currently be the beginning of the log. If such elements > are constantly flowing, it may be OK, but since we don't know that element is > coming, in that absence of data, the watermark may advance. We need a way to > keep it far enough back even in the absence of data holding it back. > One idea is a new primitive ShiftWatermark / AdjustTimestamps with the > following pieces: > - A constant duration (positive or negative) D by which to shift the > watermark. > - A function from TimestampedElement<T> to new timestamp that is >= t + D > So, for example, AdjustTimestamps(<-60 minutes>, f) would allow f to make > timestamps up to 60 minutes earlier. > With this primitive added, outputWithTimestamp and withAllowedTimestampSkew > could be removed, simplifying DoFn. > Alternatively, all of this functionality could be bolted on to DoFn. > This ticket is not a proposal, but a record of the issue and ideas that were > mentioned. -- This message was sent by Atlassian JIRA (v6.3.15#6346)