Hi Thomas, Thanks a lot for explaining.
Best, Shen On Thu, Jan 26, 2017 at 12:48 PM, Thomas Groh <[email protected]> wrote: > The default timestamp should be BoundedWindow.TIMESTAMP_MIN_VALUE, which > is > equivalent to -2**63 microseconds. We also occasionally refer to this > timestamp as "negative infinity". > > The default watermark policy for a bounded source should be negative > infinity until all of the data is read, then positive infinity. There isn't > really a default watermark policy for an unbounded source - this is > dependent on the data that hasn't been read from that source, so it's > dependent on where you're reading from. > > Currently, modifying the timestamp of an element from within a DoFn does > not modify the watermark; modifying a timestamp forwards in time is > generally "safe", as it can't cause data to move to behind the watermark - > this is why moving elements backwards in time requires setting > "withAllowedTimestampSkew" (which also doesn't modify the watermark, which > means that elements that are moved backwards in time can become late and be > dropped by a runner). I don't think we currently have any changes in-flight > to make this configurable. > > On Wed, Jan 25, 2017 at 9:24 PM, Shen Li <[email protected]> wrote: > > > Hi, > > > > When reading from a source with no timestamp specified on elements, what > > should be the default timestamp? I presume that it should be 0 as I saw > > PAssertTest trying to set timestamps to very small values with 0 allowed > > timestamp skew. Is that right? > > > > What about the default watermark policy? > > > > If a ParDo modifies the timestamp using > > DoFnProcessContext.outputWithTimestamp, how should that affect the > output > > watermark? Say the ParDo adds 100 seconds to the timestamp of each > element > > in processElement, how could the runner know it should also add 100 > seconds > > to output timestamps? > > > > Thanks, > > > > Shen > > >
