I am not so sure this is a good idea. Here are some systems and their precision:
Arrow - microseconds BigQuery - microseconds New Java instant - nanoseconds Firestore - microseconds Protobuf - nanoseconds Dataflow backend - microseconds Postgresql - microseconds Pubsub publish time - nanoseconds MSSQL datetime2 - 100 nanoseconds (original datetime about 3 millis) Cassandra - milliseconds IMO it is important to be able to treat any of these as a Beam timestamp, even though they aren't all streaming. Who knows when we might be ingesting a streamed changelog, or using them for reprocessing an archived stream. I think for this purpose we either should standardize on nanoseconds or make the runner's resolution independent of the data representation. I've had some offline conversations about this. I think we can have higher-than-runner precision in the user data, and allow WindowFns and DoFns to operate on this higher-than-runner precision data, and still have consistent watermark treatment. Watermarks are just bounds, after all. Kenn On Tue, Apr 16, 2019 at 6:48 PM Thomas Weise <[email protected]> wrote: > The Python SDK currently uses timestamps in microsecond resolution while > Java SDK, as most would probably expect, uses milliseconds. > > This causes a few difficulties with portability (Python coders need to > convert to millis for WindowedValue and Timers, which is related to a bug > I'm looking into: > > https://issues.apache.org/jira/browse/BEAM-7035 > > As Luke pointed out, the issue was previously discussed: > > https://issues.apache.org/jira/browse/BEAM-1524 > > I'm not privy to the reasons why we decided to go with micros in first > place, but would it be too big of a change or impractical for other reasons > to switch Python SDK to millis before it gets more users? > > Thanks, > Thomas > >
