Re: Python SDK timestamp precision

Kenneth Knowles Tue, 16 Apr 2019 19:14:19 -0700

I am not so sure this is a good idea. Here are some systems and their
precision:

Arrow - microseconds
BigQuery - microseconds
New Java instant - nanoseconds
Firestore - microseconds
Protobuf - nanoseconds
Dataflow backend - microseconds
Postgresql - microseconds
Pubsub publish time - nanoseconds
MSSQL datetime2 - 100 nanoseconds (original datetime about 3 millis)
Cassandra - milliseconds

IMO it is important to be able to treat any of these as a Beam timestamp,
even though they aren't all streaming. Who knows when we might be ingesting
a streamed changelog, or using them for reprocessing an archived stream. I
think for this purpose we either should standardize on nanoseconds or make
the runner's resolution independent of the data representation.

I've had some offline conversations about this. I think we can have
higher-than-runner precision in the user data, and allow WindowFns and
DoFns to operate on this higher-than-runner precision data, and still have
consistent watermark treatment. Watermarks are just bounds, after all.

Kenn

On Tue, Apr 16, 2019 at 6:48 PM Thomas Weise <[email protected]> wrote:

> The Python SDK currently uses timestamps in microsecond resolution while
> Java SDK, as most would probably expect, uses milliseconds.
>
> This causes a few difficulties with portability (Python coders need to
> convert to millis for WindowedValue and Timers, which is related to a bug
> I'm looking into:
>
> https://issues.apache.org/jira/browse/BEAM-7035
>
> As Luke pointed out, the issue was previously discussed:
>
> https://issues.apache.org/jira/browse/BEAM-1524
>
> I'm not privy to the reasons why we decided to go with micros in first
> place, but would it be too big of a change or impractical for other reasons
> to switch Python SDK to millis before it gets more users?
>
> Thanks,
> Thomas
>
>

Re: Python SDK timestamp precision

Reply via email to