Re: [DISCUSS] More precision supported by DATETIME field in Schema

Reuven Lax Tue, 06 Nov 2018 11:47:03 -0800

The main difference (though possibly theoretical) is when time runs out.
With 64 bits and nanosecond precision, we can only represent times about
244 years in the future (or the past).


On Tue, Nov 6, 2018 at 11:30 AM Kenneth Knowles <k...@apache.org> wrote:

> I like nanoseconds as extremely future-proof. What about specing this out
> in stages (1) domain of values (2) portable encoding that can represent
> those values (3) language-specific types to embed the values in.
>
> 1. If it is a nanosecond-precision absolute time, and we eventually want
> to migrate event time timestamps to match, then we need values for "end of
> global window" and "end of time". TBH I am not sure we need both of these
> any more. We can either define a max on the nanosecond range or create
> distinguished values.
>
> 2. For portability, presumably an order-preserving integer encoding of
> nanoseconds since epoch with whatever tweaks to allow for representing the
> end of time. It might be useful to find a way to allow multiple. Not super
> useful at a particular version, but might have given us a migration path.
> It would also allow experiments for performance.
>
> 3. We could probably find a way to keep user-facing API compatibility here
> while increasing underlying precision at 1 and 2, but I probably not worth
> it. A new Java type IMO addresses the lossiness issue because a user would
> have to explicitly request truncation to assign to a millis event time
> timestamp.
>
> Kenn
>
> On Tue, Nov 6, 2018 at 12:55 AM Charles Chen <c...@google.com> wrote:
>
>> Is the proposal to do this for both Beam Schema DATETIME fields as well
>> as for Beam timestamps in general?  The latter likely has a bunch of
>> downstream consequences for all runners.
>>
>> On Tue, Nov 6, 2018 at 12:38 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>>
>>> +1 to more precision even to the nano level, probably via Reuven's
>>> proposal of a different internal representation.
>>> On Tue, Nov 6, 2018 at 9:19 AM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>> >
>>> > +1 to offering more granular timestamps in general. I think it will be
>>> > odd if setting the element timestamp from a row DATETIME field is
>>> > lossy, so we should seriously consider upgrading that as well.
>>> > On Tue, Nov 6, 2018 at 6:42 AM Charles Chen <c...@google.com> wrote:
>>> > >
>>> > > One related issue that came up before is that we (perhaps
>>> unnecessarily) restrict the precision of timestamps in the Python SDK to
>>> milliseconds because of legacy reasons related to the Java runner's use of
>>> Joda time.  Perhaps Beam portability should natively use a more granular
>>> timestamp unit.
>>> > >
>>> > > On Mon, Nov 5, 2018 at 9:34 PM Rui Wang <ruw...@google.com> wrote:
>>> > >>
>>> > >> Thanks Reuven!
>>> > >>
>>> > >> I think Reuven gives the third option:
>>> > >>
>>> > >> Change internal representation of DATETIME field in Row. Still keep
>>> public ReadableDateTime getDateTime(String fieldName) API to be compatible
>>> with existing code. And I think we could add one more API to
>>> getDataTimeNanosecond. This option is different from the option one because
>>> option one actually maintains two implementation of time.
>>> > >>
>>> > >> -Rui
>>> > >>
>>> > >> On Mon, Nov 5, 2018 at 9:26 PM Reuven Lax <re...@google.com> wrote:
>>> > >>>
>>> > >>> I would vote that we change the internal representation of Row to
>>> something other than Joda. Java 8 times would give us at least
>>> microseconds, and if we want nanoseconds we could simply store it as a
>>> number.
>>> > >>>
>>> > >>> We should still keep accessor methods that return and take Joda
>>> objects, as the rest of Beam still depends on Joda.
>>> > >>>
>>> > >>> Reuven
>>> > >>>
>>> > >>> On Mon, Nov 5, 2018 at 9:21 PM Rui Wang <ruw...@google.com> wrote:
>>> > >>>>
>>> > >>>> Hi Community,
>>> > >>>>
>>> > >>>> The DATETIME field in Beam Schema/Row is implemented by Joda's
>>> Datetime (see Row.java#L611 and Row.java#L169). Joda's Datetime is limited
>>> to the precision of millisecond. It has good enough precision to represent
>>> timestamp of event time, but it is not enough for the real "time" data. For
>>> the "time" type data, we probably need to support even up to the precision
>>> of nanosecond.
>>> > >>>>
>>> > >>>> Unfortunately, Joda decided to keep the precision of millisecond:
>>> https://github.com/JodaOrg/joda-time/issues/139.
>>> > >>>>
>>> > >>>> If we want to support the precision of nanosecond, we could have
>>> two options:
>>> > >>>>
>>> > >>>> Option one: utilize current FieldType's metadata field, such that
>>> we could set something into meta data and Row could check the metadata to
>>> decide what's saved in DATETIME field: Joda's Datetime or an implementation
>>> that supports nanosecond.
>>> > >>>>
>>> > >>>> Option two: have another field (maybe called TIMESTAMP field?),
>>> to have an implementation to support higher precision of time.
>>> > >>>>
>>> > >>>> What do you think about the need of higher precision for time
>>> type and which option is preferred?
>>> > >>>>
>>> > >>>> -Rui
>>>
>>

Re: [DISCUSS] More precision supported by DATETIME field in Schema

Reply via email to