Added a few folks for visibility.

On Fri, Nov 9, 2018 at 12:43 AM Robert Bradshaw <rober...@google.com> wrote:

> We *might* have a few bits left in the WindowedValue representation to
> make this backwards compatible if we really wanted.
>
> The use of java.time.instant means that we won't be able to upgrade
> (even in v3) our internal timestamps to match without either
> internally supporting >64 bits of precision or limiting the date
> range. But using the standard Java time does make a lot of sense.
> On Fri, Nov 9, 2018 at 12:33 AM Rui Wang <ruw...@google.com> wrote:
> >
> > https://github.com/apache/beam/pull/6991
> >
> > I am using java.time.instant as the internal representation to replace
> Joda time for DateTime field in the PR. The java.time.instant uses a long
> to save seconds-after-epoch and a int to save nanoseconds-of-second.
> Therefore 64 bits are fully used for seconds-after-epoch, which loses
> nothing.
> >
> > Comments are very welcome to this PR.
> >
> > -Rui
> >
> > On Wed, Nov 7, 2018 at 1:15 AM Reuven Lax <re...@google.com> wrote:
> >>
> >> As you said, this would be update incompatible across all streaming
> pipelines. At the very least this would be a big problem for Dataflow
> users, and I believe many Flink users as well. I'm not sure the benefit
> here justifies causing problems for so many users.
> >>
> >> Reuven
> >>
> >> On Wed, Nov 7, 2018 at 4:56 PM Robert Bradshaw <rober...@google.com>
> wrote:
> >>>
> >>> Yes, microseconds is a good compromise for covering a long enough
> >>> timespan that there's little reason it could be hit (even for
> >>> processing historical data).
> >>>
> >>> Regarding backwards compatibility, could we just change the internal
> >>> representation of Beam's element timestamps, possibly with new APIs to
> >>> access the finer granularity? (True, it may not be upgrade
> >>> compatible.)
> >>> On Tue, Nov 6, 2018 at 8:46 PM Reuven Lax <re...@google.com> wrote:
> >>> >
> >>> > The main difference (though possibly theoretical) is when time runs
> out. With 64 bits and nanosecond precision, we can only represent times
> about 244 years in the future (or the past).
> >>> >
> >>> > On Tue, Nov 6, 2018 at 11:30 AM Kenneth Knowles <k...@apache.org>
> wrote:
> >>> >>
> >>> >> I like nanoseconds as extremely future-proof. What about specing
> this out in stages (1) domain of values (2) portable encoding that can
> represent those values (3) language-specific types to embed the values in.
> >>> >>
> >>> >> 1. If it is a nanosecond-precision absolute time, and we eventually
> want to migrate event time timestamps to match, then we need values for
> "end of global window" and "end of time". TBH I am not sure we need both of
> these any more. We can either define a max on the nanosecond range or
> create distinguished values.
> >>> >>
> >>> >> 2. For portability, presumably an order-preserving integer encoding
> of nanoseconds since epoch with whatever tweaks to allow for representing
> the end of time. It might be useful to find a way to allow multiple. Not
> super useful at a particular version, but might have given us a migration
> path. It would also allow experiments for performance.
> >>> >>
> >>> >> 3. We could probably find a way to keep user-facing API
> compatibility here while increasing underlying precision at 1 and 2, but I
> probably not worth it. A new Java type IMO addresses the lossiness issue
> because a user would have to explicitly request truncation to assign to a
> millis event time timestamp.
> >>> >>
> >>> >> Kenn
> >>> >>
> >>> >> On Tue, Nov 6, 2018 at 12:55 AM Charles Chen <c...@google.com>
> wrote:
> >>> >>>
> >>> >>> Is the proposal to do this for both Beam Schema DATETIME fields as
> well as for Beam timestamps in general?  The latter likely has a bunch of
> downstream consequences for all runners.
> >>> >>>
> >>> >>> On Tue, Nov 6, 2018 at 12:38 AM Ismaël Mejía <ieme...@gmail.com>
> wrote:
> >>> >>>>
> >>> >>>> +1 to more precision even to the nano level, probably via Reuven's
> >>> >>>> proposal of a different internal representation.
> >>> >>>> On Tue, Nov 6, 2018 at 9:19 AM Robert Bradshaw <
> rober...@google.com> wrote:
> >>> >>>> >
> >>> >>>> > +1 to offering more granular timestamps in general. I think it
> will be
> >>> >>>> > odd if setting the element timestamp from a row DATETIME field
> is
> >>> >>>> > lossy, so we should seriously consider upgrading that as well.
> >>> >>>> > On Tue, Nov 6, 2018 at 6:42 AM Charles Chen <c...@google.com>
> wrote:
> >>> >>>> > >
> >>> >>>> > > One related issue that came up before is that we (perhaps
> unnecessarily) restrict the precision of timestamps in the Python SDK to
> milliseconds because of legacy reasons related to the Java runner's use of
> Joda time.  Perhaps Beam portability should natively use a more granular
> timestamp unit.
> >>> >>>> > >
> >>> >>>> > > On Mon, Nov 5, 2018 at 9:34 PM Rui Wang <ruw...@google.com>
> wrote:
> >>> >>>> > >>
> >>> >>>> > >> Thanks Reuven!
> >>> >>>> > >>
> >>> >>>> > >> I think Reuven gives the third option:
> >>> >>>> > >>
> >>> >>>> > >> Change internal representation of DATETIME field in Row.
> Still keep public ReadableDateTime getDateTime(String fieldName) API to be
> compatible with existing code. And I think we could add one more API to
> getDataTimeNanosecond. This option is different from the option one because
> option one actually maintains two implementation of time.
> >>> >>>> > >>
> >>> >>>> > >> -Rui
> >>> >>>> > >>
> >>> >>>> > >> On Mon, Nov 5, 2018 at 9:26 PM Reuven Lax <re...@google.com>
> wrote:
> >>> >>>> > >>>
> >>> >>>> > >>> I would vote that we change the internal representation of
> Row to something other than Joda. Java 8 times would give us at least
> microseconds, and if we want nanoseconds we could simply store it as a
> number.
> >>> >>>> > >>>
> >>> >>>> > >>> We should still keep accessor methods that return and take
> Joda objects, as the rest of Beam still depends on Joda.
> >>> >>>> > >>>
> >>> >>>> > >>> Reuven
> >>> >>>> > >>>
> >>> >>>> > >>> On Mon, Nov 5, 2018 at 9:21 PM Rui Wang <ruw...@google.com>
> wrote:
> >>> >>>> > >>>>
> >>> >>>> > >>>> Hi Community,
> >>> >>>> > >>>>
> >>> >>>> > >>>> The DATETIME field in Beam Schema/Row is implemented by
> Joda's Datetime (see Row.java#L611 and Row.java#L169). Joda's Datetime is
> limited to the precision of millisecond. It has good enough precision to
> represent timestamp of event time, but it is not enough for the real "time"
> data. For the "time" type data, we probably need to support even up to the
> precision of nanosecond.
> >>> >>>> > >>>>
> >>> >>>> > >>>> Unfortunately, Joda decided to keep the precision of
> millisecond: https://github.com/JodaOrg/joda-time/issues/139.
> >>> >>>> > >>>>
> >>> >>>> > >>>> If we want to support the precision of nanosecond, we
> could have two options:
> >>> >>>> > >>>>
> >>> >>>> > >>>> Option one: utilize current FieldType's metadata field,
> such that we could set something into meta data and Row could check the
> metadata to decide what's saved in DATETIME field: Joda's Datetime or an
> implementation that supports nanosecond.
> >>> >>>> > >>>>
> >>> >>>> > >>>> Option two: have another field (maybe called TIMESTAMP
> field?), to have an implementation to support higher precision of time.
> >>> >>>> > >>>>
> >>> >>>> > >>>> What do you think about the need of higher precision for
> time type and which option is preferred?
> >>> >>>> > >>>>
> >>> >>>> > >>>> -Rui
>

Reply via email to