Added a few folks for visibility. On Fri, Nov 9, 2018 at 12:43 AM Robert Bradshaw <rober...@google.com> wrote:
> We *might* have a few bits left in the WindowedValue representation to > make this backwards compatible if we really wanted. > > The use of java.time.instant means that we won't be able to upgrade > (even in v3) our internal timestamps to match without either > internally supporting >64 bits of precision or limiting the date > range. But using the standard Java time does make a lot of sense. > On Fri, Nov 9, 2018 at 12:33 AM Rui Wang <ruw...@google.com> wrote: > > > > https://github.com/apache/beam/pull/6991 > > > > I am using java.time.instant as the internal representation to replace > Joda time for DateTime field in the PR. The java.time.instant uses a long > to save seconds-after-epoch and a int to save nanoseconds-of-second. > Therefore 64 bits are fully used for seconds-after-epoch, which loses > nothing. > > > > Comments are very welcome to this PR. > > > > -Rui > > > > On Wed, Nov 7, 2018 at 1:15 AM Reuven Lax <re...@google.com> wrote: > >> > >> As you said, this would be update incompatible across all streaming > pipelines. At the very least this would be a big problem for Dataflow > users, and I believe many Flink users as well. I'm not sure the benefit > here justifies causing problems for so many users. > >> > >> Reuven > >> > >> On Wed, Nov 7, 2018 at 4:56 PM Robert Bradshaw <rober...@google.com> > wrote: > >>> > >>> Yes, microseconds is a good compromise for covering a long enough > >>> timespan that there's little reason it could be hit (even for > >>> processing historical data). > >>> > >>> Regarding backwards compatibility, could we just change the internal > >>> representation of Beam's element timestamps, possibly with new APIs to > >>> access the finer granularity? (True, it may not be upgrade > >>> compatible.) > >>> On Tue, Nov 6, 2018 at 8:46 PM Reuven Lax <re...@google.com> wrote: > >>> > > >>> > The main difference (though possibly theoretical) is when time runs > out. With 64 bits and nanosecond precision, we can only represent times > about 244 years in the future (or the past). > >>> > > >>> > On Tue, Nov 6, 2018 at 11:30 AM Kenneth Knowles <k...@apache.org> > wrote: > >>> >> > >>> >> I like nanoseconds as extremely future-proof. What about specing > this out in stages (1) domain of values (2) portable encoding that can > represent those values (3) language-specific types to embed the values in. > >>> >> > >>> >> 1. If it is a nanosecond-precision absolute time, and we eventually > want to migrate event time timestamps to match, then we need values for > "end of global window" and "end of time". TBH I am not sure we need both of > these any more. We can either define a max on the nanosecond range or > create distinguished values. > >>> >> > >>> >> 2. For portability, presumably an order-preserving integer encoding > of nanoseconds since epoch with whatever tweaks to allow for representing > the end of time. It might be useful to find a way to allow multiple. Not > super useful at a particular version, but might have given us a migration > path. It would also allow experiments for performance. > >>> >> > >>> >> 3. We could probably find a way to keep user-facing API > compatibility here while increasing underlying precision at 1 and 2, but I > probably not worth it. A new Java type IMO addresses the lossiness issue > because a user would have to explicitly request truncation to assign to a > millis event time timestamp. > >>> >> > >>> >> Kenn > >>> >> > >>> >> On Tue, Nov 6, 2018 at 12:55 AM Charles Chen <c...@google.com> > wrote: > >>> >>> > >>> >>> Is the proposal to do this for both Beam Schema DATETIME fields as > well as for Beam timestamps in general? The latter likely has a bunch of > downstream consequences for all runners. > >>> >>> > >>> >>> On Tue, Nov 6, 2018 at 12:38 AM Ismaël Mejía <ieme...@gmail.com> > wrote: > >>> >>>> > >>> >>>> +1 to more precision even to the nano level, probably via Reuven's > >>> >>>> proposal of a different internal representation. > >>> >>>> On Tue, Nov 6, 2018 at 9:19 AM Robert Bradshaw < > rober...@google.com> wrote: > >>> >>>> > > >>> >>>> > +1 to offering more granular timestamps in general. I think it > will be > >>> >>>> > odd if setting the element timestamp from a row DATETIME field > is > >>> >>>> > lossy, so we should seriously consider upgrading that as well. > >>> >>>> > On Tue, Nov 6, 2018 at 6:42 AM Charles Chen <c...@google.com> > wrote: > >>> >>>> > > > >>> >>>> > > One related issue that came up before is that we (perhaps > unnecessarily) restrict the precision of timestamps in the Python SDK to > milliseconds because of legacy reasons related to the Java runner's use of > Joda time. Perhaps Beam portability should natively use a more granular > timestamp unit. > >>> >>>> > > > >>> >>>> > > On Mon, Nov 5, 2018 at 9:34 PM Rui Wang <ruw...@google.com> > wrote: > >>> >>>> > >> > >>> >>>> > >> Thanks Reuven! > >>> >>>> > >> > >>> >>>> > >> I think Reuven gives the third option: > >>> >>>> > >> > >>> >>>> > >> Change internal representation of DATETIME field in Row. > Still keep public ReadableDateTime getDateTime(String fieldName) API to be > compatible with existing code. And I think we could add one more API to > getDataTimeNanosecond. This option is different from the option one because > option one actually maintains two implementation of time. > >>> >>>> > >> > >>> >>>> > >> -Rui > >>> >>>> > >> > >>> >>>> > >> On Mon, Nov 5, 2018 at 9:26 PM Reuven Lax <re...@google.com> > wrote: > >>> >>>> > >>> > >>> >>>> > >>> I would vote that we change the internal representation of > Row to something other than Joda. Java 8 times would give us at least > microseconds, and if we want nanoseconds we could simply store it as a > number. > >>> >>>> > >>> > >>> >>>> > >>> We should still keep accessor methods that return and take > Joda objects, as the rest of Beam still depends on Joda. > >>> >>>> > >>> > >>> >>>> > >>> Reuven > >>> >>>> > >>> > >>> >>>> > >>> On Mon, Nov 5, 2018 at 9:21 PM Rui Wang <ruw...@google.com> > wrote: > >>> >>>> > >>>> > >>> >>>> > >>>> Hi Community, > >>> >>>> > >>>> > >>> >>>> > >>>> The DATETIME field in Beam Schema/Row is implemented by > Joda's Datetime (see Row.java#L611 and Row.java#L169). Joda's Datetime is > limited to the precision of millisecond. It has good enough precision to > represent timestamp of event time, but it is not enough for the real "time" > data. For the "time" type data, we probably need to support even up to the > precision of nanosecond. > >>> >>>> > >>>> > >>> >>>> > >>>> Unfortunately, Joda decided to keep the precision of > millisecond: https://github.com/JodaOrg/joda-time/issues/139. > >>> >>>> > >>>> > >>> >>>> > >>>> If we want to support the precision of nanosecond, we > could have two options: > >>> >>>> > >>>> > >>> >>>> > >>>> Option one: utilize current FieldType's metadata field, > such that we could set something into meta data and Row could check the > metadata to decide what's saved in DATETIME field: Joda's Datetime or an > implementation that supports nanosecond. > >>> >>>> > >>>> > >>> >>>> > >>>> Option two: have another field (maybe called TIMESTAMP > field?), to have an implementation to support higher precision of time. > >>> >>>> > >>>> > >>> >>>> > >>>> What do you think about the need of higher precision for > time type and which option is preferred? > >>> >>>> > >>>> > >>> >>>> > >>>> -Rui >