Thanks for the comments! > Can you give a concrete example of an query where you are proposing a change? create table tkudu (id int primary key, t timestamp) stored as kudu; insert into tkudu values (1,"1970-01-01 00:00:00.1111111"), -- all sub-second parts are 7 digit (2,"1970-01-01 23:59:59.9999999"), (3,"1969-12-31 23:59:59.9999999"); select * from tkudu;
This currently returns: 1,1970-01-01 00:00:00.111111000 2,1970-01-02 00:00:00 3,1970-01-01 00:00:00 1 was rounded down to microsec precision, while 2 and 3 were rounded up and also stepped to another way. With truncation towards negative infinity the query would return this: 1,1970-01-01 00:00:00.111111000 2,1970-01-01 23:59:59.999999000 3,1969-12-31 23:59:59.999999000 So 1 would be the same, and 2 and 3 would be truncated from 7 to 6 digits and would not step to a new day. My goal is not to change how Impala writes Kudu, but to get to a consensus before going forward with writing Parquet milli/micro timestamps. ( https://gerrit.cloudera.org/#/c/12247/ ) >That said, Oracle, Netezza, Vertica, and Postgres all round. Db2 truncates. Thanks, good to know! So it looks like that rounding is more popular, but consistency with Hive may be more important in Impala's case. On Thu, Jan 24, 2019 at 7:59 PM Greg Rahn <[email protected]> wrote: > For things like this the ISO SQL spec states "the choice of whether to > round or truncate is implementation-defined". That said, Oracle, Netezza, > Vertica, and Postgres all round. Db2 truncates. > > On Wed, Jan 23, 2019 at 12:26 PM Csaba Ringhofer <[email protected] > > > wrote: > > > Timestamps are often represented as ticks since some epoch, e.g. > 1970.01.01 > > 00:00:00, so negative timestamps make sense as times before the epoch - I > > meant rounding vs truncating towards 0 vs rounding towards negative > > infinite in this sense. Truncating towards negative infinity means that > > timestamps are always truncated to an earlier timestamp. Truncating > towards > > 0 would mean that before 1970, timestamps are truncated upwards, which > can > > lead to similar troubles as the I ones mentioned with rounding. On x86 > c++, > > when a time_t is divided by an integer, the result is rounded towards 0, > so > > a naive implementation that uses time_t to represent timestamps can > > truncate towards 0, but In impala::TimestampValue, time_ should be the > > non-negative nanoseconds since midnight, so it can be simply divided with > > 1000 to convert from nanoseconds to microseconds. > > >
