My intuition is that this would not be as easy as decimal_v2 to put behind a flag. Is that right?
On Wed, Nov 1, 2017 at 2:26 PM, Greg Rahn <greg.r...@gmail.com> wrote: > It might not be that simple. > > One side-effect of the return type promotion of EXTRACT and DATE_PART() is > that INSERTs that use either could potentially fail after this change > simply because of the type mismatch and the fact that Impala does not do > implicit type casting even if the value could fit. > > For example: > create table t(i int); -- assume this is a destination for any EXTRACT or > DATE_PART() value > insert into t values( cast(10 as bigint)); -- emulating BIGINT return type > ERROR: Expression 'cast(10 as bigint)' (type: BIGINT) would need to be cast > to INT for column 'I' > > The solution here is an obvious CAST because w/o the nanosecond support, no > values can overflow INT, but that still requires changes from end users. > > Similarly with #2, there is a value change. Arguably a good change, but a > change nonetheless. Again, there is a trivial solution to provide an > equivalent expression that returns the legacy value, but again, a change. > > Based on this I would suggest we apply this change to the 3.0 branch and > make sure we call out both behavioral changes in the 3.0 docs. > > > > On Tue, Oct 31, 2017 at 2:48 PM, Zachary Amsden <zams...@cloudera.com> > wrote: > >> In discussion about IMPALA-5607, it came up that implementing this as >> described is a compatibility breaking change. There are a couple of >> required changes: >> >> 1) Return data type for date_part and EXTRACT FROM must be promoted to >> BIGINT. >> 2) MILLISECONDS now will include seconds part in the calculation of >> milliseconds. >> >> I think the first is a non-issue; we're promoting the type to be wide >> enough to hold nanoseconds precision values (with the seconds component >> included). An alternative could be to return this as a decimal type, but >> that seems rather unwieldy for other date expressions so I'd prefer these >> values to all be returned as integral types. >> >> The bigger issue is including seconds in the calculation of milliseconds, >> microseconds and nanoseconds breaks the existing value returned for >> milliseconds, which is just bare milliseconds with no seconds component. >> For compatibility with other SQL implementations, I think we'd like to >> include seconds with all of these date parts, but that is certainly >> debatable. >> >> The question then, is anyone relying on this functionality that can't >> easily workaround such a change? The Impala documentation doesn't specify >> this behavior either way, and there isn't a formal specification for how >> sub-second granularity time is handled. Whatever we decide, we should >> document this going forward. >> >> - Zach >>