[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-09-06 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762533#comment-17762533
 ] 

Julian Hyde commented on CALCITE-5919:
--

[~mbudiu], But also: create links. Click the 'More' menu, then 'Link', and 'is 
related to'.

> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-09-06 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762531#comment-17762531
 ] 

Julian Hyde commented on CALCITE-5919:
--

Please make Jira links, so the links can be found from the other end. Also, 
just paste the case id, e.g. CALCITE-3529, so that it will change when the 
referenced case is closed.

> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-09-06 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762530#comment-17762530
 ] 

Mihai Budiu commented on CALCITE-5919:
--

This is also related to https://issues.apache.org/jira/browse/CALCITE-3529

> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-09-06 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762528#comment-17762528
 ] 

Mihai Budiu commented on CALCITE-5919:
--

This is related to https://issues.apache.org/jira/browse/CALCITE-3530


> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-08-21 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757156#comment-17757156
 ] 

Julian Hyde commented on CALCITE-5919:
--

The default implementation of `TIME` should remain `int` (and 
`java.lang.Integer`).

Based on some criteria - say, a `TIME` data type that requires more than about 
31 bits of precision (9 decimal digits) - then we could switch to another 
implementation. I'm fine for that alternative implementation to use 
`BigDecimal`.

> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-08-17 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755798#comment-17755798
 ] 

Mihai Budiu commented on CALCITE-5919:
--

Let me rephrase my question: 

Is it acceptable to change JavaTypeFactoryImpl.getJavaClass to return 
BigDecimal instead of int for TIME?

This change will probably affect not only many Calcite uses, but possibly many 
other programs that link to Calcite. But this change seems the only way to 
implement this feature.

> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-08-11 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753443#comment-17753443
 ] 

Mihai Budiu commented on CALCITE-5919:
--

[~julianhyde]

The problem is that there is a lot of code already in Calcite which assumes 
that time values are represented as int values encoding milliseconds since 
midnight. This choice is made in JavaTypeFactoryImpl, and it looks to me like 
both compile-time optimizations and the runtime depend on this choice. 

Here is an example from SqlFunctions:

{code:java}
/** SQL {@code DATETIME(DATE, TIME)} function; returns a Calcite TIMESTAMP. */
  public static long datetime(int daysSinceEpoch, int millisSinceMidnight) {
// BigQuery's DATETIME function returns a Calcite TIMESTAMP,
// represented internally as milliseconds since epoch UTC.
return daysSinceEpoch * DateTimeUtils.MILLIS_PER_DAY + millisSinceMidnight;
  }
{code}

This function assumes that a time value is represented as an integer which 
contains the millisSinceMidnight. There may be hundreds such locations in the 
code. 

Even more insidious, there is code in RexImpTable 
ExtractImplementor.implementSafe which looks like this: 
{code:java}
case MILLISECOND:
  case MICROSECOND:
  case NANOSECOND:
if (sqlTypeName == SqlTypeName.DATE) {
  return Expressions.constant(0L);
}
operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
!isIntervalType);
return Expressions.multiply(
operand, Expressions.constant((long) (1 / 
unit.multiplier.doubleValue(;
{code}

This code implicitly assumes that time values can be arguments to functions 
like "mod" and "multiply", which is not true for BigDecimal values.

This is a failure of abstraction: the fact that TIME values are represented as 
'int' values in the compiler has leaked in many places.

The problem would be slightly more tractable if the compile-time and run-time 
representations were independent, as you suggest, but I am not sure that is the 
case.

I may make an effort to see if this particular change is tractable, but if it's 
too much work I will give up. 
The only alternative solution I can foresee is to disallow precision >= 3 for 
time values, because otherwise some optimizations produce wrong results.



(There is at least one other bug here, it shouldn't do a floating point 
division when it could do a precise division, but forget about that.)


> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-08-11 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753382#comment-17753382
 ] 

Julian Hyde commented on CALCITE-5919:
--

Good one. It seems to me that we should use exact arbitrary precision 
arithmetic (i.e. BigDecimal) throughout the compilation process. That way, we 
would at least not lose the least-significant digits of constants if they 
happened to be smaller than milliseconds.

Extending the precision beyond 32 bits would, of course, be a separate task, 
moving away from {{int}} as the runtime representation.

> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-08-11 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753346#comment-17753346
 ] 

Mihai Budiu commented on CALCITE-5919:
--

This one may be tricky to fix.
I notice that the JavaTypeFactoryImpl makes this choice about computing with 
TIME types:

{code}
 @Override public Type getJavaClass(RelDataType type) {
if (type instanceof JavaType) {
  JavaType javaType = (JavaType) type;
  return javaType.getJavaClass();
}
if (type instanceof BasicSqlType || type instanceof IntervalSqlType) {
  switch (type.getSqlTypeName()) {
  case TIME:
return type.isNullable() ? Integer.class : int.class;
{code}

So this limits the precision of the Time values that can be represented in Java 
to fit in an integer. So the evaluator cannot represent microsecond or 
nanosecond precision values even if the type system would require them. I will 
try to change this to Long, but I am afraid this may break other stuff.

> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5919) Compile-time implementation of EXTRACT ignores sub-millisecond values

2023-08-11 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753345#comment-17753345
 ] 

Mihai Budiu commented on CALCITE-5919:
--

Here is a test which reproduces this problem when added to RelOptRulesTest:
{code:Java}
  @Test void testExtractMicroseconds() {
RelDataTypeSystemImpl typeSystem = new RelDataTypeSystemImpl() {
  @Override
  public int getMaxPrecision(SqlTypeName typeName) {
if (typeName.equals(SqlTypeName.TIME))
  return 6;
return super.getMaxPrecision(typeName);
  }
};
final String sql = "select extract(microsecond from time 
'00:00:00.123456')";
sql(sql)
.withFactory(f -> f.withTypeSystem(ignore -> typeSystem))
.withRule(CoreRules.PROJECT_REDUCE_EXPRESSIONS)
.check();
  }
{code}

The plans before is:
{code}
LogicalProject(EXPR$0=[EXTRACT(FLAG(MICROSECOND), 00:00:00.123456:TIME(6))])
  LogicalValues(tuples=[[{ 0 }]])
{code}

The plan after is:
{code}
LogicalProject(EXPR$0=[123000:BIGINT])
  LogicalValues(tuples=[[{ 0 }]])
{code}

> Compile-time implementation of EXTRACT ignores sub-millisecond values
> -
>
> Key: CALCITE-5919
> URL: https://issues.apache.org/jira/browse/CALCITE-5919
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> When enabling the optimization rule PROJECT_REDUCE_EXPRESSIONS the 
> compile-time evaluation of an expression like EXTRACT(MICROSECONDS FROM TIME 
> '13:30:25.575401') will produce a result up to milliseconds only, 
> irrespective of the type system provided. (This query will evaluate to 
> 25575000 instead of 25575401).
> The bug is in RexImpTable.ExtractImplementor, here:
> {code:java}
> case MILLISECOND:
>   case MICROSECOND:
>   case NANOSECOND:
> if (sqlTypeName == SqlTypeName.DATE) {
>   return Expressions.constant(0L);
> }
> operand = mod(operand, TimeUnit.MINUTE.multiplier.longValue(), 
> !isIntervalType); // << BUG
> return Expressions.multiply(
> operand, Expressions.constant((long) (1 / 
> unit.multiplier.doubleValue(;
> {code}
> The mod operation uses a multiplier for MINUTE that is expressed in 
> milliseconds, so it always truncates away values below milliseconds. The 
> multiplier seems to assume that the type system precision for TIME is set to 
> 3, which is the default value, but may be overridden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)