Re: Storing RexLiteral as BigDecimal values

Julian Hyde Wed, 31 Jan 2024 15:46:38 -0800

Mihai,

I accept your point that literals may have values that are binary floating 
point. (As we discussed, they are impossible to create via SQL, but may be 
created via constant reduction.)


But I still have a concern that developers will accidentally create numeric 
literals with double values. Due to how Java rounds binary floating point 
values when it prints them, even with exhaustive testing we may not notice that 
literals are not exact. How to we protect against this?

One idea is to create a class BigBinary (as BigDecimal, but exponents that are 
powers of two rather than of ten, and also allowing IEEE special values NaN, 
+Inf, -Inf, -0). Internally it could contain a double, or be extended to 
support binary floating point values with more than 64 bits. 

Julian




> On Jan 30, 2024, at 4:13 PM, Mihai Budiu <mbu...@gmail.com> wrote:
> 
> Highly relevant discussion: https://issues.apache.org/jira/browse/CALCITE-2067
> 
> This change would indeed enable representing other DOUBLE values which cannot 
> be represented by BigDecimal, such as NaN, Infinities, and -0. It looks like 
> the evaluator does not reduce such expressions.
> 
> That issue also had a PR, which was rejected at the time, but I think a new 
> PR would look relatively similar. Julian's objection at the time was that 
> this changes the SQL language; however, this doesn't change the syntax of the 
> language, just the representation that the evaluator uses internally. This 
> would enable changes in the syntax too, but they can be done separately if 
> desired.
> 
> Mihai
> 
> ________________________________
> From: Mihai Budiu <mbu...@gmail.com>
> Sent: Tuesday, January 30, 2024 4:00 PM
> To: dev@calcite.apache.org <dev@calcite.apache.org>
> Subject: Re: Storing RexLiteral as BigDecimal values
> 
> 
>  *
> Regarding [1], it looks like the power function implementations with DECIMAL 
> arguments return DOUBLE. Where the BigDecimal Java type has custom functions 
> that have potentially higher accuracy than FP versions, using them could be 
> beneficial. Related to this, Calcite uses POWER to represent SQRT, but I 
> noticed that SQRT gives more precise results. However, the Java 
> implementation of Math.pow is very careful, and seems to be optimized for the 
> case of integer arguments: 
> https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#pow-double-double-
> 
> 
>  *   This is what POWER looks like in the SqlStdOperatorTable, which suggests 
> that even the type checker assumes that this function always returns DOUBLE:
> 
> /** The {@code POWER(numeric, numeric)} function.
>   *
>   * <p>The return type is always {@code DOUBLE} since we don't know
>   * what the result type will be by just looking at the operand types. For
>   * example {@code POWER(INTEGER, INTEGER)} can return a non-INTEGER if the
>   * second operand is negative.
>   */
>  public static final SqlBasicFunction POWER =
>      SqlBasicFunction.create("POWER",
>          ReturnTypes.DOUBLE_NULLABLE,
>          OperandTypes.NUMERIC_NUMERIC,
>          SqlFunctionCategory.NUMERIC);
> 
> 
>  *   Using a string representation for double literals, as you propose, still 
> introduces rounding problems, when the Double value is converted to a string. 
> So I think this will just replace one problem with another.
> 
> 
>  *
> Finally, regarding the subject of this email, the right solution seems to 
> allow RexLiteral with a DOUBLE type (and similarly for REAL) to store values 
> either as BigDecimal, which is suitable when the literals come from the 
> parser, or as Double, which is suitable when values come from the evaluator. 
> I plan to file a JIRA issue to enable this feature.
> 
> Thank you,
> Mihai
> 
> ________________________________
> From: Julian Hyde <jhyde.apa...@gmail.com>
> Sent: Tuesday, January 30, 2024 3:42 PM
> To: dev@calcite.apache.org <dev@calcite.apache.org>
> Subject: Re: Storing RexLiteral as BigDecimal values
> 
> In the case of POWER, Postgres has both a decimal and double version[1], and 
> Calcite should do the same.
> 
> However, I take your point that if the function returns a double, then 
> constant-reduction should create a double literal.
> 
> I am concerned that developers will accidentally create double literals, when 
> double literals are not possible in SQL (until we implement DOUBLE ‘3.14’ 
> syntax, as discussed in [2]). Until then, is it better to store the Rex 
> equivalent of CAST(‘3.14’ AS DOUBLE)?
> 
> Julian
> 
> [1] https://www.postgresql.org/docs/8.2/functions-math.html
> [2] https://issues.apache.org/jira/browse/CALCITE-6052
> 
>> On Jan 30, 2024, at 3:21 PM, Mihai Budiu <mbu...@gmail.com> wrote:
>> 
>> Well, irrespective of the type of the arguments, the POWER function returns 
>> a double value.
>> 
>> These are the 4 implementations in SqlFunctions:
>> 
>> public static double power(double b0, double b1) {
>>   return Math.pow(b0, b1);
>> }
>> 
>> public static double power(double b0, BigDecimal b1) {
>>   return Math.pow(b0, b1.doubleValue());
>> }
>> 
>> public static double power(BigDecimal b0, double b1) {
>>   return Math.pow(b0.doubleValue(), b1);
>> }
>> 
>> public static double power(BigDecimal b0, BigDecimal b1) {
>>   return Math.pow(b0.doubleValue(), b1.doubleValue());
>> }
>> 
>> The problem is that, for the query below, the optimizer will then convert 
>> the double result of power into a BigDecimal, rounding it in the process.
>> 
>> Mihai
>> ________________________________
>> From: Julian Hyde <jhyde.apa...@gmail.com>
>> Sent: Tuesday, January 30, 2024 3:03 PM
>> To: dev@calcite.apache.org <dev@calcite.apache.org>
>> Subject: Re: Storing RexLiteral as BigDecimal values
>> 
>> It’s surprising to me that 1004.3e0 should have type DOUBLE or REAL; I would 
>> expect it to have type DECIMAL(5, 1), where it can be represented exactly.
>> 
>>> On Jan 30, 2024, at 2:53 PM, Mihai Budiu <mbu...@gmail.com> wrote:
>>> 
>>> e0 shows that these are DOUBLE values.
>>> Moreover, power returns a DOUBLE value.
>>> In FP the result is the wrong one, but that's the semantics of the power 
>>> function in FP.
>>> 
>>> Mihai
>>> ________________________________
>>> From: Julian Hyde <jhyde.apa...@gmail.com>
>>> Sent: Tuesday, January 30, 2024 2:50 PM
>>> To: dev@calcite.apache.org <dev@calcite.apache.org>
>>> Subject: Re: Storing RexLiteral as BigDecimal values
>>> 
>>> The inputs are decimals, and the correct answer is 1008618.49, also a 
>>> decimal, and cannot be exactly represented as a binary floating point. I’m 
>>> not sure why in this case you want a binary floating point.
>>> 
>>>> On Jan 30, 2024, at 2:46 PM, Mihai Budiu <mbu...@gmail.com> wrote:
>>>> 
>>>> I am evaluating this expression: SELECT power(1004.3e0, 2e0)
>>>> The result in Java, or Postgres, when formatted as a string, is 
>>>> 1008618.4899999999
>>>> The result produced by the Calcite simplification code is 1008618.49
>>>> The simplification code can produce RexLiterals - that's where this would 
>>>> be useful.
>>>> This rounding error is not really necessary.
>>>> 
>>>> Mihai
>>>> ________________________________
>>>> From: Julian Hyde <jhyde.apa...@gmail.com>
>>>> Sent: Tuesday, January 30, 2024 2:40 PM
>>>> To: dev@calcite.apache.org <dev@calcite.apache.org>
>>>> Subject: Re: Storing RexLiteral as BigDecimal values
>>>> 
>>>> Can you give a scenario where a RexLiteral should have a double value?
>>>> 
>>>>> On Jan 30, 2024, at 2:36 PM, Mihai Budiu <mbu...@gmail.com> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I have a question about the representation of RexLiteral values.
>>>>> Currently DOUBLE-valued literals are represented using a BigDecimal.
>>>>> This causes small rounding errors, introduced in the RexBuilder.clean() 
>>>>> function.
>>>>> This causes FP expressions that are evaluated at compilation-time to 
>>>>> produce results that are slightly off from the same expressions that may 
>>>>> be evaluated at runtime, for no real reason. For example, I am running 
>>>>> some Postgres tests for FP values, and they fail because of this small 
>>>>> difference.
>>>>> 
>>>>> I know that FP values cannot be compared for equality, and tests are 
>>>>> supposed to have some slack, but I think that this particular rounding 
>>>>> error is not necessary.
>>>>> 
>>>>> Why can't RexLiteral actually store a Double value internally when the 
>>>>> type is Double?
>>>>> Is this a bug or is there a deeper reason for this representation?
>>>>> If it's a bug I can file a JIRA issue and probably fix it.
>>>>> 
>>>>> Thank you,
>>>>> Mihai
>>>> 
>>> 
>> 
>

Re: Storing RexLiteral as BigDecimal values

Reply via email to