RE: DPath arithmetic conversions and overflow/overflow

Sood, Harinder Fri, 17 Oct 2025 23:01:37 -0700

If everything is int32 then going across platform boundaries makes it less 
prone to endian issues.


Sincerely,
 Harinder Sood


 Senior Program Manager
  [email protected]
  240 805 4219
  owlcyberdefense.com

The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately.

-----Original Message-----
From: Steve Lawrence <[email protected]> 
Sent: Monday, October 6, 2025 9:01 AM
To: [email protected]
Subject: Re: DPath arithmetic conversions and overflow/overflow

Makes sense. I took a look at what Saxon's XPath implementation does and it 
looks like they promote things to longs for arithmetic, even int arithamtic. So 
the likelihood of arithmetic overflow is pretty low.

This feels like the right approach and is similar to what you suggest, just 
promoting to long instead of int. And I imagine any performance differences 
between long and int is probably minimal on modern systems.


On 2025-10-03 01:23 PM, Mike Beckerle wrote:
> I also would be hesitant to cast every to to xs:integer, since our 
> implementation backs that with java.math.BigInteger. I would guess 
> there's a performance hit from switching to primitive types to 
> BigInteger. Not sure if that would be enough to notice though, 
> especially since DPath epressions aren't usually that common.
> 
> I agree promoting everything to BigInteger has performance 
> implications I don't like.
> These are all boxed numbers inside Daffodil, but still a BigInteger is 
> more expensive than a boxed primitive number.
> 
> 
> There's also the consideration that if we cast everything to 
> xs:integer then we
> 
> still need to downcast to the expected resulting type, e.g.:
> 
> 
> <element name="foo" type="xs:short"
> 
>     dfdl:inputValueCalc="{ ../short * ../short }" />
> 
> 
> We could add an implicit downcast to the result of the expression, and 
> maybe
> 
> overflow is just considered an error in that case?
> 
> Whether we convert to xs:integer or xs:int (Java style) or do 
> promotion to next bigger size (int * int => long) (byte * byte => 
> short) you'd still need to insert a downcast in this situation of 
> short * short result type going into an element of type short.
> 
> If we insert that automatically, then that would be compatible with 
> behavior today. If that downcast causes a runtime error, that's expected.
> 
> The change in behavior would be intermediate results inside an expression.
> Ex: an expression like a * b + c and they are all shorts, and a * b 
> overflows a short, that is incorrect behavior. We really do want a * b 
> to create an int, which is an incompatible change in behavior for the 
> case where a * b causes overflow, or the "+ c" causes overflow.
> 
> Though as incompatibilities go, I expect this one is very rarely hit.
> 
> 
> 
> 
> 
> 
> 
> On Thu, Oct 2, 2025 at 8:54 AM Steve Lawrence <[email protected]> wrote:
> 
>> I couldn't find that phrasing about casting to an xs:integer in the spec.
>> Maybe
>> AI hallucinated?
>>
>> I did find this in the spec in Section B.1:
>>
>>> Note that type promotion is different from subtype substitution. For
>> example:
>>>
>>>      A function that expects a parameter $p of type xs:float can be
>> invoked with a value of type xs:decimal. This is an example of type 
>> promotion. The value is actually converted to the expected type. 
>> Within the body of the function, $p instance of xs:decimal returns false.
>>>
>>>      A function that expects a parameter $p of type xs:decimal can 
>>> be
>> invoked with a value of type xs:integer. This is an example of 
>> subtype substitution. The value retains its original type. Within the 
>> body of the function, $p instance of xs:integer returns true.
>>
>> And here's the definition of subtype substitution:
>>
>>> [Definition: The use of a value whose dynamic type is derived from 
>>> an
>> expected type is known as subtype substitution.] Subtype substitution 
>> does not change the actual type of a value. For example, if an 
>> xs:integer value is used where an xs:decimal value is expected, the 
>> value retains its type as xs:integer.
>>
>> In the case of an xs:short being passed into a function that expects 
>> an xs:integer, that sounds like it would just be subtype 
>> substitution, so we would not cast the xs:short to an xs:integer, and 
>> inside the function the type is treated as an xs:short. But the spec 
>> isn't clear to me if that implies the result is also an xs:short or 
>> if that is cast to something. It feels like keeping it a short as 
>> very likely to run into overflow/underflow.
>>
>> I also would be hesitant to cast every to to xs:integer, since our 
>> implementation backs that with java.math.BigInteger. I would guess 
>> there's a performance hit from switching to primitive types to 
>> BigInteger. Not sure if that would be enough to notice though, 
>> especially since DPath epressions aren't usually that common.
>>
>> There's also the consideration that if we cast everything to 
>> xs:integer then we still need to downcast to the expected resulting 
>> type, e.g.:
>>
>> <element name="foo" type="xs:short"
>>     dfdl:inputValueCalc="{ ../short * ../short }" />
>>
>> We could add an implicit downcast to the result of the expression, 
>> and maybe overflow is just considered an error in that case?
>>
>>
>>
>> On 2025-10-01 05:17 PM, Mike Beckerle wrote:
>>> Ok, I looked at this and got some AI coaching....
>>>
>>> The phrase in the XPath spec says:
>>>
>>>     "If both operands are of type xs:integer or are derived from
>> xs:integer,
>>> then the operands are cast to xs:integer and the result is an
>> xs:integer."
>>>
>>> This is explicit about operands being derived from xs:integer in 
>>> that
>> part,
>>> but when it says they are cast, it doesn't qualify that in any way, 
>>> so I think the right interpretation of this is that they are cast to 
>>> exactly
>> the
>>> xs:integer type.
>>>
>>> ChatGPT agrees:  " XPath and XQuery .. deliberately avoid 
>>> proliferating narrow integer subtypes in arithmetic results. 
>>> Instead, the specification
>>> says:
>>>
>>>      -
>>>
>>>      For + - * div idiv mod, if both operands are subtypes of xs:integer,
>>>      they are *promoted to xs:integer*, not kept at the narrower type.
>>>      -
>>>
>>>      That way, all arithmetic on integer subtypes collapses to
>> xs:integer."
>>>
>>> -
>>> - Now, admittedly, I wrote a bunch of that code, and my thought 
>>> would not have been to do that lazy thing of just casting everything to xs:
>> integer.
>>> - Rather I would have wanted promotion to have been to the least 
>>> common supertype for addition and multiplicatoin, and promotion to 
>>> just the
>> larger
>>> of the two arg types for division and subtraction of unsigned types.
>>> (Subtraction of signed types has to be treated like addition).
>>> -
>>> - So probably if we just did this promotion right the problem 
>>> wouldn't occur.
>>> -
>>> - Certainly having short + short create short is a bug.
>>> -
>>> - I am wondering if I made the mistake of taking *least** upper 
>>> bound* of the arg types, not least common supertype. The least upper 
>>> bound of X
>> and X
>>> is, well X.
>>> -
>>> -
>>>
>>> On Wed, Oct 1, 2025 at 2:18 PM Steve Lawrence <[email protected]>
>> wrote:
>>>
>>>> I'm trying to fix https://issues.apache.org/jira/browse/DAFFODIL-2574.
>>>> The core
>>>> issue is that Java arithmetic operations return Int, even if for 
>>>> example you are adding two Shorts. Our DPath implementation doesn't 
>>>> expect that, and assumes xs:short + xs:short always result in an 
>>>> xs:short, that way it knows all the types at compile time and can 
>>>> put in appropriate conversions.
>>>>
>>>> I was looking through the Xpath/XQuery spec to figure what the 
>>>> corret behavior is, and it feels kindof ambiguous. It defines 
>>>> arithmetic functions like:
>>>>
>>>> op:numeric-add($arg1 as numeric, $arg2 as numeric) as numeric
>>>>
>>>> But it doesn't really say what the resulting numeric should be. It
>> really
>>>> just says
>>>>
>>>> op:operation(xs:integer, xs:integer)
>>>>
>>>> should return "xs:integer", but it's not completely clear if that's
>> saying
>>>> the
>>>> result should be promoted to an xs:integer, or the result just 
>>>> should derive xs:integer. The later is my interpretation, 
>>>> suggesting we should not promote, and I think is what DPath 
>>>> intends.
>>>>
>>>> But that then has issues with underflow/overflow--what happens when 
>>>> a short + short doesn't fit into a short. Do we promote to an int? 
>>>> Do we error.
>> The
>>>> spec
>>>> does say this regarding overflow underflow:
>>>>
>>>> For xs:integer operations, implementations that support
>> limited-precision
>>>> integer operations ·must· select from the following options:
>>>> They ·may· choose to always raise an error [err:FOAR0002].
>>>> They ·may· provide an ·implementation-defined· mechanism that 
>>>> allows
>> users
>>>> to
>>>> choose between raising an error and returning a result that is 
>>>> modulo
>> the
>>>> largest representable integer value. See [ISO 10967].
>>>>
>>>> So we could just detect overflow and error, but that feels like
>> short/byte
>>>> operations are likely to overflow. Which might break usability, but 
>>>> it might detect cases people weren't expecting?
>>>>
>>>> Or we could do what Java does and just promote arithmetic 
>>>> operations to Int, which is likely to just do the right thing and 
>>>> not overflow. But does me you would likely need to add downcasts 
>>>> that might not be expected,e.g.
>>>>
>>>>      <element name="foo" type="xs:short"
>>>>        dfdl:inputValueCalc="{ xs:short(../short1 + ../short2) }" />
>>>>
>>>> In order for DPath to work the way it does, I think we do need to 
>>>> make a compile time decision, I don't think DPath really wants to 
>>>> promote things at runtime to whatever type fits the arithmetic 
>>>> result and just assume everything is a Numeric. But I guess that 
>>>> could be an option too, and there just might
>> be
>>>> little
>>>> bit of runtime overhead to check types and arithmetic results.
>>>>
>>>> Thoughts?
>>>>
>>>
>>
>>
>

RE: DPath arithmetic conversions and overflow/overflow

Reply via email to