Re: RFR: 8361842: Move input validation checks to Java for java.lang.StringCoding intrinsics [v8]

John R Rose Fri, 18 Jul 2025 16:21:36 -0700

On Thu, 17 Jul 2025 15:33:37 GMT, Raffaello Giulietti <rgiulie...@openjdk.org> 
wrote:


> What is the thinking when an `@IntrinsicCandidate` method invokes another 
> `@IntrinsicCandidate` method? Which part is responsible for the checks?

This is a good question.  Suppose IC1 calls IC2 and both are intrinsic 
candidates, and suppose that M1 and M2 are their checked "front doors".

I think the answer has to be that, once you start executing IC1, you cannot 
expect any further checks.  Probably some assembler macro implements IC2 and it 
may be called from more than one place.  The tricky thing to prove is that all 
uses of IC2's intrinsic code, whether direct (via M2) or indirect (via things 
like M1) have adequate checks.

If intrinsics are factored this way (as they are for string methods) I think 
that IC1 has to advertise that it calls IC2, so that the front door method M1 
is responsible for validity checks for both IC1 and IC2.  That is because after 
intrinsic expansion, IC2 is reached without going through M2; the entry was 
indirectly from M1.  So M1 has to duplicate M2's front door checks.

To make this workable, it may be that M2's front door checks are factored into 
a subroutine FD2, so that M1 can refer to FD2, rather than do risky code 
duplication.

If (as in this case) IC2 loops over calls to IC1, then perhaps M2 should have a 
companion method FD2R which checks a range a range of inputs to IC2, so that M1 
can call FD2R.  If all goes well, then FD2R has a range check that duplicates 
the front door logic of M1, so that the JIT can remove the duplicate checking.

In the case of `StringUTF16.getChar`, I see it is marked as trusted, and it 
does not have a front-door method, and does have many callers.  In the terms of 
this PR, perhaps it should be renamed `getChar0` (or the like) to make it more 
clear (at non-local use points) that it must be called from trusted code.  
Perhaps it should also have a range check method associated with it, so that 
some callers can use that range check method, so that the non-local 
responsibility is more clearly fulfilled.

Maybe some callers (if less performance critical) should be changed to call a 
properly checked front-door method, `getChar` (as opposed to `getChar0`).  
Remaining callers of `getChar0` should be clearly linked to the front-door 
checks required by `getChar0`.

The above seems to be the principled way to deal with an unchecked intrinsic 
called from many trusted use sites.  The basic idea is that every trusted use 
site should reaffirm its responsibility locally, not just hope that a non-local 
assert will catch a bug.  We want some kind of reviewable (static/local) proof 
that each use site (of an unchecked private intrinsics) has correct checks.

Some examples:  A new front-door `getChar` method can be used in less important 
places like `AbstractStringBuilder::getChar`.

In trusted loops over `getChar` like `String::encodeASCII`, the loop containing 
`getChar` can be prefaced by a range check which is batched for all the loop 
iterations, something like `StringUTF16.getCharChecks(val, 0, len)`.  The same 
pattern occurs in `String::encode8859_1` and `encodeUTF8_UTF16` and 
`computeSizeUTF8_UTF16` and maybe elsewhere.  The `val` reference and limit 
variable `len` or `sl` should be marked `final` to ensure that the batched 
range check remains correct (because it should not take loop-variant inputs).

As I read through `String.java` I see that a batched range check would cover a 
lot of use cases…  I haven't read though all the uses of `getChar`, however.

The intrinsic `encodeISOArray0` (was `implEncodeISOArray`) calls `getChar`.  
This is an example where its front door method (now `encodeISOArray` with no 
"0") should call a batched check method like `getCharBatchChecks`.  Let's look 
at this in detail.  The `getCharBatchChecks` method could look like this:


//non-public
void getCharBatchChecks(byte[] val, int charStart, int charSize) {
  Objects.requireNonNull(val, "val");  // *** what style guide mandates this 
line??
  Preconditions.checkFromIndexSize(charStart << 1, charSize << 1, val.length, 
Preconditions.AIOOBE_FORMATTER);
  // *** using "char" in the names helps reduce confusion from the mix of byte 
and char indexes
}
…
static int encodeISOArray(…) {
    …
    StringUTF16.getCharBatchChecks(sa, sp, len);  // next method loops over 
getChar(sa, sp++)
    return encodeISOArray0(sa, sp, da, dp, len);
}


Note that after inlining, the batch checks exactly match pre-existing checks 
for the caller intrinsic.  Perhaps the caller's checks could be removed 
manually, or perhaps the JIT removes the duplication.

Actually, I think you got this documentation wrong:


@param sp the index of the <em>byte (not character!)</em> from the source array 
to start reading from


AFAICT, `sp` is a char index; note that `getChar` scales it as `(index=sp)<<1`.

Note that `getChar` has zero javadoc, so you are left to guess helplessly about 
its index operand.

This stuff is complicated to get right.  The above exercise in wiring up the 
checking logic tends to uncover bugs and misconceptions, I think.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25998#issuecomment-3091231708

Re: RFR: 8361842: Move input validation checks to Java for java.lang.StringCoding intrinsics [v8]

Reply via email to