Re: [External] : Re: Primitive type patterns

Brian Goetz Thu, 03 Mar 2022 15:22:40 -0800

I read JLS 5.2 more carefully and discovered that while assignmentcontext supports primitive narrowing from int-and-smaller tosmaller-than-that:


    byte b = 0


it does not support primitive narrowing from long to int:

    int x = 0L  // error

My best guess at rationale is that because there is no suffix forint/short/byte, then int literals are like "poly expressions" but longliterals are just long literals. That's an irritating asymmetry (but,fixable.)

In addition, if the expression is a constant expression (§15.29) oftype byte, short,
char, or int:
• A narrowing primitive conversion may be used if the variable is oftype byte,short, or char, and the value of the constant expression isrepresentable in the
type of the variable.
• A narrowing primitive conversion followed by a boxing conversion maybe usedif the variable is of type Byte, Short, or Character, and the value ofthe constant
expression is representable in the type byte, short, or char respectively.




On 3/3/2022 10:17 AM, Dan Heidinga wrote:

On Wed, Mar 2, 2022 at 3:13 PM Brian Goetz<brian.go...@oracle.com>  wrote:



On 3/2/2022 1:43 PM, Dan Heidinga wrote:

Making the pattern match compatible with assignment conversions makes
sense to me and follows a similar rationale to that used with
MethodHandle::asType following the JLS 5.3 invocation conversions.
Though with MHs we had the ability to add additional conversions under
MethodHandles::explicitCastArguments. With pattern matching, we don't
have the same ability to make the "extra" behaviour opt-in / opt-out.
We just get one chance to pick the right behaviour.

Indeed.  And the thing that I am trying to avoid here is creating _yet
another_ new context in which a different bag of ad-hoc conversions are
possible.  While it might be justifiable from a local perspective to say
"its OK if `int x` does unboxing, but having it do range checking seems
new and different, so let's not do that", from a global perspective,
that means we a new context ("pattern match context") to add to
assignment, loose invocation, strict invocation, cast, and numeric
contexts.  That is the kind of incremental complexity I'd like to avoid,
if there is a unifying move we can pull.

I'm in agreement on not adding new contexts but I had the opposite
impression here.  Doesn't "having it do range checking" require a new
context as this is different from what assignment contexts allow
today?  Or is it the case that regular, non-match assignment must be
total with no left over that allows them to use the same context
despite not being able to do the dynamic range check?  As this
sentence shows, I'm confused on how dynamic range checking fits in the
existing assignment context.

Or are we suggesting that assignment allows:

byte b = new Long(5);

to succeed if we can unbox + meet the dynamic range check?  I'm
clearly confused here.

Conversions like unboxing or casting are burdened by the fact that they
have to be total, which means the "does it fit" / "if so, do it" / "if
not, do something else (truncate, throw, etc)" all have to be crammed
into a single operation.  What pattern matching is extracts the "does it
fit, and if so do it" into a more primitive operation, from which other
operations can be composed.

Is it accurate to say this is less reusing assignment context and more
completely replacing it with a new pattern context from which
assignment can be built on top of?

At some level, what I'm proposing is all spec-shuffling; we'll either
say "a widening primitive conversion is allowed in assignment context",
or we'll say that primitive `P p` matches any primitive type Q that can
be widened to P.  We'll end up with a similar number of rules, but we
might be able to "shake the box" to make them settle to a lower energy
state, and be able to define (whether we explicitly do so or not)
assignment context to support "all the cases where the LHS, viewed as a
type pattern, are exhaustive on the RHS, potentially with remainder, and
throws if remainder is encountered."  (That's what unboxing does; throws
when remainder is encountered.)

Ok. So maybe I'm not confused.  We'd allow the `byte b = new Long(5);`
code to compile and throw not only on a failed unbox, but also on a
dynamic range check failure.

If we took this "dynamic hook" behaviour to the limit, what other new
capabilities does it unlock?  Is this the place to connect other
user-supplied conversion operations as well?  Maybe I'm running too
far with this idea but it seems like this could be laying the
groundwork for other interesting behaviours.  Am I way off in the
weeds here?

As to the range check, it has always bugged me that you see code that
looks like:

      if (i >= -127 && i <= 128) { byte b = (byte) i; ... }

because of the accidental specificity, and the attendant risk of error
(using <= instead of <, or using 127 instead of 128). Being able to say:

      if (i instanceof byte b) { ... }

is better not because it is more compact, but because you're actually
asking the right question -- "does this int value fit in a byte."  I'm
sad we don't really have a way to ask this question today; it seems an
omission.

I had been thinking about this when I wrote my response and I like
having the compiler generate the range check for me.  As you say, way
easier to avoid errors that way.

Intuitively, the behaviour you propose is kind of what we want - all
the possible byte cases end up in the byte case and we don't need to
adapt the long case to handle those that would have fit in a byte.
I'm slightly concerned that this changes Java's historical approach
and may lead to surprises when refactoring existing code that treats
unbox(Long) one way and unbox(Short) another.  Will users be confused
when the unbox(Long) in the short right range ends up in a case that
was only intended for unbox(Short)?  I'm having a hard time finding an
example that would trip on this but my lack of imagination isn't
definitive =)

I'm worried about this too.  We examined it briefly, and ran away, when
we were thinking about constant patterns, specifically:

      Object o = ...
      switch (o) {
          case 0: ...
          default: ...
      }

What would this mean?  What I wouldn't want it to mean is "match Long 0,
Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the
line for "magic".  (Note that this is about defining what the _constant
pattern_ means, not the primitive type pattern.) I think its probably
reasonable to say this is a type error; 0 is applicable to primitive
numerics and their boxes, but not to Number or Object.  I think that is
consistent with what I'm suggesting about primitive type patterns, but
I'd have to think about it more.

Object o =...
switch(o) {
     case (long)0: ...  // can we say this?  Probably not
     case long l && l == 0: // otherwise this would become the way to
catch most of the constant 0 cases
     default: ....
}

I'm starting to think the constant pattern will feel less like magic
once the dynamic range checking becomes commonplace.

Something like following shouldn't be surprising given the existing
rules around unbox + widening primitive conversion (though it may be
when first encountered as I expect most users haven't really
internalized the JLS 5.2 rules):

As Alex said to me yesterday: "JLS Ch 5 contains many more words than
any prospective reader would expect to find on the subject, but once the
reader gets over the overwhelm of how much there is to say, will find
none of the words surprising."  There's a deeper truth to this
statement: Java is not actually as simple a language as its mythology
suggests, but we win by hiding the complexity in places users generally
don't have to look, and if and when they do confront the complexity,
they find it unsurprising, and go back to ignoring it.

So in point of fact, *almost no one* has read JLS 5.2, but it still does
"what users would likely find reasonable".

Number n = ....;
switch(n) {
    case long l -> ...
    case int i -> .... // dead code
    case byte b -> .... // dead code
    default -> ....
}

Correct.  We have rules for pattern dominance, which are used to give
compile errors on dead cases; we'd have to work through the details to
confirm that `long l` dominates `int i`, but I'd hope this is the case.

But this may be more surprising as I suggested above

Number n = new Long(5);
switch(n) {
    case byte b -> .... // matches here
    case int i -> .... //
    case long l -> ...
    default -> ....
}

Overall, I like the extra dynamic range check but would be fine with
leaving it out if it complicates the spec given it feels like a pretty
deep-in-the-weeds corner case.

It is probably not a forced move to support the richer interpretation of
primitive patterns now.  But I think the consequence of doing so may be
surprising: rather than "simplifying the language" (as one might hope
that "leaving something out" would do), I think there's a risk that it
makes things more complicated, because (a) it effectively creates yet
another conversion context that is distinct from the too-many we have
now, and (b) creates a sharp edge where refactoring from local variable
initialization to let-bind doesn't work, because assignment would then
be looser than let-bind.

Ok. You're saying that the dynamic range check is essential enough
that it's worth a new context for if we can't adjust the meaning of
assignment context.

One reason this is especially undesirable is that one of the forms of
let-bind is a let-bind *expression*:

      let P = p, Q = q
      in <expression>

which is useful for pulling out subexpressions and binding them to a
variable, but for which the scope of that variable is limited.  If
refactoring from:

Possible typo in the example.  Attempted to fix:

      int x = stuff;
      m(f(x));

to

      m(let x = stuff in f(x))
      // x no longer in scope here

Not sure I follow this example.  I'm not sure why introducing a new
variable in this scope is useful.

was not possible because of a silly mismatch between the conversions in
let context and the conversions in assignment context, then we're
putting users in the position of having to choose between richer
conversions and richer scoping.

Ok.  I think I see where this is going and while it may be clearer
with a larger example, I agree with the principle that this
refactoring should be possible.

--Dan

(Usual warning (Remi): I'm mentioning let-expressions because it gives a
sense of where some of these constraints come from, but this is not a
suitable time to design the let-expression feature.)

Re: [External] : Re: Primitive type patterns

Reply via email to