Re: What's in a CONSTANT_Class?

2017-06-08 Thread Dan Smith
Some initial notes below attempting to flesh out what our two long-term options 
look like.

> On Jun 7, 2017, at 1:53 PM, John Rose  wrote:

> Comparing these options in detail makes me comfortable with
> declaring that a CONSTANT_Class is *mainly* a file reference,
> and *also* an L-mode type.

Let me highlight this as the source of all these problems. Trying to make a 
single constant pool entry represent two different things is painful. It leads 
to confusion about the model, tortured language explaining basic things like 
what gets "returned" from resolution, attempts to explain away cases that don't 
follow the rules, bugs, etc.

That said, we must live with the legacy of years ago and make the best of it. 
Looking at the two viable strategies:

> 1. Wrap a new CP node (a "mode node") around the file-oriented C_Class node - 
> Q[Class["Foo"]]

Here's the syntax I would use, more or less:

CONSTANT_Class_info {
u1 tag; // 7
u2 name_index; // Utf8
}

CONSTANT_PrimitiveType_info {
   u1 tag; // 19
   u1 type_code; // 'Z'=90 or 4, 'C'=67 or 5, 'B'=66 or 8, 'S'=83 or 9
 // 'I'=73 or 10, 'J'=74 or 11, 'F'=70 or 6, 'D'=68 or 7
}

CONSTANT_ClassType_info {
   u1 tag; // 20
   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
   u2 class_index; // Class
}

CONSTANT_ArrayType_info {
   u1 tag; // 21
   u2 component_index; // PrimitiveType, ClassType, ArrayType, or SpeciesType
}

CONSTANT_SpeciesType_info {
u1 tag; //22
u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
u2 class_index; // Class
u2 enclosing_index; // ClassType or SpeciesType
u2 typearg_count;
u2 typeargs[typearg_count]; // PrimitiveType, ClassType, ArrayType, or 
SpeciesType
}

CONSTANT_MethodDescriptor_info {
   u1 tag; // 23
   u2 parameter_count;
   u2 parameter_descriptors[parameter_count]; // PrimitiveType, ClassType, 
ArrayType, or SpeciesType
   u2 return_descriptor; // PrimitiveType, ClassType, ArrayType, SpeciesType, 
or 0 (void)
}

CONSTANT_FieldDescriptor_info { // is this wrapper useful?
u1 tag; // 24
u2 type_index; // PrimitiveType, ClassType, ArrayType, or SpeciesType
}

(I thought about a CONSTANT_Type_info union rather than all these flavors of 
type constants, but it's not great because 1) constant pool entries already 
form a tagged union, so we don't need another union layer, and 2) 
CONSTANT_Class_info can also be used to represent types—once you've got 2 
flavors, might as well have 5+.)


> 2. Insert a new CP node inside the type-oriented C_Class node - 
> Class[Q["Foo"]] or Class[Q[File["Foo"]]]

Possible syntax for this:

CONSTANT_Class_info {
u1 tag; // 7
u2 name_index; // Utf8, PrimitiveDescriptor, ClassDescriptor, 
ArrayDescriptor, SpeciesDescriptor
}

CONSTANT_PrimitiveDescriptor_info {
   u1 tag; // 19
   u1 type_code; // 'Z'=90 or 4, 'C'=67 or 5, 'B'=66 or 8, 'S'=83 or 9
 // 'I'=73 or 10, 'J'=74 or 11, 'F'=70 or 6, 'D'=68 or 7
}

CONSTANT_ClassDescriptor_info {
   u1 tag; // 20
   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
   u2 class_index; // ClassFile
}

CONSTANT_ClassFile_info {
   u1 tag; // 25
   u2 class_index; // Utf8
}

CONSTANT_ArrayDescriptor_info {
   u1 tag; // 21
   u2 component_index; // PrimitiveDescriptor, ClassDescriptor, 
ArrayDescriptor, or SpeciesDescriptor
}

CONSTANT_SpeciesDescriptor_info {
u1 tag; //22
u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
u2 class_index; // ClassFile
u2 enclosing_index; // ClassDescriptor or SpeciesDescriptor
u2 typearg_count;
u2 typeargs[typearg_count]; // PrimitiveDescriptor, ClassDescriptor, 
ArrayDescriptor, or SpeciesDescriptor
}

CONSTANT_MethodDescriptor_info {
   u1 tag; // 23
   u2 parameter_count;
   u2 parameter_descriptors[parameter_count]; // PrimitiveDescriptor, 
ClassDescriptor, ArrayDescriptor, or SpeciesDescriptor
   u2 return_descriptor; // PrimitiveDescriptor, ClassDescriptor, 
ArrayDescriptor, SpeciesDescriptor, or 0 (void)
}

CONSTANT_FieldDescriptor_info { // is this wrapper useful?
u1 tag; // 24
u2 type_index; // PrimitiveDescriptor, ClassDescriptor, ArrayDescriptor, or 
SpeciesDescriptor
}



Here's an overview of spec changes, assuming one of the sets of syntactic 
changes above. As I look at this, both approaches seem mostly fine. Option (1) 
has messier rules for resolution, because it has to deal with the duality of 
CONSTANT_Class. Option (2) has messier treatment of this_class, in exchange for 
eliminating the duality of CONSTANT_Class.

The rules about where types can appear can be additive (new constants allowed 
in certain places) or negative (certain kinds of CONSTANT_Class disallowed in 
certain places), but either way, you've *mostly* got to touch all of the same 
places.


Syntax

Need to describe where certain kinds of types or class references can appear. 
In option (1), some of this can be enforced to some extent by limiting the 
types of constants allowed in certain places. But, generally, both option (1) 
and option (2)

Re: What's in a CONSTANT_Class?

2017-06-08 Thread John Rose
On Jun 8, 2017, at 5:00 PM, Dan Smith  wrote:
> 
> Some initial notes below attempting to flesh out what our two long-term 
> options look like.

I like both of your sketches.

I think we should also try this variation:  CONSTANT_Class is legacy only.
That way folks won't encounter CONSTANT_Class as a False Friend,
as they encounter it in new CP structures.

It is *neither* mainly a file nor mainly a type, but only a legacy abbreviation.
For loading a class file we have CONSTANT_ClassFile and for naming
a class type we have CONSTANT_ClassType (and the other types of 1).

The legacy meaning of CONSTANT_Class is retained, but the preferred
translation of "String.class" is ldc[CONSTANT_ClassType['L',CFS]] where
CFS is CONSTANT_ClassFile[Utf8["java/lang/String]].

An object string field is getfield[CONSTANT_Fieldref[ClassType['L',...],
NameAndType["myStr", CONSTANT_ClassType['L',CFS.

The stringy type descriptors are tucked away inside C_NameAndType.
I guess it's sufficient to say that the second component of C_NAT
preferentially points at a C_XType (for X in Primitive, Class, Array,
Species) but may also point at legacy Utf8.  The Utf8 semantics
could be defined by expansion to hypothetical CP entries (a point
you already made).

— John

Re: What's in a CONSTANT_Class?

2017-06-08 Thread John Rose
(more comments)

On Jun 8, 2017, at 5:00 PM, Dan Smith  wrote:
> 
> 
> CONSTANT_Class_info {
>u1 tag; // 7
>u2 name_index; // Utf8
> }

If we decide to sideline the previous guy as a False Friend,
then this is the place where resolution really happens:

CONSTANT_ClassFile_info {
   u1 tag; // 25
   u2 name_index; // Utf8
}

> CONSTANT_PrimitiveType_info {
>   u1 tag; // 19
>   u1 type_code; // 'Z'=90 or 4, 'C'=67 or 5, 'B'=66 or 8, 'S'=83 or 9
> // 'I'=73 or 10, 'J'=74 or 11, 'F'=70 or 6, 'D'=68 or 7
> }

Alternative encoding:  Assign a compact range of tags 32..39,
one per primitive.  Another alternative:  Hardwire the top 8 CP
indexes (starting at 2^16-9).  But these alternatives just remove
a minor eyesore from class files; instead of lots of UTF8 encodings
there will be a little dance at the beginning of every CP that
recalls to mind the perennial favorites 'int', 'boolean', etc.
For the CP type system, one type for primitives is better,
I guess.

I slightly prefer the smaller code points, because they are
easier to decode with a short array.  But a perfect hash code
would be a clever alternative for either encoding.

If we use Utf8 strings for types (in non-legacy CP structure)
then the actual ASCII code points would be more appealing.

> 
> CONSTANT_ClassType_info {
>   u1 tag; // 20
>   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
>   u2 class_index; // Class
s/Class/ClassFile/
> }
> 
> CONSTANT_ArrayType_info {
>   u1 tag; // 21
>   u2 component_index; // PrimitiveType, ClassType, ArrayType, or SpeciesType
> }
> 
> CONSTANT_SpeciesType_info {
>u1 tag; //22
>u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
>u2 class_index; // Class
>u2 enclosing_index; // ClassType or SpeciesType
>u2 typearg_count;
>u2 typeargs[typearg_count]; // PrimitiveType, ClassType, ArrayType, or 
> SpeciesType
> }
s/Class/ClassFile/
…which raises the question of whether the species is type-like or file-like.
The mode_code also raises this question.
Why must a mode also be assigned when a template is expanded?
When a class file is loaded, a mode is not assigned.
Perhaps both class files and species are "pre-types", things with
names and typed members, but which are not yet themselves types.

> CONSTANT_MethodDescriptor_info {
>   u1 tag; // 23
>   u2 parameter_count;
>   u2 parameter_descriptors[parameter_count]; // PrimitiveType, ClassType, 
> ArrayType, or SpeciesType
>   u2 return_descriptor; // PrimitiveType, ClassType, ArrayType, SpeciesType, 
> or 0 (void)
> }

The void quasi-type should be lumped into PrimitiveType, for the sake
of ldc (void.class).

> CONSTANT_FieldDescriptor_info { // is this wrapper useful?
>u1 tag; // 24
>u2 type_index; // PrimitiveType, ClassType, ArrayType, or SpeciesType
> }

I don't think this wrapper is useful.  Instead we have the lopsided
distinction between the star in FieldRef[,NameAndType[,*]] and the star
in MethodRef[,NameAndType[,*]].  In the case of FieldRef, it is any
of the types (but not PT-void), and in the case of MethodRef, it is
a MethodDescriptor.

MethodDescriptor is an extra tricky nut to crack here, I think, because
it has an unlimited arity.  That makes logical sense, but major JVMs
(IBM, ours) have baked in an assumption that CP entries are fixed
in size except for Utf8 strings.  In JSR 292 we pushed the BSM
specifiers into a side table for this reason.  We could put method
descriptor lists into a similar side table.  I don't have a good suggestion
here.  For method types the flat Utf8 strings are seductive, at least until
you have 100 repetitions of the substring "Ljava/lang/Object;".

If we break the arity limit of 2, then we should also consider merging
NameAndType into FieldRef and MethodRef, at which point the
genericity of NameAndType becomes moot.  The three components
of a FieldRef would be (holder:ClassType,name:Utf8,:type:XType)
and the components of a MethodRef could be (holder:ClassType,
name:Utf8,descr:MethodDescriptor).  At that point the MethodD.
could be unfolded into the MethodRef, right?  Then the only
high-arity node would be MethodRef.  (Except for C_MethodType.
But that could be made a legacy guy also, since he is built on
top of flat strings, and condy can materialize him easily enough.)

> (I thought about a CONSTANT_Type_info union rather than all these flavors of 
> type constants, but it's not great because 1) constant pool entries already 
> form a tagged union, so we don't need another union layer, and 2) 
> CONSTANT_Class_info can also be used to represent types—once you've got 2 
> flavors, might as well have 5+.)

Yep.  And you could push that a little farther by giving each
PrimitiveType its own tag.  The PTs are the odd thing here.
There are no constants except them that have a payload
of less than a byte.  Just as constants seem to have a
maximum size (arity 2) they also seem to have a minimum
size (32 bits or so).  Note that very small integer constants
(which would correspond to PT sub-ta

Re: What's in a CONSTANT_Class?

2017-06-09 Thread John Rose
Whatever tag numbering scheme we do will have to move
up by two, since CONSTANT_Module_info.tag = 19 and 
CONSTANT_Package_info.tag = 20.

I think we can take our next node at tag 21, for CONSTANT_Q
or CONSTANT_Value or whatever we were calling it on Wednesday.

I particularly like your ClassType_info; I'd like to use that for MVT,
with T_VALUETYPE (14) as the only mode_code that is valid,
at first.  Later T_OBJECT (12) for symmetry, and T_UNION (3? 16?).

(Much much later T_INT or another primitive, associated with a Class,
is a very interesting thing to contemplate.)

— John

On Jun 8, 2017, at 5:00 PM, Dan Smith  wrote:
> 
> CONSTANT_PrimitiveType_info {
>   u1 tag; // 19
>   u1 type_code; // 'Z'=90 or 4, 'C'=67 or 5, 'B'=66 or 8, 'S'=83 or 9
> // 'I'=73 or 10, 'J'=74 or 11, 'F'=70 or 6, 'D'=68 or 7
> }
> 
> CONSTANT_ClassType_info {
>   u1 tag; // 20
>   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
>   u2 class_index; // Class
> }



Re: What's in a CONSTANT_Class?

2017-06-09 Thread Dan Smith
> On Jun 8, 2017, at 8:44 PM, John Rose  wrote:
> 
> The void quasi-type should be lumped into PrimitiveType, for the sake
> of ldc (void.class).

I see the appeal, though it also, unfortunately, expands the set of "primitive 
types" and means we have to restrict that set at the use sites:

CONSTANT_ArrayType_info {
  u1 tag; // 21
  u2 component_index; // PrimitiveType **but not void**, ClassType, ArrayType, 
or SpeciesType
}

CONSTANT_SpeciesType_info {
   u1 tag; //22
   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
   u2 class_index; // Class
   u2 enclosing_index; // ClassType or SpeciesType
   u2 typearg_count;
   u2 typeargs[typearg_count]; // PrimitiveType **but not void**, ClassType, 
ArrayType, or SpeciesType
}

CONSTANT_MethodDescriptor_info {
  u1 tag; // 23
  u2 parameter_count;
  u2 parameter_descriptors[parameter_count]; // PrimitiveType **but not void**, 
ClassType, ArrayType, or SpeciesType
  u2 return_descriptor; // PrimitiveType, ClassType, ArrayType, SpeciesType, or 
0 (void)
}

CONSTANT_FieldDescriptor_info { // is this wrapper useful?
   u1 tag; // 24
   u2 type_index; // PrimitiveType **but not void**, ClassType, ArrayType, or 
SpeciesType
}

- Any non-void type (CONSTANT_Class, CONSTANT_ArrayType, CONSTANT_ClassType, 
CONSTANT_SpeciesType, or CONSTANT_PrimitiveType **that isn't void**):
anewarray
verification_type_info.Object_variable_info

- Any type or void (CONSTANT_Class, CONSTANT_ArrayType, CONSTANT_ClassType, 
CONSTANT_SpeciesType, or CONSTANT_PrimitiveType):
ldc
BootstrapMethods.bootstrap_arguments

I prefer the discipline of making 'void' a separate entity (CONSTANT_Void?) 
that we don't necessarily call a "type", although not sure that carries its 
weight.

> Note that very small integer constants
> (which would correspond to PT sub-tags) are *not* usually
> stored in the CP; they are loaded with short instructions
> like "bipush", not "ldc".

Yes, and that works fine for instructions (see also newarray). The new 
requirement here is for another constant pool entry to need to talk about one 
of these very small things, and in a polymorphic way (e.g., the component type 
of an array may be a primitive or some other type).

—Dan

Re: What's in a CONSTANT_Class?

2017-06-09 Thread John Rose
On Jun 9, 2017, at 8:43 AM, Dan Smith  wrote:
> 
> I prefer the discipline of making 'void' a separate entity (CONSTANT_Void?) 
> that we don't necessarily call a "type", although not sure that carries its 
> weight.

I think on balance the JLS would be cleaner if we admitted void is a type, with 
some funny restrictions.

(IIRC Alex tilts this way too.)

Allowing  in return position to assume  will be attractive with new 
generics.




Re: What's in a CONSTANT_Class?

2017-06-09 Thread John Rose
On Jun 9, 2017, at 1:45 PM, John Rose  wrote:
> 
> Allowing  in return position to assume  will be attractive with new 
> generics.

Allowing Map to assume V=void with no layout footprint derives 
Set.
This is a trick Rust plays successfully.  I've always wanted to pull that trick.



Re: What's in a CONSTANT_Class?

2017-06-10 Thread Remi Forax
You're not alone :)

I want the result of an async procedure call to be a CompletableFuture 
too.

Rémi 

On June 9, 2017 10:53:14 PM GMT+02:00, John Rose  wrote:
>On Jun 9, 2017, at 1:45 PM, John Rose  wrote:
>> 
>> Allowing  in return position to assume  will be attractive
>with new generics.
>
>Allowing Map to assume V=void with no layout footprint
>derives Set.
>This is a trick Rust plays successfully.  I've always wanted to pull
>that trick.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: What's in a CONSTANT_Class?

2017-06-12 Thread Karen Kinnear
Dan,

I am really glad we are exploring the longer term picture of how to handle the 
constant pool.

(note: not to be confused with the Minimal Value Types exercise)

I would like to add a couple of constraints/questions/concerns please:

1) No change in meaning of any existing constant pool entries
   Dan Heidinga correctly pointed out the challenge, that while we may have a 
classfile version on the classfile as generated originally,
   tools will be injecting byte codes assuming the meaning of existing constant 
pool entries, and will be adding constant pool entries
   prior to having any knowledge of classfile version changes.

2) impact on APIs
  I need a better understanding on how a user is going to represent the 
difference between a QFoo and an LFoo in source?
  And whether we are going to be changing/augmenting APIs that currently take a 
class name if we want them to extend to support
  value types rather than always requiring boxing. Today we have a name/loader 
unique runtime type guarantee and the loader can be
  determined from context. 

3) impact on tools
   We need feedback from tool developers.
   Maurizio has mentioned concerns relative to ASM.
   Please look at the JNI and JVMTI type signatures - they expose for instance 
the JVMS BasicTypes
   so changes here break tools.

http://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/types.html#wp16432

4) prototype support
   For value types and for specialization, let’s make sure that any proposal 
could have a prototype/ implementation that allows generation
   of separate classfiles, so two UTF8s. (e.g. what I was proposing was that 
any derived type would have both its own name
   and a link to the “root” type from which it derived. I think that could 
apply to species as well as to value types but you have
   probably thought this through more than I have). 

5) Adding Type information rather than Class information has a ripple effect on 
the JVM implementation - we need to study in more
  detail how this changes other constant pool entries such as StackMapTable etc.

6) Descriptor ambiguity
   We need to make sure that we design descriptors after we have figured out 
what a UType is.
   Descriptor matching (nominal or structural) works with exact matches. If you 
introduce a polymorphism that allows for
   multiple potential correct matches, you have to work out resolution, 
overriding and selection rules in great detail (and pay the performance cost).

thanks,
Karen
> On Jun 9, 2017, at 11:43 AM, Dan Smith  wrote:
> 
>> On Jun 8, 2017, at 8:44 PM, John Rose  wrote:
>> 
>> The void quasi-type should be lumped into PrimitiveType, for the sake
>> of ldc (void.class).
> 
> I see the appeal, though it also, unfortunately, expands the set of 
> "primitive types" and means we have to restrict that set at the use sites:
> 
> CONSTANT_ArrayType_info {
>  u1 tag; // 21
>  u2 component_index; // PrimitiveType **but not void**, ClassType, ArrayType, 
> or SpeciesType
> }
> 
> CONSTANT_SpeciesType_info {
>   u1 tag; //22
>   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
>   u2 class_index; // Class
>   u2 enclosing_index; // ClassType or SpeciesType
>   u2 typearg_count;
>   u2 typeargs[typearg_count]; // PrimitiveType **but not void**, ClassType, 
> ArrayType, or SpeciesType
> }
> 
> CONSTANT_MethodDescriptor_info {
>  u1 tag; // 23
>  u2 parameter_count;
>  u2 parameter_descriptors[parameter_count]; // PrimitiveType **but not 
> void**, ClassType, ArrayType, or SpeciesType
>  u2 return_descriptor; // PrimitiveType, ClassType, ArrayType, SpeciesType, 
> or 0 (void)
> }
> 
> CONSTANT_FieldDescriptor_info { // is this wrapper useful?
>   u1 tag; // 24
>   u2 type_index; // PrimitiveType **but not void**, ClassType, ArrayType, or 
> SpeciesType
> }
> 
> - Any non-void type (CONSTANT_Class, CONSTANT_ArrayType, CONSTANT_ClassType, 
> CONSTANT_SpeciesType, or CONSTANT_PrimitiveType **that isn't void**):
> anewarray
> verification_type_info.Object_variable_info
> 
> - Any type or void (CONSTANT_Class, CONSTANT_ArrayType, CONSTANT_ClassType, 
> CONSTANT_SpeciesType, or CONSTANT_PrimitiveType):
> ldc
> BootstrapMethods.bootstrap_arguments
> 
> I prefer the discipline of making 'void' a separate entity (CONSTANT_Void?) 
> that we don't necessarily call a "type", although not sure that carries its 
> weight.
> 
>> Note that very small integer constants
>> (which would correspond to PT sub-tags) are *not* usually
>> stored in the CP; they are loaded with short instructions
>> like "bipush", not "ldc".
> 
> Yes, and that works fine for instructions (see also newarray). The new 
> requirement here is for another constant pool entry to need to talk about one 
> of these very small things, and in a polymorphic way (e.g., the component 
> type of an array may be a primitive or some other type).
> 
> —Dan



Re: What's in a CONSTANT_Class?

2017-06-14 Thread Karen Kinnear
Update from hotspot implementation:

We would like to request that for the MVT Early Access we keep the TEMPORARY 
CONSTANT_Class_info “;Q”.

This is far easier for us to implement (we have a prototype in progress) and we 
believe that it will be easier
for bytecode generators to adopt - which will allow us to get more people 
trying MVT so we get more feedback.

We would also like to keep the explicit separate name for the derived value 
class, so that from an implementation
standpoint we are able to continue to use the name, class loader pair as a 
unique lookup.
So the JVMS as proposed explicitly calls out 5.3 Creation and Loading that the 
derived value class has the name ClassName$Value.

For Early Access we would like to keep this naming convention, stable across 
reboots, so people can generate byte codes
that reference value types by name distinctly from their value capable class.

thanks,
Karen

p.s. this will allow us time to do the longer-term exploration of where the 
class/type/constant pool forms should evolve

Re: What's in a CONSTANT_Class?

2017-06-14 Thread Remi Forax
Hi Karen,
With my ASM Hat,
both CONSTANT_Class_info “;Q” and CONSTANT_ValueType_info that references 
an UTF8 are Ok for me.

Weirdly, having a CONSTANT_Value_info that reference a CONSTANT_Class_info is 
little harder to implement because the implementation of ASM is sensitive to 
the number of levels of indirection (it's hardcoded to be 4, a constant method 
handle has 4 levels).

On the longer term, I think that the spec of CONSTANT_Class should changed to 
accept a class descriptor and not a class name (which is not BTW because array 
are accepted in order to encode a method call to an array clone()).
It will allow more sharing and unlike a class name, a class descriptor is an 
extensible format.

>From the VM point of view, it's easy to know if a CONSTANT_Class is a 
>descriptor or not, if it's a descriptor, the last character is a ';'.
I also think that the bytecode version corresponding to 10 should requires that 
all CONSTANT_Class are encoded as class descriptor.  

regards,
Rémi

- Mail original -
> De: "Karen Kinnear" 
> À: "Dan Smith" 
> Cc: valhalla-spec-experts@openjdk.java.net
> Envoyé: Mercredi 14 Juin 2017 17:54:07
> Objet: Re: What's in a CONSTANT_Class?

> Update from hotspot implementation:
> 
> We would like to request that for the MVT Early Access we keep the TEMPORARY
> CONSTANT_Class_info “;Q”.
> 
> This is far easier for us to implement (we have a prototype in progress) and 
> we
> believe that it will be easier
> for bytecode generators to adopt - which will allow us to get more people 
> trying
> MVT so we get more feedback.
> 
> We would also like to keep the explicit separate name for the derived value
> class, so that from an implementation
> standpoint we are able to continue to use the name, class loader pair as a
> unique lookup.
> So the JVMS as proposed explicitly calls out 5.3 Creation and Loading that the
> derived value class has the name ClassName$Value.
> 
> For Early Access we would like to keep this naming convention, stable across
> reboots, so people can generate byte codes
> that reference value types by name distinctly from their value capable class.
> 
> thanks,
> Karen
> 
> p.s. this will allow us time to do the longer-term exploration of where the
> class/type/constant pool forms should evolve


Re: What's in a CONSTANT_Class?

2017-06-14 Thread John Rose
On Jun 14, 2017, at 9:22 AM, Remi Forax  wrote:
> 
> With my ASM Hat,
> both CONSTANT_Class_info “;Q” and CONSTANT_ValueType_info that 
> references an UTF8 are Ok for me.

Between those two I prefer the first since it doesn't require a new CP tag.

> Weirdly, having a CONSTANT_Value_info that reference a CONSTANT_Class_info is 
> little harder to implement because the implementation of ASM is sensitive to 
> the number of levels of indirection (it's hardcoded to be 4, a constant 
> method handle has 4 levels).

Interesting fact.  Won't that have to change with condy?
That allows bootstrap specifications to be recursive.

> On the longer term, I think that the spec of CONSTANT_Class should changed to 
> accept a class descriptor and not a class name (which is not BTW because 
> array are accepted in order to encode a method call to an array clone()).
> It will allow more sharing and unlike a class name, a class descriptor is an 
> extensible format.

[Flat strings won't take us there]

Remi, flat strings don't go far enough.  They are moderately
extensible, and certainly accommodate new ground types like
QFoo; and UFoo;, but there are two big problems.  First, they
suffer from combinatorial explosion (*less* sharingin flat strings)
and second they incompletely support expression-holes which
are required when we get to generics.

We live with the combinatorial problems of method type descriptors,
but I think that's a place we want to retreat from. (Look at the encoding
of (Object,Object,Object)Object:  The flatness requires repetition of
the whole qualified name four times, just in this one descriptor.)

When we go to parameterized types, ground types will have multiple
levels of nesting, which turns the problem from quadratic to
exponential.  That that point it's more than today's irritant.

You can patch this with repeat operators, but the natural format
is a tree, which represents all subparts uniformly, rather than some
as a defining use, and others as repeated uses.

[String-tagged shallow trees]

For non-ground generic types, a type string could to be something
like a format string.  (The format "hello, %s" has a string-typed hole.)
In that case, the string doesn't give you everything you need;
it must be joined by a vector of operands.  At that point you've
invented trees, and then the real question is whether tree nodes
should be tagged by format strings (an infinite number of them)
or by a handful of simple CP-style tags.

I handled both these issues in Pack200 by with the CONSTANT_Signature
CP type (present only in Pack200 archives), whose content is a format
string (with N>=0 holes) plus an implicitly counted vector of (N) CP refs
of type CONSTANT_Class.  (Primitives are inlined.)  For technical
reasons the hole syntax, if any, must be different from either string
format notations and Pack200 with future JVMs; I think it should be
a simple period '.'.  (For discussion signature meta-characters see
my "Symbolic Freedom" manifesto ca. 2008.)

For values+generics we'll probably want to look at an experimental design
like this that uses string-tagged tree nodes.  They are very compact (hence
their use in Pack200).

[Byte-tagged deep trees]

But I think for ease of tooling we will end up with the other option,
which is *more* tree nodes tagged by a very small finite set of
CP-style tags.  This is why I support designs like the ones
Dan has been sketching.

In that style of tree, a format string like "hello, %s" breaks down into
nested AST (Append[Literal["hello, "],Param[]]).  Instead of parsing
the string to find holes, the holes are directly represented, along
with every other part, in a strongly-typed AST tree.

An advantage of Dan-style trees is they are more strongly normalizing.
With the format-based trees you always have small types sliding inline
into the format strings, or out as explicit nodes (for uses like ldc).
The programmer's educated instincts prefer one way to say one
thing, rather than many ways to say the same thing.  Stronger
normalization leads to better compactness and fewer bugs.

[Constant inlining?]

Dan-style trees *could* be made much more compact, comparable
to format strings, by extending the CP to support inlining of constant
expressions into other expressions.  This weakens the strong normalization
of constants, but at a lower level where it can be hidden; constants
presented via tools like ASM can be normalized easily, with a single
clever rule ("unwind the inlining by making temporary CP nodes").
ASM does stuff like this in reverse already, by interning ("normalizing")
constants.

We probably need something like this anyway, for the future
CONSTANT_Group syntax, which doesn't pay for itself if it has to
burn its way through the limited (u2) index space of the CP; so it
needs some form of inlining, for constants that occur only inside
the group and don't need global sharing.

> From the VM point of view, it's easy to know if a CONSTANT_Class is a 
> descriptor or not, i

Re: What's in a CONSTANT_Class?

2017-06-14 Thread John Rose
On Jun 14, 2017, at 8:54 AM, Karen Kinnear  wrote:
> 
> We would like to request that for the MVT Early Access we keep the TEMPORARY 
> CONSTANT_Class_info “;Q”.

Nit: For uniformity, the syntax wants to be ";" + field_signature,
which implies ";Q;".  Without that uniformity you need
to specify a third syntax (neither field nor method signature),
which is not good spec. economy, even for a temporary feature.



Re: What's in a CONSTANT_Class?

2017-06-15 Thread forax
> De: "John Rose" 
> À: "Rémi Forax" 
> Cc: "Karen Kinnear" ,
> valhalla-spec-experts@openjdk.java.net
> Envoyé: Mercredi 14 Juin 2017 23:55:23
> Objet: Re: What's in a CONSTANT_Class?

> On Jun 14, 2017, at 9:22 AM, Remi Forax < fo...@univ-mlv.fr > wrote:

>> With my ASM Hat,
>> both CONSTANT_Class_info “;Q” and CONSTANT_ValueType_info that 
>> references
>> an UTF8 are Ok for me.

> Between those two I prefer the first since it doesn't require a new CP tag.

>> Weirdly, having a CONSTANT_Value_info that reference a CONSTANT_Class_info is
>> little harder to implement because the implementation of ASM is sensitive to
>> the number of levels of indirection (it's hardcoded to be 4, a constant 
>> method
>> handle has 4 levels).

> Interesting fact. Won't that have to change with condy?
> That allows bootstrap specifications to be recursive.

I have not implemented condy yet but i believe it's not an issue if the way to 
lookup a value is lazy, 
the user code will be recursive and not the ASM internals. 

>> On the longer term, I think that the spec of CONSTANT_Class should changed to
>> accept a class descriptor and not a class name (which is not BTW because 
>> array
>> are accepted in order to encode a method call to an array clone()).
>> It will allow more sharing and unlike a class name, a class descriptor is an
>> extensible format.

> [Flat strings won't take us there]

> Remi, flat strings don't go far enough. They are moderately
> extensible, and certainly accommodate new ground types like
> QFoo; and UFoo;, but there are two big problems. First, they
> suffer from combinatorial explosion (*less* sharingin flat strings)
> and second they incompletely support expression-holes which
> are required when we get to generics.

(BTW, neither you nor Karen did answer to my mail asking why we need UFoo; ) 

I agree that we may need a tree of constants but only if the interpreter need 
that in order to interpret the code. 
In my opinion, we should only use a tree of constants if it makes sense for the 
interpreter, otherwise, the constant should be flattened as a String. 
By example, an interpreter of Java 9 does not need to extract the types from a 
method descriptor in order to run (the verifier does but not the interpreter), 
so for me a method descriptor does not need to be a tree of constants at least 
for Java 9. 

> We live with the combinatorial problems of method type descriptors,
> but I think that's a place we want to retreat from. (Look at the encoding
> of (Object,Object,Object)Object: The flatness requires repetition of
> the whole qualified name four times, just in this one descriptor.)

> When we go to parameterized types, ground types will have multiple
> levels of nesting, which turns the problem from quadratic to
> exponential. That that point it's more than today's irritant.

> You can patch this with repeat operators, but the natural format
> is a tree, which represents all subparts uniformly, rather than some
> as a defining use, and others as repeated uses.

I fully agree. Specializing the code should not require to patch constants of 
the constant pool. 
The patchable content should be represented by an index inside a tree and the 
interpreter should maintain an array (in fact two arrays because you have 
method parameter types and class parameter types) of the corresponding type 
arguments. 

> [String-tagged shallow trees]

> For non-ground generic types, a type string could to be something
> like a format string. (The format "hello, %s" has a string-typed hole.)
> In that case, the string doesn't give you everything you need;
> it must be joined by a vector of operands. At that point you've
> invented trees, and then the real question is whether tree nodes
> should be tagged by format strings (an infinite number of them)
> or by a handful of simple CP-style tags.

> I handled both these issues in Pack200 by with the CONSTANT_Signature
> CP type (present only in Pack200 archives), whose content is a format
> string (with N>=0 holes) plus an implicitly counted vector of (N) CP refs
> of type CONSTANT_Class. (Primitives are inlined.) For technical
> reasons the hole syntax, if any, must be different from either string
> format notations and Pack200 with future JVMs; I think it should be
> a simple period '.'. (For discussion signature meta-characters see
> my "Symbolic Freedom" manifesto ca. 2008.)

> For values+generics we'll probably want to look at an experimental design
> like this that uses string-tagged tree nodes. They are very compact (hence
> their use in Pack200).

The problem of String tagging is that you often n

Re: What's in a CONSTANT_Class?

2017-06-15 Thread John Rose
On Jun 15, 2017, at 3:09 PM, fo...@univ-mlv.fr wrote:
> 
> 
> (BTW, neither you nor Karen did answer to my mail asking why we need UFoo; )

I'm working on a manifesto about this.  Short answer is Q-Foo and L-Foo need to
be disjoint types, so that interpreter processing of L-Foo is undisturbed and 
Q-Foo
can used non-heap buffering.  But, some language features require a *union*
of Q-Foo and L-Foo.  That could be Q-MaybeRef but it is so fundamental
to translation strategies that it seems to merit a new type-kind.  It can be the
final one, since it is a disjoint union containing all values of the other 
type-kinds
(L,Q,I,J,F,D).  To soften the blow, we can then align Q and U type-kinds to use
exactly the same representation, so that in practice verified Q-values are
carried using the same format as verified U-values (of the same class) but
with a little less exercise of the power of the carrier (refs and nulls don't 
appear).
The motivating uses of U-types are any-kinded type parameters and interfaces
which are implemented by values.  Once accepted, there are many serendipitous
uses of U-types that arise, including use cases where you want "Q-Foo or null"
(that's U-Foo) or "Foo and I don't want to know if it was boxed or not" (U-Foo 
again).
Finally, putting U-types in the heap, if we get that far, gives us frozen arrays
and frozen objects "for free".

Trust me, we spent many months trying to find a way to implement type-vars
and interfaces without U-types and the alternatives were all worse.  (Always
box for interface calls, but no heisenboxes?  No.  Always box any-type vars?
Also no.  Always specialize any-generic code at bytecode level?  No, no, no.
Use U-types for operands in generic algos and interface defaults?  Yes!)

> 
> I agree that we may need a tree of constants but only if the interpreter need 
> that in order to interpret the code.
> In my opinion, we should only use a tree of constants if it makes sense for 
> the interpreter, otherwise, the constant should be flattened as a String.

There are other reasons to avoid strings, notably better type checking.
Also footprint, if the complexity exponent goes above 2 (as I already argued).
> 
> You can patch this with repeat operators, but the natural format
> is a tree, which represents all subparts uniformly, rather than some
> as a defining use, and others as repeated uses.
> 
> I fully agree. Specializing the code should not require to patch constants of 
> the constant pool.
> The patchable content should be represented by an index inside a tree and the 
> interpreter should maintain an array (in fact two arrays because you have 
> method parameter types and class parameter types) of the corresponding type 
> arguments.
> 

(Another manifesto I'm working on!)  IMO, enhancing constant pools so they can
be smoothly parameterized is Job One for VM support of extended generics.
> 
> Actually, for the proposed extension, you look at the *first* character to see
> if it is a ';'.  It's a different place (already existing) in the system 
> where you
> check to see whether the name is of the form Foo or "LFoo;", and strip
> the decorations in the latter case.  You *could* get away with Class["QFoo;"]
> but I don't recommend it, because it's a little harder to decode for both
> human readers and parsers.
> 
> i do not understand why ?

We read a prefix first so it sets a context, and then we read the rest.
We human readers are used to this.  It's like backslash superquoting.
If you don't know what you are reading until you read the end, then
you have to read it twice.

For a computer, the flag is at foo[0] rather than foo[foo.length-1], which
is a little less complex.  And putting the flag at the beginning allows the
computer to stream over the text, choosing a parser up front, and then
like the human reader it can "read the rest" with confidence.  Streaming
over small strings like this has no performance advantage, but streaming
code is easier to understand and reason about, hence less buggy.

There's also my prejudice, to be frank.  I've always been annoyed by the
obtuseness of the check sym[0]=='L' && sym[sym.length-1]==';'.  That
sort of oddity breeds bugs, compared to a left-to-right parse.

> 
> 
> If I understand what you are saying, that's not MVT at all, since it
> would force a revolution in tools.  So we won't do that.  It's overwhelmingly
> likely that legacy uses of CONSTANT_Class will coexist with new
> CP forms for multiple releases, even if this gives up the advantages
> of normal forms.
> 
> yes,, it's post MVT and given there will be other changes in the constant 
> pool, tools will need to be updated so we can also mandate CONSTANT_Class to 
> use only the descriptor format at at time. 

Those mandates are harder to pull off than they look beforehand.
(Who knew modules would be so excruciatingly hard to "mandate"?)
I think we'll have to do a gentle introduction with peaceful coexistence
for at least a couple years.

> 
> [In the crystal