On 07/06/2009 06:03 PM, John Rose wrote:
> For small integers serialized into the class format, I suggest using
> Pack's UNSIGNED5 format, which scales cleanly from 1 to 5 bytes, is
> monotonic and continuous throughout the 32-bit range, and has
> efficient (bit-twiddling) encoders and decoders.
A simple, efficient, and easily-implementable idea is to
just replace all the u2 counts and indexes in classfile
format by unsigned5. We can also replace u4 counts and sizes,
to reduce the typical size of class files.
In terms of specification, I'd (mostly) use two new types:
v2 - either u2 or unsigned5, depending on version number/flags
v4 - either u4 or unsigned5, depending on version number/flags
Thus for example the Code attribute:
Code_attribute {
v2 attribute_name_index;
v4 attribute_length;
v2 max_stack;
v2 max_locals;
v4 code_length;
u1 code[code_length];
v2 exception_table_length;
{ v2 start_pc;
v2 end_pc;
v2 handler_pc;
v2 catch_type;
} exception_table[exception_table_length];
v2 attributes_count;
attribute_info attributes[attributes_count];
}
This approach has the advantage that it's simple to
modify an existing classfile reader or producer;
the class files are compact; and it's easy to
read/write "legacy" class files depending on a switch
or a version number.
The actual instructions could also be changed to take
an unsigned5 where appropriate. For example invokeXxx
would be followed by an unsigned5 instead of (indexbyte1,
indexbyte2). The if<cond> instructions would be followed
by branch5, rather than (branchbyte1,branchbyte2), etc.
Basically, using encoding of Pack200 for the counts and
offsets, but maintaining the structure of class class files.
Alternatively, we could use 'wide wide' or a 'wide4'
instruction. That has the advantage that we can generate
code for the legacy format until we find out we need the
large model, rather than having to know up-front. I don't
know how useful that is - presumably one would specify a
-target flag if they want legacy format.
The classic encoding of switch statements is for direct
lookup in direct bytecode interpreters. Assuming there
are few or no interpreters that don't do at least *some*
processing of the instruction stream, we could encode
the switches more efficiently.
--
--Per Bothner
[email protected] http://per.bothner.com/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM
Languages" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---