Re: [Hessian-interest] RFC: bytecode remapping

Scott Ferguson Fri, 15 Aug 2008 09:22:29 -0700

On Aug 14, 2008, at 12:58 PM, Fredrik Olsson wrote:

> Keep the range free for Hessian 2.1, or whatever. No need to occupy it
> just because it is there. I see a greater gain in having a nice free
> range in case a real need should arise.


After thinking about it, we should use it now.  It's unlikely anyone  
will ever implement a Hessian 2.1, because they'd be worried about  
compatibility issues.  Look at how long IPv4 vs IPv6 has taken, for  
example.  The Hessian 1.0 to Hessian 2.0 upgrade is a huge  
improvement, so getting people to upgrade will be easy.  An  
incremental change from Hessian 2.0 to Hessian 2.1 would see greater  
resistance.

Besides, we know that the object model is good for Hessian, since it  
hasn't changed at all from Hessian 1.0 even with expanded use and new  
communications models like streaming/HMTP.  So the underlying type  
system is good, we're just looking at compact encodings.

I freed up 0x20 even after adding a medium string and medium binary  
section, 0-1023 and only taking up 4 bytecodes each.  There are also 6  
unused individual bytecodes outside of any range, one of which is  
reserved for any future escape capability.

So I'm proposing using that range as follows:

1. 0x10 - first 16 object instance types.  Many protocols use a small  
number of object types, so frequencly we could every object instance  
into this group.

2. 0x8 - short, fixed-length, typed lists.  An array like Foo[] or a  
specialized type like PriorityQueue
3. 0x8 - short, fixed-length, untyped lists.  ArrayList<Foo>

-- Scott

>
>
> Do not forget a code for compact 32bit dates :)
>
> //Fredrik
>
> Skickat från min iPhone
>
> 14 aug 2008 kl. 21.44 skrev Scott Ferguson <[EMAIL PROTECTED]>:
>
>> This one I'm not sure of.
>>
>> The lower case bytecodes currently used in Hessian 2.0 could be
>> remapped to upper case.  This would create a contiguous block from
>> 0x60 to 0xff for ranged encodings in addition to the low range from
>> x00-x3f.
>>
>> The disadvantage is the remapping would break up some nice mnemonic
>> codes, most importantly, b/B and s/S.  The mnemonics are nice as
>> mental anchors in understanding Hessian, so I'm not sure I want to
>> give those up.  They could be remapped as A/B and R/S on the theory
>> that 'A' is before 'B' (non-final binary chunk) and 'R' is before
>> 'S' (non-final unicode chunk.)
>>
>> In order to make this change, there should be some significant
>> benefit.
>>
>> The possible extensions:
>>
>> 1. extend short strings to 48 bytes (Probably overkill, since most
>> strings fit into 32 or are larger than 48 and saving 2 bytes is just
>> an 8% savings of the string itself, or save only 1 byte if we added a
>> single-byte S.  So the total advantage is almost nothing.)
>>
>> 1a. We could add a two-octet string similar to the two-octet ints,
>> e.g. 0x70 - 0x7f could introduce strings between 0 and 4095, saving a
>> byte for almost all strings (even 0x70 - 0x77 would be 0 to 2048).
>>
>> 2. extend short binary to 32.  Again, probably overkill.   Most  
>> binary
>> data is very large with a few exceptions that are small, so the
>> current 0-16 seems right.
>>
>> 2a.  Like 1a, could add two-octet binary, e.g. 0x60 - 0x6f could be
>> binary between 0 and 4095.
>>
>> 3. short-type-ref objects, i.e. if the object class ref is 0-15, save
>> a byte for a single code.  Possible since protocols don't tend to  
>> have
>> too many objects, but I'm not sure how much that would save.
>>
>> 4. short fixed length arrays, i.e. encode the array length in the
>> first byte.   But I doubt the number of arrays in a typical
>> serialization is hugely significant.  Adding separate codes to skip
>> the null-type would probably be a bigger win (v x90 and V x90 having
>> their own single-byte codes).
>>
>> 5. extend one octet int to -16 to 63.  This seems way overkill.  I
>> can't imagine the range 48-63 is really that critical.
>>
>> 6. extend two octet int to -0x1000 to 0xffff.  Possible, but again  
>> not
>> sure how often 0x7fff to 0xffff shows up.
>>
>> 7. extend one-octet long to -x8 to x20.  Possible, although longs
>> aren't used all that often or are things like sequence numbers or
>> dates that are fairly big.
>>
>> The only ones that sound plausible to me are #1a, #2a, and #3 and I'm
>> not sure if they're significant enough.
>>
>> thoughts?
>>
>> -- Scott
>>
>>
>>
>> _______________________________________________
>> hessian-interest mailing list
>> [email protected]
>> http://maillist.caucho.com/mailman/listinfo/hessian-interest
>>
>
>
> _______________________________________________
> hessian-interest mailing list
> [email protected]
> http://maillist.caucho.com/mailman/listinfo/hessian-interest



_______________________________________________
hessian-interest mailing list
[email protected]
http://maillist.caucho.com/mailman/listinfo/hessian-interest

Re: [Hessian-interest] RFC: bytecode remapping

Reply via email to