re *Data Structure Issues*
*
*

My comments were based not on what you could do but what you can do on the
CLR /JVM now.

I though the main reasons for the array located in the string were GC
related .
- Twice as many GC objects to trace for mark . RIght now  strings are not
checked by the mark as it knows they are immutable with no references.  If
it was  just a holder for an array object  that may be more difficult .  (
It could be done but not on the CLR.) . Increasing the # mark checks can be
a significant cost  in a system that is trying to reduce GC pauses.
- It hijacks the size field of the string object
- The GC knows nothing about fixed arrays ..they cant exist outside of an
object. Huge amounts of small stringslike   "/n" now becomes 50  bytes  +
alignment   ( 2 object headers (32) , reference to array(8)  , 1 char (2) ,
array Syncroot ref( 8)  ) instead of 18.   ( object header (16)  + 1 char)
.  Now in theory you can build a Framework + GC that may have these objects
 in a seperate area without a header etc  but id like to see it built
before allowing for it ..as GCs are complex with full of compromises
already that something else may not work .*

 Re cache - The string cache performance is dwarfed by the creation cost of
new strings which is very common ..so you would need some form of slices
first .  I would have liked slices to be just a reference with some high
bits for length but that is not possible for the CLR so you have the cost
of 2 objects ( the struct and the underlying string ) but for heavy string
work the underlying string becomes less important  due to reducing the
amount of strings created by reusing the exiting string data performance
will be much better with immutable slices.  (GC mark will be heavier but
most of these slices will never escape the nursery) .

Im not sure large strings are that critical , i just dont see much code
that works on long strings in C# where the long string is stored for a long
time .. The long strings i do see are  often byte[] ( often even in C the
XML processor is a COM object )  eg a web page or XML and then parsed ( as
bytes !)  once to create many short strings (eg XML or DOM node content)
 or packed to byte[] to send to other machines in nearly all cases the
byte[] is utf-8 .  Every one of these small strings is converted from UTF8
after getting it from a native  parser.    Now if we had UTF-8  and slices
 you could do lots of tricks eg in a using block  directly read the native
string for processing with no conversion from the web server or driver  and
during parsing you can just create what you need eg DOM nodes etc..  now
the developer will need to decide does he build the DOM nodes for short
work and just uses slices or for long work you would copy the native array
with Buffer.Copy  to move it into the GC  but you have the option either
way. This is especially important for unpacking messages in a higher level
network stack , right now WCF deals with a byte[] that comes on the wire
and it creates messages for the user by copying just that message via
running a json/XML deserializer which reduces the GC work  - if you dont do
this GC pauses become a big problem as i found when i stored a large amount
of messages from WCF in queues for 3000 hand held clients.

Ben


* Thinking further , I could see  in the language this being allowed  eg
you GCAlloc a large block ( just like you will do for regions) and then
manage it with unsafe in the runtime ( eg not by the GC and hence no
header) .  If the holder object gets disposed so does the array it holds.
However consider the huge amount str  code like abc =  str1 + str2 +
strn..n   .. At the moment the string goes in the nursery and incurs little
memory  costs  ( cmpx  nursery pointer  , call a method  that creates the
string header and copies or loads the body data)  and no dispose costs ..
 we would then have these strings creating subarrays and removing them in a
heap adding significantly to the cost.



*
*
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to