On Mon, Sep 2, 2013 at 2:01 PM, Jonathan S. Shapiro <[email protected]>wrote:

> On Sun, Sep 1, 2013 at 8:23 PM, Bennie Kloosteman <[email protected]>wrote:
>
>> Im  particularly interested in variable size   known only at run time .
>> Without it i dont think its possible to have fast string and i think fast
>> string is a huge win for middle tier servers and  mobile devices.
>>
>
> I think you are obsessing over a very difficult problem that has no
> business being in the runtime layer.
>

string does exactly this variable size , id day its part of the runtime
layer ?  And while i havent seen string in the CLR i have seen it in mono
and know it pretty well eg the class , unsafe methods , Global libs which
use the unsafe pointer directly and the newobj routines which have custom
code  for strings.



> Can I ask you to define "fast string". What are the complexities of the
> following operations:
>
> Get next character in sequence
>

    var ch = ptr[i+1];
   if( ch != 0x10)
            index++;  // out param
            return (char) ptr[index]
}     else  // rarer
      {
           if ( !4charescape)
           {
            unsigned short short_pr  =  * (( unsigned short)* ptr);
            index = index+3;

            return (char) short_pr[index]
         }
          else handle 4 char escape new chinese chars etc // very rare .
       }




> Get character at (arbitrary) index i
>

This is much rarer without a GetIndex  ...except for fixed length format
strings  ,  for this use case i wanted FormatString :FastString which has
indexes for the {?}  values that get replaced since such format strings
also have high reuse.  In C# you normally use a search , like indexOf  eg
indexof({0}  or IndexOf ("%s") and then use then use the  index . Fast
string would return  a byte index and hence it would be

return  str[index] ;

Now you say i want the 132rd letter in a string this is not meaningful and
incorrect in some asian languages since they use multiple unicode chars as
ascii for encoding as discussed ie word[5] may be the 2nd character but a
naive implimentation would go
int escape_count;
for short strings < 8  do as per next sequence above  , for long strings
for  ( int i = 0 ; i < lenth ; i=i+32 )
     escape_count += SIMD_SCAN_32CHARS_FOR_ESCAPE ( str[i])
return ptr[index+escapecount];


Note the performance cost in C# strings str.SubString ( Indexof(
lookupString) , length)  requires creating a new string each time.  We
disussed a mutable ptr / length slice lookup previously ( even using a 64
bit pointer with the length in the high bits)   which would be nice but it
wont work with C# string as  the array is private.

So Fast Strings , with format string and slices will likely give a much
higher performance than standard C# and Java strings yet provide a nice API
( even plug in equivalent is possible if you dont take advantage of slices)
 , there is no conversion from common web data to rare UTF-16, there is
30-35% less heap reducing paging and improving locality / cache . A DOM
tree can be built by just a single parse of the original UTF8  and using
slices in the DOM tree nodes. Instead of building huge amounts of C#
strings which are later discarded. Likewise XML parsing etc ... I just dont
think C# and Javas string is  efficient  enough for this . Sure you can
work with char[] or byte[] ( which you are anyway with a UTF8 source)  but
then you have a huge amount of costs as you convert back and forward to
string objects so its only worth it for some very narrow cases.


Im probably obsessing but im getting excited at where BitC# can go .. i can
actually envisage C++ , Java and C# devs looking at  and using it .  Using
fastString , explicitly unboxed value types and fixed arrays  , regions  ,
SIMD extentions and  some newer syntax / const correctness  and you have a
good case for a product which will be mature and stable quickly due to mono
and the CLR  .

You get better bench marks on windows than C# and Java ( fast string ,
unboxed value types and fixed array ops ,some more SIMD) . Even can do  vs
some C++ benchmarks  if you put the C++ code in a lib unlike micro benches
i think it will be very competative.
You get a much smaller heap with lower GC pauses ( fast string and region
analysis Web / Middle Tier servers , mobile devices ( Xaml uses a LOT of
strings  ) )
The newer syntax will reduce code size and improve code.  And to C++ and
Java devs it wont be just a Java clone  so the stance that it will be on
the CLR at first and when we can we build our own with a better GC is a
good one.
You can use mono fullAOT  to produce a kernel with a bit of work.

I have many more thoughts but im not putting them down as i want to see
where Shap is going..

I think the case may be so good im thinking about getting some bods here in
Shanghai and working on extentions for  it .. though thats probably 2 years
away ..

Ben
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to