On Mon, Sep 2, 2013 at 2:01 PM, Jonathan S. Shapiro <[email protected]>wrote:
> On Sun, Sep 1, 2013 at 8:23 PM, Bennie Kloosteman <[email protected]>wrote: > >> Im particularly interested in variable size known only at run time . >> Without it i dont think its possible to have fast string and i think fast >> string is a huge win for middle tier servers and mobile devices. >> > > I think you are obsessing over a very difficult problem that has no > business being in the runtime layer. > string does exactly this variable size , id day its part of the runtime layer ? And while i havent seen string in the CLR i have seen it in mono and know it pretty well eg the class , unsafe methods , Global libs which use the unsafe pointer directly and the newobj routines which have custom code for strings. > Can I ask you to define "fast string". What are the complexities of the > following operations: > > Get next character in sequence > var ch = ptr[i+1]; if( ch != 0x10) index++; // out param return (char) ptr[index] } else // rarer { if ( !4charescape) { unsigned short short_pr = * (( unsigned short)* ptr); index = index+3; return (char) short_pr[index] } else handle 4 char escape new chinese chars etc // very rare . } > Get character at (arbitrary) index i > This is much rarer without a GetIndex ...except for fixed length format strings , for this use case i wanted FormatString :FastString which has indexes for the {?} values that get replaced since such format strings also have high reuse. In C# you normally use a search , like indexOf eg indexof({0} or IndexOf ("%s") and then use then use the index . Fast string would return a byte index and hence it would be return str[index] ; Now you say i want the 132rd letter in a string this is not meaningful and incorrect in some asian languages since they use multiple unicode chars as ascii for encoding as discussed ie word[5] may be the 2nd character but a naive implimentation would go int escape_count; for short strings < 8 do as per next sequence above , for long strings for ( int i = 0 ; i < lenth ; i=i+32 ) escape_count += SIMD_SCAN_32CHARS_FOR_ESCAPE ( str[i]) return ptr[index+escapecount]; Note the performance cost in C# strings str.SubString ( Indexof( lookupString) , length) requires creating a new string each time. We disussed a mutable ptr / length slice lookup previously ( even using a 64 bit pointer with the length in the high bits) which would be nice but it wont work with C# string as the array is private. So Fast Strings , with format string and slices will likely give a much higher performance than standard C# and Java strings yet provide a nice API ( even plug in equivalent is possible if you dont take advantage of slices) , there is no conversion from common web data to rare UTF-16, there is 30-35% less heap reducing paging and improving locality / cache . A DOM tree can be built by just a single parse of the original UTF8 and using slices in the DOM tree nodes. Instead of building huge amounts of C# strings which are later discarded. Likewise XML parsing etc ... I just dont think C# and Javas string is efficient enough for this . Sure you can work with char[] or byte[] ( which you are anyway with a UTF8 source) but then you have a huge amount of costs as you convert back and forward to string objects so its only worth it for some very narrow cases. Im probably obsessing but im getting excited at where BitC# can go .. i can actually envisage C++ , Java and C# devs looking at and using it . Using fastString , explicitly unboxed value types and fixed arrays , regions , SIMD extentions and some newer syntax / const correctness and you have a good case for a product which will be mature and stable quickly due to mono and the CLR . You get better bench marks on windows than C# and Java ( fast string , unboxed value types and fixed array ops ,some more SIMD) . Even can do vs some C++ benchmarks if you put the C++ code in a lib unlike micro benches i think it will be very competative. You get a much smaller heap with lower GC pauses ( fast string and region analysis Web / Middle Tier servers , mobile devices ( Xaml uses a LOT of strings ) ) The newer syntax will reduce code size and improve code. And to C++ and Java devs it wont be just a Java clone so the stance that it will be on the CLR at first and when we can we build our own with a better GC is a good one. You can use mono fullAOT to produce a kernel with a bit of work. I have many more thoughts but im not putting them down as i want to see where Shap is going.. I think the case may be so good im thinking about getting some bods here in Shanghai and working on extentions for it .. though thats probably 2 years away .. Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
