Wow!!!

I dispeared for a few days to work on this and came back to find I
wasn't alone. :-P

I was testing some similar concepts myself. I've also been testing
moving things back and forth to unamanged internal calls to see if I
could speed it up and so far I've got mixed results. Some things
faster; some things slower. Mostly slower for most of the functions,
but it really depended on the size of the string I was working with
which was the deciding factor if it was faster to be unmanged or not.

One thing I found is that NUnit itself can't be trusted 100% with
testing some of this :-P It calls corelib use these same functions we
are tweaking are slowing it down or speeding it up in some cases. It's
fine because the test suite is a test in of itself in a way. Just
something interesting I noticed when I was testing, when I managed to
slow down strings so much in one test (lovely printf on each char) that
it took NUnit about 30 seconds to initialize before my test even ran :-)

I'll post all my benchmarks and patches in the morning when I can
assembly them in some readable format.

--
Zac Bowling
http://zacbowling.com/


----- Message from [EMAIL PROTECTED] ---------
   Date: Fri, 17 Mar 2006 19:02:50 +0900
   From: Atsushi Eno <[EMAIL PROTECTED]>
Reply-To: Atsushi Eno <[EMAIL PROTECTED]>
Subject: Re: String performance boost (Re: [Mono-dev] Patch to boost
speed ofUnicodeEncoding)
     To: Andreas Nahr <[EMAIL PROTECTED]>


Hi,

Your patch is based on pretty old code(!) ;-) I tried it for
each functionality:

- CopyTo():
  The patch makes it faster. But I guess CopyChar() functionality
  overlaps Paolo's memcpy() implementation in the latest code,
  though it was not simply replaceable.
- ToCharArray():
  ditto. The improvement rate is like you showed, about 5%.
- Trim():
  The patch resulted much better, as you showed. I think it is
  the best improvement.
- GetHashCode(), Insert(), Remove(),
  ToUpperInvariant(), ToLowerInvariant():
  We already have managed implementation like yours.
- ToUpper(), ToLower():
  Based on your patch, I rather made TextInfo.ToUpper() and
  ToLower() based on pointer implementation, so the mind is already
  checked in svn :-)
- Replace():
  There was a one-liner bug in the implementation. Even after
  fixing it, the result became much worse than existing one, about
  1.5x slower :-(
- IndexOf():
  It didn't make improvement - I think it is because we already
  have managed implementation by now.
- IndexOfAny():
  The patch made results much worse :-( It doubled execution time.
- Join():
  The patch was a bit worse, about 7-8% loss.
- PadLeft(), PadRight():
  Almost no difference - and seems like it still has bugs (some
  NUnit tests fail).
- Split():
  It looks pretty fast but it is too buggy to fix right now :(

I'm attaching a patch based on the original source, so anyone can
try them handy. Those which is buggy or makes performance worse
are commented out.

I think Trim() can be checked in, with a few changes
(changing CharCopy() to InternalStrcpy()). For CharCopy(),
I'd wait for further review. ToCharArray() is also nicer
to have, but it depends on CharCopy().

Thanks a bunch, Andreas.

Atsushi Eno

Andreas Nahr wrote:
It's just bad naming ;)
The String.cs.old contains the managed implementations. Please note
that there are usually multiple implementation-tries for each method
and all but one is just commented out. Its extremely unlikely that
the ones currently "active" are the best ones.
Also I can remember that at least one of the stats is completely
wrong because there was a bug in the implemenation that copied only
half the data.
If looking at the numbers I guess it is Replace because the
difference is really huge

I'll also attach the file I used for the microbenchmarks. But it is
a complete mess and I doubt anyone could use anything from it.

Andreas

Wow, the numbers are quite impressive. Can you please attach your
String.cs(.new) ? :-)

Atsushi Eno

Andreas Nahr wrote:
Hi Zac, Hi Kornél,

I've been working quite some time on improving the existing String class a
long time ago (about 2-3 years), but as I got a new job back then
never had
the time to finish anything. My findings back then were that
purely managed
implementations basically always outperformed the internalcalls
(And I guess
the JIT is now even more evolved than it was 3 years ago).
However as I sayed it was never finished and contains bugs. Moreover it
doesn't care at all for alignment issues.
If anyone want's to look at it - I attach my String-Testing class. You'll
find lots of different tries to optimize the Methods. But beware, the code
is in horrible shape, far from being usable.
Some optimizations use specific string-domain knowlege (like "equals"
testing first for the first char and after that from the end of
the string)

My conclusion was: We should have a few managed functions to do the work
(MemoryCopy, MemoryCompare, possibly for Byte* and Char*). They should be
managed so that optimizers and optimizing compilers are able to do
optimizations even on IL level. Whenever possible the JIT should replace
these at runtime (provided they aren't optimized away) with
architecture-specific assembly versions with the managed version
as fallback
mechanism.


Here are some findings from microbenchmarks made back then (the
first number
is always the time in milliseconds for the existing unmanaged
implementation
(internalcall), the second for a tested managed implementation, the Number
in () is the length of the string tested):

Some Methods still need internalcalls to create new Strings, but
were still
faster than the native implementations (The optimum would be to
internalize
InternalAllocateStr calls).

CopyTo (000): 810 -> 381
CopyTo (010): 832 -> 441
CopyTo (100): 1352 -> 881
CopyTo (512): 3395 -> 3014

ToCharArray (000): 5067 -> 4466
ToCharArray (002): 5317 -> 4857
ToCharArray (015): 8041 -> 7691
ToCharArray (960): 2915 -> 2894

ToCharArray (with parameters): Similar to above

Trim (): 6930 -> 6760
Trim (custom search Chars): 10596 -> 9413
Trim (default search Chars): 10455 -> 7210
Trim (no trimmed chars, long string): 1893 -> 661
Trim (no trimmed chars, short string): 1893 -> 631

Replace (004 - one replace): 37264 -> 3135
Replace (004 - nothing to replace): 3735 -> 310
Replace (961 - everything replaced): 2584 -> 501
Replace (961 - only last char replaced): 2463 -> 481

Split (default split Chars, long string, lots splitted): 42421 -> 8523
Split (custom split Chars, long string, non found): 2944 -> 2263
Split (custom split Chars, long string, lots found): 22062 -> 7330
Split (default split Chars, short string, non found): 2093 -> 761
Split (default split Chars, short string, nearly only splitChars): 8002 ->
5067

IndexOf (17): 1132 -> 791
IndexOf (2162): 10576 -> 7862

LastIndexOf (similar to above)

IndexOfAny (long string, nothing found): 25867 -> 2984 (Break even bei ca.
75chars)

LastIndexOfAny  (similar to above)

PadLeft/ PadRight: slightly slower that current (Should get faster once
optimized CharCopy is available): 1012 -> 1031

Remove: slightly slower that current (Should get faster once optimized
CharCopy is available): 2153 -> 2283


If somebody is interesting in picking this up I might be able to help a
litte.
Andreas

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list




----- End message from [EMAIL PROTECTED] -----


_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Reply via email to