There are a few things to notice when extending unicode data. MS have (or used to on 2.0) a lot of broken stuff when it comes to unicode/locales. So sometimes fixing for MS compatibility means breaking correctness.
On Sat, May 15, 2010 at 12:04 PM, Damien Diederen <d...@crosstwine.com> wrote: > > Hi Andreas, > > "Andreas Nahr" <classdevelopm...@a-softtech.com> writes: > > Hi, > > > > I'm the author (of a large part) of the current Char class and > > unfortunately I think you overlooked some things with the changes. > > > > What you are doing is (heavily) degrading the performance of most of > > the char methods (some of which are VERY commonly used) for supporting > > a single (likely VERY rarely used case) in a single (likely not used > > very much) method. > > Thanks for the feedback. You are completely right: we were discussing > access times to the packed data, and I got sidetracked into measuring > that, mistakenly assuming these accesses would still be inlined in the > interesting cases. > > I now had a look at the disassembly (better late than never), which is a > bit sobering. Thanks for the sanity check! > > > 1. Lets first start with performance. Unfortunately your micro > > benchmark is completely omitting the most important speed factor: All > > the methods were designed to be inlinable (because they tend to be > > called in tight loops, as they are even within corelib). After your > > change likely most/none of them would be inlinable anymore. Your > > benchmark, however, does not CALL A SINGLE METHOD. It just measures > > the overhead of two additional mathematical operators and a memory > > access. That is not the big speed-sink. But even then your approach > > looses 25% perf for no additional gain. The original implementation > > was the result of (literally) thousands of performance tests. > > Heh. Ironically, I introduced the testing method to *avoid* measuring > function call overhead… > > My initial iteration (in Bugzilla) was trying to take advantage of the > initial linear portion of the table, but the conditional function call > would probably also result in the method not being inlineable. > > > 2. Gain vs. impact. I don't see much sense in changing lots of methods > > in Char to something worse just to support one corner case that could > > be handled separately. > > Another goal of my approach was to support multiple variants of the > database while still sharing identical pages. Maybe I should punt on > that, either by only supporting v2.0–3.5 or by including multiple flat > tables (which shouldn't be faulted in if not used). > > > 3. Imho the alternative is relatively simple: Just leave the Char > > methods as they are and add the special casing only for the one > > relevant method. You could just add another table (or two for bi-level > > table compression or one that reuses the data from the first table) as > > a separate entity. So you can leave all the original methods intact > > and ONLY if somebody really uses that method and ONLY if he uses it to > > query the astral plane using surrogates he actually has to pay the > > price for it (in the runtime loading the relevant table into memory > > and using that additional lookup). > > I just implemented such a solution, and quickly measured its performance > with the attached microbenchmark (feel free to point me to a more > representative test suite if you have one at hand). On this (x86-64) > computer, it (unsurprisingly) results in the same numbers as the > unpached version: > > $ time mono -O=all CharMicrobenchs.exe --load /tmp/mono.txt --repeat 5000 > --run > > - Original Mono: 10.5s; > - Bi-level table: 15.6s; > - Linear + bi-level table: 10.5s. > > (Run a few times; jitter is minimal. Mono.txt is a lynx dump of > http://de.wikipedia.org/wiki/Mono-Projekt.) > > This approach grows the category data table from 64 to ~70kB, and > requires an additional 8kB index for the astral planes. The resulting > runtime produces the same results as Microsoft's v2.0.50727 and > v3.5.21022 for the whole Unicode range, but has no support for v4. > > Would that be an acceptable overhead? > > Would supporting v4.0.30319 be worth it at an additional cost of ~78kB > of data (which shouldn't be faulted in) in the Mono binary? > > Further comments are of course welcome. I will prepare a new series of > patches in the near future. > > > Happy hacking > > Andreas > > > > P.S. And sorry for being somewhat harsh on the patches. > > Heh. That's what I get for cutting corners ;) > > Cheers, > Damien > > -- > http://crosstwine.com > tel: +49 21 89 29 39 > cell: +49 174 34 89 428 > > “Strong Opinions, Weakly Held” > — Bob Johansen > > _______________________________________________ > Mono-devel-list mailing list > Mono-devel-list@lists.ximian.com > http://lists.ximian.com/mailman/listinfo/mono-devel-list > >
_______________________________________________ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list