On Mon, Dec 21, 2015 at 4:08 PM, Christian Schudt <christian.sch...@gmx.de> wrote: > If you mean having a huge code point table, like in your tables.go file: I > think Java already has such tables internally. > What could be improved here, is that Character.getType(cp) could only be > invoked once. I haven’t done any benchmark for this, but I don’t expect a > significant performance benefit.
Out of curiosity, I answered my own question here. I'm using Go, which also has lots of Unicode tables in the standard library, so I benchmarked running the algorithm (I modified it slightly from the version in my generator to remove the NFKC step, which is very slow, this way it more closely resembles your algorithm), and looking up a value in the large pre-generated trie. I have no idea where the bottlenecks / optimizations in Java would be, so these results may be meaningless to you, but, at least in Go, the single Trie lookup was much faster: $ go test -bench . -benchmem PASS BenchmarkAsciiLookup-4 300000000 3.85 ns/op 0 B/op 0 allocs/op BenchmarkFullwidthLookup-4 200000000 9.21 ns/op 0 B/op 0 allocs/op BenchmarkAsciiCalculate-4 100000000 17.4 ns/op 0 B/op 0 allocs/op BenchmarkFullwidthCalculate-4 20000000 71.4 ns/op 0 B/op 0 allocs/op ok _/home/sam/Projects/golang-x-text/unicode/precis 7.632s Each test here is looking up or calculating the derived properties for a single character (the ASCII tests are looking up 'u' and the Unicode tests are looking up 'u' [full width] which was chosen very scientifically, I assure you), the second column is the number of tests that were run until the timings reached equilibrium. For the worst case, there's a pretty good speed difference, whether that difference is worth pre-generating the data is another matter, of course ☺ Best, Sam -- Sam Whited pub 4096R/54083AE104EA7AD3 https://blog.samwhited.com _______________________________________________ precis mailing list precis@ietf.org https://www.ietf.org/mailman/listinfo/precis