Cool, thx for your answers. The source code for OpenJDK can be downloaded if you want to take a look at it. You are probably right about them not decoding the characters lazily since their strings are UTF-16. The commented version of opIndex is a bit faster on my Core 2. This is the first time that I witnessed such speed differences between processors. :) Also I found that the trie is usually queried twice for each matching character in the input string. You can't optimize opIndex any further (but try size_t in there instead of uint, it helped here) unless you make some changes on the larger scale. So if you should find out that the second query isn't required, that would help more than anything else. I said it on IRC today: This library will be my reference for compile time code generation in D. There is a lot of expertise in it, good work!

P.S.: I'm fine with treating anything that is escaped, but not special, as is. \w did cause an infinite loop though, so you may want to test with the original regex. For \. you can assert(false, "\. is not a valid escape sequence") or just ignore the backslash. Personally I usually don't escape anything just to be on the safe side. :p

Reply via email to