On Saturday, December 03, 2011 01:44:33 Dmitry Olshansky wrote: > On 03.12.2011 1:08, Marco Leise wrote: > > Cool, thx for your answers. The source code for OpenJDK can be > > downloaded if you want to take a look at it. You are probably right > > about them not decoding the characters lazily since their strings are > > UTF-16. > > The commented version of opIndex is a bit faster on my Core 2. This is > > the first time that I witnessed such speed differences between > > processors. :) > > Wow. I knew something was wrong with non-BT test code, from what I heard > it should have been faster but it wasn't for me :) > > > Also I found that the trie is usually queried twice for each matching > > character in the input string. You can't optimize opIndex any further > > (but try size_t in there instead of uint, it helped here) unless you > > make some changes on the larger scale. So if you should find out that > > the second query isn't required, that would help more than anything > > else. > > I said it on IRC today: This library will be my reference for compile > > time code generation in D. There is a lot of expertise in it, good work! > > There I have two options to work through: > - separate negative and positive character classes it would kill > possible branching here. > - and now looking at test_11 in you profile output, I see the likely > culprit: I should re-think lookahead tests, they used to reduce number > of savepoints during matching. > > > P.S.: I'm fine with treating anything that is escaped, but not special, > > as is. \w did cause an infinite loop though, so you may want to test > > Hm can't reproduce. > > > with the original regex. For \. you can assert(false, "\. is not a valid > > escape sequence") > > No that was bad idea ... and I planed to change that exception. Now I'm > more into ignore the backslash. > > or just ignore the backslash. Personally I usually > > > don't escape anything just to be on the safe side. :p > > Worthy of a small community poll.
Also, in case you didn't see it: http://d.puremagic.com/issues/show_bug.cgi?id=7045 - Jonathan M Davis