On 03.12.2011 1:08, Marco Leise wrote:
Cool, thx for your answers. The source code for OpenJDK can be
downloaded if you want to take a look at it. You are probably right
about them not decoding the characters lazily since their strings are
UTF-16.
The commented version of opIndex is a bit faster on my Core 2. This is
the first time that I witnessed such speed differences between
processors. :)

Wow. I knew something was wrong with non-BT test code, from what I heard it should have been faster but it wasn't for me :)

Also I found that the trie is usually queried twice for each matching
character in the input string. You can't optimize opIndex any further
(but try size_t in there instead of uint, it helped here) unless you
make some changes on the larger scale. So if you should find out that
the second query isn't required, that would help more than anything else.
I said it on IRC today: This library will be my reference for compile
time code generation in D. There is a lot of expertise in it, good work!


There I have two options to work through:
- separate negative and positive character classes it would kill possible branching here. - and now looking at test_11 in you profile output, I see the likely culprit: I should re-think lookahead tests, they used to reduce number of savepoints during matching.

P.S.: I'm fine with treating anything that is escaped, but not special,
as is. \w did cause an infinite loop though, so you may want to test

Hm can't reproduce.

with the original regex. For \. you can assert(false, "\. is not a valid
escape sequence")

No that was bad idea ... and I planed to change that exception. Now I'm more into ignore the backslash.

 or just ignore the backslash. Personally I usually
don't escape anything just to be on the safe side. :p

Worthy of a small community poll.

Reply via email to