On 2012-08-01 08:11, Jonathan M Davis wrote:

I'm not using regexes at all. It's using string mixins to reduce code
duplication, but it's effectively hand-written. If I do it right, it should be
_very_ difficult to make it any faster than it's going to be. It even
specifically avoids decoding unicode characters and operates on ASCII
characters as much as possible.

That's good idea. Most code can be treated as ASCII (I assume most people code in english). It would basically only be string literals containing characters outside the ASCII table.

BTW, have you seen this:

http://woboq.com/blog/utf-8-processing-using-simd.html

--
/Jacob Carlborg

Reply via email to