You're welcome, Garry.

Stefan - a simple check for whether pattern has any regex metacharacters (e.g. 
'.', '*', etc.) can decide if a pattern is definitely not a regular expression. 
If there are any metacharacters in pattern, well only the user can know and 
there would have to be a command-line switch to force one interpretation vs. 
another.

In terms of absolute maximum performance, one usually wants to not do the 
search "line at a time" at all. Rather, one wants to use an assembly-optimized 
"strchr"/"memchr"-like search on the biggest chunks of input possible. In this 
context that means scanning the input for the least likely character that is 
definitely necessary for a match. There may be characters much less likely than 
newlines, and so a big boost may be possible (though note that input data sets 
do vary). Once that byte offset is known, from there one determines if it is 
part of a real match. That is all quite a bit more bookkeeping/logic/hassle, 
obviously.

Reply via email to