On Thu, Jan 31, 2002 at 08:54:21AM -0800, Brent Dax wrote: > Peter Haworth: > # On Wed, 30 Jan 2002 17:45:58 +0000, Graham Barr wrote: > # > On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: > # > > # rx_setprops P0, "i", 2 > # > > # branch $start0 > # > > # $advance: > # > > # rx_advance P0, $fail > # > > # $start0: > # > > # rx_literal P0, "a", $advance > # > > # > # > > # First, we set the rx engine to case-insensitive. Why is > # that bad? It's > # > > # setting a runtime property for what should be compile-time > # > > # unicode-character-kung-fu. Assuming your "CPU" knows > # what the gritty > # > > # details of unicode in the first place just feels wrong, > # but I digress. > # > > > # > > That "i" does a once-off case-folding operation on the > # target string. > # > > All other input to the engine MUST already be case-folded > # for speed. > # > > # > Hm, is that going to work ? What about a rx like > # /^a(?i:b)C/ where the > # > case insensitivity only applies to part of the pattern ? > # > # Or worse, in /^a(b)c/i, where you want to capture the > # original character, > # not the case-folded version? > > Parentheses just record a pair of indices, not a string.
Yes, I was assuming that. However what is to be gained by case folding the input string ? Because parts of an rx can be case-insensitive while other parts are case-sensitive, we will probably need two sorts of ops anyway (or a way to tell the op to be case-insensitive). And you will only be able to do the case folding when the whole rx is case-insensitive. It also means creating a copy of the input string, which is something the current rx engine in perl5 tries to avoid. And while I will agree that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i that is normally only the case for small-ish strings. Graham.