> Because parts of an rx can be case-insensitive while other parts > are case-sensitive, we will probably need two sorts of ops anyway > (or a way to tell the op to be case-insensitive). And you will > only be able to do the case folding when the whole rx is > case-insensitive.
I don't like your suggestion. I think we should have one set of ops, but two input strings: one is the original, the other is case- folded. Rx chooses the right one depending on the current case-sensitivity. 2 regex opcodes will be used for this purpose, op-case-sensitive-start and op-case-insensitive-start. The opcode will switch strings begins, ends, positions etc. > It also means creating a copy of the input string, which is something > the current rx engine in perl5 tries to avoid. And while I will agree > that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i > that is normally only the case for small-ish strings. I don't think the perl5 approach is the best choice. Unicode case folding is much much more expensive than malloc/free. And we can always use per-thread free list, unless the regex is nested or the string is very big, we don't need to allocate any memory. Hong