On Fri, Jul 02, 2004 at 03:03:49PM -0400, JOSEPH RYAN wrote: : Sure. The parser won't care what kind of characters : make up the operator, as long as its defined by the : time the operator is encountered. The "operator" : rules in the grammar will probably be as simple as this: : : # where x is the type of operator; infix, prefix, etc : rule x_operator:u2 { : %*X_OPERATORS : }
Maybe not *quite* that simple--we have to guarantee the longest token wins. But maybe that's how % should work in rules anyway. We'd have to get a little fancy and generate an ordered match rule for the current set of keys, and presumably cache that so we don't recalculate it every time. And then we'd have to flush the cache if the set of keys change. Also, the :u2 would almost certainly be set at the top of the file as a pragma rather than putting it on every rule, since bare Perl code is always considered language-independent Unicode. (Literal strings convert to the actual Unicode support level for data, of course.) (It may be possible to write Perl programs in other encodings, but those would be source-filtered into Unicode before the Perl parser ever sees it.) On the other hand, the default is likely "use graphemes" anyway, so we probably don't even need the pragma in file containing the Perl grammar at all... Larry