On Fri, Jul 02, 2004 at 03:03:49PM -0400, JOSEPH RYAN wrote:
: Sure.  The parser won't care what kind of characters
: make up the operator, as long as its defined by the
: time the operator is encountered.  The "operator" 
: rules in the grammar will probably be as simple as this:
: 
: # where x is the type of operator; infix, prefix, etc
: rule x_operator:u2 {
:     %*X_OPERATORS
: }

Maybe not *quite* that simple--we have to guarantee the longest
token wins.  But maybe that's how % should work in rules anyway.
We'd have to get a little fancy and generate an ordered match rule
for the current set of keys, and presumably cache that so we don't
recalculate it every time.  And then we'd have to flush the cache if
the set of keys change.

Also, the :u2 would almost certainly be set at the top of the file as
a pragma rather than putting it on every rule, since bare Perl code
is always considered language-independent Unicode.  (Literal strings
convert to the actual Unicode support level for data, of course.)
(It may be possible to write Perl programs in other encodings, but
those would be source-filtered into Unicode before the Perl parser
ever sees it.)

On the other hand, the default is likely "use graphemes" anyway,
so we probably don't even need the pragma in file containing the
Perl grammar at all...

Larry

Reply via email to