On Thu, Jul 08, 2004 at 04:49:33AM -0600, Luke Palmer wrote: : Michele Dondi writes: : > On the wild side of things, could there be the possibility of even : > defining new ones? : : That's what I meant by: : : grammatical_category:postcircumfix : : Though it wouldn't be so magical as to just know what you mean. If your : mucking with the grammar, though, you should be able to insert hooks. : After all, the writers of the perl 6 parser have to do it. : : rule prefix_op() { : (@(%Perl::guts::grammatical_categories«prefix»)) : <prefix_op> : | : <term> : } : : Or something.
I like it when someone says "or something" about the same place I'd say "or something". :-) However, in the interests of dewaffling, I have a couple of quibbles. I don't know what that @() is doing there--I presume you meant @{}. Also, it's not clear that you want an array there, but I understand you're indicating that the tokens have to be matched in some particular order that is unspecified but not arbitrary (presumably longer tokens preceding any shorter prefixes of those tokens). As I said in another message, though, we might want to force hashes to automatically tokenize in a longest-token-first fashion (or at least have the option of doing so), and using a hash would allow the keys to be the strings and the values to be individual actions to be taken. With an array match, you might find yourself redispatching individual operators in a switch statement to provide that kind of specificity. For efficiency, either an array or a hash would want to be preprocessed into some other kind of trie or other data structure for fast tokenizing anyway, so it's not like doing it with an array is buying you much unless you really need to specify the order of matching. You might think we need to specify order so that lexicalized operator definitions can override more global ones, but I suspect we actually have to copy the array or hash into the derived grammar in any event to properly emulate method overriding for things that aren't really methods, so that when we revert the grammar it reverts the user-defined operators as well. Or something... My other quibble is that I hope this level of operator can be parsed with operator precedence rather than rules. Higher level rules drop into the operator precedence parser when they see things like <expr>, and the operator precedence parser drops into lower level rules before returning a "term" token (or if a macro specifies a particular followup parsing rule). Of course, it's possible that our tokener is just a fancy rule, in which case it would strongly resemble what you have above, only maybe with more alternatives, depending on where we decide to recognize the various kinds of terms. Oddly, depending on how we decide to do operator precedence, we might not do the conventional thing of treating parenthesized expressions as terms, but just make parens into pseudo operators that jack up the internal precedence and return the parens as individual tokens. But maybe we should stick with the ordinary recursive definition--it might give better error messages on missing parens, and we've already eliminated the 20-odd recursion levels that a strict recursive descent parser would impose on parentheses anyway. Or something. :-) Larry