Hello, I have a design dilemma that will become real some time in the future, and consider how large it is, I thought it could be a good idea to take a quick look forward.
I am building a Bison parser for a language, or to be precise, multiple languages which all are very similar. I have a "main" language, followed by three other languages which all are subsets of the main language. To be precise, I'm building a parser for the XPath language, and the different flavours I need to be able to distinguish are: * XPath 2.0. This is as broad as it gets. * XPath 1.0. A subset of XPath 2.0. XPath 2.0 is an extension of XPath 1.0 * XSL-T 2.0 Patterns. A small subset of XPath 2.0 * XSL-T 1.0 Patterns. A small subset of XPath 1.0 * W3C XML Schema Selectors. An even smaller subset of XPath 1.0 My wondering is how I practically should modularize the code in order to efficiently support these different languages. First of all, my thought is that the scanner(flex) is the same in either case(e.g, support all tokens in XPath 2.0), and that distinguishing the various "languages" is done on a higher level(parser). Distinguishing XPath 1.0/2.0 is from what I can tell the easiest. Since XPath 2.0 is an extension to 1.0, one can pass the parser an argument which signifies whether it's 1.0 that is parsed, and in the actions for 2.0 expressions error out if 1.0 is being parsed. In other words, conditional checks on an action basis. This approach, however, easily becomes complex when taking the other grammars into account, because one needs to be "context" aware. For example, XSL-T Patterns is a sub-set, but the constructs that are disallowed are only done so in certain scenarios. Hence, if one continued with conditional tests("What language am I parsing?") inside actions, it would require to implement "non-terminal awareness". Another approach, which seems attractive to me if it's possible, is to modularize the grammar on the API/file level. For example, the tokens are declared in one file, non-terminals grouped in files, and a separate parser is constructed for each language. It would be preferred if it was also modularized on the object level, but I guess the disadvantage wouldn't be that big if it wasn't. In other words, if one could "select start token depending on language" it would solve my problems, it seems. I don't know how this "bison modularization" would be done practically though. What are people's experiences with these kind of problems? What are the approaches for solving them? Cheers, Frans PS. For those interested, here are the EBNF productions for what I'm talking about: XPath 2.0(1.0 is merely a subset): http://www.w3.org/TR/xpath20/#nt-bnf XSL-T Patterns: http://www.w3.org/TR/xslt20/#pattern-syntax W3C XML Schema Selectors: http://www.w3.org/TR/xmlschema-1/#coss-identity-constraint btw, there's also an interesting document wrt to parser/scanner construction & XPath, "Building a Tokenizer for XPath or XQuery": http://www.w3.org/TR/2005/WD-xquery-xpath-parsing-20050404/ _______________________________________________ Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison