Hi Vincent,
I dusted off my books on parsing and compiling (also using some Web-sites to do research) and looked at writing a formal grammar for font-shorthand. Because font-variant font-style and font-weight can occur in any order, I could not (currently) come up with a grammar in which the directing sets were disjoint for each non-terminal. So I was unable to come up with an LL(1) grammar. For instance, here are two productions of my attempt at a grammar: <style-variant-weight> -> <variant-weight> <style-variant-weight> -> <variant-style> In each case, the first set of <style-variant-weight> shares a common element in two different productions, the literal values for variant. One needs to look ahead one more token to see if one has a <variant-weight> or a <variant-style>. According to Gough's "Syntax Analysis and Software Tools" (1988) "For every production of the augmented grammar we derive a set of possible 1-lookahead symbols, which we call the director set for that production. If and only if the director sets for different productions of the same non-terminal are disjoint, i.e. have no common elements, is the grammar LL(1)." Also the grammar is ambiguous as we've discussed. <font> -> <style-variant-weight> <size> [ / <line-height>] <family> If the string starts with 'normal' and then goes on to define <size> and <family> then one isn't sure whether style or variant or weight are being specified. Somehow one needs to special case 'normal' so that when the string begins with normal - one value (say font-weight is set) and the other two are not set which according to the spec means they are reset to normal as well. The books and web articles I read only discussed using recursive descent when the grammar is LL(1). I have the feeling that despite the ambiguities in the grammar it is almost LL(k) because font-variant and font-style and font-weight almost have disjoint values. It is at least LL(3) and I suspect it is LL(6). Given your greater knowledge of parsing, do you know if an LL(k) parser can always be implemented as recursive descent if one looks k tokens ahead in one's parsing routine? I also noticed that the fact that space separates the tokens must be in an important part of any solution to the problem and that the font-shorthand is more easily parsed (by any software) from right-to-left than left-to-right. This is because font-family is not nullable and in a right-to-left parsing is the first element encountered. A non-terminal symbol is nullable if null can be validly derived from it in terms of the grammar. I'm not as convinced as you are that recursive descent parsing or a formal bottom-up-parser will make the code simpler rather than more complex because of the complexities of a formal grammar. Of course, however complex the grammar, a table-generating tool - like ANTLR - will generate code, however complex, which will faithfully reflect the inputted grammar. However, none of the other properties in FOP use a table-generating tool like ANTLR - and I'm not sure what the consequences would be to FOP of introducing such a tool. Given the complexities of the grammar, I'm sure that a recursive descent parser will be quite complex, and if we are going to use a grammar driven approach we would be better off with a tool that generates parsers from grammars rather than the recursive descent approach. Also an advantage of parser generators is that one doesn't have to rewrite so much code to correct a mistake in one's grammar, if one makes a mistake, or if the grammar changes. Recursive descent parsing can pose its own maintenance nightmares. The current approach in FOP for font-shorthand is obscurely written but strikes me as basically sound. 1) One parses from right-to-left using the fact that spaces divide tokens 2) One lets property makers determine whether they apply to a token. Each property maker is a little parser of the token one feeds it. Because the property makers determine whether they apply to a token, one can handle the fact that variant, weight and style can occur in any order by feeding the current token to each of the property makers for font-variant, font-weight, and font-style in turn. Whatever they accept is ipso-facto a font-variant or a font-weight or font-style. Just want to let you know I take the problem seriously, and I'm not trying to duck the responsibility of coming up with an adequate solution. I'm not sure what I did fits into a "job priority" which is why I spent many hours this weekend on this research. You are free to disagree with my observations and I notice that on fop-dev forums you challenge us to make the code simpler, more reusable, and better structured. Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600