Re: Formal Grammar — some thoughts

Allan Odgaard Sat, 29 Jul 2006 15:02:12 -0700

On 29/7/2006, at 23:22, Eric Astor wrote:

1. interpreting tokens as literal text when end token is missing,
example: `this is __not starting bold`.

This is actually simple to deal with in most formal grammars -since formal

grammars are recursive, you simply define bold (for example) as:
bold := ('__' SPAN '__') | ('**' SPAN '**')

Well, yes, you can put that in your formal grammar, but the generatedparser will have a problem. Parsers generally tokenize the text andthen go through it token-by-token selecting which rule to pick.

So this parser will only see the `__` token (not what follows) andwill then pick the bold rule. If we have defined SPAN as notcontaining any `\n`, then when it reaches end-of-line it will givethe error that it sees `\n` but expected `__`.

Given a sufficiently large look-ahead (in parser terms, i.e. lookingat the next n tokens) and defining some dummy rules to deal withisolated `__` it could possibly be pulled off, but it could likelystill be fooled.

A slightly related problem is the ambiguity when seeing `___` in thetext. That will be tokenized as the two tokens `__` and `_`, i.e.first start bold, then italic. But the entire line could be: `___boldand__ only italic_`.

I.e. in this particular case it should have been tokenized as `_` and`__`.

A workaround would be using `*` for either the bold or italic. I.e.the strict parser would disallow three consecutive `*` or `_` if andonly if bold has a longer span than italic.


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: Formal Grammar — some thoughts

Reply via email to