On 29/7/2006, at 23:22, Eric Astor wrote:

1. interpreting tokens as literal text when end token is missing,
example: `this is __not starting bold`.

This is actually simple to deal with in most formal grammars - since formal
grammars are recursive, you simply define bold (for example) as:
bold := ('__' SPAN '__') | ('**' SPAN '**')

Well, yes, you can put that in your formal grammar, but the generated parser will have a problem. Parsers generally tokenize the text and then go through it token-by-token selecting which rule to pick.

So this parser will only see the `__` token (not what follows) and will then pick the bold rule. If we have defined SPAN as not containing any `\n`, then when it reaches end-of-line it will give the error that it sees `\n` but expected `__`.

Given a sufficiently large look-ahead (in parser terms, i.e. looking at the next n tokens) and defining some dummy rules to deal with isolated `__` it could possibly be pulled off, but it could likely still be fooled.

A slightly related problem is the ambiguity when seeing `___` in the text. That will be tokenized as the two tokens `__` and `_`, i.e. first start bold, then italic. But the entire line could be: `___bold and__ only italic_`.

I.e. in this particular case it should have been tokenized as `_` and `__`.

A workaround would be using `*` for either the bold or italic. I.e. the strict parser would disallow three consecutive `*` or `_` if and only if bold has a longer span than italic.

_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to