Re: Formal Grammar — some thoughts

Allan Odgaard Sun, 30 Jul 2006 18:30:08 -0700

On 30/7/2006, at 22:34, Michel Fortin wrote:

[...] I'd like to point out that in my view John's implementationis already doing tokenization in some form [...]

Well, this here [1] is what people generally refer to when speakingof tokenizing input.

[...] For example, let's create a link with a new "tokenized" wayfrom this:
    __some text [with a link__ oh!](somewhere)

[...] See? No invalid nesting anymore!


Now try the same on these two lines of text:

    This `is raw [text`](#)

    This is a [`link](#) and more text`

If you choose to replace links with an md5 first, then the result ofconverting the first line will be wrong, whereas if you choose toconvert raw first, the second line will be wrong.

This is easy to handle with a real parser, actually, even a regexpcan do it. There is little need for this multi-pass contentobfuscation paradigm currently being used ;)

[...] This is far from having a formal grammar, but it shows that alot more could be done by reusing the current approach.

Well, yes, a lot more can be done. But I think the energy would bebetter spent trying to move toward a more formal grammar and morestandard parsing mechanisms. This is quite a challenge, and it can’tbe done without revising some parts of the syntax, OTOH theproblematic parts (e.g. nested block elements) is often not handledconsistently (or properly) by the current implementation, so I’dthink it would be possible to tweak this a bit.



[1] http://en.wikipedia.org/wiki/Lexer

_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: Formal Grammar — some thoughts

Reply via email to