Re: Formal Grammar — some thoughts

Michel Fortin Mon, 31 Jul 2006 12:03:34 -0700

Le 30 juil. 2006 à 21:29, Allan Odgaard a écrit :

On 30/7/2006, at 22:34, Michel Fortin wrote:
[...] I'd like to point out that in my view John's implementationis already doing tokenization in some form [...]
Well, this here [1] is what people generally refer to when speakingof tokenizing input.

Yeah, I know that isn't exactly like a tokenization process. I justwanted to draw the parallel between the way Markdown currently worksand a regular tokenizer. I called it *some form* of tokenization, andused more often than not the word "token" inside quotes to emphasisethe precariousness of the comparaison.

At the same time, I'm not sure I have a better name than "token" forthese md5 hashes in the eventuality they would be replaced by anothernon-hashing labeling scheme.

Now try the same on these two lines of text:

    This `is raw [text`](#)

    This is a [`link](#) and more text`
If you choose to replace links with an md5 first, then the resultof converting the first line will be wrong, whereas if you chooseto convert raw first, the second line will be wrong.

What's wrong and right here? It could be argued that since it's notdefined in the syntax description whichever comes first should be therule and no priority should be given to one syntax construct overanother, but the fact is that it's still undefined and that John'sreference implementation prioritize code spans over links.

This is easy to handle with a real parser, actually, even a regexpcan do it. There is little need for this multi-pass contentobfuscation paradigm currently being used ;)

I thought a while ago about combining all the span-level regularexpressions into one big expression: this would implement thewhichever-comes-first rule. But I don't see the multi-pass approachas wrong either: it simply implements some priority relationshipamong the different syntax constructs.

One question though: is it so much important that these border casesbe consistent across all implementations? No doubt it would be a goodthing, but at what price in term of complexity of implementation?

[...] This is far from having a formal grammar, but it shows thata lot more could be done by reusing the current approach.
Well, yes, a lot more can be done. But I think the energy would bebetter spent trying to move toward a more formal grammar and morestandard parsing mechanisms. This is quite a challenge, and itcan’t be done without revising some parts of the syntax, OTOH theproblematic parts (e.g. nested block elements) is often not handledconsistently (or properly) by the current implementation, so I’dthink it would be possible to tweak this a bit.

Formal grammar or not, it's certain the specification could berevised to clarify a lot of edge cases. That said, I don't think thesyntax should be allowed to *change* just to accomodate a formalgrammar requirement.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: Formal Grammar — some thoughts

Reply via email to