Allan Odgaard wrote:

Though without changing a lot of edge-case behavior, I find it hard to see Markdown using such rule-based implementation, so personally I am favoring a new Markdown-inspired language.

For my part, I'm currently trying to specify parsing rules Markdown Extra, and make the specification usable to parse Markdown too. The idea is to preserve the way it is working now, but to handle edge cases in a consistent and predictable manner. What I want to achieve is interoperability between implementations for the current Markdown and Markdown Extra languages, not creating a new look-alike language.

The problem so far has been that the formal syntax normally used to define grammars does not support Markdown’s notion of embedding, but as mentioned here http://six.pairlist.net/pipermail/markdown-discuss/2008-February/001002.html I have had some success with a rule-based implementation that uses a stack for aggregating rules that needs to be applied to the current line before it is handed to the regular parser -- this allows a specification without code and which is unambiguous to edge- cases since the rules are exhaustive, unlike a document written in English.

I'd like to point out a thing: you can always write in english what you can with a formal grammar; if you write things correctly, they'll be precise and unambiguous. This has the disadvantage of being more verbose, but the advantage that you don't need to learn a new "language", which is the grammar.

That said, I'm currently looking at how to specify Markdown formally. Whether to use a grammar or english, that is to be decided later. I'm looking at the general form of a rule, and I find the post you linked above gives a pretty good insight at what I need. Each rule in your lost rule-based implementation had this (quoting):

1. A regexp that makes the parser enter the context the rule
represents (e.g. block quote, list, raw, etc.).

2. A list of which rules are allowed in the context of this rule.

3. A regexp for leaving the context of this rule.

4. A regexp which is pushed onto a stack when entering the context of
this rule, and popped again when leaving this rule.

The fourth item here is really the interesting part, because it is
what made Markdown nesting work (99% of the time) despite this being
100% rule-driven.

I'm not sure that the regular expression in 4 does, beside being pushed and popped from the stack (perhaps it's the end of block expression), but overall it looks pretty good, and is pretty similar to how I'm currently approaching the problem. There are a couple of subtleties I'm not sure if these rules can catch though.

In my idea, you'd have parametrized rules. For instance, the list of allowed rules (2) should change depending on the context: you shouldn't have a link within a link, but you can have emphasis in your link; therefore, the emphasis rule when within a link shouldn't have a link rule in it's list of sub rules (2). You also need a way for the regular expression in 3 to be variable depending on what you caught in 1 (to match the same number of backticks in a code span for instance; to catch a matching closing HTML tag, etc.).

The way I see it, rules need to be parametrized so the above can be changed without having to define 2^(number of syntax elements) rules, such as EmphasisWithinLink, LinkWihtinEmphasis, CodeSpanWithinLinkWithinEmphasis, and so on.


Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


_______________________________________________
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to