Re: evolving the spec (was: forking Markdown.pl?)

Michel Fortin Mon, 03 Mar 2008 04:30:11 -0800

Allan Odgaard wrote:

Though without changing a lot of edge-case behavior, I find it hardto see Markdown using such rule-based implementation, so personallyI am favoring a new Markdown-inspired language.

For my part, I'm currently trying to specify parsing rules MarkdownExtra, and make the specification usable to parse Markdown too. Theidea is to preserve the way it is working now, but to handle edgecases in a consistent and predictable manner. What I want to achieveis interoperability between implementations for the current Markdownand Markdown Extra languages, not creating a new look-alike language.

The problem so far has been that the formal syntax normally used todefine grammars does not support Markdown’s notion of embedding, butas mentioned here http://six.pairlist.net/pipermail/markdown-discuss/2008-February/001002.htmlI have had some success with a rule-based implementation that usesa stack for aggregating rules that needs to be applied to thecurrent line before it is handed to the regular parser -- thisallows a specification without code and which is unambiguous to edge-cases since the rules are exhaustive, unlike a document written inEnglish.

I'd like to point out a thing: you can always write in english whatyou can with a formal grammar; if you write things correctly, they'llbe precise and unambiguous. This has the disadvantage of being moreverbose, but the advantage that you don't need to learn a new"language", which is the grammar.

That said, I'm currently looking at how to specify Markdown formally.Whether to use a grammar or english, that is to be decided later. I'mlooking at the general form of a rule, and I find the post you linkedabove gives a pretty good insight at what I need. Each rule in yourlost rule-based implementation had this (quoting):

1. A regexp that makes the parser enter the context the rule
represents (e.g. block quote, list, raw, etc.).

2. A list of which rules are allowed in the context of this rule.

3. A regexp for leaving the context of this rule.

4. A regexp which is pushed onto a stack when entering the context of
this rule, and popped again when leaving this rule.

The fourth item here is really the interesting part, because it is
what made Markdown nesting work (99% of the time) despite this being
100% rule-driven.

I'm not sure that the regular expression in 4 does, beside beingpushed and popped from the stack (perhaps it's the end of blockexpression), but overall it looks pretty good, and is pretty similarto how I'm currently approaching the problem. There are a couple ofsubtleties I'm not sure if these rules can catch though.

In my idea, you'd have parametrized rules. For instance, the list ofallowed rules (2) should change depending on the context: youshouldn't have a link within a link, but you can have emphasis in yourlink; therefore, the emphasis rule when within a link shouldn't have alink rule in it's list of sub rules (2). You also need a way for theregular expression in 3 to be variable depending on what you caught in1 (to match the same number of backticks in a code span for instance;to catch a matching closing HTML tag, etc.).

The way I see it, rules need to be parametrized so the above can bechanged without having to define 2^(number of syntax elements) rules,such as EmphasisWithinLink, LinkWihtinEmphasis,CodeSpanWithinLinkWithinEmphasis, and so on.



Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


_______________________________________________
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: evolving the spec (was: forking Markdown.pl?)

Reply via email to