On Mar 3, 2008, at 7:30 AM, Michel Fortin wrote:
Allan Odgaard wrote:
4. A regexp which is pushed onto a stack when entering the context of
this rule, and popped again when leaving this rule.

The fourth item here is really the interesting part, because it is
what made Markdown nesting work (99% of the time) despite this being
100% rule-driven.

I'm not sure that the regular expression in 4 does, beside being pushed and popped from the stack (perhaps it's the end of block expression), but overall it looks pretty good, and is pretty similar to how I'm currently approaching the problem. There are a couple of subtleties I'm not sure if these rules can catch though.

I assume Allan let the grammar refer back to this stack as if it were an ordinary rule, so you could use the stack to collect levels of indentation. It's like a limited kind of parameterization. I'd been planning to use recursive transformation to handle nesting, since it makes memoization easier and ought to be a little more readable. But I'll try Allan's idea if mine gets hairy.

I like the direction you're both going, and I'm hoping we can come up with a definition that doesn't use any English at all. Admittedly, that'll be a lot easier for a version that does change some behavior at the edges -- like ditching Markdown's 'undocumented *precedence' rules* (<http://six.pairlist.net/pipermail/markdown-discuss/2007-August/000746.html >).

I'm going to build my own little prototype to experiment with this stuff (<http://six.pairlist.net/pipermail/markdown-discuss/2008-February/001042.html >). My goal is to come up with a formal grammar that doubles as a (slow) reference implementation. You'll feed a grammar and an input file into a generic text-munging tool, which will spit out either the transformed output or an AST. The tool will be small, easy to port, and completely general -- you could use it to implement html2txt or smartypants or an HTML sanitizer, for example. That's the plan, anyway; we'll how the first iteration turns out.

The way I see it, rules need to be parametrized so the above can be changed without having to define 2^(number of syntax elements) rules, such as EmphasisWithinLink, LinkWihtinEmphasis, CodeSpanWithinLinkWithinEmphasis, and so on.

Since I'm doing something packrat-ish, I'm hoping I can use lookahead to keep the rules from exploding.

John Fraser
_______________________________________________
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to