On Mar 3, 2008, at 7:30 AM, Michel Fortin wrote:
Allan Odgaard wrote:
4. A regexp which is pushed onto a stack when entering the context of
this rule, and popped again when leaving this rule.
The fourth item here is really the interesting part, because it is
what made Markdown nesting work (99% of the time) despite this being
100% rule-driven.
I'm not sure that the regular expression in 4 does, beside being
pushed and popped from the stack (perhaps it's the end of block
expression), but overall it looks pretty good, and is pretty similar
to how I'm currently approaching the problem. There are a couple of
subtleties I'm not sure if these rules can catch though.
I assume Allan let the grammar refer back to this stack as if it were
an ordinary rule, so you could use the stack to collect levels of
indentation. It's like a limited kind of parameterization. I'd been
planning to use recursive transformation to handle nesting, since it
makes memoization easier and ought to be a little more readable. But
I'll try Allan's idea if mine gets hairy.
I like the direction you're both going, and I'm hoping we can come up
with a definition that doesn't use any English at all. Admittedly,
that'll be a lot easier for a version that does change some behavior
at the edges -- like ditching Markdown's 'undocumented *precedence'
rules* (<http://six.pairlist.net/pipermail/markdown-discuss/2007-August/000746.html
>).
I'm going to build my own little prototype to experiment with this
stuff (<http://six.pairlist.net/pipermail/markdown-discuss/2008-February/001042.html
>). My goal is to come up with a formal grammar that doubles as a
(slow) reference implementation. You'll feed a grammar and an input
file into a generic text-munging tool, which will spit out either the
transformed output or an AST. The tool will be small, easy to port,
and completely general -- you could use it to implement html2txt or
smartypants or an HTML sanitizer, for example. That's the plan,
anyway; we'll how the first iteration turns out.
The way I see it, rules need to be parametrized so the above can be
changed without having to define 2^(number of syntax elements)
rules, such as EmphasisWithinLink, LinkWihtinEmphasis,
CodeSpanWithinLinkWithinEmphasis, and so on.
Since I'm doing something packrat-ish, I'm hoping I can use lookahead
to keep the rules from exploding.
John Fraser
_______________________________________________
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss