Re: Optional features (was: Markdown Extra Specification (First Draft))

2008-05-24 Thread Aristotle Pagaltzis
* Yuri Takhteyev [EMAIL PROTECTED] [2008-05-23 08:35]:
 * Aristotle Pagaltzis [EMAIL PROTECTED] [2008-05-23 05:40]:
  I also agree with your opposition to them; if anything, one
  should filter the *output* of a Markdown-to-HTML conversion
  so that it won't matter whether people write literal `em`
  tags or use asterisks.
 
 This is true in theory... I actually just recently write
 something along those lines in Lua [1] to use with my Lua wiki.
 The idea is to do as you suggest: Convert from MD to HTML
 first, then filter the HTML. To make it safe, I parse HTML as
 XHTML and complain if it doesn't parse. Hence a problem: if the
 user screws up with their HTML (and my filter is pretty
 unforgiving), it becomes hard to communicate to them what went
 wrong. I can tell them where there is a problem in the overall
 HTML, but this doesn't help much, since the user didn't know
 there was all of this HTML to begin with.

It seems to me that filtering is a red herring in your case. If
you want to allow users to enter literal tags, you will have this
problem whether you filter the ultimate output or not.

 There is no easy way to show them where the problem occurred
 relative to the input that they provided, or to show them the
 content with just _their_ HTML escaped. So, a good solution in
 Markdown itself actually would be a good thing.

If your XHTML parser has a streaming input mode, you can couple
your Markdown converter directly to the XHTML parser and feed the
HTML output to it as you go. If the XHTML parser throws a well-
formedness error, you can then relate it to the vicinity of the
last Markdown chunk you converted to HTML and passed into the
XHTML parser.

It will sometimes be an earlier chunk; eg. if the user writes
`nbsp` (notice the missing semicolon) and this is exacly at end
of string in the HTML chunk you pass to the XHTML parser, then
the XHTML parser will have to wait until the next chunk before
it can decide that that entity is broken.

If you don’t want to couple the Markdown converter with an XHTML
parser that closely, it’s still possible to do this, but the
Markdown converter will have to be able to accept streaming input
itself and will need to generate output sufficiently frequently
that you can track the correlation of input and output with a
useful amount of precision. The glue code that combines the
Markdown converter with the XHTML parser will have to do some
relatively hairy (tho not very complex) bookkeeping in that case.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Optional features (was: Markdown Extra Specification (First Draft))

2008-05-24 Thread Yuri Takhteyev
 It seems to me that filtering is a red herring in your case. If
 you want to allow users to enter literal tags, you will have this
 problem whether you filter the ultimate output or not.

If I want to allow them, then yes, but this is not the case I was
considering.  Suppose I do _not_ want to allow them to enter HTML
tags.  This is easy to implement as an option in a Markdown converter.
 However, if the converter doesn't do that, then I have a much harder
task: user's tags are now mixed with Markdown's tags, and I have to
figure out how to sort them out.  There _is_ a difference between the
em inserted by markdown and the em inserted by the user.  I know
Markdown's em will be balanced.  I am not sure that the user's will
be.  At this point the only way to be sure that the HTML is valid is
to parse it.

 If your XHTML parser has a streaming input mode, you can couple
 your Markdown converter directly to the XHTML parser and feed the
 HTML output to it as you go. If the XHTML parser throws a well-
 formedness error, you can then relate it to the vicinity of the
 last Markdown chunk you converted to HTML and passed into the
 XHTML parser.

I am not quite sure what you mean, but Markdown documents can't always
be processed on a chunk by chunk basis. Consider:

Here is a [link][id].

... 100KB of text...

[id]: http://example.com/  Optional Title Here

This document cannot be processed correctly unless it's considered all
at the same time.

 If you don't want to couple the Markdown converter with an XHTML
 parser that closely, it's still possible to do this, but the
 Markdown converter will have to be able to accept streaming input
 itself and will need to generate output sufficiently frequently
 that you can track the correlation of input and output with a
 useful amount of precision.

Sure, if you want to drop support for references, footnotes, etc.  But
it's much simpler to implement a safe mode that escapes or validates
all HTML submitted by the user.

  - yuri

-- 
http://sputnik.freewisdom.org/
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Optional features (was: Markdown Extra Specification (First Draft))

2008-05-24 Thread Aristotle Pagaltzis
* Yuri Takhteyev [EMAIL PROTECTED] [2008-05-24 21:35]:
 * Aristotle Pagaltzis [EMAIL PROTECTED] [2008-05-24 11:15]:
  If your XHTML parser has a streaming input mode, you can
  couple your Markdown converter directly to the XHTML parser
  and feed the HTML output to it as you go. If the XHTML parser
  throws a well-formedness error, you can then relate it to
  the vicinity of the last Markdown chunk you converted to HTML
  and passed into the XHTML parser.
 
 I am not quite sure what you mean, but Markdown documents can't
 always be processed on a chunk by chunk basis. Consider:
 
 Here is a [link][id].
 
 ... 100KB of text...
 
 [id]: http://example.com/  Optional Title Here
 
 This document cannot be processed correctly unless it's
 considered all at the same time.

Good point, so streaming the Markdown input is not possible. But
that doesn’t mean you can’t generate the output piecemeal and
also feed it to the XHTML parser that way.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Optional features (was: Markdown Extra Specification (First Draft))

2008-05-23 Thread Yuri Takhteyev
 I also agree with your opposition to them; if anything, one
 should filter the *output* of a Markdown-to-HTML conversion
 so that it won't matter whether people write literal `em`
 tags or use asterisks.

This is true in theory...  I actually just recently write something
along those lines in Lua [1] to use with my Lua wiki.  The idea is to
do as you suggest: Convert from MD to HTML first, then filter the
HTML.  To make it safe, I parse HTML as XHTML and complain if it
doesn't parse.  Hence a problem: if the user screws up with their HTML
(and my filter is pretty unforgiving), it becomes hard to communicate
to them what went wrong.  I can tell them where there is a problem in
the overall HTML, but this doesn't help much, since the user didn't
know there was all of this HTML to begin with.  There is no easy way
to show them where the problem occurred relative to the input that
they provided, or to show them the content with just _their_ HTML
escaped.  So, a good solution in Markdown itself actually would be a
good thing.

  - yuri

[1]: http://sputnik.freewisdom.org/lib/xssfilter/

-- 
http://sputnik.freewisdom.org/
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Optional features (was: Markdown Extra Specification (First Draft))

2008-05-22 Thread Aristotle Pagaltzis
* Sherwood Botsford [EMAIL PROTECTED] [2008-05-07 22:10]:
 THAT said, however, maintaining perfect backward compatibility
 slows down progress.

I don’t know. It seems to me perfect backward compatibility is
not even possible, considering that Markdown.pl is not set in
stone (John takes bug reports and writes fixes, every so often)
and yet is not formally defined anywhere. As such, there is no
way to say what is backward compatible and what isn’t. I think
at most, backcompat for the purposes of a spec for Markdown can
only be defined as targetting a particular feature set, but not
an exact implementation of it.

That is, after all, the entire reason for the spec effort in the
first place.

 Can markdown extra have a configuration file:
 The default behaviour is to emulate markdown.
 The configuration file allows for new features that don't fit
 well into the old set.

Optional features are dangerous and impede interoperability.

Everyone who ever thinks about chipping in on the design of
a spec should read [section 5 of RFC 3339][1]. That RFC is
a spec for a particular datetime format, but section 5 is
largely agnostic of the nature of the format, and lays down
the principles according to which the design decisions for
this format were made. [Section 5.3][2] is the part with
direct relevance to your stipulation, but the entire section
is readworthy.

[1]: http://tools.ietf.org/html/rfc3339#section-5
[2]: http://tools.ietf.org/html/rfc3339#section-5.3

One problem is that every new option leads to a geometric
increase in the number of feature combinations that have to be
tested.

Another issue is that Markdown is a document format. If it has
many optional features, what are the chances that if I send you
a document ostensibly written in Markdown that will work in your
implementation of Markdown exactly as it did in mine? You really
really don’t want to have to wonder.

This was a major reason why SGML mostly failed, f.ex., and only
gained traction when it was restandardised as XML. SGML had
legions of optional author-friendly features that it made it an
extreme amount of work to implement a parser that correctly
implemented the entire spec. The XML working group sat down and
basically chucked out 95% of the optional features and made the
rest mandatory. The rest is history.

Optional features in a document format are an invitation for
interoperability problems. Since the entire point of the Markdown
spec effort was to reduce existing interoperability problems,
I strongly advise that as little as possible in the spec be made
optional. Ideally, nothing would be.

It is, mind, perfectly fine to have two (or maybe three?) specs
of which one is a superset of the other, as seems to be Michel’s
current thrust with Markdown vs Markdown Extra. Assuming that no
feature in either spec is optional, that means you would be able
to expect Markdown Extra documents to work in all Markdown Extra
processors, and all Markdown documents to work in all Markdown
and Markdown Extra processors. The scope of the problem is much
smaller in such a scenario, enough so to be perfectly tractable.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Optional features (was: Markdown Extra Specification (First Draft))

2008-05-22 Thread Allan Odgaard

On 22 May 2008, at 08:10, Aristotle Pagaltzis wrote:


[...]
Optional features are dangerous and impede interoperability.

Everyone who ever thinks about chipping in on the design of
a spec should read [section 5 of RFC 3339][1]. [...]


I love how it says:

[...] A format which includes rarely used options is
likely to cause interoperability problems [...] The
format defined below includes only one rarely used
option: fractions of a second. [...]

Which reminds me of when svn started to report fractions of seconds in  
their ‘svn log --xml’ output, causing a few log visualizers to break :)


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Optional features (was: Markdown Extra Specification (First Draft))

2008-05-22 Thread Michel Fortin

Le 2008-05-22 à 2:10, Aristotle Pagaltzis a écrit :


It is, mind, perfectly fine to have two (or maybe three?) specs
of which one is a superset of the other, as seems to be Michel’s
current thrust with Markdown vs Markdown Extra. Assuming that no
feature in either spec is optional, that means you would be able
to expect Markdown Extra documents to work in all Markdown Extra
processors, and all Markdown documents to work in all Markdown
and Markdown Extra processors. The scope of the problem is much
smaller in such a scenario, enough so to be perfectly tractable.


I perfectly agree with this by the way: optional features should be  
kept to a minimum. It may be interesting to note there are currently  
two configurable parsing-related[^1] in PHP Markdown:


Tab width (default = 4)

:   This one comes from a similar configuration option in
Markdown.pl and is essentially the size in spaces for one
indent through a Markdown document. When John Gruber says
four spaces or one tab in his syntax description document,
he really means tab-width spaces or one tab, where
tab-width is a configurable parameter defaulting to 4.

I'm not aware of anyone changing this parameter, and I'm not
even sure of how well it works, but it is clear that changing
this will break many documents written with a different tab
width in mind.

No markup (default = false)
No entities (default = false)

:   This one prevents the parser from skipping over HTML tags
and/or HTML character entities. I was originally opposed to
it, and in some way I still am. I decided to add it because
there was too much people attempting to disable HTML by
preprocessing the input with strip_tags or a substitution
regular expression without realizing they were breaking
automatic links, code spans and code blocks with HTML in
them, and sometimes blockquotes.

I'm no fan of this mode, but I feel it was the best way to
avoid people breaking the syntax by accident, so I've added
it in.

I'm not sure those features should be formally part of the spec. I  
believe however that if the spec is well written it should be pretty  
trivial to see what must be changed to achieve them.


[^1]:
A parsing-related setting is a setting that changes the
interpretation of the document given in output. The oposite
is an output-related setting, which changes the HTML
output but does not affect the interpretation the parser
makes of the document.


Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss