Re: Markdown internal metadata Re: Markdown validity

2014-07-12 Thread John MacFarlane

+++ Sean Leonard [Jul 12 14 17:26 ]:

I think I can move on to my next question:

It seems that all Markdown content is expected to appear inside of a 
block-level element in HTML parlance; i.e., inside  or one of 
its block-level descendants (, , , , ..., 
etc.).


I tried to do some  stuff, as in:
http://johnmacfarlane.net/babelmark2/?text=%3Chead%3E%3Ctitle%3EHello+World%3C%2Ftitle%3E%3Cmeta+name%3D%22author%22+content%3D%22Alice%22%3E%3C%2Fhead%3E%0A%0AI+am+some+text.%0A%3Cdiv%3Eand+i+am+inside+*myself*%3C%2Fdiv%3E%0A%0AThe+end.

And not surprisingly, the results are all over the place. Clearly this 
is not an effective way to communicate HTML metadata, since Markdown 
is designed to process HTML block-level content.


Therefore, *when it matters*, what are strategies that Markdown users 
currently use to manage HTML metadata such as those metadata items 
defined in  and 
?


I am interested in items such as:
title
meta name info (author, generator, description, keywords)
link rel (stylesheet, icon, etc.)
language (either http-equiv content-language, or )
date [not part of HTML, but see pandoc_title_block]
?


There is no standardization here.  However, pandoc has moved on to a
more flexible system allowing structured YAML metadata, which may be
placed anywhere in the document.

http://johnmacfarlane.net/pandoc/README.html#yaml-metadata-block

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown internal metadata Re: Markdown validity

2014-07-12 Thread Aristotle Pagaltzis
* Alan Hogan  [2014-07-10 23:00]:
> You are entirely correct that there is a strong chance that this API
> call would actually send an updated copy of a JSON object including
> fields such as “title”, “date”, “url”, and “body”, the last of which
> may implicitly or explicitly be Markdown data. (And the MIME type on
> that call would be application/json or whatever.) But perhaps the most
> RESTful way to do this would be to send a plain Markdown file (as
> text/markdown).

Perhaps, yes, but not actually.

As far as ReST is concerned, either approach is equally valid.

However, it is nicer if you can use a wide-spread MIME type that is more
specific than something ultra-generic like application/json, since these
generic MIME types tell you essentially nothing about the application-
level meaning of the data, which weakens the utility of intermediaries.
(I.e. a reverse proxy in front of your app might try to do clever stuff
based on the MIME type of a request; if your data is overly generically
labelled then the proxy must parse the response body to figure out what
type of data it is dealing with. Conversely for the same reason you also
don’t want to invent ultra-specific one-off MIME types, because existing
infrastructure will have no idea what type of thing that might be.)

But it is totally feasible for a few standard rules to be applied by the
server to extract metadata from the content of a Markdown document.

That is in fact exactly what my own hack for serving a directory full’a
Markdown files as a static site does. Furthermore,

* Sean Leonard  [2014-07-13 02:30]:
> I tried to do some  stuff, as in:
> http://johnmacfarlane.net/babelmark2/?text=%3Chead%3E%3Ctitle%3EHello+World%3C%2Ftitle%3E%3Cmeta+name%3D%22author%22+content%3D%22Alice%22%3E%3C%2Fhead%3E%0A%0AI+am+some+text.%0A%3Cdiv%3Eand+i+am+inside+*myself*%3C%2Fdiv%3E%0A%0AThe+end.
>
> And not surprisingly, the results are all over the place. Clearly this
> is not an effective way to communicate HTML metadata, since Markdown
> is designed to process HTML block-level content.

… I use a hacked Markdown processor that treats head-level elements just
like block-level elements (I find it a missed opportunity that at least
this much is not part of standard Markdown), then I run a HTML5 parser
over the output to normalise it and finally, I use an XSL transform
against the DOM from that to pull any remaining head elements up into
the head, before re-serialising the whole shebang.

(The H1-as-title extraction is only a fallback. So I can give documents
an explicit title different from their first heading, or even provide
a title when there are no headings present.)

(I have designs for releasing this thing someday but its current form is
cobbled together too hackily to work for anyone else.)


Which leads me to:

* Waylan Limberg  [2014-07-13 03:30]:
> I should also point out that a number of projects will use the first
>  Header in the document as the title. And if the file is stored on
> the file system, the creation and modification date may be pulled from
> the file system.

a) Yup, exactly.


* Waylan Limberg  [2014-07-10 17:55]:
> Why do we need a Mime Type?

b) I find the need for a MIME type trivially evident because I already
   have directories full of files with nothing but Markdown in them.


Regards,
-- 
Aristotle Pagaltzis // 
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown internal metadata Re: Markdown validity

2014-07-12 Thread Fletcher T. Penney
These are some of the things that lead to me releasing MultiMarkdown 9 years 
ago:

* The realization that Markdown documents could be complete documents, and not 
just a snippet of text to be inserted in a blog CMS

* That these complete documents would need some sort of metadata (Gruber was 
not a fan of this idea)

* That Markdown could be converted to more than just HTML (e.g. LaTeX, etc.)


The MultiMarkdown metadata syntax was based on a blosxom plugin (I believe it 
was simply called meta??)

I would recommend checking out MMD (in addition to pandoc as you mentioned) if 
you're interested in Markdown related tools that support metadata.



FTP


-- 
Fletcher T. Penney
fletc...@fletcherpenney.net 

On Jul 12, 2014, at 8:26 PM, Sean Leonard  wrote:

> It seems that all Markdown content is expected to appear inside of a 
> block-level element in HTML parlance; i.e., inside  or one of its 
> block-level descendants (, , , , ..., etc.).



> Therefore, *when it matters*, what are strategies that Markdown users 
> currently use to manage HTML metadata such as those metadata items defined in 
>  and 
> ?
> 
> I am interested in items such as:
> title
> meta name info (author, generator, description, keywords)
> link rel (stylesheet, icon, etc.)
> language (either http-equiv content-language, or )
> date [not part of HTML, but see pandoc_title_block]
> ?
> 
> I recognize that in many use cases, Markdown is for content fragments: stick 
> this blob of text somewhere in a page and be done with it. But increasingly 
> there are Markdown files (.md, .markdown) that are being treated as discrete 
> documents. So for those latter cases, some metadata is desirable.
> 
> Are the following also true (or aesthetically agreeable)?
> - there are no concerted CROSS-TOOL efforts to insert metadata into Markdown 
> streams
>  (I am aware of pandoc_title_block)
> - inserting metadata into Markdown streams in a CROSS-TOOL way would be 
> kludgey
>  e.g. use an inert comment at the top:
>  [/Title/]: # (This comment could include metadata)
>  (but nobody does this)
> 
> -Sean
> 
> ___
> Markdown-Discuss mailing list
> Markdown-Discuss@six.pairlist.net
> http://six.pairlist.net/mailman/listinfo/markdown-discuss



smime.p7s
Description: S/MIME cryptographic signature
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown internal metadata Re: Markdown validity

2014-07-12 Thread Waylan Limberg

> On Jul 12, 2014, at 8:52 PM, Karl Dubost  wrote:
> 
> 
>> Le 13 juil. 2014 à 09:26, Sean Leonard  a écrit :
>> Therefore, *when it matters*, what are strategies that Markdown users 
>> currently use to manage HTML metadata such as those metadata items defined in
> 
> search for multi-markdown.
> http://fletcher.github.io/MultiMarkdown-4/metadata

Yes, that is one example. A few other implementations have similar extensions. 
However, I think the best example is Jekyll [1], the static file generator 
behind GitHub Pages (admittedly, Jekyll is not a markdown parser, but a tool 
that uses one). Although its metadata syntax is not really that much different 
that the other metadata extensions, it is important to note that Jekyll 
supports more than one text format (markdown, textile). Behind the scenes, the 
code removes the "frontmatter" first (which is passes on to a YAML parser), 
then passes the remaining text on to the appropriate parser. The point is that 
the one file contains 2 documents: a YAML document and  markdown document; each 
parsed by a separate tool. So, while other markdown parsers may parse the 
frontmatter with the same tool, I still think of the metadata as being 
something other than markdown.

I should also point out that a number of projects will use the first  
Header in the document as the title. And if the file is stored on the file 
system, the creation and modification date may be pulled from the file system.  
Some even use the file name for the title (converting underscores to spaces and 
title casing). But those are the least flexible systems. The most flexible 
systems generally store the metadata in separate columns in a database 
alongside the markdown.

One thing is for certain, there is absolutely no standardization regarding 
metadata associated with markdown documents and many (most?) parsers do nothing 
to address the issue.

IMO, pure markdown is just human readable HTML fragments. That, I guess, is 
part of the reason why I asked why we need a mime type way back in my first 
response. Those HTML fragments don't really stand on there own, so why would a 
pure markdown file be transported on its own outside of some container that 
contains all that other metadata? Especially when that container already has a 
mime type of its own.

[1]: http://jekyllrb.com/docs/frontmatter/

Waylan
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown internal metadata Re: Markdown validity

2014-07-12 Thread Shane McCarron
We did some work on accessible markdown a year ago. Adding RDFa and aria
markup to help add metadata to the content. I don't think I have any good
pointers right now but it was all about making sure the generated HTML was
wcaag company and semantically meaningful.
On Jul 12, 2014 7:52 PM, "Karl Dubost"  wrote:

>
> Le 13 juil. 2014 à 09:26, Sean Leonard  a écrit :
> > Therefore, *when it matters*, what are strategies that Markdown users
> currently use to manage HTML metadata such as those metadata items defined
> in
>
> search for multi-markdown.
> http://fletcher.github.io/MultiMarkdown-4/metadata
>
> --
> Karl Dubost 🐄
> http://www.la-grange.net/karl/
>
> ___
> Markdown-Discuss mailing list
> Markdown-Discuss@six.pairlist.net
> http://six.pairlist.net/mailman/listinfo/markdown-discuss
>
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown internal metadata Re: Markdown validity

2014-07-12 Thread Karl Dubost

Le 13 juil. 2014 à 09:26, Sean Leonard  a écrit :
> Therefore, *when it matters*, what are strategies that Markdown users 
> currently use to manage HTML metadata such as those metadata items defined in 

search for multi-markdown.
http://fletcher.github.io/MultiMarkdown-4/metadata

-- 
Karl Dubost 🐄
http://www.la-grange.net/karl/

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Markdown internal metadata Re: Markdown validity

2014-07-12 Thread Sean Leonard

I think I can move on to my next question:

It seems that all Markdown content is expected to appear inside of a 
block-level element in HTML parlance; i.e., inside  or one of its 
block-level descendants (, , , , ..., etc.).


I tried to do some  stuff, as in:
http://johnmacfarlane.net/babelmark2/?text=%3Chead%3E%3Ctitle%3EHello+World%3C%2Ftitle%3E%3Cmeta+name%3D%22author%22+content%3D%22Alice%22%3E%3C%2Fhead%3E%0A%0AI+am+some+text.%0A%3Cdiv%3Eand+i+am+inside+*myself*%3C%2Fdiv%3E%0A%0AThe+end.

And not surprisingly, the results are all over the place. Clearly this 
is not an effective way to communicate HTML metadata, since Markdown is 
designed to process HTML block-level content.


Therefore, *when it matters*, what are strategies that Markdown users 
currently use to manage HTML metadata such as those metadata items 
defined in  and 
?


I am interested in items such as:
title
meta name info (author, generator, description, keywords)
link rel (stylesheet, icon, etc.)
language (either http-equiv content-language, or )
date [not part of HTML, but see pandoc_title_block]
?

I recognize that in many use cases, Markdown is for content fragments: 
stick this blob of text somewhere in a page and be done with it. But 
increasingly there are Markdown files (.md, .markdown) that are being 
treated as discrete documents. So for those latter cases, some metadata 
is desirable.


Are the following also true (or aesthetically agreeable)?
- there are no concerted CROSS-TOOL efforts to insert metadata into 
Markdown streams

  (I am aware of pandoc_title_block)
- inserting metadata into Markdown streams in a CROSS-TOOL way would be 
kludgey

  e.g. use an inert comment at the top:
  [/Title/]: # (This comment could include metadata)
  (but nobody does this)

-Sean

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown validity Re: Agreeing on "Historical Markdown"

2014-07-12 Thread Waylan Limberg

> On Jul 12, 2014, at 6:23 PM, Sean Leonard  wrote:
> 
> On 7/12/2014 12:31 PM, Waylan Limberg wrote:
>>> On Jul 12, 2014, at 2:52 PM, Michel Fortin  wrote:
>>> [snip]
>>> When you have a question like this, just try it Babelmark 2:
>>> http://johnmacfarlane.net/babelmark2/?normalize=1&text=%3Cdiv%3E
>> Yes, that's what we all do. And to answer your other question, notice that 
>> only two of the implementations on Babelmark2 failed. Remember, most of 
>> these implementations were written to be run on web servers. We can't have 
>> our web servers crashing just because a user submitted invalid markdown. 
>> What a parser doesn't understand is just passes through. What it 
>> misunderstands is garbles but it is specifically designed to never choke.
>> 
>> As Michel alluded to, most parsers are simply a series of regular expression 
>> substitutions which are run in a predetermined order. If a regex never 
>> matches a part of the text, then that part passes through untouched. Yes, 
>> that means the HTML is parsed by regex - which we all know is a bad idea -- 
>> but it is not really parsed in the way that browsers parse HTML. The regex 
>> just finds anything surrounded by angle brackets and ignores it. With the 
>> exception of the limited block level stuff, we don't even care if there are 
>> opening and/or closing tags. Yes, that can result in improperly nested 
>> stuff, but that is the authors fault and the parser should not bring the 
>> whole server down for that. The Author can (should?) preview in a browser 
>> and fix it before publishing.
>> 
>> However, I should point out that while the above describes most parsers (as 
>> most are more or less direct ports of markdown.pl - which works this way), 
>> there are a few that use other methods under the hood. For example, a few 
>> generate a parse tree which is then fed into a renderer (I believe Pandoc 
>> works like that, which allows it to output many more formats than just 
>> HTML), but they are the rare exception.
> 
> I see.
> 
> Here is a real-world example of what I was citing:
> http://johnmacfarlane.net/babelmark2/?text=Hello+I+am+some+*text*.%0A%3Cdiv%3EHello+%3Ca+href%3D%22http%3A%2F%2Fwww.example.com%2F%22%3Ethat+is+nice%3C%2Fa%3E+chance+%26+circumstance%26hellip%3B%0A%0AThe+end.
> 
> Truly, it looks like there is great diversity in Markdown-land.
> 
> Ok, so any standard mentioning Historical Markdown cannot say that any 
> particular behavior is normative when it comes to HTML validity. Some check 
> for HTML (island) validity and behave differently; others don't. The end...I 
> guess.

Yes, but select "normalize" (which normalizes insignificant white space in the 
output), and the number of variations decreases. Unfortunately, there is 
absolutely no standardization in how the various implementations handle white 
space (I don't think I've seen two that match exactly in every corner case). 
Either way though, hit the "preview" button (top right of output) to see how 
the browser renders the output and all but a couple render in the browser 
exactly the same.

And that is what makes markdown so great. You don't need to know or understand 
HTML to write it if you are using markdown. And if you have only an elementary 
knowledge of HTML, you can break into HTML on those few occasions when markdown 
won't do what you need.

Waylan
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown validity Re: Agreeing on "Historical Markdown"

2014-07-12 Thread Sean Leonard

On 7/12/2014 12:31 PM, Waylan Limberg wrote:

On Jul 12, 2014, at 2:52 PM, Michel Fortin  wrote:
[snip]
When you have a question like this, just try it Babelmark 2:
http://johnmacfarlane.net/babelmark2/?normalize=1&text=%3Cdiv%3E

Yes, that's what we all do. And to answer your other question, notice that only 
two of the implementations on Babelmark2 failed. Remember, most of these 
implementations were written to be run on web servers. We can't have our web 
servers crashing just because a user submitted invalid markdown. What a parser 
doesn't understand is just passes through. What it misunderstands is garbles 
but it is specifically designed to never choke.

As Michel alluded to, most parsers are simply a series of regular expression 
substitutions which are run in a predetermined order. If a regex never matches 
a part of the text, then that part passes through untouched. Yes, that means 
the HTML is parsed by regex - which we all know is a bad idea -- but it is not 
really parsed in the way that browsers parse HTML. The regex just finds 
anything surrounded by angle brackets and ignores it. With the exception of the 
limited block level stuff, we don't even care if there are opening and/or 
closing tags. Yes, that can result in improperly nested stuff, but that is the 
authors fault and the parser should not bring the whole server down for that. 
The Author can (should?) preview in a browser and fix it before publishing.

However, I should point out that while the above describes most parsers (as 
most are more or less direct ports of markdown.pl - which works this way), 
there are a few that use other methods under the hood. For example, a few 
generate a parse tree which is then fed into a renderer (I believe Pandoc works 
like that, which allows it to output many more formats than just HTML), but 
they are the rare exception.


I see.

Here is a real-world example of what I was citing:
http://johnmacfarlane.net/babelmark2/?text=Hello+I+am+some+*text*.%0A%3Cdiv%3EHello+%3Ca+href%3D%22http%3A%2F%2Fwww.example.com%2F%22%3Ethat+is+nice%3C%2Fa%3E+chance+%26+circumstance%26hellip%3B%0A%0AThe+end.

Truly, it looks like there is great diversity in Markdown-land.

Ok, so any standard mentioning Historical Markdown cannot say that any 
particular behavior is normative when it comes to HTML validity. Some 
check for HTML (island) validity and behave differently; others don't. 
The end...I guess.


Sean

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown validity Re: Agreeing on "Historical Markdown"

2014-07-12 Thread Waylan Limberg

> On Jul 12, 2014, at 2:52 PM, Michel Fortin  wrote:
> [snip]
> When you have a question like this, just try it Babelmark 2:
> http://johnmacfarlane.net/babelmark2/?normalize=1&text=%3Cdiv%3E

Yes, that's what we all do. And to answer your other question, notice that only 
two of the implementations on Babelmark2 failed. Remember, most of these 
implementations were written to be run on web servers. We can't have our web 
servers crashing just because a user submitted invalid markdown. What a parser 
doesn't understand is just passes through. What it misunderstands is garbles 
but it is specifically designed to never choke.

As Michel alluded to, most parsers are simply a series of regular expression 
substitutions which are run in a predetermined order. If a regex never matches 
a part of the text, then that part passes through untouched. Yes, that means 
the HTML is parsed by regex - which we all know is a bad idea -- but it is not 
really parsed in the way that browsers parse HTML. The regex just finds 
anything surrounded by angle brackets and ignores it. With the exception of the 
limited block level stuff, we don't even care if there are opening and/or 
closing tags. Yes, that can result in improperly nested stuff, but that is the 
authors fault and the parser should not bring the whole server down for that. 
The Author can (should?) preview in a browser and fix it before publishing.

However, I should point out that while the above describes most parsers (as 
most are more or less direct ports of markdown.pl - which works this way), 
there are a few that use other methods under the hood. For example, a few 
generate a parse tree which is then fed into a renderer (I believe Pandoc works 
like that, which allows it to output many more formats than just HTML), but 
they are the rare exception.

Waylan
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown validity Re: Agreeing on "Historical Markdown"

2014-07-12 Thread Michel Fortin
Le 12-juil.-2014 à 10:32, Sean Leonard  a écrit :

> Markdown may have a concept of HTML validity. A Markdown processor that 
> identifies HTML in Markdown content may determine that the HTML is valid or 
> invalid. For example, it may identify  ... [end of document] as HTML 
> that is invalid because it lacks a closing  tag. Then, it has five 
> choices:
> 1. treat the invalid HTML as text--pass the text-as-text to the markup (i.e., 
> turn & into & , < into < , etc.)
> 2. treat the invalid HTML as Markdown--keep on processing the input and look 
> for markdown inside of it (thus *hello* inside the invalid HTML will get 
> marked up...and http://www.example.com/";>hello[end of 
> document] will become a real link with the literal text '' preceding it)
>  <-- this is the same behavior as "not identifying the text as HTML in the 
> first place"
> 3. pass the invalid HTML as HTML
> 4. attempt to fix the HTML...thus  href="http://www.example.com/";>hello[end of document] might become 
> http://www.example.com/";>hello
> 5. fail due to HTML invalidity
> 
> ?

Is that really a question?

1. Turning `&` and `<` into `&` and `<` is part of the official syntax 
rules. Hopefully every Markdown parser does that.

2. 3. 4. 5. We have implementations doing all of that, probably mixing a few of 
those solutions depending on the exact error.

When you have a question like this, just try it Babelmark 2:
http://johnmacfarlane.net/babelmark2/?normalize=1&text=%3Cdiv%3E


-- 
Michel Fortin
michel.for...@michelf.ca
http://michelf.ca

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown validity Re: Agreeing on "Historical Markdown"

2014-07-12 Thread Aristotle Pagaltzis
* Sean Leonard  [2014-07-12 16:35]:
> However, if you have Markdown in the HTML content with markdown="1" as
> with PHP Markdown Extra, it is necessary to parse the HTML with
> something other than a straight HTML parser since the straight HTML
> parser will misinterpret the Markdown (e.g., & will be a validation
> error).

That parser is Markdown itself. You can already put Markdown inside
HTML tags, it’s just that normally Markdown will only parse the content
of inline tags like EM and SPAN, not block tags like P or DIV. This was
an explicit design choice. The markdown="1" attribute does nothing more
than turn off this distinction temporarily.

(The block tag rule allows you to write portions of your document as
plain old HTML when Markdown is insufficient, and also allows you to
pass stuff through Markdown several times (e.g. fragments in a CMS
getting passed through Markdown at various stages of page assembly)
without screwing up the document. I consider it the smartest choice in
the design of Markdown: the reason it has been adopted where other
syntaxes have remained confined to niches. It means almost any HTML
fragment is also a Markdown fragment, so it’s easy to add Markdown to
any publishing workflow that involves HTML somewhere even if it wasn’t
designed for that at all, and the content can then be ported piecemeal
instead of boil-the-ocean. Classic embrace-and-extend.)

> Therefore:
> Markdown has no concept of markdown validity.

Correct.

> Markdown may have a concept of HTML validity.

Not really. Individual processors may, but Markdown itself has nothing
to say about that. The original implementation of course is implemented
as a text substitution system, which means if you give it Markdown that
contains invalid HTML then you’ll simply get HTML that’s invalid in the
same way, to then be interpreted by the browser however the browser may.
My guess is that the majority of implementations behave equivalently to
this, though depending on their design they could differ completely.

Regards,
-- 
Aristotle Pagaltzis // 
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Markdown validity Re: Agreeing on "Historical Markdown"

2014-07-12 Thread Sean Leonard

As I'm thinking about this, I have other questions:

Can a Markdown parser/processor fail? Is there a concept of Markdown 
validity--i.e., can Markdown content be invalid (from the perspective of 
Markdown, not (X)HTML)?


As I understand it:
A Markdown processor identifies Markdown control sequences (aka 
markdown, in lowercase) in a stream of text and converts these sequences 
to the target markup--namely (X)HTML.
A Markdown processor identifies (X)HTML in markdown and passes this 
content to the target markup.
 <-- Do Markdown processors (i.e., existing implementations) attempt to 
fix or normalize the markup (by deserializing and then reserializing the 
markup), or is it a straight pass? It sounds like whether or not a 
Markdown processor reserializes the markup is implementation-dependent; 
Gruber's syntax rules do not say. However, if you have Markdown in the 
HTML content with markdown="1" as with PHP Markdown Extra, it is 
necessary to parse the HTML with something other than a straight HTML 
parser since the straight HTML parser will misinterpret the Markdown 
(e.g., & will be a validation error).



Therefore:
Markdown has no concept of markdown validity. A Markdown processor never 
fails due to invalid markdown input. If a sequence of text is not 
recognized as markdown (i.e., control sequences), it is treated as text 
and passed accordingly to the target markup. (This property is directly 
related to the "degradation" feature of Markdown, namely, if your 
processor cannot understand the markdown, the output is "worse" than an 
author intended, but does not cause utter failure--the non-understood 
markdown is visible in the output. This is in contrast to HTML, where 
tags or attributes that are not understood have no effect on the 
presentation of the HTML.)


Markdown may have a concept of HTML validity. A Markdown processor that 
identifies HTML in Markdown content may determine that the HTML is valid 
or invalid. For example, it may identify  ... [end of document] as 
HTML that is invalid because it lacks a closing  tag. Then, it has 
five choices:
1. treat the invalid HTML as text--pass the text-as-text to the markup 
(i.e., turn & into & , < into < , etc.)
2. treat the invalid HTML as Markdown--keep on processing the input and 
look for markdown inside of it (thus *hello* inside the invalid HTML 
will get marked up...and href="http://www.example.com/";>hello[end of document] will become a 
real link with the literal text '' preceding it)
  <-- this is the same behavior as "not identifying the text as HTML in 
the first place"

3. pass the invalid HTML as HTML
4. attempt to fix the HTML...thus href="http://www.example.com/";>hello[end of document] might become 
http://www.example.com/";>hello

5. fail due to HTML invalidity

?

Sean

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss