Re: Metadata syntax (was Universal syntax for Markdown)
Oh My God I actually agree with Big Bird here. This whole discussion is getting far beyond what I think of as a lightweight markup system. Personally I think metadata should be processed separately from markdown data. Keep it unixy - one tool, one job. On 09/20/2011 01:03 PM, bowerb...@aol.com wrote: sherwood said: Well if your dogs are like mine, they will eat practically anything. Lately in addition to their kibble they've been catching pocket gophers and mice. A border collie is much less lovable with 'mouse breath' gophers and mice taste _great_ to a dog -- a dietary delicacy for many millennia now... it's your kibble they don't really care for. its redeeming feature is that it's so _easy_. but i bet there are several brands of kibble which your dogs still turn up their noses at. as the ad man replied, when asked why his costly campaign hadn't moved more units of the client's dogfood: dogs hate its taste. people are the same way. they'll eat a _lot_ of things, including some that you consider to taste _dreadful_ (e.g., ms-word), but that does not mean that they will eat _anything_. *** anyway, this conversation sounds confused... aside from questions of philosophy, it seems to me that there is confusion about just what sort of metadata we're all talking about, and how it's used, by whom, for what purposes... and so on and so forth, and hmm baby swing. but maybe i'm the only one confused...:+) you all seem like bright competent fellows, so i'm sure you'll get it all worked out, so i'm gonna go back to my sandbox and play. have a nice day... -bowerbird ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Tue, Sep 20, 2011 at 1:30 AM, David Sanson dsan...@gmail.com wrote: On Sep 19, 2011, at 4:02 PM, Rob McBroom mailingli...@skurfer.com wrote: Those sound like reasons for the metadata to *identify* the abstract, but I see no requirement that it must be literally *stored* there. If the metadata contained something like abstract: relative/path/to/abstract.mdown That would allow for all of the above scenarios while keeping the metadata syntax/section simple. But that makes the document far less portable, and I'm liable to lose the extra file at some point. I'd much rather have it be self-contained---not, of course, if that means that the document suddenly becomes weirdly ugly and complicated, but I don't see anyone proposing a solution that makes documents weird and ugly and complicated. Given that the abstract is actually part of the content (it is generally printed as part of the document, right?) it would seem more sensible to have the meta-data refer to a section name/path within the document. We can probably assume any markdown parser is capable of identifying the content between a heading and its next same-or-higher-depth sibling. Abstract could be a default value, supporting the simplest case first example Fletcher T. Penney provided above; This way content is in the right place (in the document, and appearing where you would expect it to with a simple abstract-unaware markdown converter), english speakers just write their document, and others can provide the abstract header, without needing to know anything about parsing or serialization rules. I realize I'm following up on the least-important aspect of this conversation, but I do wonder: what are genuine use cases where meta-data really does contain structured/formattable content that should not be considered part of the document content? It doesn't look like the abstract is really a valid case. ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
+++ Tao Klerks [Sep 20 11 03:15 ]: I realize I'm following up on the least-important aspect of this conversation, but I do wonder: what are genuine use cases where meta-data really does contain structured/formattable content that should not be considered part of the document content? It doesn't look like the abstract is really a valid case. I think that the abstract is a fine case. Although one *could* handle it the way you suggest, by having the metadata specify a section of the document to use as the abstract, I don't see the advantage of that. It is natural distinguish between the body text, which is *always* part of the produced document, whether a fragment or a standalone document is being produced, and regardless of the format or template used, and the metadata, which sometimes appear in the produced document, depending on one's purposes, and which appear differently in different formats. Once you make this distinction, the abstract clearly falls on the side of the metadata. Other cases: * bibliographic data for the document itself, which you might want to print in some presentations but not others * revision history * tags * bibliography entries used in the document * settings for things like default stylesheets On the last item: pandoc includes a powerful citation formatting system, citeproc. So you can use plain text citations in your document, like this [@smith99, p. 30; @barney04], and pandoc will format them according to a style sheet you select and include a bibliography (if the style sheet calls for that). This is a huge convenience, as you can write the document once, and change the citation style (even from author-date to footnotes) by selecting a different CSL stylesheet on the command line. Currently you need to specify the bibliography database on the command line as well (it can be bibtex, endnote, or any number of other formats). Ideally, though, the document itself should specify where its bibliographical entries are coming from. This could just be a file path, but if you want the document to be truly portable, it would be nice to be able to include the structured bibliography entries themselves in metadata at the end of the document. This could be done easily with a data description language as powerful as lua/yaml/json. John ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Tue, Sep 20, 2011 at 9:56 AM, John MacFarlane j...@berkeley.edu wrote: I think that the abstract is a fine case. Although one *could* handle it the way you suggest, by having the metadata specify a section of the document to use as the abstract, I don't see the advantage of that. It is natural distinguish between the body text, which is *always* part of the produced document, whether a fragment or a standalone document is being produced, and regardless of the format or template used, and the metadata, which sometimes appear in the produced document, depending on one's purposes, and which appear differently in different formats. Once you make this distinction, the abstract clearly falls on the side of the metadata. In that case, you're talking about metadata in the more general sense - like link definitions, footnotes, and other constructs that are currently treated as a special case in markdown. I'm all for having a special syntax for defining the abstract, as long as the author doesn't have to worry about any escaping conventions and can just write it like he/she would any other regular markdown content. Other cases: * bibliographic data for the document itself, which you might want to print in some presentations but not others * revision history * tags * bibliography entries used in the document * settings for things like default stylesheets Point taken, most of these are good cases for supporting structured content, but not formattable/markdown content, right? Currently you need to specify the bibliography database on the command line as well (it can be bibtex, endnote, or any number of other formats). Ideally, though, the document itself should specify where its bibliographical entries are coming from. This could just be a file path, but if you want the document to be truly portable, it would be nice to be able to include the structured bibliography entries themselves in metadata at the end of the document. This could be done easily with a data description language as powerful as lua/yaml/json. Absolutely - but the (possibly unattainable) ideal would be a situation where tools and experts can specify complex structured metadata, and regular joe can change his title, author, and other basic/simple values and lists, specifying values that contain apostrophes, commas and other natural punctuation, wihout blowing anything up in the process. As soon as he needs to specify/modify something that contains structure (or even something multi-line?) it seems fair that he should have to use a tool or do some research on the standard (esp. as most if not all of the structured-data use cases relate to tools already). My concern with a pure-lua/yaml/json metadata format is that it requires specialized knowledge (not related to the existing markdown standards/experience) on the part of the user for even the most trivial changes to the simplest fields - *especially* if structured/markdown content such as the abstract is placed in a metadata field! ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
+++ Tao Klerks [Sep 20 11 10:34 ]: On Tue, Sep 20, 2011 at 9:56 AM, John MacFarlane [1]j...@berkeley.edu wrote: I think that the abstract is a fine case. Although one *could* handle it the way you suggest, by having the metadata specify a section of the document to use as the abstract, I don't see the advantage of that. It is natural distinguish between the body text, which is *always* part of the produced document, whether a fragment or a standalone document is being produced, and regardless of the format or template used, and the metadata, which sometimes appear in the produced document, depending on one's purposes, and which appear differently in different formats. Once you make this distinction, the abstract clearly falls on the side of the metadata. In that case, you're talking about metadata in the more general sense - like link definitions, footnotes, and other constructs that are currently treated as a special case in markdown. I'm all for having a special syntax for defining the abstract, as long as the author doesn't have to worry about any escaping conventions and can just write it like he/she would any other regular markdown content. Yes, absolutely. There are two ways to approach this while keeping 'abstract' a metadata field: (1) There could be a special syntax for designating metadata fields as markdown (or alternatively markdown could be the default, and there could be a special syntax for designating them plain strings). I showed in my original post how lunamark implements this: abstract = m[[ Here's the abstract. You can put anything you want here, including blank lines. No special escaping is needed. It can be flush left, but I've left a small indent because it looks nice. * item 1 * item 2 ]] The 'm' indicates that the content is markdown. If you left it out, you'd have a plain string. (2) It could just be conventional that certain fields ('abstract', 'title', etc.) are interpreted as markdown. Other cases: * bibliographic data for the document itself, which you might want to print in some presentations but not others * revision history * tags * bibliography entries used in the document * settings for things like default stylesheets Point taken, most of these are good cases for supporting structured content, but not formattable/markdown content, right? Right in most cases, but one might want a free-form revision history that is just markdown, and bibliographic entries might include abstracts etc. Currently you need to specify the bibliography database on the command line as well (it can be bibtex, endnote, or any number of other formats). Ideally, though, the document itself should specify where its bibliographical entries are coming from. This could just be a file path, but if you want the document to be truly portable, it would be nice to be able to include the structured bibliography entries themselves in metadata at the end of the document. This could be done easily with a data description language as powerful as lua/yaml/json. Absolutely - but the (possibly unattainable) ideal would be a situation where tools and experts can specify complex structured metadata, and regular joe can change his title, author, and other basic/simple values and lists, specifying values that contain apostrophes, commas and other natural punctuation, wihout blowing anything up in the process. As soon as he needs to specify/modify something that contains structure (or even something multi-line?) it seems fair that he should have to use a tool or do some research on the standard (esp. as most if not all of the structured-data use cases relate to tools already). My concern with a pure-lua/yaml/json metadata format is that it requires specialized knowledge (not related to the existing markdown standards/experience) on the part of the user for even the most trivial changes to the simplest fields - *especially* if structured/markdown content such as the abstract is placed in a metadata field! I understand the concern. YAML is particularly bad this way, because you get used to not quoting or escaping things, but then your document blows up when you have a colon in the field. I think lua is a nice compromise--more regular and predictable, but you don't have to quote the fields as in json, and you have a really nice multiline string syntax that eliminates the need for escaping.[^1] But my lua-based proposal is compatible with also having a simpler way of specifying title, author, and date -- e.g. pandoc's, or Michael Thompson's proposal involving centering, or MMD's (though I think the Hamlet problem is serious). [^1]: What if your abstract contains `]]`, you might ask?
re: Metadata syntax (was Universal syntax for Markdown)
sherwood said: Well if your dogs are like mine, they will eat practically anything. Lately in addition to their kibble they've been catching pocket gophers and mice. A border collie is much less lovable with 'mouse breath' gophers and mice taste _great_ to a dog -- a dietary delicacy for many millennia now... it's your kibble they don't really care for. its redeeming feature is that it's so _easy_. but i bet there are several brands of kibble which your dogs still turn up their noses at. as the ad man replied, when asked why his costly campaign hadn't moved more units of the client's dogfood: dogs hate its taste. people are the same way. they'll eat a _lot_ of things, including some that you consider to taste _dreadful_ (e.g., ms-word), but that does not mean that they will eat _anything_. *** anyway, this conversation sounds confused... aside from questions of philosophy, it seems to me that there is confusion about just what sort of metadata we're all talking about, and how it's used, by whom, for what purposes... and so on and so forth, and hmm baby swing. but maybe i'm the only one confused... :+) you all seem like bright competent fellows, so i'm sure you'll get it all worked out, so i'm gonna go back to my sandbox and play. have a nice day... -bowerbird___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
The reStructuredText field list syntax might be a reasonable compromise. I like the fact that the metadata can occur anywhere in the document. The [RST spec][] itself says that the leading colon can be dropped in well-defined contexts such as when a field list invariably occurs at the beginning of a document (PEPs and email messages). As John's example from Hamlet shows, this isn't the case for markdown documents. But it would be possible to grandfather in existing MMD documents by insisting, as MMD does, that first lines that contain a colon are unambiguously metadata, with apologies to Hamlet. Or one could introduce a delimiter, like `--`, and say that lines before that delimiter are unambiguously metadata, and so don't require the leading colon. Frankly, I don't think it wouldn't be a terrible thing if implementations disagreed on this one detail---when is a leading colon required?---but agreed on everything else. Those who wish to trade flexibility for beauty could leave off the leading colon; those who value inter-operability over beauty could leave it on. David [RST spec]: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#field-lists ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
+++ David Chambers [Sep 18 11 15:08 ]: If we want to avoid defining our own serialization format, we have two options: we can adopt an existing format (such as JSON or YAML), or we can hand off the responsibility to application developers. Yes, I agree, and I certainly agree that we shouldn't go down the path of reinventing YAML. My proposal was to use lua as a data description language, as it is more texty than json, less quirky than YAML, and more flexible than either. But I don't really expect to get consensus on that. It seems to me that there are three levels at which we might hope to achieve consensus about metadata in markdown: 1. Agreement about which bits of the document are metadata, so these won't be processed as part of the document's text. 2. Agreement about a key-value format, so that all implementations can extract metadata into key/value pairs, with literal string values, in the same way. 3. Agreement about how the values are to be parsed into structured data, which bits are to be parsed as markdown, etc. Consensus on 1 would be useful, because it would prevent your metadata from turning into displayed garbage when processed with another markdown implementaiton. My own proposal on 1 was to put metadata inside specially marked HTML comments. An advantage is that there is *already* agreement among implementations not to make this part of the displayed document, so no agreement is needed. In effect, my proposal already achieves consensus on 1. Another possibility would be to put metadata inside '---' and '---'. This would solve two problems with MMD metadata: it would allow it to occur anywhere in the document, and it would avoid unwanted results when you happen to have a colon in the first line of your text. As for 2, a minimal modification from MMD style metadata would be to allow blank lines in fields, by requiring continuation lines to be indented four spaces. --- field1: Value one. Continued here. Another paragraph. field2: Next field. --- This would work best if we had something like the '---' '---' delimiters, since otherwise you have even more opportunities for unwanted captures (a blank line doesn't end metadata). John ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
Side note - the actual Hamlet line has a colon at the end of the line. So it would be fine in MMD. ;) F Sent from my iPhone On Sep 19, 2011, at 10:30 AM, David Sanson dsan...@gmail.com wrote: The reStructuredText field list syntax might be a reasonable compromise. I like the fact that the metadata can occur anywhere in the document. The [RST spec][] itself says that the leading colon can be dropped in well-defined contexts such as when a field list invariably occurs at the beginning of a document (PEPs and email messages). As John's example from Hamlet shows, this isn't the case for markdown documents. But it would be possible to grandfather in existing MMD documents by insisting, as MMD does, that first lines that contain a colon are unambiguously metadata, with apologies to Hamlet. Or one could introduce a delimiter, like `--`, and say that lines before that delimiter are unambiguously metadata, and so don't require the leading colon. Frankly, I don't think it wouldn't be a terrible thing if implementations disagreed on this one detail---when is a leading colon required?---but agreed on everything else. Those who wish to trade flexibility for beauty could leave off the leading colon; those who value inter-operability over beauty could leave it on. David [RST spec]: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#field-lists ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Mon, Sep 19, 2011 at 11:31 AM, Fletcher Penney fletc...@fletcherpenney.net wrote: Side note - the actual Hamlet line has a colon at the end of the line. So it would be fine in MMD. Clever use of a dash there ;) ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
I wonder how often multi-paragraph metadata comes up in real world use? Something like an abstract, IMO, shouldn't be metadata - it should be part of the document. How to *display* that section graphically (e.g. smaller font?, narrower width, etc.) is a problem for CSS/LaTeX/ODF/whatever - *not* for Markdown itself. I published a book via Lulu (a friend's PhD thesis) using MMD - it had an abstract, dedication, acknowledgements, preface, ToC, lists of figures, etc. Each of which was formatted appropriately, without using metadata for any of it. I customized an XSLT to generate the LaTeX I desired, and the result was fantastic. The only problem I have run into with MMD's metadata is that it would be nice to support markup inside some fields but not all, and that has rarely been a problem for me. This was easily remedied in MMD 2, but trickier in MMD 3. I think the best shot at consensus is a basic syntax for general metadata (obviously I'm partial to MMD ;) that covers 90% of what people do. Then a more complicated shared syntax for those variants that want to support the kitchen sink approach to metadata. Not to repeat myself, but I again think we're approaching this from the wrong end. If there's going to be a consensus, I think it's going to have to start with a shared philosophy for the standards. Each variant may end up with it's own philosophy outside of that, but there has to be a common vision for the purpose of the standard. Until that happens, I don't think we'll get anywhere trying to sort out specific implementations for specific features - we don't have a shared understanding of the problem we're trying to solve. F- On Sep 19, 2011, at 11:12 AM, John MacFarlane wrote: +++ David Chambers [Sep 18 11 15:08 ]: If we want to avoid defining our own serialization format, we have two options: we can adopt an existing format (such as JSON or YAML), or we can hand off the responsibility to application developers. Yes, I agree, and I certainly agree that we shouldn't go down the path of reinventing YAML. My proposal was to use lua as a data description language, as it is more texty than json, less quirky than YAML, and more flexible than either. But I don't really expect to get consensus on that. It seems to me that there are three levels at which we might hope to achieve consensus about metadata in markdown: 1. Agreement about which bits of the document are metadata, so these won't be processed as part of the document's text. 2. Agreement about a key-value format, so that all implementations can extract metadata into key/value pairs, with literal string values, in the same way. 3. Agreement about how the values are to be parsed into structured data, which bits are to be parsed as markdown, etc. Consensus on 1 would be useful, because it would prevent your metadata from turning into displayed garbage when processed with another markdown implementaiton. My own proposal on 1 was to put metadata inside specially marked HTML comments. An advantage is that there is *already* agreement among implementations not to make this part of the displayed document, so no agreement is needed. In effect, my proposal already achieves consensus on 1. Another possibility would be to put metadata inside '---' and '---'. This would solve two problems with MMD metadata: it would allow it to occur anywhere in the document, and it would avoid unwanted results when you happen to have a colon in the first line of your text. As for 2, a minimal modification from MMD style metadata would be to allow blank lines in fields, by requiring continuation lines to be indented four spaces. --- field1: Value one. Continued here. Another paragraph. field2: Next field. --- This would work best if we had something like the '---' '---' delimiters, since otherwise you have even more opportunities for unwanted captures (a blank line doesn't end metadata). John ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss -- Fletcher T. Penney fletc...@fletcherpenney.net ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Sep 19, 2011, at 3:28 PM, John MacFarlane wrote: I can think of many reasons for putting an abstract into metadata. The treatment of the abstract (like that of author and title) varies quite a bit depending on the output format. In LaTeX, it goes in a special environment; in HTML, it may go in a special DIV; for some purposes, you may want to omit it entirely and just store it for bibliographic purposes. If the markdown processor pulls it out as metadata, then a templating system can put it where it needs to go in the final document. Now of course, you can always postprocess the output of your markdown processor, locate the abstract, and mess around with the result. But that's uglier and much harder for end users than the approach above, which lunamark takes. Users are more likely to be able to modify a default template than write their own XSLT transformations. John I think it is somewhat of an academic debate whether it is better for a templating system to look for an abstract in the metadata, or to check the first h1 to see if it is called Abstract. I guarantee the computer doesn't care where it's located... I personally think that the raw markdown document looks much better if the abstract is part of the text than if it is part of a complicated metadata markup scheme. In any case, my primary point stands. For any consensus to come about, I think we need to agree on the fundamental purpose and philosophy of the consensus we claim to be interested in. Otherwise many of these discussions will continue to occur without much hope of moving forward to any actual outcome/resolution. Then it's just a bunch of us sitting around explaining why we each think our own dog food is the best. But in the meantime, MMD will continue to march forward --- platform independent processor code (Mac,Windows,*nix, and presumably iOS/Android??) without any external requirements that is lightning fast (Thanks to John's fabulous peg-markdown as a starting point, and Daniel's work on getting rid of the glib2 requirement), a Mac OS X text editor with built-in MMD syntax highlighting, exporting, and editing that I hope to release in the next month or two, and I hope to put together a proof of concept native MMD parser for iOS (built using the same code as the desktop version) if no one else out there beats me to it (which would be a welcome turn of events!) F- -- Fletcher T. Penney fletc...@fletcherpenney.net ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Sep 19, 2011, at 3:28 PM, John MacFarlane wrote: I can think of many reasons for putting an abstract into metadata. The treatment of the abstract (like that of author and title) varies quite a bit depending on the output format. In LaTeX, it goes in a special environment; in HTML, it may go in a special DIV; for some purposes, you may want to omit it entirely and just store it for bibliographic purposes. If the markdown processor pulls it out as metadata, then a templating system can put it where it needs to go in the final document. Those sound like reasons for the metadata to *identify* the abstract, but I see no requirement that it must be literally *stored* there. If the metadata contained something like abstract: relative/path/to/abstract.mdown That would allow for all of the above scenarios while keeping the metadata syntax/section simple. (Obviously, I lean toward Fletcher’s philosophy #2 on this.) -- Rob McBroom http://www.skurfer.com/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
re: Metadata syntax (was Universal syntax for Markdown)
fletcher said: For any consensus to come about, I think we need to agree on the fundamental purpose and philosophy of the consensus we claim to be interested in. it would be nice. Otherwise many of these discussions will continue to occur without much hope of moving forward to any actual outcome/resolution. yep. it's just a bunch of us sitting around explaining why we each think our own dog food is the best. uh-huh. and what's really ironic is that you're not even polling the general-public dogs you hope will eat the food that you're putting out... y'all seem to believe that they'll eat _anything_. But in the meantime, MMD will continue to march forward great to hear it! a Mac OS X text editor with built-in MMD syntax highlighting, exporting, and editing that I hope to release in the next month or two wha... the next month or two? what's the hold-up? and I hope to put together a proof of concept native MMD parser for iOS (built using the same code as the desktop version) if no one else out there beats me to it (which would be a welcome turn of events!) i don't understand why a parser has to be so hard... -bowerbird___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Mon, Sep 19, 2011 at 3:47 AM, John MacFarlane j...@berkeley.edu wrote: Another major problem, in my view, is that if a document starts with a phrase followed by a colon, it gets swallowed into metadata: [...] Also, because this is recognizable as metadata wherever it occurs in the document, one could then drop the requirement that the metadata occur at the top of the document, which I think is undesirable. Another alternative is to re-use the syntax that Markdown already has for document-level metadata: [1]: http://example.com/ [^f1]: A footnote here Perhaps: [title]:Here is the title. [abstract]: The abstract here. As with footnotes, lists etc., indented lines continue the block. [author]: John Not quite as natural as the unbracketed version, but more consistent with Markdown conventions and less likely to cause unpleasant surprises. (The obvious risk is the potential for collision with reference links, but I think it is quite minor, and could be minimized by special-casing metadata at the beginning of a document.) From a syntax perspective, the idea would be that reference link definitions, footnotes, MMD-format references etc. are all removed as metadata. Keys starting with ^ are treated as footnotes, values matching the URI/title form may be re-inserted as reference links, etc. Sam ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
+++ Sam Angove [Sep 20 11 10:33 ]: On Mon, Sep 19, 2011 at 3:47 AM, John MacFarlane j...@berkeley.edu wrote: Another major problem, in my view, is that if a document starts with a phrase followed by a colon, it gets swallowed into metadata: [...] Also, because this is recognizable as metadata wherever it occurs in the document, one could then drop the requirement that the metadata occur at the top of the document, which I think is undesirable. Another alternative is to re-use the syntax that Markdown already has for document-level metadata: [1]: http://example.com/ [^f1]: A footnote here Perhaps: [title]:Here is the title. [abstract]: The abstract here. As with footnotes, lists etc., indented lines continue the block. [author]: John Not quite as natural as the unbracketed version, but more consistent with Markdown conventions and less likely to cause unpleasant surprises. (The obvious risk is the potential for collision with reference links, but I think it is quite minor, and could be minimized by special-casing metadata at the beginning of a document.) From a syntax perspective, the idea would be that reference link definitions, footnotes, MMD-format references etc. are all removed as metadata. Keys starting with ^ are treated as footnotes, values matching the URI/title form may be re-inserted as reference links, etc. I think this is a very nice idea. Authors would have to be careful not to use the same label for a reference link and a piece of metadata, but I don't see that being a big problem. If people didn't like the brackets, then I think the next best idea would be to require a delimiter of some kind, but keep the capacity for multiple paragraphs as with footnotes: --- title:Here is the title. author: John abstract: The abstract here. As with footnotes, lists etc., indented lines continue the block. --- John ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
From: Fletcher T. Penney fletc...@fletcherpenney.net I think the idea of metadata boils down to three perspectives: 1) I don't want it/need it/care about it --- get rid of it 2) I want something easy to write, easy to read, and fits with the Markdown philosophy of as little markup as possible to accomplish the job ---even if not quite as powerful (e.g. MultiMarkdown) 3) I want something powerful/flexible, even if it looks like computer code at the top of my document (e.g. lunamark) Before there can be a unified standard, there has to be a unified philosophy (just like the rest of the standards debate on the list). After some initial excitement that it might be possible to brew up a standard for Markdown extensions I have become disheartened. Metadata is one of the most commonly implemented extensions for Markdown. If we cannot agree that including metadata is important and that any standards should adhere to the fundamental philosophy of Markdown, then there is little hope for consensus. I suppose having Gruber as the the absentee landlord of Markdown is better than turning Markdown it into something completely different than what has worked so well for so many. Many of the proposals that I'm seeing try to solve problems that go far beyond the scope of what Markdown is or should ever be. Here is what I believe to be the appropriate solution for Markdown metadata. * Metadata is specified at the top of the document similar to RFC822 headers. The keys and values may be arbitrary. Multiple lines may be folded as in RFC822. * Metadata lines may be enclosed in an HTML comment to hide metadata if original Markdown is used. * Metadata is omitted from the output except that: * Keys matching standard header elements are included as the appropriate header element. * Keys that do not match standard header elements are included as standard HTML meta element. * If metadata is present in the file then full HTML files are generated by default. This could be suppressed by a switch. * Everything else is left up to the extension or whatever is processing the HTML. For example: Title: Markdown for Dummies Tags: markdown, text, markup header titleMarkdown for Dummies/title meta name=tags content=markdown, text, markup /header ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Mon, Sep 19, 2011 at 2:34 PM, Fletcher T. Penney fletc...@fletcherpenney.net wrote: Not to repeat myself, but I again think we're approaching this from the wrong end. If there's going to be a consensus, I think it's going to have to start with a shared philosophy for the standards. Each variant may end up with it's own philosophy outside of that, but there has to be a common vision for the purpose of the standard. It seems to me that many visions have been expressed here. I'm not sure what more can be done to generate consensus. But I'm happy to try to express my own. And, for what it is worth, bowerbird, this is the vision of user, not a developer :-) I have two visions, and I think they are compatible. One is an agreed upon text-y format for title, author, and date. The other is an agreed upon text-y-as-possible-but-no-doubt-more code-y format for arbitrary metadata. If I were going to push for consensus on one of these rather than the other, it would be the second, but I'd like to see both, and I think, as I've suggested before, that some of the reasons for resisting code-y metadata (elegance and aesthetics, not assuming that all documents are in English) are better thought of as reasons for developing a text-y format for title, author, and date. As for the code-y metadata, I think it is a mistake to think that we can imagine ahead of time all the ways this arbitrary metadata might be used, so I'd like it to be as flexible and powerful as possible. I've already mentioned one vision---the ability to embed bibliographic data in academic papers---but that's just something I think about because I am an academic who often uses markdown to write papers with lots of citations. Markdown is used in so many ways by so many different people---bloggers writing posts, academics writing research papers, scriveners writing novels, developers writing readme's, I say: make it as powerful as feasible and let the users discover new uses. There has been some discussion of whether or not there is any real need for multi-paragraph metadata, focusing on the example of abstracts. I currently use Jekyll for my website. By far the easiest way to generate a blurb for a given page---the sort of thing that on a blog gets shown before the fold---is to toss it into a metadata field and adjust Jekyll's templates to use the content of the blurb. There are no doubt other ways to do this---filters and scripts and pre- or post-processors. But that doesn't take away from the fact that using metadata is one very easy way to do this. So multi-paragraph metadata is something I use regularly in this context. There has been some discussion of whether or not markdown implementations should be responsible for parsing this code-y metadata. I suppose it is part of my vision that markdown implementations do parse this code, and pass it along as appropriate to templates and the like. But John's first point of possible consensus, 1. Agreement about which bits of the document are metadata, so these won't be processed as part of the document's text. would be of great value on its own. I've spent time converting documents from Scrivener or Mellel to MMD, and then to Pandoc's extended markdown. A MMD document with lots of metadata---even with hard line breaks---is, when used with other processors, a markdown file with a bunch of junk at the top that has to be trimmed away. Likewise, I've written documents using Pandoc's title-author-date blocks, and then needed to use those documents with other processors, and that stuff at the top was just so much junk that had to be trimmed away. So if everyone could just agree on what to ignore, that would be a serious improvement. But if markdown implementations are not themselves going to be responsible for parsing the code-y metadata, I would strongly prefer that the metadata be in a format that has existing wide support. I doubt that some decree by the markdown community will have the power to move all the developers who have developed all the various tools that use markdown and rely on metadata. And I think the whole thing is likely to be a nonstarter if it requires that these developers all write parsers for some new fangled format. Even if markdown implementations are going to handle to parsing, I guess someone is also going to need to write tools for translating existing data formats into the new format---unless we are assuming that nobody would ever want to use existing data as metadata in a markdown document? So if there were a standard out there for human writable/machine readable plaintext data that shares the values of markdown, I would think it made more sense to use that, and let the markdown community focus their intellectual energy on markdown. I had naively mentioned YAML in an earlier post just because among us naive users, it has the reputation of being such a standard. But I really don't know anything about plaintext data formats, and have no special affection for
Re: Metadata syntax (was Universal syntax for Markdown)
Well if your dogs are like mine, they will eat practically anything. Lately in addition to their kibble they've been catching pocket gophers and mice. A border collie is much less lovable with 'mouse breath' Respectfully, Sherwood of Sherwood's Forests Sherwood Botsford Sherwood's Forests -- http://Sherwoods-Forests.com 780-848-2548 50042 Range Rd 31 Warburg, Alberta T0C 2T0 On Mon, Sep 19, 2011 at 4:39 PM, bowerb...@aol.com wrote: fletcher said: For any consensus to come about, I think we need to agree on the fundamental purpose and philosophy of the consensus we claim to be interested in. it would be nice. Otherwise many of these discussions will continue to occur without much hope of moving forward to any actual outcome/resolution. yep. it's just a bunch of us sitting around explaining why we each think our own dog food is the best. uh-huh. and what's really ironic is that you're not even polling the general-public dogs you hope will eat the food that you're putting out... y'all seem to believe that they'll eat _anything_. But in the meantime, MMD will continue to march forward great to hear it! a Mac OS X text editor with built-in MMD syntax highlighting, exporting, and editing that I hope to release in the next month or two wha... the next month or two? what's the hold-up? and I hope to put together a proof of concept native MMD parser for iOS (built using the same code as the desktop version) if no one else out there beats me to it (which would be a welcome turn of events!) i don't understand why a parser has to be so hard... -bowerbird ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
+++ David Sanson [Aug 17 11 23:09 ]: First time posting here as well. I've been watching this discussion with interest. As a user of (extended) markdown, I have long hoped for a unified standard for (most or all) markdown extensions and a unified handling of metadata. Thanks for the thoughtful post. A few comments below, describing a metadata experiment I've done in implementing lunamark. It seems to me that one of the issues that arises when we start thinking about metadata is that there really are two different kinds of metadata: some metadata (title, author, date) is---at least in many cases---also part of the *content* of the document. This is the kind of metadata for which I feel the force of the demand for an elegant plaintext solution. For some bold suggestions in this direction, see this [old post by Michael Thompson][1] to the pandoc-discuss list. Here is one of his examples from that post: A Good Man Is Hard To Find Flannery O'Connor Spring 1952 The grandmother didn't want to go to Florida. She wanted to visit some of her connections in east Tennessee and she was seizing at every chance to change Bailey's mind. Isn't that so much *prettier* than any of the options currently in play? Email someone a document like that, and they will know exactly what you mean, and see no distracting markup. No doubt this presents challenges when it comes to parsing, and I have no idea whether or not those challenges are surmountable. Clearly some rules would have to be laid down (Does it have to be centered? Indented? Can I underline the title ala setext? Do I have to have two blank lines after the date? Can I leave the date out? etc.) And it raises issues for backwards compatibility too. But I think its worth having in view a solution that achieves a certain degree of perfection along this one dimension. But then there is the other kind of metadata. Tags, keywords, baseurls, paths to associated files, directives for webpage templating software, and so on and so on. This sort of stuff is definitely not content. It is a bunch of data that I want to associate with the file for some reason or other. It needs to be indefinitely extensible. It is frequently tied directly to some specific output format or context. In other contexts, probably just needs to be ignored. Blosxom taught us that it should all be at the top of the document (and successors, like Jekyll, follow this tradition), but much of it is ugly enough that it could just as well be banished to the bottom of the document, where nobody but the author would ever have to look at it. When it comes to this sort of metadata, I don't see any reason to look for something elegant, language-independent, and plaintext-y. This is where it feels like I just want a way of embedding a block of data within a markdown file, knowing that it won't be treated as content (and, depending on my processor and the context, knowing that it may be sucked up and used in various ways). It is here that I agree with the sentiment that metadata shouldn't be part of the markdown spec, *but* I think markdown should be smart enough to ignore the metadata, so that I don't have to strip it out before feeding the document to a markdown processor. One way to achieve this is to put metadata inside specially marked HTML comments. Then existing markdown parsers will all ignore it (at any rate, it won't display). That's what I did in lunamark's experimental 'lua_metadata' feature. Here's an example: !--@ catalog_number = 23423423A category = fish tags = { Arctic, fish, char } bib = { title = Fishing for Arctic char, author= Samuel Smith, publisher = Alaska Press, year = 2008 } -- Inside the comment we just have lua declarations (they're processed in a sandbox, so metadata can't do anything nasty). This makes the metadata slightly less textual looking, but it gives you the ability to have metadata of various types: string, number, array, key-value table. And it's actually pretty readable -- note that bibtex's format was based on lua tables. One thing that needs to be considered in a metadata format is that some metadata entries need to be parsed as markdown, while others should remain literal (suppose you have a product number with lots of '*' and '[' in it). I handle this by providing a function 'markdown' or 'm' that you can use: !--@ title = mReading *Hamlet*, author = m[Sally Cho](http://sallycho.net) -- It doesn't matter whether you write markdown(foo) m(foo) markdown foo mfoo etc. They all work. It would be possible to define other functions as well, even ones that do IO, and expose them individually without giving access to general IO functions. So
Re: Metadata syntax (was Universal syntax for Markdown)
I think the idea of metadata boils down to three perspectives: 1) I don't want it/need it/care about it --- get rid of it 2) I want something easy to write, easy to read, and fits with the Markdown philosophy of as little markup as possible to accomplish the job ---even if not quite as powerful (e.g. MultiMarkdown) 3) I want something powerful/flexible, even if it looks like computer code at the top of my document (e.g. lunamark) Before there can be a unified standard, there has to be a unified philosophy (just like the rest of the standards debate on the list). My philosophy, and therefore that of MMD, is #2 above. I obviously have more markup than plain Markdown, but I feel that my feature to markup ratio is as good or better than Markdown (obviously a personal opinion, not a fact). The metadata functionality is pretty powerful, fails gracefully when run through plain markdown (if you remember the extra two spaces at the end of lines), but does have some limitations. I reiterate one of my previous posts - if we want to have any sort of consensus for the Markdown derivatives, the first step is agreeing on a philosophy for those standards. Individual variants can still have their own features, but we would need agreement on the core. F- On Sep 18, 2011, at 11:53 AM, John MacFarlane wrote: +++ David Sanson [Aug 17 11 23:09 ]: snipped for brevity - please see original posts -- Fletcher T. Penney fletc...@fletcherpenney.net ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
Yes, the key question is: what's the right balance of flexibility vs. textiness? To my mind, multimarkdown comments just aren't flexible enough: * There's no way to have multiline metadata fields that contain blank lines, e.g. an abstract with two paragraphs. * There's no provision for structured data (e.g. key/value tables or lists), or for boolean or numerical fields. * Metadata fields are interpreted as raw strings, not markdown. That's sometimes what you want, but not always. Titles often contain emphasis and other formatting, for example, and sometimes even footnotes (for acknowledgements). If these are just going into an html meta field, it doesn't much matter, but if you're using the metadata fields in templates, it does. (And sure, you could always run a raw string through your markdown processor again, before passing it to the template engine, but that creates problems for things like reference links and footnotes.) Another major problem, in my view, is that if a document starts with a phrase followed by a colon, it gets swallowed into metadata: % multimarkdown To be or not to be: that is the question. ^D ?xml version=1.0 encoding=UTF-8 standalone=yes ? !DOCTYPE html html xmlns=http://www.w3.org/1999/xhtml; head meta name=tobeornottobe content=that is the question./ /head body /body /html That's not what most authors would expect! For this reason, I would favor something more like reStructuredText field lists, which marks the fields explicitly as fields: :title:Here is the title. :author: John :abstract: The abstract here. It can span multiple lines. As long as the indentation is maintained. This is not part of the metadata. This is slightly less texty because of the leading colon, but less likely to capture regular text. Also, because this is recognizable as metadata wherever it occurs in the document, one could then drop the requirement that the metadata occur at the top of the document, which I think is undesirable. When there's lots of metadata, it's nicer to put it at the bottom (or at least to put some of it at the bottom), so it doesn't interfere with reading the article. lunamark's lua_metadata allows that, by the way -- so you don't have to start the document with something that doesn't look like plain text. One nice point that David Sanson made is that one could combine a simple, texty metadata format for common things like titles and authors with a flexible, more cody format for everything else. One should keep this in mind in thining about how to balance flexibility vs. textiness. John +++ Fletcher T. Penney [Sep 18 11 12:06 ]: I think the idea of metadata boils down to three perspectives: 1) I don't want it/need it/care about it --- get rid of it 2) I want something easy to write, easy to read, and fits with the Markdown philosophy of as little markup as possible to accomplish the job ---even if not quite as powerful (e.g. MultiMarkdown) 3) I want something powerful/flexible, even if it looks like computer code at the top of my document (e.g. lunamark) Before there can be a unified standard, there has to be a unified philosophy (just like the rest of the standards debate on the list). My philosophy, and therefore that of MMD, is #2 above. I obviously have more markup than plain Markdown, but I feel that my feature to markup ratio is as good or better than Markdown (obviously a personal opinion, not a fact). The metadata functionality is pretty powerful, fails gracefully when run through plain markdown (if you remember the extra two spaces at the end of lines), but does have some limitations. I reiterate one of my previous posts - if we want to have any sort of consensus for the Markdown derivatives, the first step is agreeing on a philosophy for those standards. Individual variants can still have their own features, but we would need agreement on the core. F- On Sep 18, 2011, at 11:53 AM, John MacFarlane wrote: +++ David Sanson [Aug 17 11 23:09 ]: snipped for brevity - please see original posts -- Fletcher T. Penney fletc...@fletcherpenney.net ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Sep 18, 2011, at 1:47 PM, John MacFarlane wrote: snipped To my mind, multimarkdown comments just aren't flexible enough: * There's no way to have multiline metadata fields that contain blank lines, e.g. an abstract with two paragraphs. True - but in MMD an abstract would be included in the document with a separate header, not as metadata. But you're correct that blank lines are not allowed. I've never needed them, but they aren't allowed. * There's no provision for structured data (e.g. key/value tables or lists), or for boolean or numerical fields. True. I've never needed them, and have never had them requested. But there is no provision for that. * Metadata fields are interpreted as raw strings, not markdown. That's sometimes what you want, but not always. Titles often contain emphasis and other formatting, for example, and sometimes even footnotes (for acknowledgements). If these are just going into an html meta field, it doesn't much matter, but if you're using the metadata fields in templates, it does. (And sure, you could always run a raw string through your markdown processor again, before passing it to the template engine, but that creates problems for things like reference links and footnotes.) This is a slight difference in behavior from MMD 2. I'm considering approaches to allow processing the contents of the metadata, as this can be an issue occasionally. Another major problem, in my view, is that if a document starts with a phrase followed by a colon, it gets swallowed into metadata: % multimarkdown To be or not to be: that is the question. ^D ?xml version=1.0 encoding=UTF-8 standalone=yes ? !DOCTYPE html html xmlns=http://www.w3.org/1999/xhtml; head meta name=tobeornottobe content=that is the question./ /head body /body /html That's not what most authors would expect! This is true. But a blank line at the top of the document solves the problem. And it doesn't match a URL on the first line as metadata, so I'm not sure how often this really happens in real life. For this reason, I would favor something more like reStructuredText field lists, which marks the fields explicitly as fields: :title:Here is the title. :author: John :abstract: The abstract here. It can span multiple lines. As long as the indentation is maintained. This is not part of the metadata. This is slightly less texty because of the leading colon, but less likely to capture regular text. This becomes a matter of values. To me, the ugliness of this approach outweighs the virtually negligible chance that I will have a document triggering metadata when I don't mean it. But it's certainly not as bad as some other alternatives. If it was proposed as a standard, I would try to vote against it, but would not necessarily boycott it within MultiMarkdown. Also, because this is recognizable as metadata wherever it occurs in the document, one could then drop the requirement that the metadata occur at the top of the document, which I think is undesirable. When there's lots of metadata, it's nicer to put it at the bottom (or at least to put some of it at the bottom), so it doesn't interfere with reading the article. lunamark's lua_metadata allows that, by the way -- so you don't have to start the document with something that doesn't look like plain text. I don't view metadata as necessarily belonging at the bottom, but the flexibility is a bonus. One nice point that David Sanson made is that one could combine a simple, texty metadata format for common things like titles and authors with a flexible, more cody format for everything else. One should keep this in mind in thining about how to balance flexibility vs. textiness. John My vote would be for something more akin to MMD's metadata as the first option, and then for something more robust as the optional variant for those who need it. The cody alternative could allow lists, key value pairs, multiple paragraphs, etc. I suspect it would be used by only a minority of users, but that the minority is going to be over-represented on this discussion list. F- -- Fletcher T. Penney fletc...@fletcherpenney.net ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Sep 18, 2011, at 10:47 AM, John MacFarlane wrote: * There's no provision for structured data (e.g. key/value tables or lists), or for boolean or numerical fields. I'm not convinced that Markdown should have any say as to which data structure a particular value should be transformed into. These are the things I believe Markdown certainly should define: delimiters for metadata blocks (whitespace or otherwise) syntax for key–value pairs valid keys valid values Perhaps Markdown's responsibilities should be limited to the following: ensuring that metadata are omitted from the HTML output storing the key–value pairs (as strings) in a dictionary-like object The reason I lean towards this approach is that the alternative (defining syntax for lists, numbers, etc.) would impose extra syntax in common cases. Take the following, for example: date: Sunday, 22 May 2011 time: 6:30pm zone: America/Los_Angeles tags: JavaScript, regex, regular expressions To a human reader, tags is clearly a list. How, though, would a parser know that tags is a list but date—which also contains a comma—is not? Resolving this ambiguity would require that the tags be wrapped in square brackets (or the addition of some other syntax): date: Sunday, 22 May 2011 time: 6:30pm zone: America/Los_Angeles tags: [JavaScript, regex, regular expressions] What if list items are allowed to contain commas? Perhaps an item may be quoted to resolve this ambiguity. What happens, then, if one wishes to include a quoted item: tags: [foo, bar, baz!] If quotation marks are optional, would this necessitate wrapping baz! in an extra pair? These are certainly edge cases, but as we've agreed defining correct behaviour in such cases is important. If we want to avoid defining our own serialization format, we have two options: we can adopt an existing format (such as JSON or YAML), or we can hand off the responsibility to application developers. I favour the latter, because serialization formats, by necessity, contain quite a bit of punctuation. Transforming strings from a metadata dictionary into appropriate values is something with which I have first-hand experience. Mango provides a META_LISTS setting which determines which keys' (string) values should be transformed in lists. Sure, this required a bit of work on my part, but the end result is pleasing (no extra punctuation in my Markdown files). Won't this lead to a situation where one application cannot correctly process another application's metadata? Yes. If we're unwilling to accept this I fear we'll end up reinventing YAML. ;) David On Sep 18, 2011, at 11:07 AM, Fletcher T. Penney wrote: On Sep 18, 2011, at 1:47 PM, John MacFarlane wrote: snipped To my mind, multimarkdown comments just aren't flexible enough: * There's no way to have multiline metadata fields that contain blank lines, e.g. an abstract with two paragraphs. True - but in MMD an abstract would be included in the document with a separate header, not as metadata. But you're correct that blank lines are not allowed. I've never needed them, but they aren't allowed. * There's no provision for structured data (e.g. key/value tables or lists), or for boolean or numerical fields. True. I've never needed them, and have never had them requested. But there is no provision for that. * Metadata fields are interpreted as raw strings, not markdown. That's sometimes what you want, but not always. Titles often contain emphasis and other formatting, for example, and sometimes even footnotes (for acknowledgements). If these are just going into an html meta field, it doesn't much matter, but if you're using the metadata fields in templates, it does. (And sure, you could always run a raw string through your markdown processor again, before passing it to the template engine, but that creates problems for things like reference links and footnotes.) This is a slight difference in behavior from MMD 2. I'm considering approaches to allow processing the contents of the metadata, as this can be an issue occasionally. Another major problem, in my view, is that if a document starts with a phrase followed by a colon, it gets swallowed into metadata: % multimarkdown To be or not to be: that is the question. ^D ?xml version=1.0 encoding=UTF-8 standalone=yes ? !DOCTYPE html html xmlns=http://www.w3.org/1999/xhtml; head meta name=tobeornottobe content=that is the question./ /head body /body /html That's not what most authors would expect! This is true. But a blank line at the top of the document solves the problem. And it doesn't match a URL on the first line as metadata, so I'm not sure how often this really happens in real life. For this reason, I would favor something more like reStructuredText field lists, which marks the fields explicitly as fields: :title:Here is the title. :author: John :abstract:
Re: Metadata syntax (was Universal syntax for Markdown)
On Aug 26, 2011, at 11:01 AM, bowerb...@aol.com wrote: there's something else that i generally put under metadata -- which other people do not -- which are the specifications used to create the output-formats. these include things like straight-quotes vs. curly, indented paragraphs vs. block, and the pagesize (for .pdf), the font, fontsize, leading, and so on. this allows the end-user who receives the z.m.l. file to create outputs matching what the author intended them to look like. in accordance with the all-text-in-one-file mandate of z.m.l., these specifications should be included in the text-file itself, and can fall in the metadata section, the colophon section, or in their own output specifications section, as you desire... and, of course, end-users can also change the specifications, so as to create output that is formatted to their own desires... -bowerbird I think the definition of such a section, for similar reasons (such metadata would only be considered in certain contexts such as publishing or CMS extensions), was a motivation for the metadata discussion. Alan Hogan___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
re: Metadata syntax (was Universal syntax for Markdown)
all pooped out, are you? oh well, the conversation this time lasted longer than it ever has before, in my memory, so maybe you're just working up your stamina for next time... so let me finish off this round... *** christoph said: A Markdown document may contain metadata in a human readable form that the parser converts to a machine readable form of metadata automatically. A casual reader will understand the content directly and without distraction. Bowerbird will love this. indeed, christoph... because you've begun to describe the very system that i use, for the very reason i use it. i'll describe it more fully below, but first other stuff *** i'm not sure i fully understand the mentality that says implementations of markdown 2.0 can toss metadata. isn't the objective to dispense with implementations that act differently from each other? ok, sure, i'm not naive; i realize that once a standard for markup 2.0 is made, someone will come along and tweak it for their benefit, and then we are once again on the path toward fracture. but still, the goal for here and now is to unify all. right? i feel the same way about command-line switches that turn on different modes, like quirks and extensions. isn't it our zeitgeist to gather everyone under one roof? you'll just ignore (or never learn) features you don't need. so everyone gets what they want. and if it's not possible, if you want to use the system you have been using which is tweaked the way you want it, just continue to do that... it's not like those scripts will stop working or something. but manufacturing a situation where all of the differences are _blessed_ (rather than removed) is counterproductive. *** now on to metadata... as for the color of the metadata bikeshed, we have one shade of paint -- simple -- so that's what it must be... you've probably over-discussed it already, without even getting to the meat of the matter. for _most_ purposes, the metadata is relatively unimportant, which you'll see quite clearly if you only begin to concentrate on specifics. in a .pdf, for example, the metadata consists merely of title, author, subject, creator, and keywords. that's it... in an .epub or a .mobi, you can specify a ton of metadata, if you want, but there's no standardized way of getting it, so you're basically whistling at a noisy construction site... (or doing pantomime in the dark, if you prefer that image.) unless/until the microformat people get an upper-hand -- and lord help us if that kind of bureaucracy wins out -- metadata in .html continues to be a rather iffy thing, so at least for now, i think this issue needs little attention... as for the matter of tags or keywords, they're _lame_, to a large degree, because they can be gleaned from the text itself in most cases. and perhaps more importantly, such descriptive judgments need to be accumulated over the input from hundreds or thousands of objective users, rather than plugged in by a document's author or publisher, or the specter of gaming the system makes it all worthless... i'm not telling people not to use tags, but i think it's obvious that any worthwhile recommendation system will ignore 'em. your metadata often tries to tell lies; google knows the truth. there are a lot of consultants selling metadata as a cure-all. it's more like snake-oil. *** as for my system... as i said, my focus is on _books_, so for me, the concept of the title-page (plus the cover) is the one that rules here. the first section or chapter in a .zml file is the title-page, and _everything_ on that page is considered as metadata. remember that my first pass consists of separating chunks -- a sequence of non-blank lines bordered by blank lines -- so the top chunk (of one or more lines) is defined as the title. the second chunk is considered to be the subtitle, and the third is considered to be the author. the author chunk is required to start with the word by, so if the second chunk starts with by and the third chunk does not, my routines assume that the book has no subtitle, so the second chunk is considered to be the author chunk. subsequent chunks are required to be labeled appropriately, such as edited by or illustrations by or plus additional contributions by or with preface by, and so on. you get the picture; it's clear. other things which commonly appear on the title-page are the publisher's name and often the city where it is located, publication date, contact information for the author(s), etc. none of this is particularly difficult to parse. nor does it sacrifice any power _or_ flexibility. other info about the document is obtained in the course of analyzing it, like the number of chapters and illustrations, the size of the file, the number of references, and so forth. you also have to acknowledge, at some point in time, that no matter what you do, you ain't gonna make a professional book-cataloger happy... and one of my close
Re: Metadata syntax (was Universal syntax for Markdown)
+++ Sam Angove [Aug 18 11 12:26 ]: On Thu, Aug 18, 2011 at 7:29 AM, Fletcher T. Penney fletc...@fletcherpenney.net wrote: The MMD format for metadata was actually taken from the Blosxom software that you mention. And before that, almost certainly taken from the Internet Message Format [1]). MultiMarkdown improves on the IETF version from a user's point of view (and becomes more Markdownish) by making it legal to do lazy line-folding. The result is something that's simple to write, simple to parse, and exactly what a normal person would come up with if you asked them to put some metadata at the top of a text file. I think it's a perfect fit for Markdown. YAML, by contrast, is complicated and outrageously heavy to include as a dependency -- data merging, references, different types of folding, user-defined data-types... you've got to be kidding. I have to agree that YAML is overkill for our purposes, and adds a lot of complexity. ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Metadata syntax (was Universal syntax for Markdown)
1) This is a totally separate issue from the discussion at hand I was responding to, which is how to try and converge the Markdown derivatives, so I renamed the thread 2) That said, there are certainly multiple ways of including metadata information, and I'm happy to discuss. But I do want to be clear that I think this is a secondary discussion, and do not wish to distract from the bigger picture of how to develop a plan for unifying the core features of the Markdown family. How to merge metadata syntaxes should be a secondary or tertiary concern for such an effort. For those who care about metadata syntaxes, read on. If you don't, feel free to skip to the next email that interests you. ;) The MMD format for metadata was actually taken from the Blosxom software that you mention. As you may recall, the first line could be used as a title, but beyond that a syntax basically identical to that of MMD was the most common way of including metadata, and I believe that the plugin responsible was in fact, called metadata. This was necessary to allow information such as dates, categories, etc to be included in the document itself. The ability to include metadata using arbitrary keys created a blossoming (pun intended) of plugins that added many useful features to the blosxom package. Your suggested syntax certainly requires less markup than that used by MMD currently, but at the cost of a great deal of flexibility, and would require more complexity in programming the parser. You mention the English-centric nature of MMD metadata. This is certainly true, but no more so than HTML itself. One could certainly localize MMD to use any language you like (the beauty of open source), but to match your proposal in multiple languages would be quite complicated. For example, the following are valid MMD metadata dates, and easily used: date: 8/17/2011 date: August 17th, 2011 date: 2011-08-17 date: 17/8/2011 date: 14. Juni 2001 date: 8 avril 2000 Writing a parser that would correctly catch all of these dates in any language would be quite difficult, and prone to error. You mention tags as being easily recognized, but that this is not always true: A sample document by John Smith, MD Director of Palliative Care, Division of General Medicine, Medical University of Somewhere While perhaps not the best example of potential problems, this would be incorrectly interpreted as tags, when the author probably implies that this represents his academic affiliation and would like it to be properly placed after his name on the title page, or on the slide deck if generating via beamer. So your example would work for simple metadata that relies only on numerical dates. For documents that fit your desired model, this syntax would be great and would involve less markup --- which is good. However, I suspect that for those who want metadata in their document, it would be too limiting --- which is not good. Many of my users, myself included, would end up right back where we started with needing another way to include metadata. To help give you perspective on the power of the current metadata model, by properly including the right metadata, a single MMD document can be processed into a web page, a pdf slide show (aka powerpoint), and a pdf handout. Another document can be processed into letterhead, complete with logo, return address, recipient information, graphical signature, and even a properly addressed envelope. Another can be output as a properly formatted manuscript for submission to a publisher. I don't expect all users to use the full power of metadata. Many users can simply ignore it altogether. But it is an incredibly useful feature that is one of the primary ways that I integrate MMD into my own personal workflow. It does take a bit of willingness to dig around and experiment in order to understand how metadata works. So while I am certainly interested in ways to improve it, metadata will not be removed from MMD. That doesn't mean I expect all variants to use metadata, just because MMD does. Nor do I expect them to follow the MMD syntax if they do. Other than yours, I haven't seen any proposals for a metadata syntax that had *less* markup than mine, nor did they seem any more human friendly than this syntax. And for my purposes, your proposal doesn't offer the flexibility that I would need for the ways I use MMD. I've tried to throw a few things into your example, to show how it wouldn't work as well for my own use cases: --- Test Document for Automatic Metadata Detection Is this a subtitle, or a continuation of the title from above? by Christoph Freitag date: August 17, 2001 Markdown, Standardization, MMD, Metadata affiliation:University of Somewhere comment:This looks funny aligned
Re: Metadata syntax (was Universal syntax for Markdown)
It is true that certain metadata (author and date, to provide two examples) are used far more frequently than return addresses or URIs for graphical signatures. That said, it would be foolish to try to imagine every way in which metadata might be used, nor do I see much value in doing so. If Markdown is to process metadata, the syntax should support arbitrary key–value pairs. For example: author: Jesper Nøhr date: 17 August 2011 tags: lol, omg, lulz Formatted differently: author: Jesper Nøhr date: 17 August 2011 tag: lol tag: omg tag: lulz If — again, if — Markdown is to be charged with parsing metadata, my opinion is that it's role should be limited to returning a dictionary-like metadata object (in addition to the HTML string generated from the remainder of the document's contents). For the first example: {date: 17 August 2011, tags: lol, omg, lulz, author: Jesper Nøhr} For the second example: {date: 17 August 2011, author: Jesper Nøhr, tag: [lol, omg, lulz]} In my opinion, Markdown should *not* be responsible for any of the following: - splitting lists (note that lol, omg, lulz is a string in the first example) - converting date strings into date objects - any other manipulation of values In other words, every value should be either a string, or an ordered, list-like object containing two or more strings (in the case of a repeated key). In addition to converting strings into appropriate objects, applications making use of Markdown's metadata feature would also be responsible for handling the fact that the value for a particular key may be a string for one document and a list of strings for another. Fletcher touched on another question that should be discussed: should multiline values be accommodated and if so, how? I think it'd be great to support multiline strings. I imagine the formatting looking something like this: author: Jesper Nøhr date: 17 August 2011 lol: Irony keffiyeh pitchfork, mustache letterpress tofu cred twee scenester thundercats gluten-free yr chambray sartorial stumptown. Homo cosby sweater gentrify banh mi letterpress, vinyl beard hoodie terry richardson. Art party whatever banksy, readymade skateboard you probably haven't heard of them tumblr tattooed PBR letterpress photo booth carles vegan organic. omg: VHS carles photo booth food truck synth craft beer, wes anderson tofu banksy fanny pack stumptown. This strikes me as being in the spirit of Markdown, as it's how one might structure this content if one were to produce it on a typewriter. I'm interested to hear people's thoughts on multiline values and on the unfancy approach to metadata parsing that I (currently) favour. David On 17 August 2011 15:17, M Harris m...@2011.n0b.org wrote: So, hi all. First time commenting on the list. I personally think having tags (whether of type author: or type by) is useful for two reasons. One: It allows multiple tags to be entered. Two, it clears up the potential problem listed by Fletcher regarding tags. by Christoph Freitag Affiliation: XYZ by Fletcher T. Penney Affiliation: ABC tags: Markdown, Standardization, MMD, Metadata desc: An interesting discussion of how metadata could be included usefully in Markdown, whilst being readable etc. Regarding the localisation problem then, I thought that this was a solved problem when it came to computing? (At least in the cases of the major world languages.) A parser could have a table of equivalent words, so in English by, en français de (pardon my French*). * By which I mean, I'm not sure that's correct, because I'm only a learner. From: Christoph Freitag m...@christoph-freitag.de Fletcher, sorry, but personally -- despite loving MMD (and even having used MMD CMS for a diary) -- I have never liked the way MMD handles metadata. Partly this is because, not being a native English speaker, I dislike English meta descriptors. A localization could resolve this -- but I still think it looks ugly. However, do you actually need descriptors at all? I doubt it: * The title could be anything at the start of the document. Blosxom is a good example. Anything up to the first blank line is the title. * After that, anything between the first blank line and the second blank line would be treated as additional metadata. * Instead of the Author: descriptor, explicitely stated, it should suffice to write by. What follows is the name of the author. (Localization would be easier as only this keyword would have to be known to the parser in a number of languages.) * Dates would be self-explanatory, to a clever parser. * Any list of words separated by commas on a single line would be treated as tags. * Any more fanciful meta descriptors might be given explicitly just as in MMD before. This could be left to non-standard, personalized variants of Markdown. Thus the following would be a valid document: --- Test
Re: Metadata syntax (was Universal syntax for Markdown)
On Aug 17, 2011, at 6:17 PM, David Chambers david.chambers...@gmail.com wrote: I'm interested to hear people's thoughts on multiline values and on the unfancy approach to metadata parsing that I (currently) favour. I agree that: - multiline values are a must - arbitrary key/value pairs are a must When you describe the syntax you envision, I am just thinking, why redefine YAML? (In that case, if you'll allow me a moment of glibness, lets call that syntax YAYAML.) AFAIK both YAML and JSON allow for representation of the same data types (numbers, strings, arrays, objects/dictionaries). If we pick a format as preferred metadata syntax, my vote is for YAML. It's already defined, already implemented, already proven, and fairly natural. Hell, I'm TextMate, for example, the YAML bundle would simply apply to the appropriate section of the Markdown 2 doc (like JavaScript or PHP among HTML)! Alan ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Thu, Aug 18, 2011 at 7:29 AM, Fletcher T. Penney fletc...@fletcherpenney.net wrote: The MMD format for metadata was actually taken from the Blosxom software that you mention. And before that, almost certainly taken from the Internet Message Format [1]). MultiMarkdown improves on the IETF version from a user's point of view (and becomes more Markdownish) by making it legal to do lazy line-folding. The result is something that's simple to write, simple to parse, and exactly what a normal person would come up with if you asked them to put some metadata at the top of a text file. I think it's a perfect fit for Markdown. YAML, by contrast, is complicated and outrageously heavy to include as a dependency -- data merging, references, different types of folding, user-defined data-types... you've got to be kidding. [1]: http://tools.ietf.org/html/rfc2822#section-2.2 ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
On Aug 17, 2011, at 7:26 PM, Sam Angove peas...@gmail.com wrote: YAML, by contrast, is complicated and outrageously heavy to include as a dependency -- data merging, references, different types of folding, user-defined data-types... you've got to be kidding. Wow, I seem to have a vastly over-simplified conception of YAML. Alan ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Metadata syntax (was Universal syntax for Markdown)
First time posting here as well. I've been watching this discussion with interest. As a user of (extended) markdown, I have long hoped for a unified standard for (most or all) markdown extensions and a unified handling of metadata. It seems to me that one of the issues that arises when we start thinking about metadata is that there really are two different kinds of metadata: some metadata (title, author, date) is---at least in many cases---also part of the *content* of the document. This is the kind of metadata for which I feel the force of the demand for an elegant plaintext solution. For some bold suggestions in this direction, see this [old post by Michael Thompson][1] to the pandoc-discuss list. Here is one of his examples from that post: A Good Man Is Hard To Find Flannery O'Connor Spring 1952 The grandmother didn't want to go to Florida. She wanted to visit some of her connections in east Tennessee and she was seizing at every chance to change Bailey's mind. Isn't that so much *prettier* than any of the options currently in play? Email someone a document like that, and they will know exactly what you mean, and see no distracting markup. No doubt this presents challenges when it comes to parsing, and I have no idea whether or not those challenges are surmountable. Clearly some rules would have to be laid down (Does it have to be centered? Indented? Can I underline the title ala setext? Do I have to have two blank lines after the date? Can I leave the date out? etc.) And it raises issues for backwards compatibility too. But I think its worth having in view a solution that achieves a certain degree of perfection along this one dimension. But then there is the other kind of metadata. Tags, keywords, baseurls, paths to associated files, directives for webpage templating software, and so on and so on. This sort of stuff is definitely not content. It is a bunch of data that I want to associate with the file for some reason or other. It needs to be indefinitely extensible. It is frequently tied directly to some specific output format or context. In other contexts, probably just needs to be ignored. Blosxom taught us that it should all be at the top of the document (and successors, like Jekyll, follow this tradition), but much of it is ugly enough that it could just as well be banished to the bottom of the document, where nobody but the author would ever have to look at it. When it comes to this sort of metadata, I don't see any reason to look for something elegant, language-independent, and plaintext-y. This is where it feels like I just want a way of embedding a block of data within a markdown file, knowing that it won't be treated as content (and, depending on my processor and the context, knowing that it may be sucked up and used in various ways). It is here that I agree with the sentiment that metadata shouldn't be part of the markdown spec, *but* I think markdown should be smart enough to ignore the metadata, so that I don't have to strip it out before feeding the document to a markdown processor. Here is an extreme version of this: extant implementations of citeproc support JSON as a bibliography format. Imagine they supported YAML. Then imagine being able to stick something like this at the *end* of your markdown file, --- story: title: A Good Man is Hard to Find author: Flannery O'Connor date: Spring 1952 key: oconnor1952 story: title: The Old Man and the Sea author: Ernest Hemingway date: Sep 1952 key: hemingway1952 ... and then being able to treat the same file as both your markdown file and your bibliography database, knowing that, when you run it through the markdown parser, that chunk of metadata will be ignored, and when you feed it as a database to your citeproc implementation, the markdown will be ignored. This is just one example of the sort of flexibility and power that you might get from supporting arbitrary blocks of data within markdown files. So, here is my *pipe dream* implementation of metadata in markdown: 1. A syntax for clean, language independent title, author, date (and ?) that looks the way you would have done it on a typewriter or in a plaintext email. 2. Support for embedding arbitrary metadata inside of appropriate delimiters (e.g., YAML's '---' and '...') *anywhere* within the document. I would then add, that, for simplicity, all markdown processors should look into the arbitrary metadata for a few common bits of metadata, namely, title, date, and author (perhaps with proper localizations). That way, I could write beautiful plaintext markdown, providing title, author, date as part of the content, if I wanted too, but if I was lazy, or was using a bunch of metadata and preferred to keep it all in one place, I could instead just specify that as metadata along with all the rest. I guess this means that I
Re: Metadata syntax (was Universal syntax for Markdown)
As for the heaviness of YAML as a dependency, I think it would reasonable to expect markdown itself to handle only the simplest YAML constructs when trying to find the few bits of metadata---title, author, date---that it should be responsible for. ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss