Hey Nicolas, this looks very detailed and I think it could be useful for people trying to write other parsers implementations for org-mode. Thanks for sharing!
By the way, does it exist somewhere a set of examples of Emacs org-mode -> html conversion for all org-mode features? (How are changes from org-mode -> html converstion from Emacs tested during development?) I am mantaining the org-ruby gem which is used to render org-mode texts to html, and currently there is no "roadmap" of features to implement for it. As a result, features and tweaks are added to the library as long as someone submits a ticket requesting the feature in Github. (Here is a list of the export features supported in case someone wants to take a look: https://github.com/bdewey/org-ruby/tree/master/spec/html_examples ) Having a set of examples features from org-mode would be very useful to see how much coverage other implementations of org-mode exporting features have. Cheers everyone, keep org-mode being an awesome tool :) - Waldemar On Sat, Mar 9, 2013 at 7:06 AM, Nicolas Goaziou <n.goaz...@gmail.com> wrote: > Hello, > > "Nicolas Richard" <theonewiththeevill...@yahoo.fr> writes: > >> Nicolas Goaziou <n.goaz...@gmail.com> writes: >>> As discussed a few days ago, here is a document describing the complete >>> Org syntax as read by the parser. I also added some comments. I am going >>> to put the Org file on Worg, so anyone can update it and fix mistakes. >> >> [for the record, the org file mentionned by Nicolas is currently at >> <http://orgmode.org/worg/dev/org-syntax.org>] >> >> This looks truly awesome. I give some (naïve) comments below, from my >> non-expert point of view. > > Thank you for your comments. > >>> The paragraph is the unit of measurement. An element defines >>> syntactical parts that are at the same level as a paragraph, i.e. which >>> cannot contain or be included in a paragraph. An object is a part that >>> could be included in an element. Greater elements are all parts that >>> can contain an element. >> >> This is very clear but I'm slightly worried about confusion that might come >> from "Greater element" not being an "element", and the word "element" >> being a common word : > > element means "Element + Greater Element". It is to be understood as the > opposite of object. I think there shouldn't be much ambiguity according > to context. > >>> Empty lines belong to the largest element ending before them. For >>> example, in a list, empty lines between items belong are part of the >>> item before them, but empty lines at the end of a list belong to the >>> plain list element. >> >> Is the word "element" (in /largest element ending.../) to be understood >> as an "element" from the above definition ? I guess not (this would >> require both list items and plain lists to be on the level 'element', >> from your example) > > Again, it's a shortcut for "in the largest element or greater element > ending before them". > >>> 1 Headlines and Sections >>> ════════════════════════ >>> >>> A headline is defined as: >>> >>> ╭──── >>> │ STARS KEYWORD PRIORITY TITLE TAGS >>> ╰──── >>> >>> STARS is a string starting at column 0 and containing at least one >>> asterisk (and up to `org-inlinetask-min-level' if `org-inlinetask' >>> library is loaded). It’s the sole compulsory part of a headline. >> >> Perhaps it should be mentionned that STARS has to end by a space (see >> below). > > I agree. > >> I suggest adding : The number of stars defines the level of the >> headline. > > Does it belong to the syntax definition? Level is how Org uses syntax > internally. Also the sentence, although right, is misleading, because > level definition also depends on `org-odd-levels-only'. > >>> KEYWORD is a TODO keyword, which have to belong to the list defined in >>> `org-todo-keywords'. Case is significant. >> >> The option #+TODO: is used also. > > Then it should be ~org-todo-keywords-1~, which is where all TODO > keywords are added eventually. > >>> PRIORITY is a priority cookie, i.e. a single letter preceded by a hash >>> sign # and enclosed within square brackets. Case is significant. >> >> I suggest dropping "Case is significant" (or maybe give the whole story : >> IIRC, it is the ascii code of the given letter that is used as >> priority) > > I'm not sure that the purpose of this document should be to explain how > syntax will be used. > >>> ╭──── >>> │ * >> >> I don't see a space character after that one in your email and it >> doesn't seem to be recognized as a headline by the exporter (hence my >> above suggestion) >> >>> If the first word appearing in the title is `org-comment-keyword', >>> the >> >> That should be `org-comment-string' I guess. > > Indeed. Btw, I think this variable should be a defconst, not > a defcustom. It just makes things harder for little benefit. > >>> A headline contains directly at most one section, followed by any >>> number of headlines. Only a section can contain another section. >> >> From what I understand, "A section is delimited by two headlines (and >> buffer limits)." [I initially thought it was "by two headlines of the >> same level", which it is not from the structure example you give >> later.] > > "Only a section can contain another section" is wrong. It should be > removed. > >>> A section contains directly any greater element or element. Only >>> a headline can contain a section. As an exception, text before the >>> first headline in the document also belongs to a section. >> >> >>> In a quoted headline contains a section, the latter will be considered >>> as a “quote section”. >> >> s/In/If/ > > Yes. > >> unsure: s/quote section/quoted section/ ? > > No, it is "quote section". > >>> BACKEND is a string constituted of alpha-numeric characters, hyphens >>> or underscores. >> >> I suggest: BACKEND is a string which is an element of (mapcar 'car >> org-export-registered-backends). > > Not really. Parser can understand #+attr_foo even if foo is not > registered as a valid back-end. > >>> OPTIONAL and VALUE can contain any character but a new line. Only >>> keywords in `org-element-dual-keywords' can have an optional value. >> >> I guess OPTIONAL cannot contain a closing square bracket ] > > It can. > >>> An affiliated keyword can appear on multiple lines if KEY belongs to >>> `org-element-multiple-keywords' or if its pattern is “#+ATTR_BACKEND: >>> VALUE”. >> >> I suggest s/on multiple lines/more than once/ > > Ok. > >>> PARAMETERS can contain any character, and can be omitted. >> >> any other than new line, I guess. > > Correct. > >>> CONTENTS can contain any element, but another greater block of the >>> same type. >> >> What is the type of a greater block ? the /name/ ? > > Yes. I think it should be better to say something like: CONTENTS cannot > contain the string "#+END_NAME" on a line on its own. > >> I did have a quick look at the rest of your mail, and it is very nice to >> have all of it written down explicitly, so again a big thanks for all of >> this (and the rest of your) work. Unfortunately I don't have much time >> right now to read it thoroughtfully, so just one single comment : >> >>> Even the LaTeX community suggests to use `\(...\)' over >>> `$...$'. — ngz >> >> AFAIK that's not for technical reasons and also I would be curious to >> know who does that in real documents : '$' is so much more convenient. > > Yes, I mixed $$...$$ and $...$. This sentence could be removed. Though > I still maintain my POV about $...$. It may be convenient in a latex > file, but in a free-form text format like Org, it's error prone. > > I also forgot to write about optional #+tblfm: line below Org tables. > > Would you (or Someone) mind updating the org-syntax.org file on Worg? > > Thank you again. > > > Regards, > > -- > Nicolas Goaziou >