Hello,
I've added org-element.el in contrib directory. It is a complete parser and interpreter for Org syntax. While it was written to be extensible, it is also an attempt to normalize current syntax and provide guidance for its evolution. Org syntax can be divided into three categories: "Greater elements", "Elements" and "Objects". An object can be defined anywhere on a line. It may span over more than a line but never contains a blank one. Objects belong to the following types: `emphasis', `entity', `export-snippet', `footnote-reference', `inline-babel-call', `inline-src-block', `latex-fragment', `line-break', `link', `macro', `radio-target', `statistics-cookie', `subscript', `superscript', `target', `time-stamp' and `verbatim'. An element always starts and ends at the beginning of a line. The only element's type containing objects is called a `paragraph'. Other types are: `comment', `comment-block', `example-block', `export-block', `fixed-width', `horizontal-rule', `keyword', `latex-environment', `babel-call', `property-drawer', `quote-section', `src-block', `table' and `verse-block'. Elements containing paragraphs are called greater elements. Concerned types are: `center-block', `drawer', `dynamic-block', `footnote-definition', `headline', `inlinetask', `item', `plain-list', `quote-block' and `special-block'. Greater elements (excepted `headline' and `item' types) and elements (excepted `keyword', `babel-call', and `property-drawer' types) can have a fixed set of keywords as attributes. Those are called "affiliated keywords", to distinguish them from others keywords, which are full-fledged elements. In particular, the "name" affiliated keyword allows to label almost any element in an Org buffer. Notwithstanding affiliated keywords, each greater element, element and object has a fixed set of properties attached to it. Among them, three are shared by all types: `:begin' and `:end', which refer to the beginning and ending buffer positions of the considered element or object, and `:post-blank', which holds the number of blank lines, or white spaces, at its end. Some elements also have special properties whose value can hold objects themselves (i.e. an item tag, an headline name, a table cell). Such values are called "secondary strings". Lisp-wise, an element or an object can be represented as a list. It follows the pattern (TYPE PROPERTIES CONTENTS), where: TYPE is a symbol describing the Org element or object. PROPERTIES is the property list attached to it. See docstring of appropriate parsing function to get an exhaustive list. CONTENTS is a list of elements, objects or raw strings contained in the current element or object, when applicable. An Org buffer is a nested list of such elements and objects, whose type is `org-data' and properties is nil. The first part of this file implements a parser and an interpreter for each type of Org syntax. The next two parts introduce two accessors and a function retrieving the smallest element containing point (respectively `org-element-get-property', `org-element-get-contents' and `org-element-at-point'). The following part creates a fully recursive buffer parser. It also provides a tool to map a function to elements or objects matching some criteria in the parse tree. Functions of interest are `org-element-parse-buffer', `org-element-map' and, to a lesser extent, `org-element-parse-secondary-string'. The penultimate part is the cradle of an interpreter for the obtained parse tree: `org-element-interpret-data' (and its relative, `org-element-interpret-secondary'). The library ends by furnishing a set of interactive tools for element's navigation and manipulation. More specifically, that last part includes some tools like `org-element-forward', `org-element-backward', `org-element-drag-forward', `org-element-drag-backward', `org-element-mark-element', `org-element-up', `org-element-unindent-buffer'... For the impatient (well, not quite as you're still reading this), you can evaluate the following examples in an Org buffer : (org-element-parse-buffer) (org-element-parse-buffer 'headline) (org-element-parse-buffer 'headline 'visible-only) Also, the following code will parse the buffer, interpret the parsed tree, and create a canonical copy of it (no indentation, lowercased blocks, standard keywords): #+begin_src org (let ((out (org-element-interpret-data (org-element-parse-buffer)))) (switch-to-buffer (get-buffer-create "*Bijectivep*")) (erase-buffer) (insert out) (goto-char (point-min)) (org-mode)) #+end_src Beside allowing to add keywords like "#+name:", "#+caption:" or "#+attr_latex:" to almost any Org element, it also introduces two less noticable changes: 1. "#+label:" keywords are deprecated in favor of "#+name:". Though, for now, "label" is still considered as a synonym of "name". 2. Protected HTML snippets (like @<b>) are no longer supported, as they were too specific. Instead, a general mechanism to inline back-end specific commands is created. Thus, the HTML back-end will see "<b>some text<\b>" while the LaTeX one will only see "some text" if the buffer contains: @html{<b>}some text@html{<\b>} Syntax is heavier, but a configurable variable allows to define shortcuts, allowing to reduce it to, for example, @h{<b>}. No shortcut is provided by default. Also, the syntax is experimental, and may change if proven to be inadequate. I will commit a generic exporter built on top of Elements, along with a LaTeX back-end, in a couple of days. Feedback is welcome. Regards, -- Nicolas Goaziou