Hello,

I've added org-element.el in contrib directory. It is a complete parser
and interpreter for Org syntax.

While it was written to be extensible, it is also an attempt to
normalize current syntax and provide guidance for its evolution.

Org syntax can be divided into three categories: "Greater elements",
"Elements" and "Objects".

An object can be defined anywhere on a line. It may span over more than
a line but never contains a blank one. Objects belong to the following
types: `emphasis', `entity', `export-snippet', `footnote-reference',
`inline-babel-call', `inline-src-block', `latex-fragment', `line-break',
`link', `macro', `radio-target', `statistics-cookie', `subscript',
`superscript', `target', `time-stamp' and `verbatim'.

An element always starts and ends at the beginning of a line. The only
element's type containing objects is called a `paragraph'. Other types
are: `comment', `comment-block', `example-block', `export-block',
`fixed-width', `horizontal-rule', `keyword', `latex-environment',
`babel-call', `property-drawer', `quote-section', `src-block', `table'
and `verse-block'.

Elements containing paragraphs are called greater elements. Concerned
types are: `center-block', `drawer', `dynamic-block',
`footnote-definition', `headline', `inlinetask', `item', `plain-list',
`quote-block' and `special-block'.

Greater elements (excepted `headline' and `item' types) and elements
(excepted `keyword', `babel-call', and `property-drawer' types) can have
a fixed set of keywords as attributes. Those are called "affiliated
keywords", to distinguish them from others keywords, which are
full-fledged elements. In particular, the "name" affiliated keyword
allows to label almost any element in an Org buffer.

Notwithstanding affiliated keywords, each greater element, element and
object has a fixed set of properties attached to it. Among them, three
are shared by all types: `:begin' and `:end', which refer to the
beginning and ending buffer positions of the considered element or
object, and `:post-blank', which holds the number of blank lines, or
white spaces, at its end.

Some elements also have special properties whose value can hold objects
themselves (i.e. an item tag, an headline name, a table cell). Such
values are called "secondary strings".

Lisp-wise, an element or an object can be represented as a list. It
follows the pattern (TYPE PROPERTIES CONTENTS), where: TYPE is a symbol
describing the Org element or object. PROPERTIES is the property list
attached to it. See docstring of appropriate parsing function to get an
exhaustive list. CONTENTS is a list of elements, objects or raw strings
contained in the current element or object, when applicable.

An Org buffer is a nested list of such elements and objects, whose type
is `org-data' and properties is nil.

The first part of this file implements a parser and an interpreter for
each type of Org syntax.

The next two parts introduce two accessors and a function retrieving the
smallest element containing point (respectively
`org-element-get-property', `org-element-get-contents' and
`org-element-at-point').

The following part creates a fully recursive buffer parser. It also
provides a tool to map a function to elements or objects matching some
criteria in the parse tree. Functions of interest are
`org-element-parse-buffer', `org-element-map' and, to a lesser extent,
`org-element-parse-secondary-string'.

The penultimate part is the cradle of an interpreter for the obtained
parse tree: `org-element-interpret-data' (and its relative,
`org-element-interpret-secondary').

The library ends by furnishing a set of interactive tools for element's
navigation and manipulation.

More specifically, that last part includes some tools like
`org-element-forward', `org-element-backward',
`org-element-drag-forward', `org-element-drag-backward',
`org-element-mark-element', `org-element-up',
`org-element-unindent-buffer'... 

For the impatient (well, not quite as you're still reading this), you
can evaluate the following examples in an Org buffer :

                       (org-element-parse-buffer)
                  (org-element-parse-buffer 'headline)
           (org-element-parse-buffer 'headline 'visible-only)

Also, the following code will parse the buffer, interpret the parsed
tree, and create a canonical copy of it (no indentation, lowercased
blocks, standard keywords):

#+begin_src org
(let ((out (org-element-interpret-data (org-element-parse-buffer))))
  (switch-to-buffer (get-buffer-create "*Bijectivep*"))
  (erase-buffer)
  (insert out)
  (goto-char (point-min))
  (org-mode))
#+end_src

Beside allowing to add keywords like "#+name:", "#+caption:" or
"#+attr_latex:" to almost any Org element, it also introduces two less
noticable changes:

  1. "#+label:" keywords are deprecated in favor of "#+name:". Though,
     for now, "label" is still considered as a synonym of "name".

  2. Protected HTML snippets (like @<b>) are no longer supported, as
     they were too specific.

     Instead, a general mechanism to inline back-end specific commands
     is created. Thus, the HTML back-end will see "<b>some text<\b>"
     while the LaTeX one will only see "some text" if the buffer
     contains:

                     @html{<b>}some text@html{<\b>}

     Syntax is heavier, but a configurable variable allows to define
     shortcuts, allowing to reduce it to, for example, @h{<b>}. No
     shortcut is provided by default.

     Also, the syntax is experimental, and may change if proven to be
     inadequate.


I will commit a generic exporter built on top of Elements, along with
a LaTeX back-end, in a couple of days.

Feedback is welcome.


Regards,

-- 
Nicolas Goaziou

Reply via email to