Hi David and all,

David Masterson <dsmasterson92...@outlook.com> writes:
> Sebastian Miele <sebastian.mi...@gmail.com> writes:
>> Currently org-syntax.org says that "TITLE can be made of any
>> character but a new line.  Though, it will match after every other
>> part have been matched."  This does not reflect the currently
>> effective behavior that "* :t:" is a headline with title ":t:" and no
>> tags.
>
> Can you describe what should happen in a parser grammar (ie. BNF)?  If
> not, I would tend toward rethinking the structure of the Org file so
> that it can be described in a grammar.  Having a good grammar for Org
> files will promote it's acceptance beyond Emacs.

I do not know whether it can be expressed in a context-free grammar,
although it may very well be possible.  However, the way I understand
the above quote from org-syntax.org (which is, I think, in the end
preferable) is concisely expressible in a regular expression language
that can distinguish between greedy and non-greedy matching of
subexpressions, including Emacs Lisp's regular expressions:

#+BEGIN_SRC elisp
(rx line-start
    (maximal-match STARS SPACE)
    (maximal-match (optional KEYWORD SPACE))
    (maximal-match (optional PRIORITY SPACE))
    (maximal-match (optional COMMENT SPACE))
    (minimal-match (optional TITLE SPACE))
    (maximal-match (optional TAGS))
    (maximal-match (optional SPACE))
    line-end)
#+END_SRC

SPACE is (1+ (any " \t")).  TITLE is (1+ not-newline).  In the
following, I concentrate on differences from org-syntax.org.

The above expression contains COMMENT (matching "COMMENT") not as part
of the title but as separate entity.  Although this is contrary to
org-syntax.org, it is how it is implemented now, e.g., in
org-element-headline-parser.

TAGS currently effectively is (seq ":" (1+ TAG ":")).  In particular,
that means a TAGS specification in a headline must define at least one
tag.

I suggest to change that into (seq ":" (0+ TAG ":")), i.e., to also
allow TAGS specifications of zero tags (just ":").  This would enable to
clearly disambuate the following ambiguity between TITLEs and TAGS:

#+BEGIN_SRC org
,* :t:
,* :t: :
#+END_SRC

The former headline would have empty TITLE and TAGS ":t:".  The latter
headline would have TITLE ":t:" and TAGS ":".

The following toy can be used to test some cases.  It is not complete,
but contains the essential.

#+BEGIN_SRC elisp
(defun f (x)
  (let ((r (rx line-start
               (maximal-match (group (1+ "*")) (1+ (any " \t")))
               (maximal-match (group (optional "TODO" (1+ (any " \t")))))
               (minimal-match (optional (group (1+ not-newline)) (1+ (any " 
\t"))))
               (maximal-match (group (optional (seq ":" (0+ (any "a-z") ":")))))
               (maximal-match (optional (1+ (any " \t"))))
               line-end)))
    (when (let (case-fold-search) (string-match r x))
      (list :stars (match-string 1 x)
            :todo  (match-string 2 x)
            :title (let ((title (match-string 3 x))) (if title title ""))
            :tags  (match-string 4 x)))))

(f "*** :t:  :  ") ;(:stars "***" :todo "" :title ":t:" :tags ":")
(f "***    :t:  ") ;(:stars "***" :todo "" :title ""    :tags ":t:")
#+END_SRC

Best wishes
Sebastian

Reply via email to