Hi David and all, David Masterson <dsmasterson92...@outlook.com> writes: > Sebastian Miele <sebastian.mi...@gmail.com> writes: >> Currently org-syntax.org says that "TITLE can be made of any >> character but a new line. Though, it will match after every other >> part have been matched." This does not reflect the currently >> effective behavior that "* :t:" is a headline with title ":t:" and no >> tags. > > Can you describe what should happen in a parser grammar (ie. BNF)? If > not, I would tend toward rethinking the structure of the Org file so > that it can be described in a grammar. Having a good grammar for Org > files will promote it's acceptance beyond Emacs.
I do not know whether it can be expressed in a context-free grammar, although it may very well be possible. However, the way I understand the above quote from org-syntax.org (which is, I think, in the end preferable) is concisely expressible in a regular expression language that can distinguish between greedy and non-greedy matching of subexpressions, including Emacs Lisp's regular expressions: #+BEGIN_SRC elisp (rx line-start (maximal-match STARS SPACE) (maximal-match (optional KEYWORD SPACE)) (maximal-match (optional PRIORITY SPACE)) (maximal-match (optional COMMENT SPACE)) (minimal-match (optional TITLE SPACE)) (maximal-match (optional TAGS)) (maximal-match (optional SPACE)) line-end) #+END_SRC SPACE is (1+ (any " \t")). TITLE is (1+ not-newline). In the following, I concentrate on differences from org-syntax.org. The above expression contains COMMENT (matching "COMMENT") not as part of the title but as separate entity. Although this is contrary to org-syntax.org, it is how it is implemented now, e.g., in org-element-headline-parser. TAGS currently effectively is (seq ":" (1+ TAG ":")). In particular, that means a TAGS specification in a headline must define at least one tag. I suggest to change that into (seq ":" (0+ TAG ":")), i.e., to also allow TAGS specifications of zero tags (just ":"). This would enable to clearly disambuate the following ambiguity between TITLEs and TAGS: #+BEGIN_SRC org ,* :t: ,* :t: : #+END_SRC The former headline would have empty TITLE and TAGS ":t:". The latter headline would have TITLE ":t:" and TAGS ":". The following toy can be used to test some cases. It is not complete, but contains the essential. #+BEGIN_SRC elisp (defun f (x) (let ((r (rx line-start (maximal-match (group (1+ "*")) (1+ (any " \t"))) (maximal-match (group (optional "TODO" (1+ (any " \t"))))) (minimal-match (optional (group (1+ not-newline)) (1+ (any " \t")))) (maximal-match (group (optional (seq ":" (0+ (any "a-z") ":"))))) (maximal-match (optional (1+ (any " \t")))) line-end))) (when (let (case-fold-search) (string-match r x)) (list :stars (match-string 1 x) :todo (match-string 2 x) :title (let ((title (match-string 3 x))) (if title title "")) :tags (match-string 4 x))))) (f "*** :t: : ") ;(:stars "***" :todo "" :title ":t:" :tags ":") (f "*** :t: ") ;(:stars "***" :todo "" :title "" :tags ":t:") #+END_SRC Best wishes Sebastian