Ihor Radchenko <yanta...@gmail.com> writes: > I am attaching a tentative patch that will make Org export remove > zero-width spaces when those spaces actually separate the object > boundaries. > > Any objections?
Given the raised objections, zero-width space does not appear to be a useful escape symbol because it has its valid uses as a standalone space symbol. The raised objections can be solved using some kind of intricate heuristics, but I do not feel like it is a good direction to go. The code will be too complex and fragile. Therefore, I am proposing a different approach for shielding fontification: introducing a special entity. The new entity is \--, which is a valid boundary between emphasis markup. It will be removed during export (replaced by ""). "\--" specifically is somewhat arbitrary choice. The actual requirements for the entity name are: (1) No clash with LaTeX (which is why simpler \- would not cut it); (2) Being a valid markup boundary: entity must end with (any space ?- ?\( ?' ?\" ?\{). I am attaching a tentative patch introducing the new entity. Note that some minor tweaks to the parser were needed. I do not see it as a big deal - the current entity regexp has much more cumbersome exceptions. Also, the patch will not work correctly on org → org export, similar to pointed in one of the replies to the previous abandoned approach. I do not want to address it here because a much more appropriate solution for this issue is changing org-element-interpret-data. Consider (org-element-interpret-data '("asd" (bold () "bold") "bsd")) This will return "asd*bold*bsd", which is not correct even though the given Org datum is not wrong by itself - such things can easily appear when user filters are applied to parse tree during org→org export. Otherwise, the patch should be good enough to play around and kick-start the discussion. WDYT? Best, Ihor
>From 521a4b06578cf37f22e9f33d2f45b967419ad3a3 Mon Sep 17 00:00:00 2001 Message-Id: <521a4b06578cf37f22e9f33d2f45b967419ad3a3.1659013441.git.yanta...@gmail.com> From: Ihor Radchenko <yanta...@gmail.com> Date: Thu, 28 Jul 2022 21:02:26 +0800 Subject: [PATCH] Add new entity \-- serving as markup separator/escape symbol * lisp/org-entities.el (org-entities): Add \-- entity. This entity is exported as an empty string and simply serves as markup separator if the user needs any. * lisp/org.el (org-fontify-entities): * lisp/org-element.el (org-element-entity-parser): (org-element--set-regexps): Update entity regexp to match "-". --- lisp/org-element.el | 4 ++-- lisp/org-entities.el | 4 ++++ lisp/org.el | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/lisp/org-element.el b/lisp/org-element.el index 9e9b7c5ec..6405b4db8 100644 --- a/lisp/org-element.el +++ b/lisp/org-element.el @@ -258,7 +258,7 @@ (defun org-element--set-regexps () "\\$" ;; Objects starting with "\": line break, ;; entity, latex fragment. - "\\\\\\(?:[a-zA-Z[(]\\|\\\\[ \t]*$\\|_ +\\)" + "\\\\\\(?:[-a-zA-Z[(]\\|\\\\[ \t]*$\\|_ +\\)" ;; Objects starting with raw text: inline Babel ;; source block, inline Babel call. "\\(?:call\\|src\\)_")) @@ -3158,7 +3158,7 @@ (defun org-element-entity-parser () Assume point is at the beginning of the entity." (catch 'no-object - (when (looking-at "\\\\\\(?:\\(?1:_ +\\)\\|\\(?1:there4\\|sup[123]\\|frac[13][24]\\|[a-zA-Z]+\\)\\(?2:$\\|{}\\|[^[:alpha:]]\\)\\)") + (when (looking-at "\\\\\\(?:\\(?1:_ +\\)\\|\\(?1:there4\\|sup[123]\\|frac[13][24]\\|[a-zA-Z-]+\\)\\(?2:$\\|{}\\|[^[:alpha:]]\\)\\)") (save-excursion (let* ((value (or (org-entity-get (match-string 1)) (throw 'no-object nil))) diff --git a/lisp/org-entities.el b/lisp/org-entities.el index d35e3fa8a..9d79d23fc 100644 --- a/lisp/org-entities.el +++ b/lisp/org-entities.el @@ -264,6 +264,10 @@ (defconst org-entities ("rsaquo" "\\guilsinglright{}" nil "›" ">" ">" "›") "* Other" + + "** Escaping Org markup" + ("--" "" nil "" "" "" "") + "** Misc. (often used)" ("circ" "\\^{}" nil "ˆ" "^" "^" "∘") ("vert" "\\vert{}" t "|" "|" "|" "|") diff --git a/lisp/org.el b/lisp/org.el index 937892ef3..29ccff83b 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -5828,7 +5828,7 @@ (defun org-fontify-entities (limit) ;; i.e., "\_ ", could be fontified anyway, and it would be ;; confusing when adding a second white space character. (while (re-search-forward - "\\\\\\(there4\\|sup[123]\\|frac[13][24]\\|[a-zA-Z]+\\)\\($\\|{}\\|[^[:alpha:]\n]\\)" + "\\\\\\(there4\\|sup[123]\\|frac[13][24]\\|[a-zA-Z-]+\\)\\($\\|{}\\|[^[:alpha:]\n]\\)" limit t) (when (and (not (org-at-comment-p)) (setq ee (org-entity-get (match-string 1))) -- 2.35.1