Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
>> I am not sure if "Org *10*.0" is a good general example. It is probably >> one of those cases when users want fine control over emphasis and must >> use zero width space. > > This is simply the first example that crossed my mind. My point is that > changing the regexp substantially may not be rewarding, ultimately. Agree. At least not until we find a clearly better variant. Yet, I do not like the current state of things. Especially with regards to mixing emphasis and hidden parts of links. Situations like in /A link [[https://orgmode.org/?oops=true][Org Mode]] or like reported in https://list.orgmode.org/orgmode/87pmtqp79s@web.de/ can be very hard to notice and cause "mysterious" Org failures. >> +Sometimes, when marked text also contains the marker character itself, >> +the result may be unsettling. For example, [ 8 more citation lines. Click/Enter to show. ] >> + >> +#+begin_example >> +/One may expect this whole sentence to be italicized, but the >> +following ~user/?variable~ contains =/= character, which effectively >> +stops emphasis there./ >> +#+end_example >> + >> +You can use zero width space to help Org sorting out the ambiguity. >> +See [[*Escape Character]] for more details. > > LGTM! Applied. Best, Ihor
Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
Hello, Ihor Radchenko writes: > Commit messages are also important, especially years later. I updated > the commit message in the attached new version of the patch. Note I'm not saying commit messages are not important. I just won't spend energy on the wording there. >> Thinking about it a bit more, you might be right: we may slightly change >> the closing part of the emphasis regexp, e.g.: >> >> (seq >>(not space) >>(group ,mark) >>(or (any space ?- ?') >>(and (any ?. ?, ?\; ?: ?! ?? ?\" ?\) ?\} ?\\ ?\[) (or space line-end)) >>line-end)) >> >> The logic behind this is that in regular text, we assume usual >> punctuation rules apply. > > This will fail for "*Bold*?!" or "/Italics/!!!" Of course. Any regexp will fail somehow. > Also, is there any reason why we are not simply using punctuation > character class instead of listing punctuation chars explicitly (and > only for English)? What about "_你叫什么名字_?" > > Maybe just > > (seq > (not space) > (group ,mark) > (0+ (in punctuation)) > (or space line-end)) Historically, Org only focused on ASCII. But it makes sense to extend the allowed punctuation characters, indeed. This is orthogonal to OP's issue, however. >> My concern is that the more complicated is the rule, the more difficult >> it is to predict. Also, we introduce new corner case, e.g., >> >> Woot! I just released Org *10*.0! >> >> So, I'm not totally convinced it is worth the trouble. > > I am not sure if "Org *10*.0" is a good general example. It is probably > one of those cases when users want fine control over emphasis and must > use zero width space. This is simply the first example that crossed my mind. My point is that changing the regexp substantially may not be rewarding, ultimately. > +Sometimes, when marked text also contains the marker character itself, > +the result may be unsettling. For example, > + > +#+begin_example > +/One may expect this whole sentence to be italicized, but the > +following ~user/?variable~ contains =/= character, which effectively > +stops emphasis there./ > +#+end_example > + > +You can use zero width space to help Org sorting out the ambiguity. > +See [[*Escape Character]] for more details. LGTM! Regards, -- Nicolas Goaziou
Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
Nicolas Goaziou writes: > Thanks for the update, and apologies in advance for being bold, as > I have some additional comments about it. Constructive critics and suggestions are always welcome. And we do not have pressing deadlines here :) >> * doc/org-manual.org (Emphasis and Monospace): Advice users to insert >> zero width space when Org does not parse emphasized text correctly. > > Org _does_ parse emphasized text correctly. It may be seen as > unintuitive, but it's really a fontification problem. Anyway, this is > just a commit message… Agree. It just that the example in the patch _feels_ wrong considering intuitive definition of verbatim borrowed from LaTeX. Commit messages are also important, especially years later. I updated the commit message in the attached new version of the patch. > Thinking about it a bit more, you might be right: we may slightly change > the closing part of the emphasis regexp, e.g.: > > (seq >(not space) >(group ,mark) >(or (any space ?- ?') >(and (any ?. ?, ?\; ?: ?! ?? ?\" ?\) ?\} ?\\ ?\[) (or space line-end)) >line-end)) > > The logic behind this is that in regular text, we assume usual > punctuation rules apply. This will fail for "*Bold*?!" or "/Italics/!!!" Also, is there any reason why we are not simply using punctuation character class instead of listing punctuation chars explicitly (and only for English)? What about "_你叫什么名字_?" Maybe just (seq (not space) (group ,mark) (0+ (in punctuation)) (or space line-end)) > My concern is that the more complicated is the rule, the more difficult > it is to predict. Also, we introduce new corner case, e.g., > > Woot! I just released Org *10*.0! > > So, I'm not totally convinced it is worth the trouble. I am not sure if "Org *10*.0" is a good general example. It is probably one of those cases when users want fine control over emphasis and must use zero width space. Best, Ihor >From 9ad522e8d1f1184ef097611fc30b326b08d5b432 Mon Sep 17 00:00:00 2001 Message-Id: <9ad522e8d1f1184ef097611fc30b326b08d5b432.1637486504.git.yanta...@gmail.com> From: Ihor Radchenko Date: Fri, 19 Nov 2021 19:27:56 +0800 Subject: [PATCH] org-manual.org: Clarify how to handle markup ambiguity * doc/org-manual.org (Emphasis and Monospace): Advice users to insert zero width space to force Org ignore emphasis markers. --- doc/org-manual.org | 12 1 file changed, 12 insertions(+) diff --git a/doc/org-manual.org b/doc/org-manual.org index 1d0213934..19f42fc77 100644 --- a/doc/org-manual.org +++ b/doc/org-manual.org @@ -10818,6 +10818,18 @@ ** Emphasis and Monospace ~org-fontify-emphasized-text~ to ~nil~. To narrow down the list of available markup syntax, you can customize ~org-emphasis-alist~. +Sometimes, when marked text also contains the marker character itself, +the result may be unsettling. For example, + +#+begin_example +/One may expect this whole sentence to be italicized, but the +following ~user/?variable~ contains =/= character, which effectively +stops emphasis there./ +#+end_example + +You can use zero width space to help Org sorting out the ambiguity. +See [[*Escape Character]] for more details. + ** Subscripts and Superscripts :PROPERTIES: :DESCRIPTION: Simple syntax for raising/lowering text. -- 2.32.0
Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
Hello, Ihor Radchenko writes: > I updated the patch. If you have no objections on the new wording, I > will push it to main. Thanks for the update, and apologies in advance for being bold, as I have some additional comments about it. > * doc/org-manual.org (Emphasis and Monospace): Advice users to insert > zero width space when Org does not parse emphasized text correctly. Org _does_ parse emphasized text correctly. It may be seen as unintuitive, but it's really a fontification problem. Anyway, this is just a commit message… > +=*=, =/=, =_=, ===, and =~= symbols inside =verbatim= or ~code~ can > +sometimes produce unexpected markup. OK, but it's not limited to symbols within verbatim or code. What about something like: Sometimes, when marked text also contains the marker character itself, the result may be unsettling. ...example follows (see below)... > +#+begin_example > +/The whole line is supposed to be marked italic, but the following > +~user/?variable~ contains italics =/= marker and confuses Org parser/. > +#+end_example The whole line is not supposed to be marked as italic, as long as we follow Org syntax. And the parser is not confused at all. The user may be, however. I suggest: /One may expect this whole sentence to be italicized, but the following ~user/?variable~ contains =/= character, which effectively stops emphasis there./ > +You can use zero width space to help Org sorting out the ambiguity. > +See [[*Escape Character]] for more details. Thinking about it a bit more, you might be right: we may slightly change the closing part of the emphasis regexp, e.g.: (seq (not space) (group ,mark) (or (any space ?- ?') (and (any ?. ?, ?\; ?: ?! ?? ?\" ?\) ?\} ?\\ ?\[) (or space line-end)) line-end)) The logic behind this is that in regular text, we assume usual punctuation rules apply. My concern is that the more complicated is the rule, the more difficult it is to predict. Also, we introduce new corner case, e.g., Woot! I just released Org *10*.0! So, I'm not totally convinced it is worth the trouble. Regards, -- Nicolas Goaziou
Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
Nicolas Goaziou writes: >> +Sometimes, Org mode has difficulties recognising markup. It usually >> +happens when markup marker symbols are present inside verbatim or code >> +blocks: > > I disagree with the wording. Org mode has no difficulties recognizing > markup nor does it parses text incorrectly. This is similar to the well > known surprise: > > #+begin_example > * something > #+end_example > > Here, we are simply fooled by the fontification process. > > Otherwise, the patch looks good to me. I updated the patch. If you have no objections on the new wording, I will push it to main. >> + ;; Verify that we are at the right object. >> + (let ((object (save-excursion >> + (save-match-data >> + (goto-char (match-beginning 2)) >> + (org-element-context) >> + (and (memq (org-element-type object) >> +'(bold italic verbatim code strike-through)) >> + (eq (match-beginning 2) >> + (org-element-property :begin object)) > > I sympathize with the idea. As long as fontification does not rely on > the parser, we will face issues like the one at stake. So, ultimately, > Org will hopefully move in that direction. > > However, I'm not convinced making such local changes is going to help us > in the long run. It may be more fruitful to think this evolution > globally, as it involves a fair bit of rewriting of the fontification > process. For example, it may only be necessary to parse the part of the > Org document being fontified only once, and then apply all fontification > functions to the resulting tree rather than have each of them calling > the parser, like the above. > > In any case, I think fontification deserves a dedicated thread, in > addition to some love. Ok. I will create a new discussion thread on fontification. Best, Ihor >From a1a497a80578669ef1e96700aa592aadd8d0d7ec Mon Sep 17 00:00:00 2001 Message-Id: From: Ihor Radchenko Date: Fri, 19 Nov 2021 19:27:56 +0800 Subject: [PATCH] org-manual.org: Clarify how to handle markup ambiguity * doc/org-manual.org (Emphasis and Monospace): Advice users to insert zero width space when Org does not parse emphasized text correctly. --- doc/org-manual.org | 11 +++ 1 file changed, 11 insertions(+) diff --git a/doc/org-manual.org b/doc/org-manual.org index a38dbec4a..9615209a1 100644 --- a/doc/org-manual.org +++ b/doc/org-manual.org @@ -10818,6 +10818,17 @@ ** Emphasis and Monospace ~org-fontify-emphasized-text~ to ~nil~. To narrow down the list of available markup syntax, you can customize ~org-emphasis-alist~. +=*=, =/=, =_=, ===, and =~= symbols inside =verbatim= or ~code~ can +sometimes produce unexpected markup. For example, + +#+begin_example +/The whole line is supposed to be marked italic, but the following +~user/?variable~ contains italics =/= marker and confuses Org parser/. +#+end_example + +You can use zero width space to help Org sorting out the ambiguity. +See [[*Escape Character]] for more details. + ** Subscripts and Superscripts :PROPERTIES: :DESCRIPTION: Simple syntax for raising/lowering text. -- 2.32.0
Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
Hello, Ihor Radchenko writes: > * doc/org-manual.org (Emphasis and Monospace): Advice users to insert > zero width space when Org does not parse emphasized text correctly. > --- [...] > +Sometimes, Org mode has difficulties recognising markup. It usually > +happens when markup marker symbols are present inside verbatim or code > +blocks: I disagree with the wording. Org mode has no difficulties recognizing markup nor does it parses text incorrectly. This is similar to the well known surprise: #+begin_example * something #+end_example Here, we are simply fooled by the fontification process. Otherwise, the patch looks good to me. > + ;; Verify that we are at the right object. > + (let ((object (save-excursion > + (save-match-data > + (goto-char (match-beginning 2)) > + (org-element-context) > + (and (memq (org-element-type object) > +'(bold italic verbatim code strike-through)) > + (eq (match-beginning 2) > + (org-element-property :begin object)) I sympathize with the idea. As long as fontification does not rely on the parser, we will face issues like the one at stake. So, ultimately, Org will hopefully move in that direction. However, I'm not convinced making such local changes is going to help us in the long run. It may be more fruitful to think this evolution globally, as it involves a fair bit of rewriting of the fontification process. For example, it may only be necessary to parse the part of the Org document being fontified only once, and then apply all fontification functions to the resulting tree rather than have each of them calling the parser, like the above. In any case, I think fontification deserves a dedicated thread, in addition to some love. Regards, -- Nicolas Goaziou
[PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
Nicolas Goaziou writes: > You can use a zero-width space to help Org sorting out the ambiguity, > for example (| denotes the zero-width space): > > /|A link [[https://orgmode.org/?oops=true][Org Mode]] > > /A code ~user|/?my-user-variable~ with slash/ Makes sense. Maybe we should also mention it in the Markup section of the manual? I attached a tentative patch. Also, part of the problem with /|A link [[https://orgmode.org/?oops=true][Org Mode]] is that the link is emphasised despite not being parser as a link by org-element. I attached a patch for our link/emphasis fontification that makes sure that fontification is always consistent with the parser. The patch might hit the performance slightly, but I do not see obvious slowdown using my 15Mb notes file. Best, Ihor >From 3b4a857582e848e9688a49c01b853ed577dd4935 Mon Sep 17 00:00:00 2001 Message-Id: <3b4a857582e848e9688a49c01b853ed577dd4935.1637321577.git.yanta...@gmail.com> From: Ihor Radchenko Date: Fri, 19 Nov 2021 19:27:56 +0800 Subject: [PATCH] org-manual.org: Clarify how to handle markup ambiguity * doc/org-manual.org (Emphasis and Monospace): Advice users to insert zero width space when Org does not parse emphasized text correctly. --- doc/org-manual.org | 12 1 file changed, 12 insertions(+) diff --git a/doc/org-manual.org b/doc/org-manual.org index a38dbec4a..1db993d3d 100644 --- a/doc/org-manual.org +++ b/doc/org-manual.org @@ -10818,6 +10818,18 @@ ** Emphasis and Monospace ~org-fontify-emphasized-text~ to ~nil~. To narrow down the list of available markup syntax, you can customize ~org-emphasis-alist~. +Sometimes, Org mode has difficulties recognising markup. It usually +happens when markup marker symbols are present inside verbatim or code +blocks: + +#+begin_example +/The whole line is supposed to be marked italic, but the following +~user/?variable~ contains italics =/= marker and confuses Org parser/. +#+end_example + +You can use zero width space to help Org sorting out the ambiguity. +See [[*Escape Character]] for more details. + ** Subscripts and Superscripts :PROPERTIES: :DESCRIPTION: Simple syntax for raising/lowering text. -- 2.32.0 >From d5413e772fe6aedb8a8aa094f98c96fc20b82d3a Mon Sep 17 00:00:00 2001 Message-Id: From: Ihor Radchenko Date: Fri, 19 Nov 2021 19:13:54 +0800 Subject: [PATCH] org.el: Make emphasis and link fontification consistent with parser * lisp/org.el (org-do-emphasis-faces): (org-activate-links): Do not fontify just based on approximate regexps. Make sure the current object is emphasis. --- lisp/org.el | 62 - 1 file changed, 33 insertions(+), 29 deletions(-) diff --git a/lisp/org.el b/lisp/org.el index cb1b58c51..d9f073100 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -5106,12 +5106,15 @@ (defun org-do-emphasis-faces (limit) (looking-at-p org-outline-regexp-bol ;; Match full emphasis markup regexp. (looking-at (if verbatim? org-verbatim-re org-emph-re)) - ;; Do not span over paragraph boundaries. - (not (string-match-p org-element-paragraph-separate - (match-string 2))) - ;; Do not span over cells in table rows. - (not (and (save-match-data (org-match-line "[ \t]*|")) - (string-match-p "|" (match-string 4)) + ;; Verify that we are at the right object. + (let ((object (save-excursion + (save-match-data + (goto-char (match-beginning 2)) + (org-element-context) + (and (memq (org-element-type object) +'(bold italic verbatim code strike-through)) + (eq (match-beginning 2) + (org-element-property :begin object)) (pcase-let ((`(,_ ,face ,_) (assoc marker org-emphasis-alist)) (m (if org-hide-emphasis-markers 4 2))) (font-lock-prepend-text-property @@ -5206,7 +5209,7 @@ (defun org-activate-links (limit) (eq 'org-tag face)) (let* ((link-object (save-excursion (goto-char start) -(save-match-data (org-element-link-parser +(save-match-data (org-element-context (link (org-element-property :raw-link link-object)) (type (org-element-property :type link-object)) (path (org-element-property :path link-object)) @@ -5229,29 +5232,30 @@ (defun org-activate-links (limit) ((and (pred functionp) f) (funcall f)) (_ `(:uri ,link))) 'font-lock-multiline t))) - (org-remove-flyspell-overlays-in start end) - (org-rear-nonsticky-at end) - (if (not (eq 'bracket style)) - (progn +(when (eq (org-element-type link-object) 'link) + (org-remove-flyspell-overlays-in start end) + (org-rear-nonsticky-at end) + (if (not (eq 'bracket style)) + (progn +(add-face-text-property start end face-property) +