Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’

2021-11-23 Thread Ihor Radchenko


>> I am not sure if "Org *10*.0" is a good general example. It is probably
>> one of those cases when users want fine control over emphasis and must
>> use zero width space.
>
> This is simply the first example that crossed my mind. My point is that
> changing the regexp substantially may not be rewarding, ultimately.

Agree. At least not until we find a clearly better variant.

Yet, I do not like the current state of things. Especially with regards
to mixing emphasis and hidden parts of links. Situations like in
/A link [[https://orgmode.org/?oops=true][Org Mode]]
or like reported in
https://list.orgmode.org/orgmode/87pmtqp79s@web.de/ can be very hard
to notice and cause "mysterious" Org failures.

>> +Sometimes, when marked text also contains the marker character itself,
>> +the result may be unsettling.  For example,
[ 8 more citation lines. Click/Enter to show. ]
>> +
>> +#+begin_example
>> +/One may expect this whole sentence to be italicized, but the
>> +following ~user/?variable~ contains =/= character, which effectively
>> +stops emphasis there./
>> +#+end_example
>> +
>> +You can use zero width space to help Org sorting out the ambiguity.
>> +See [[*Escape Character]] for more details.
>
> LGTM!

Applied.

Best,
Ihor



Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’

2021-11-22 Thread Nicolas Goaziou
Hello,

Ihor Radchenko  writes:

> Commit messages are also important, especially years later. I updated
> the commit message in the attached new version of the patch.

Note I'm not saying commit messages are not important. I just won't
spend energy on the wording there.

>> Thinking about it a bit more, you might be right: we may slightly change
>> the closing part of the emphasis regexp, e.g.:
>>
>>   (seq
>>(not space)
>>(group ,mark)
>>(or (any space ?- ?')
>>(and (any ?. ?, ?\; ?: ?! ?? ?\" ?\) ?\} ?\\ ?\[) (or space line-end))
>>line-end))
>>
>> The logic behind this is that in regular text, we assume usual
>> punctuation rules apply.
>
> This will fail for "*Bold*?!" or "/Italics/!!!"

Of course. Any regexp will fail somehow.

> Also, is there any reason why we are not simply using punctuation
> character class instead of listing punctuation chars explicitly (and
> only for English)? What about "_你叫什么名字_?"
>
> Maybe just
>
> (seq
>  (not space)
>  (group ,mark)
>  (0+ (in punctuation))
>  (or space line-end))

Historically, Org only focused on ASCII. But it makes sense to extend
the allowed punctuation characters, indeed.

This is orthogonal to OP's issue, however.

>> My concern is that the more complicated is the rule, the more difficult
>> it is to predict. Also, we introduce new corner case, e.g.,
>>
>>   Woot! I just released Org *10*.0!
>>
>> So, I'm not totally convinced it is worth the trouble.
>
> I am not sure if "Org *10*.0" is a good general example. It is probably
> one of those cases when users want fine control over emphasis and must
> use zero width space.

This is simply the first example that crossed my mind. My point is that
changing the regexp substantially may not be rewarding, ultimately.

> +Sometimes, when marked text also contains the marker character itself,
> +the result may be unsettling.  For example,
> +
> +#+begin_example
> +/One may expect this whole sentence to be italicized, but the
> +following ~user/?variable~ contains =/= character, which effectively
> +stops emphasis there./
> +#+end_example
> +
> +You can use zero width space to help Org sorting out the ambiguity.
> +See [[*Escape Character]] for more details.

LGTM!

Regards,
-- 
Nicolas Goaziou



Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’

2021-11-21 Thread Ihor Radchenko
Nicolas Goaziou  writes:

> Thanks for the update, and apologies in advance for being bold, as
> I have some additional comments about it.

Constructive critics and suggestions are always welcome. And we do not
have pressing deadlines here :)

>> * doc/org-manual.org (Emphasis and Monospace): Advice users to insert
>> zero width space when Org does not parse emphasized text correctly.
>
> Org _does_ parse emphasized text correctly. It may be seen as
> unintuitive, but it's really a fontification problem. Anyway, this is
> just a commit message…

Agree. It just that the example in the patch _feels_ wrong considering
intuitive definition of verbatim borrowed from LaTeX.

Commit messages are also important, especially years later. I updated
the commit message in the attached new version of the patch.

> Thinking about it a bit more, you might be right: we may slightly change
> the closing part of the emphasis regexp, e.g.:
>
>   (seq
>(not space)
>(group ,mark)
>(or (any space ?- ?')
>(and (any ?. ?, ?\; ?: ?! ?? ?\" ?\) ?\} ?\\ ?\[) (or space line-end))
>line-end))
>
> The logic behind this is that in regular text, we assume usual
> punctuation rules apply.

This will fail for "*Bold*?!" or "/Italics/!!!"

Also, is there any reason why we are not simply using punctuation
character class instead of listing punctuation chars explicitly (and
only for English)? What about "_你叫什么名字_?"

Maybe just

(seq
 (not space)
 (group ,mark)
 (0+ (in punctuation))
 (or space line-end))

> My concern is that the more complicated is the rule, the more difficult
> it is to predict. Also, we introduce new corner case, e.g.,
>
>   Woot! I just released Org *10*.0!
>
> So, I'm not totally convinced it is worth the trouble.

I am not sure if "Org *10*.0" is a good general example. It is probably
one of those cases when users want fine control over emphasis and must
use zero width space.

Best,
Ihor

>From 9ad522e8d1f1184ef097611fc30b326b08d5b432 Mon Sep 17 00:00:00 2001
Message-Id: <9ad522e8d1f1184ef097611fc30b326b08d5b432.1637486504.git.yanta...@gmail.com>
From: Ihor Radchenko 
Date: Fri, 19 Nov 2021 19:27:56 +0800
Subject: [PATCH] org-manual.org: Clarify how to handle markup ambiguity

* doc/org-manual.org (Emphasis and Monospace): Advice users to insert
zero width space to force Org ignore emphasis markers.
---
 doc/org-manual.org | 12 
 1 file changed, 12 insertions(+)

diff --git a/doc/org-manual.org b/doc/org-manual.org
index 1d0213934..19f42fc77 100644
--- a/doc/org-manual.org
+++ b/doc/org-manual.org
@@ -10818,6 +10818,18 @@ ** Emphasis and Monospace
 ~org-fontify-emphasized-text~ to ~nil~.  To narrow down the list of
 available markup syntax, you can customize ~org-emphasis-alist~.
 
+Sometimes, when marked text also contains the marker character itself,
+the result may be unsettling.  For example,
+
+#+begin_example
+/One may expect this whole sentence to be italicized, but the
+following ~user/?variable~ contains =/= character, which effectively
+stops emphasis there./
+#+end_example
+
+You can use zero width space to help Org sorting out the ambiguity.
+See [[*Escape Character]] for more details.
+
 ** Subscripts and Superscripts
 :PROPERTIES:
 :DESCRIPTION: Simple syntax for raising/lowering text.
-- 
2.32.0



Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’

2021-11-20 Thread Nicolas Goaziou
Hello,

Ihor Radchenko  writes:

> I updated the patch. If you have no objections on the new wording, I
> will push it to main.

Thanks for the update, and apologies in advance for being bold, as
I have some additional comments about it.

> * doc/org-manual.org (Emphasis and Monospace): Advice users to insert
> zero width space when Org does not parse emphasized text correctly.

Org _does_ parse emphasized text correctly. It may be seen as
unintuitive, but it's really a fontification problem. Anyway, this is
just a commit message…

> +=*=, =/=, =_=, ===, and =~= symbols inside =verbatim= or ~code~ can
> +sometimes produce unexpected markup.  

OK, but it's not limited to symbols within verbatim or code. What about
something like:

  Sometimes, when marked text also contains the marker character itself,
  the result may be unsettling.

  ...example follows (see below)...

> +#+begin_example
> +/The whole line is supposed to be marked italic, but the following
> +~user/?variable~ contains italics =/= marker and confuses Org parser/.
> +#+end_example

The whole line is not supposed to be marked as italic, as long as we
follow Org syntax. And the parser is not confused at all. The user may
be, however.

I suggest:

  /One may expect this whole sentence to be italicized, but the
  following ~user/?variable~ contains =/= character, which effectively
  stops emphasis there./

> +You can use zero width space to help Org sorting out the ambiguity.
> +See [[*Escape Character]] for more details.

Thinking about it a bit more, you might be right: we may slightly change
the closing part of the emphasis regexp, e.g.:

  (seq
   (not space)
   (group ,mark)
   (or (any space ?- ?')
   (and (any ?. ?, ?\; ?: ?! ?? ?\" ?\) ?\} ?\\ ?\[) (or space line-end))
   line-end))

The logic behind this is that in regular text, we assume usual
punctuation rules apply.

My concern is that the more complicated is the rule, the more difficult
it is to predict. Also, we introduce new corner case, e.g.,

  Woot! I just released Org *10*.0!

So, I'm not totally convinced it is worth the trouble.

Regards,
-- 
Nicolas Goaziou



Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’

2021-11-19 Thread Ihor Radchenko
Nicolas Goaziou  writes:

>> +Sometimes, Org mode has difficulties recognising markup.  It usually
>> +happens when markup marker symbols are present inside verbatim or code
>> +blocks:
>
> I disagree with the wording. Org mode has no difficulties recognizing
> markup nor does it parses text incorrectly. This is similar to the well
> known surprise:
>
>   #+begin_example
>   * something
>   #+end_example
>
> Here, we are simply fooled by the fontification process.
>
> Otherwise, the patch looks good to me.

I updated the patch. If you have no objections on the new wording, I
will push it to main.

>> +   ;; Verify that we are at the right object.
>> +   (let ((object (save-excursion
>> +   (save-match-data
>> + (goto-char (match-beginning 2))
>> + (org-element-context)
>> + (and (memq (org-element-type object)
>> +'(bold italic verbatim code strike-through))
>> +  (eq (match-beginning 2)
>> +  (org-element-property :begin object))
>
> I sympathize with the idea. As long as fontification does not rely on
> the parser, we will face issues like the one at stake. So, ultimately,
> Org will hopefully move in that direction.
>
> However, I'm not convinced making such local changes is going to help us
> in the long run. It may be more fruitful to think this evolution
> globally, as it involves a fair bit of rewriting of the fontification
> process. For example, it may only be necessary to parse the part of the
> Org document being fontified only once, and then apply all fontification
> functions to the resulting tree rather than have each of them calling
> the parser, like the above.
>
> In any case, I think fontification deserves a dedicated thread, in
> addition to some love.

Ok. I will create a new discussion thread on fontification.

Best,
Ihor

>From a1a497a80578669ef1e96700aa592aadd8d0d7ec Mon Sep 17 00:00:00 2001
Message-Id: 
From: Ihor Radchenko 
Date: Fri, 19 Nov 2021 19:27:56 +0800
Subject: [PATCH] org-manual.org: Clarify how to handle markup ambiguity

* doc/org-manual.org (Emphasis and Monospace): Advice users to insert
zero width space when Org does not parse emphasized text correctly.
---
 doc/org-manual.org | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/doc/org-manual.org b/doc/org-manual.org
index a38dbec4a..9615209a1 100644
--- a/doc/org-manual.org
+++ b/doc/org-manual.org
@@ -10818,6 +10818,17 @@ ** Emphasis and Monospace
 ~org-fontify-emphasized-text~ to ~nil~.  To narrow down the list of
 available markup syntax, you can customize ~org-emphasis-alist~.
 
+=*=, =/=, =_=, ===, and =~= symbols inside =verbatim= or ~code~ can
+sometimes produce unexpected markup.  For example,
+
+#+begin_example
+/The whole line is supposed to be marked italic, but the following
+~user/?variable~ contains italics =/= marker and confuses Org parser/.
+#+end_example
+
+You can use zero width space to help Org sorting out the ambiguity.
+See [[*Escape Character]] for more details.
+
 ** Subscripts and Superscripts
 :PROPERTIES:
 :DESCRIPTION: Simple syntax for raising/lowering text.
-- 
2.32.0



Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’

2021-11-19 Thread Nicolas Goaziou
Hello,

Ihor Radchenko  writes:

> * doc/org-manual.org (Emphasis and Monospace): Advice users to insert
> zero width space when Org does not parse emphasized text correctly.
> ---

[...]

> +Sometimes, Org mode has difficulties recognising markup.  It usually
> +happens when markup marker symbols are present inside verbatim or code
> +blocks:

I disagree with the wording. Org mode has no difficulties recognizing
markup nor does it parses text incorrectly. This is similar to the well
known surprise:

  #+begin_example
  * something
  #+end_example

Here, we are simply fooled by the fontification process.

Otherwise, the patch looks good to me.

> +   ;; Verify that we are at the right object.
> +   (let ((object (save-excursion
> +   (save-match-data
> + (goto-char (match-beginning 2))
> + (org-element-context)
> + (and (memq (org-element-type object)
> +'(bold italic verbatim code strike-through))
> +  (eq (match-beginning 2)
> +  (org-element-property :begin object))

I sympathize with the idea. As long as fontification does not rely on
the parser, we will face issues like the one at stake. So, ultimately,
Org will hopefully move in that direction.

However, I'm not convinced making such local changes is going to help us
in the long run. It may be more fruitful to think this evolution
globally, as it involves a fair bit of rewriting of the fontification
process. For example, it may only be necessary to parse the part of the
Org document being fontified only once, and then apply all fontification
functions to the resulting tree rather than have each of them calling
the parser, like the above.

In any case, I think fontification deserves a dedicated thread, in
addition to some love.

Regards,
-- 
Nicolas Goaziou



[PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’

2021-11-19 Thread Ihor Radchenko
Nicolas Goaziou  writes:

> You can use a zero-width space to help Org sorting out the ambiguity,
> for example (| denotes the zero-width space):
>
>   /|A link [[https://orgmode.org/?oops=true][Org Mode]]
>
>   /A code ~user|/?my-user-variable~ with slash/

Makes sense. Maybe we should also mention it in the Markup section of
the manual? I attached a tentative patch.

Also, part of the problem with
/|A link [[https://orgmode.org/?oops=true][Org Mode]]
is that the link is emphasised despite not being parser as a link by
org-element. I attached a patch for our link/emphasis fontification that
makes sure that fontification is always consistent with the parser. The
patch might hit the performance slightly, but I do not see obvious
slowdown using my 15Mb notes file.

Best,
Ihor
>From 3b4a857582e848e9688a49c01b853ed577dd4935 Mon Sep 17 00:00:00 2001
Message-Id: <3b4a857582e848e9688a49c01b853ed577dd4935.1637321577.git.yanta...@gmail.com>
From: Ihor Radchenko 
Date: Fri, 19 Nov 2021 19:27:56 +0800
Subject: [PATCH] org-manual.org: Clarify how to handle markup ambiguity

* doc/org-manual.org (Emphasis and Monospace): Advice users to insert
zero width space when Org does not parse emphasized text correctly.
---
 doc/org-manual.org | 12 
 1 file changed, 12 insertions(+)

diff --git a/doc/org-manual.org b/doc/org-manual.org
index a38dbec4a..1db993d3d 100644
--- a/doc/org-manual.org
+++ b/doc/org-manual.org
@@ -10818,6 +10818,18 @@ ** Emphasis and Monospace
 ~org-fontify-emphasized-text~ to ~nil~.  To narrow down the list of
 available markup syntax, you can customize ~org-emphasis-alist~.
 
+Sometimes, Org mode has difficulties recognising markup.  It usually
+happens when markup marker symbols are present inside verbatim or code
+blocks:
+
+#+begin_example
+/The whole line is supposed to be marked italic, but the following
+~user/?variable~ contains italics =/= marker and confuses Org parser/.
+#+end_example
+
+You can use zero width space to help Org sorting out the ambiguity.
+See [[*Escape Character]] for more details.
+
 ** Subscripts and Superscripts
 :PROPERTIES:
 :DESCRIPTION: Simple syntax for raising/lowering text.
-- 
2.32.0

>From d5413e772fe6aedb8a8aa094f98c96fc20b82d3a Mon Sep 17 00:00:00 2001
Message-Id: 
From: Ihor Radchenko 
Date: Fri, 19 Nov 2021 19:13:54 +0800
Subject: [PATCH] org.el: Make emphasis and link fontification consistent with
 parser

* lisp/org.el (org-do-emphasis-faces):
(org-activate-links): Do not fontify just based on approximate
regexps.  Make sure the current object is emphasis.
---
 lisp/org.el | 62 -
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index cb1b58c51..d9f073100 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -5106,12 +5106,15 @@ (defun org-do-emphasis-faces (limit)
 			   (looking-at-p org-outline-regexp-bol
 		   ;; Match full emphasis markup regexp.
 		   (looking-at (if verbatim? org-verbatim-re org-emph-re))
-		   ;; Do not span over paragraph boundaries.
-		   (not (string-match-p org-element-paragraph-separate
-	(match-string 2)))
-		   ;; Do not span over cells in table rows.
-		   (not (and (save-match-data (org-match-line "[ \t]*|"))
-			 (string-match-p "|" (match-string 4))
+   ;; Verify that we are at the right object.
+   (let ((object (save-excursion
+   (save-match-data
+ (goto-char (match-beginning 2))
+ (org-element-context)
+ (and (memq (org-element-type object)
+'(bold italic verbatim code strike-through))
+  (eq (match-beginning 2)
+  (org-element-property :begin object))
 	(pcase-let ((`(,_ ,face ,_) (assoc marker org-emphasis-alist))
 			(m (if org-hide-emphasis-markers 4 2)))
 	  (font-lock-prepend-text-property
@@ -5206,7 +5209,7 @@ (defun org-activate-links (limit)
  (eq 'org-tag face))
 	  (let* ((link-object (save-excursion
 (goto-char start)
-(save-match-data (org-element-link-parser
+(save-match-data (org-element-context
 		 (link (org-element-property :raw-link link-object))
 		 (type (org-element-property :type link-object))
 		 (path (org-element-property :path link-object))
@@ -5229,29 +5232,30 @@ (defun org-activate-links (limit)
 	((and (pred functionp) f) (funcall f))
 	(_ `(:uri ,link)))
 			'font-lock-multiline t)))
-	(org-remove-flyspell-overlays-in start end)
-	(org-rear-nonsticky-at end)
-	(if (not (eq 'bracket style))
-		(progn
+(when (eq (org-element-type link-object) 'link)
+  (org-remove-flyspell-overlays-in start end)
+	  (org-rear-nonsticky-at end)
+	  (if (not (eq 'bracket style))
+		  (progn
+(add-face-text-property start end face-property)
+