Max Nikulin <maniku...@gmail.com> writes:

>> If you can, please do not make such assertions without testing.
>
> I am sorry, I had no intention to offend you. I missed that the removed 
> line with explicit list of punctuation characters was commented out. I 
> have tried the regexp used before (a part of v6.34)

>      facedba05 2009-12-09 15:13:50 +0100 Carsten Dominik: Use John 
> Gruber's regular expression for URL's
>
> and it seems trailing dash was allowed.

Hmm. That's a really long time ago, earlier than built-in Org in Emacs
versions that are available in various distros. My reading of "prior to
v9.5" was more like "not too far before v9.5" (and I tested everything
down to Org mode included into Emacs 26).

>>>> +: https://domain/test-
>>>
>>> example.org, example.net, example.com are domains reserved for usage in
>>> examples:
>>> <https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml>
>> 
>> And so?
>
> http://example.org/dash- may be a bit better for docs. (For IPv6 
> addresses the difference should be more noticeable, but I do not 
> remember what range is reserved for usage in examples there.)

I see. I would not mind installing a patch, if you submit it.

>>> I have realized that some Org regexps use [:punct:] *regexp class* and
>>> others *syntax class*, see latex math regexp. I am in doubts if the
>>> discrepancy is intentional.
>> 
>> It is not intentional, but using syntax classes can sometimes be
>> fragile.
>
> Do you mean that result depends on current buffer? I do not have strong 
> opinion what variant should be used.

Not current buffer. Current syntax table, inherited from
outline-mode. And that syntax table is customized by some users, leading
to Org parser behaving unexpectedly in some scenarios.

Also, there is 'syntax-table text property, and I have managed to break
Org parser in the past by trying to apply 'syntax-table property to code
blocks in Org mode (I was trying to solve `forward-sexp' bug people
frequently report).

So, we should generally avoid using syntax tables, so that Org syntax
becomes independent of user customizations in that area. Or, at least,
we should not introduce more syntax class uses when possible.

> ... What I do not like is that in the 
> case of $n$-th the character after second "$" is tested against syntax 
> class, while regexp class is used for links. This subtle difference is 
> almost certainly ignored in alternative implementations of the parser. 
> However I am not sure what characters besides dash and apostrophe are 
> affected and whether it depends on locale.

These kinds of inconsistencies should be solved eventually. We should not
use locale, but UTF syntax classes; and document it in org-syntax
document.

>>> 09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re:
>>> Improve regexp heuristics
> [...]
>>>       (link http://example.org/a<b)
> [...]
>> It is heuristics. We cannot be 100% right. So, it is what it is.
>
>  From my point of view it is at least close to a regression. I do not 
> have any argument against http://example.org/a<b>, but the regexp should 
> not match whole "http://example.org/a<b)"

No bug reports, so your point is rather theoretical.

I do not mind improving the regexp, of course, but I am afraid that we
will need PEG or `org-element--parse-paired-brackets' to match paired
brackets accurately. And that kind of change will be breaking - we will
need to trash the regexp variable.

>>> I would consider [:space:] or \s-.
>> 
>> Do you mean "[^[:punct:][:space:]\t\n]"?
>
> I believe it might be an improvement ([:space:] includes \t).

https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=6cada29c0

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

Reply via email to