On 16/06/2024 22:59, Ihor Radchenko wrote:
Max Nikulin writes:

I suspect, it worked prior to v9.5. Without a unit test it may be
accidentally broken again.

No, it did not work.
If you can, please do not make such assertions without testing.

I am sorry, I had no intention to offend you. I missed that the removed line with explicit list of punctuation characters was commented out. I have tried the regexp used before (a part of v6.34)

facedba05 2009-12-09 15:13:50 +0100 Carsten Dominik: Use John Gruber's regular expression for URL's

and it seems trailing dash was allowed.

+: https://domain/test-

example.org, example.net, example.com are domains reserved for usage in
examples:
<https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml>

And so?

http://example.org/dash- may be a bit better for docs. (For IPv6 addresses the difference should be more noticeable, but I do not remember what range is reserved for usage in examples there.)

I have realized that some Org regexps use [:punct:] *regexp class* and
others *syntax class*, see latex math regexp. I am in doubts if the
discrepancy is intentional.

It is not intentional, but using syntax classes can sometimes be
fragile.

Do you mean that result depends on current buffer? I do not have strong opinion what variant should be used. What I do not like is that in the case of $n$-th the character after second "$" is tested against syntax class, while regexp class is used for links. This subtle difference is almost certainly ignored in alternative implementations of the parser. However I am not sure what characters besides dash and apostrophe are affected and whether it depends on locale.

09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re:
Improve regexp heuristics
[...]
      (link http://example.org/a<b)
[...]
It is heuristics. We cannot be 100% right. So, it is what it is.

From my point of view it is at least close to a regression. I do not have any argument against http://example.org/a<b>, but the regexp should not match whole "http://example.org/a<b)"

[...]
Nowadays it is likely better to inspect
autolinking code for GitHub/GitLab or widely used python packages.

If you have concrete proposals, please share them.

Not yet. I consider inspecting mozilla's code as a kind of negative result from the point of view of usefulness for Org. Expanding test suite by gathering examples of failed heuristics from bug reports require enough reports. https://wpt.live/url/resources/urltestdata.json (https://github.com/web-platform-tests/wpt) is too specific for browsers and HTML/JS.

I would consider [:space:] or \s-.

Do you mean "[^[:punct:][:space:]\t\n]"?

I believe it might be an improvement ([:space:] includes \t).



Reply via email to