Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)]

Max Nikulin Thu, 20 Jun 2024 05:17:26 -0700

On 16/06/2024 22:59, Ihor Radchenko wrote:

Max Nikulin writes:


I suspect, it worked prior to v9.5. Without a unit test it may be
accidentally broken again.


No, it did not work.
If you can, please do not make such assertions without testing.

I am sorry, I had no intention to offend you. I missed that the removedline with explicit list of punctuation characters was commented out. Ihave tried the regexp used before (a part of v6.34)

facedba05 2009-12-09 15:13:50 +0100 Carsten Dominik: Use JohnGruber's regular expression for URL's


and it seems trailing dash was allowed.

+: https://domain/test-


example.org, example.net, example.com are domains reserved for usage in
examples:
<https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml>


And so?

http://example.org/dash- may be a bit better for docs. (For IPv6addresses the difference should be more noticeable, but I do notremember what range is reserved for usage in examples there.)

I have realized that some Org regexps use [:punct:] *regexp class* and
others *syntax class*, see latex math regexp. I am in doubts if the
discrepancy is intentional.


It is not intentional, but using syntax classes can sometimes be
fragile.

Do you mean that result depends on current buffer? I do not have strongopinion what variant should be used. What I do not like is that in thecase of $n$-th the character after second "$" is tested against syntaxclass, while regexp class is used for links. This subtle difference isalmost certainly ignored in alternative implementations of the parser.However I am not sure what characters besides dash and apostrophe areaffected and whether it depends on locale.

09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re:
Improve regexp heuristics

[...]

      (link http://example.org/a<b)

[...]

It is heuristics. We cannot be 100% right. So, it is what it is.

From my point of view it is at least close to a regression. I do nothave any argument against http://example.org/a<b>, but the regexp shouldnot match whole "http://example.org/a<b)"


[...]

Nowadays it is likely better to inspect
autolinking code for GitHub/GitLab or widely used python packages.


If you have concrete proposals, please share them.

Not yet. I consider inspecting mozilla's code as a kind of negativeresult from the point of view of usefulness for Org. Expanding testsuite by gathering examples of failed heuristics from bug reportsrequire enough reports. https://wpt.live/url/resources/urltestdata.json(https://github.com/web-platform-tests/wpt) is too specific for browsersand HTML/JS.

I would consider [:space:] or \s-.


Do you mean "[^[:punct:][:space:]\t\n]"?


I believe it might be an improvement ([:space:] includes \t).

Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)]

Reply via email to