https://bugs.documentfoundation.org/show_bug.cgi?id=91192
Stephan Bergmann <sberg...@redhat.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sberg...@redhat.com --- Comment #17 from Stephan Bergmann <sberg...@redhat.com> --- The code that guesses which part of a larger text shall be auto-detected as a URI is URIHelper::FindFirstURLInText (svl/source/misc/urihelper.cxx, containing detailed documentation). Of necessity, it needs to apply some heuristics, and, also of necessity, the algorithm's outcome will not necessarily match any given user's exact expectations. That said: (In reply to sdc.blanco from comment #12) > Asking for UXEval: Two questions. > > 1. Is it a considered a "bug" a potential URL that ends with # (or ?) does > not include the # (or ?) in the URL recognition? > > (but, as noted, no problem if text follows # or ? ) Especially with "?" (and similar to e.g. "," and "."), the heuristics conservatively try to avoid including trailing punctuation (for which it is assumed that it was not meant to be part of the URI). > 2. Is it a problem that the three characters: ^ | \ are not recognized as > part of a URL (and URL recognition stops with these characters)? > > Relevant to note that these three characters are considered "unsafe" and > should have percent-encoding ( https://www.ietf.org/rfc/rfc1738.txt ) That's not a "should" but a "must". None of those three characters can appear in a URI as-is, they always need to be percent-encoded. The used heuristics in general do not consider that a character that cannot appear in a URI would form part of a to-be-detected URI. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Libreoffice-ux-advise mailing list Libreoffice-ux-advise@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise