Hi Grisha,

Thank you for the detailed reply. I was not aware of the paste and 
glob-complete-word issues. OMG!

Background: I maintain Readline bindings for Python [1] which I have started 
long ago because I was annoyed the standard bindings do not allow for proper 
filename completion. My goal is to also provide good documentation [2] and 
examples [3].

The scenario I have given is hypothetical in so far as I do not myself maintain 
an application with these requirements. I have however tested it in the past by 
using a Latin-1 terminal on a UTF-8 filesystem, and completion has worked just 
fine (given all three hooks are set).

I am however not sure supporting different character sets is still useful in 
2023. Everything is UTF-8 these days, and your code is fine if NFD/NFC is the 
only problem we have to solve. Yet another hook? I’d rather not...

I will have to run some experiments.

Thanks again for your feedback,
Stefan

[1] https://github.com/stefanholek/rl <https://github.com/stefanholek/rl>
[2] 
https://rl.readthedocs.io/en/stable/completion.html#additional-hooks-for-when-the-filesystem-representation-differs-from-the-representation-in-the-terminal
 
<https://rl.readthedocs.io/en/stable/completion.html#additional-hooks-for-when-the-filesystem-representation-differs-from-the-representation-in-the-terminal>
[3] https://rl.readthedocs.io/en/stable/examples.html#filename-completion 
<https://rl.readthedocs.io/en/stable/examples.html#filename-completion>


> On 25. Sep 2023, at 08:15, Grisha Levit <[email protected]> wrote:
> 
> Hi Stefan,
> 
> On Sun, Sep 24, 2023, 06:38 Stefan H. Holek <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi All,
> 
> There appears to be an issue with a recent addition to 
> rl_filename_completion_function. It now applies rl_filename_rewrite_hook to 
> the filename part of "what the user has typed". This seems wrong. Let me 
> explain.
> 
> This was my patch (submitted to bug-bash, the original thread is at [1]) so 
> I'll defend the motivation for it -- though I think you're right that the 
> implementation was too narrowly focused on addressing the issue described 
> there and can violate assumptions in existing code.
> 
> The rl_filename_rewrite_hook exists to convert data read from the filesystem 
> to a representation that works in the terminal. E.g. on macOS the filesystem 
> returns decomposed UTF-8, which must be converted to fully composed UTF-8 
> before comparing it to a string the user has typed.
> 
> Side note: APFS preserves normalization -- so we get both composed and 
> decomposed entries to compare against.  But that doesn't really affect this 
> feature.
> 
> For background, with either filesystem, macOS filenames are not the usual 
> opaque byte strings that they are on other platforms but rather 
> normalization-insensitive UTF-8 text, i.e.:
> * it's not possible to have two distinct directory entries that normalize 
> equally
> * a file can be accessed using any name that normalizes the same as the 
> filename
> 
> Now, the section below (in complete.c) appears to apply 
> rl_filename_rewrite_hook to a string in TERMINAL representation ('filename' 
> is the rightmost part of the path the user has typed):
> 
> While text literally typed in will likely be NFC, any filenames pasted into 
> the terminal (or placed there by glob-complete-word, etc) will retain the 
> normalization stored on the filesystem -- which is usually _not_ NFC. See 
> examples in the thread at [2].
> 
> I struggle to find this useful and in fact think it's dangerous and should be 
> backed out.
> 
> So without normalizing the input text, it's not possible to reuse filenames 
> read from the filesystem (`ls` output, etc.) as input to readline completion 
> code.  Or rather, it would be possible, but Readline normalizes the directory 
> entries so it only makes sense to normalize the text to match against them as 
> well.
> 
> If I have an rl_filename_rewrite_hook that works in Readline 8.2, it may just 
> fail in 8.3 because it is applied to a string that is not in the expected 
> filesystem representation!
> 
> Readline has so far worked fine in scenarios where the terminal encoding 
> differs from the filesystem encoding. I can use rl_directory_rewrite_hook and 
> rl_filename_stat_hook to go from terminal -> filesystem encoding, and 
> rl_filename_rewrite_hook to go from filesystem -> terminal encoding. It is my 
> understanding that these hooks have been added to support this use-case in 
> the first place.
> 
> Is this an existing application or a hypothetical one? I'm not sure how this 
> can work as described -- rl_directory_rewrite_hook only modifies the 
> directory portion of the text, not the part after the final slash, and 
> rl_filename_stat_hook is applied only after completion matches have already 
> been generated.
> 
> What was missing was a way to modify the filename portion of the text before 
> generating completion matches. (Well, rl_filename_dequoting_function does 
> that, but that only gets called if the name is quoted).
> 
> Maybe a better approach is a separate variable (e.g.  
> rl_filename_completion_hook) to serve this purpose since an application may 
> want to perform different transformations on generated filenames vs input 
> text.
> 
> [1]: https://lists.gnu.org/archive/html/bug-bash/2023-07/msg00050.html 
> <https://lists.gnu.org/archive/html/bug-bash/2023-07/msg00050.html>
> [2]: https://lists.gnu.org/archive/html/bug-bash/2023-07/msg00081.html 
> <https://lists.gnu.org/archive/html/bug-bash/2023-07/msg00081.html>

-- 
Stefan H. Holek
[email protected]

Reply via email to