Hi Grisha, Thank you for the detailed reply. I was not aware of the paste and glob-complete-word issues. OMG!
Background: I maintain Readline bindings for Python [1] which I have started long ago because I was annoyed the standard bindings do not allow for proper filename completion. My goal is to also provide good documentation [2] and examples [3]. The scenario I have given is hypothetical in so far as I do not myself maintain an application with these requirements. I have however tested it in the past by using a Latin-1 terminal on a UTF-8 filesystem, and completion has worked just fine (given all three hooks are set). I am however not sure supporting different character sets is still useful in 2023. Everything is UTF-8 these days, and your code is fine if NFD/NFC is the only problem we have to solve. Yet another hook? I’d rather not... I will have to run some experiments. Thanks again for your feedback, Stefan [1] https://github.com/stefanholek/rl <https://github.com/stefanholek/rl> [2] https://rl.readthedocs.io/en/stable/completion.html#additional-hooks-for-when-the-filesystem-representation-differs-from-the-representation-in-the-terminal <https://rl.readthedocs.io/en/stable/completion.html#additional-hooks-for-when-the-filesystem-representation-differs-from-the-representation-in-the-terminal> [3] https://rl.readthedocs.io/en/stable/examples.html#filename-completion <https://rl.readthedocs.io/en/stable/examples.html#filename-completion> > On 25. Sep 2023, at 08:15, Grisha Levit <[email protected]> wrote: > > Hi Stefan, > > On Sun, Sep 24, 2023, 06:38 Stefan H. Holek <[email protected] > <mailto:[email protected]>> wrote: > Hi All, > > There appears to be an issue with a recent addition to > rl_filename_completion_function. It now applies rl_filename_rewrite_hook to > the filename part of "what the user has typed". This seems wrong. Let me > explain. > > This was my patch (submitted to bug-bash, the original thread is at [1]) so > I'll defend the motivation for it -- though I think you're right that the > implementation was too narrowly focused on addressing the issue described > there and can violate assumptions in existing code. > > The rl_filename_rewrite_hook exists to convert data read from the filesystem > to a representation that works in the terminal. E.g. on macOS the filesystem > returns decomposed UTF-8, which must be converted to fully composed UTF-8 > before comparing it to a string the user has typed. > > Side note: APFS preserves normalization -- so we get both composed and > decomposed entries to compare against. But that doesn't really affect this > feature. > > For background, with either filesystem, macOS filenames are not the usual > opaque byte strings that they are on other platforms but rather > normalization-insensitive UTF-8 text, i.e.: > * it's not possible to have two distinct directory entries that normalize > equally > * a file can be accessed using any name that normalizes the same as the > filename > > Now, the section below (in complete.c) appears to apply > rl_filename_rewrite_hook to a string in TERMINAL representation ('filename' > is the rightmost part of the path the user has typed): > > While text literally typed in will likely be NFC, any filenames pasted into > the terminal (or placed there by glob-complete-word, etc) will retain the > normalization stored on the filesystem -- which is usually _not_ NFC. See > examples in the thread at [2]. > > I struggle to find this useful and in fact think it's dangerous and should be > backed out. > > So without normalizing the input text, it's not possible to reuse filenames > read from the filesystem (`ls` output, etc.) as input to readline completion > code. Or rather, it would be possible, but Readline normalizes the directory > entries so it only makes sense to normalize the text to match against them as > well. > > If I have an rl_filename_rewrite_hook that works in Readline 8.2, it may just > fail in 8.3 because it is applied to a string that is not in the expected > filesystem representation! > > Readline has so far worked fine in scenarios where the terminal encoding > differs from the filesystem encoding. I can use rl_directory_rewrite_hook and > rl_filename_stat_hook to go from terminal -> filesystem encoding, and > rl_filename_rewrite_hook to go from filesystem -> terminal encoding. It is my > understanding that these hooks have been added to support this use-case in > the first place. > > Is this an existing application or a hypothetical one? I'm not sure how this > can work as described -- rl_directory_rewrite_hook only modifies the > directory portion of the text, not the part after the final slash, and > rl_filename_stat_hook is applied only after completion matches have already > been generated. > > What was missing was a way to modify the filename portion of the text before > generating completion matches. (Well, rl_filename_dequoting_function does > that, but that only gets called if the name is quoted). > > Maybe a better approach is a separate variable (e.g. > rl_filename_completion_hook) to serve this purpose since an application may > want to perform different transformations on generated filenames vs input > text. > > [1]: https://lists.gnu.org/archive/html/bug-bash/2023-07/msg00050.html > <https://lists.gnu.org/archive/html/bug-bash/2023-07/msg00050.html> > [2]: https://lists.gnu.org/archive/html/bug-bash/2023-07/msg00081.html > <https://lists.gnu.org/archive/html/bug-bash/2023-07/msg00081.html> -- Stefan H. Holek [email protected]
