Jilles Tjoelker via austin-group-l at The Open Group wrote in <20240827145000.ga2...@stack.nl>: |On Fri, Aug 23, 2024 at 12:27:01PM +0200, Alejandro C via austin-group-l |at The Open Group wrote: |> wmemrchr(), and in general w*(), are functions that deal with wide |> characters --which have a fixed width--, not multi-byte characters \ |> --which |> have a variable width--. | |> Thus, searching backwards for a wc should be a trivial loop: |> [snip] | |I agree that the rationale is incorrect. However, I still agree that |wmemrchr() should not be added to the standard. Not only would it be |invention, but it would boil down to doing work to improve UTF-32 |support (in most implementations). UTF-32 is inefficient with little |compensation (since single code points aren't that meaningful in today's |Unicode).
I agree with UTF-32 as such being not very helpful, as you in practice have to watch for grapheme boundaries, which very well can include modifiers etc always (ie even in the low 7-bit ASCII compatible range). (And to remark that at least for elder ISO C's and POSIX wchar_t is not necessarily UTF-32 at all, and Citrus in particular uses nifty things (once i looked, .. a decade ago).) And yes the entire family of functions is *not* usable in practice, iswupper*(), towupper*(), these are all thoughtless ISO C inventions that never spent a though on reading the Unicode or ISO 10646 standard at all. For true internationalization you have to look at entire sentences if you want to perform case conversions (if applicable) etc etc. I think this came up on this list a decade ago, but even ISO C23 as i have glanced over does not support anything usable at all. Anyhow this isolated inspection of bytes or UTF-32 characters is rarely useful at all. And for UTF-8 it is pretty easy to create jumptables, i think the glib library did that already over two decades ago (ie, if you know the UTF-8 string is syntactically correct, looking at the first bytes gives the length of the multi-byte sequence), and for backward scanning, well, i guess there are mathematical tricks how you can scan multiple bytes backward and look for a starter bytes. No "trivial loop" though, but text processing was only trivial as long as it was all american (and otherwise careless). --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)