the wg20 issues are now tranferred to sc35/wg5 they issued a standard iso/iec 30112 - built on posix, c and unicode
keld On Fri, Jun 20, 2025 at 08:54:47AM +0000, Niu Danny via austin-group-l at The Open Group wrote: > Do any of us still have recount of our interaction > with WG20 - Internationalization? > > I intend to learn more about the background > before doing any kind of judgment. > > > 2025???5???25??? 07:58???Niu Danny via austin-group-l at The Open Group > > <[email protected]> ????????? > > > > I'd like to query if the following premise makes the following > > implementation stratagy valid. Not sure if it's on topic though. > > > > Localization in Unix was intended to sell the system to non-English-speaking > > customers, but nowadays its relevance is decreasing due to the developement > > of language models of deployable scales and improved translation algorithms > > - > > although their accuracy is debated, they're sufficient considering they're > > primarily just a first-hand built-in source, and users would purchase more > > professional > > translation softwares or services for work. > > > > Internationalized regex is supposedly a subsidiary tool to localization for > > text processing, but for a regex engine to be really internationalized, I > > think > > a character database model is needed, which is easy, as the true boundary > > of a character is not always clear in every culture. I suppose the readers > > will > > expect Perl to be mentioned, so yes, a large codebase of text processing > > tool > > is written in Perl, owing to its more versatile regex and programming > > language > > syntax, as well as its diverse ecosystem. > > > > Regex in Unix really is mostly good for system administration - especially > > for > > tasks that are meant to be automated such as log analysis and incident > > reports. > > Configuration editing and other tasks that require humen decision, although > > cannot > > be automated, can be greatly augmented when a useful tool such as regex is > > available to user. > > > > I personally find another use of regex where localization prevented me from > > doing what I need. In web back-end programming, there's the need of > > "path sanitization" when storing and retrieving files, to prevent malicious > > client from using crafted path to overwrite or accesss restricted data. > > Due to the regex engine I used at time bundled with internationalization > > support, > > I had to install additional dependency during deployment, which wasn't > > discovered during development. Minor anecdote though. > > > > POSIX already give permit for implementation to support no additional > > locales than > > the C/POSIX locale, so a regex implementation that hasn't any extension > > mechanism > > whatsoever, on a system implementation that doesn't support defining > > additional > > locales is conforming. But here's the part that I'm not sure: > > > > I want to implement an ASCII-based regex that's simultaneously a byte-based > > regex, > > POSIX didn't require me to use the exact ASCII character set, so in theory, > > I have the > > freedom to call the byte values 128-255 [:nonchar:] or [:nonascii:] if I > > see fit. But in > > this case, I strictly shouldn't advertise charset as ASCII in my > > environment, yet > > programs that sees ASCII can assume some properties about the environment, > > but > > such assumption will in turn make them strictly non-portable? > > > > How do you view these issues? Thanks for your opinion. > > >
