On May 7, 2023, at 11:41:45, Phil Smith III  wrote:
>     ...
> This is especially confusing since “plain ol’ ASCII” maps directly to the 
> first part of UTF-8-encoded Unicode. This is of course A Good Thing in 
> general, but lets people cheat and get away with it—until they don’t.
> 
Yup. In MacOS, sed regex counts UTF-8 characters; printf counts octets:
516 $ printf '%3s|\n%3s|\n' 2π r
2π|
  r|
517 $

> As for your original question, I’m more than willing to believe in some code 
> page with hex AA as the NOT sign, just never seen it. Hard to search for, 
> too, alas. Do you know what page that is?

Host: UTF-8 output: CP852
0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240
0 10 20 30 40 50 60 70 80 90 A0 B0 C0 D0 E0 F0

0 0 0 @ P ` p á ░ └ đ Ó ­
1 1 ! 1 A Q a q í ▒ ┴ Đ ß ˝
2 2 " 2 B R b r ó ▓ ┬ Ď Ô ˛
3 3 # 3 C S c s ú │ ├ Ë Ń ˇ
4 4 $ 4 D T d t Ą ┤ ─ ď ń ˘
5 5 % 5 E U e u ą Á ┼ Ň ň §
6 6 & 6 F V f v Ž Â Ă Í Š ÷
7 7 ' 7 G W g w ž Ě ă Î š ¸
8 8 ( 8 H X h x Ę Ş ╚ ě Ŕ °
9 9 ) 9 I Y i y ę ╣ ╔ ┘ Ú ¨
10 A * : J Z j z ¬ ║ ╩ ┌ ŕ ˙
11 B + ; K [ k { ź ╗ ╦ █ Ű ű
12 C , < L \ l | Č ╝ ╠ ▄ ý Ř
13 D - = M ] m } Ż ═ Ţ Ý ř
14 E . > N ^ n ~ « ż ╬ Ů ţ ■
15 F / ? O _ o » ┐ ¤ ▀ ´  


> I’m a bit chary* of blindly accepting multiple code points as NOT signs. 
> Better to know how your input is encoded (or mandate it). Unless, of course, 
> it can be demonstrated that this particular multilingualism cannot be 
> misinterpreted.

+1

-- 
gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to