On May 7, 2023, at 11:41:45, Phil Smith III wrote: > ... > This is especially confusing since “plain ol’ ASCII” maps directly to the > first part of UTF-8-encoded Unicode. This is of course A Good Thing in > general, but lets people cheat and get away with it—until they don’t. > Yup. In MacOS, sed regex counts UTF-8 characters; printf counts octets: 516 $ printf '%3s|\n%3s|\n' 2π r 2π| r| 517 $
> As for your original question, I’m more than willing to believe in some code > page with hex AA as the NOT sign, just never seen it. Hard to search for, > too, alas. Do you know what page that is? Host: UTF-8 output: CP852 0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 0 10 20 30 40 50 60 70 80 90 A0 B0 C0 D0 E0 F0 0 0 0 @ P ` p á ░ └ đ Ó 1 1 ! 1 A Q a q í ▒ ┴ Đ ß ˝ 2 2 " 2 B R b r ó ▓ ┬ Ď Ô ˛ 3 3 # 3 C S c s ú │ ├ Ë Ń ˇ 4 4 $ 4 D T d t Ą ┤ ─ ď ń ˘ 5 5 % 5 E U e u ą Á ┼ Ň ň § 6 6 & 6 F V f v Ž Â Ă Í Š ÷ 7 7 ' 7 G W g w ž Ě ă Î š ¸ 8 8 ( 8 H X h x Ę Ş ╚ ě Ŕ ° 9 9 ) 9 I Y i y ę ╣ ╔ ┘ Ú ¨ 10 A * : J Z j z ¬ ║ ╩ ┌ ŕ ˙ 11 B + ; K [ k { ź ╗ ╦ █ Ű ű 12 C , < L \ l | Č ╝ ╠ ▄ ý Ř 13 D - = M ] m } Ż ═ Ţ Ý ř 14 E . > N ^ n ~ « ż ╬ Ů ţ ■ 15 F / ? O _ o » ┐ ¤ ▀ ´ > I’m a bit chary* of blindly accepting multiple code points as NOT signs. > Better to know how your input is encoded (or mandate it). Unless, of course, > it can be demonstrated that this particular multilingualism cannot be > misinterpreted. +1 -- gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN