Re: [Jprogramming] RFC: unicode

Elijah Stone Fri, 18 Mar 2022 23:56:21 -0700

On Sat, 19 Mar 2022, Raul Miller wrote:

> How can a programmer write a program to handle text if the language> does not allow text to exist?
That is nonsense. Programming is all about representation. You mightas well say: how can a programmer write a program to handle graphs ifthe language does not allow graphs to exist?
A problem, with this argument, is that programs *are* text.

Or, more specifically, we use text to represent programs.

Indeed. In a hypothetical version of j without first-class textual data,an alternate representation would be needed to describe programs.

Note I do not think getting rid of text would be the _best_ thing. Iwould rather improve it.

the unicode suite of standards defines quite a variety of ways ofrepresenting unicode characters.

Unicode specifies precisely three representations (v14 §2.5): utf-32,utf-16, and utf-8. Of these, j conflates utf-8 with 'ucs-1' (a made-upencoding) and utf-16 with ucs-2 (a genuine, historically significantencoding). Of these, only utf-8 sees widespread use as an interchangeformat. UTF-16 is used by some corners of win32 and the jvm, but that ispretty much it (and I believe win32 has utf8 interfaces now). UTF-8 has'won', and I do not think it is worth giving significant attention toother encodings.

Looking somewhat further afield, there are _thousands_ of historicallysignificant encodings and codepages. I do not think it is sensible toprovide first-class support for them all. Nor do I think it is desirable.It might be interesting to provide _decoders_ and _encoders_ for some(treating the foreign encoding as a sequence of opaque octets, notcharacters), but always in terms of the common canonicalization ofunicode.

...but I do not see what aspect of my proposal you object to, and it seemsto avoid this issue.


 -E
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] RFC: unicode

Reply via email to