Re: [Jprogramming] RFC: unicode

Elijah Stone Sat, 19 Mar 2022 14:47:37 -0700

Thank you.

I specifically object to this:
:: Therefore I
:: suggest an alternate solution, at least for the interim: foreigns (scary
:: and obscure, per above) that will _intentionally misinterpret_ data from
:: the outside world as 'UCS-1' and represent it compactly (or do the
:: opposite).
I think that this statement is too broad. As I think you previouslypointed out, interpretation is the domain of the programmer, not thelanguage.

I did not mean to imply that it was _solely_ the domain of the programmer.It is a delicate balancing act. J places more of the burden on theprogrammer than many other programming languages, and I do not think thatis a bad thing.

But there are limits. I do not need to carefully track which words areintegers and which are floats. I can imagine a version of j in which thiswere the case; in which, for instance, + would interpret its arguments asintegers and add them, and +. would interpret its arguments as floats andadd them, and you would have to be very careful to always choose thecorrect addition routine.

That language would be more expressive than the one we have, but I thinkit would be easy to write buggy code, and it would be much fun to use.Having to manually track how characters are encoded seems not toodifferent from having to manually track how numbers are encoded.

And it is possible to make a software floating-point implementation in j,effectively interpreting integers as floating-point numbers. That iseffectively what I propose to permit, but as a performance hack ratherthan for any desirable semantic properties.

I also object to any proposal to retract language primitives where we donot have demonstrated working replacements whose merits we are judging.(I could go into more detail on how I think these issues should beapproached, if you like.)


Please do.

Specifically, removing u: does not seem like it would solve any of the
problems you highlighted with your examples involving x, y and z. (I
think bill lam's post --
http://www.jsoftware.com/pipermail/programming/2022-March/060355.html

-- gives a succinct description of the *language* (as opposed toforeigns or the operating system).)


I agree that removing u: is not sufficient.  But:

1. I do think that removing u: is necessary.

2. Insofar as we are considering only language issues (rather than
   issues of interoperability), I do not see why it is desirable to
   promote as primary routines for translating between encodings.

3. The language is not consistent; ": treats text as if it were
   utf-encoded, but promotion (} , " etc.) treats text as if it were
   ucs-encoded.

4. I think that issues of interoperability _are_ important, and still
   worthy of discussion.  For instance, more consistent display (and this
   is arguably not a foreign issue, but ":'s fault) would make clear what
   was happening with x, y, and z.

Taking a step back to your opening example in this thread:
Your z was not a valid utf-32 sequence. It would represent a valid utf-8sequence, *if* the programmer treated it as such. When treated as autf-32 sequence, the result would be invalid.

That's part of the problem. z is a perfectly good, valid utf-32 sequence.When interpreted as a utf-32 sequence, it does not represent the samething as x does when interpreted as a utf-8 sequence. But there isnothing wrong with aÃ³b, nor with a utf-32 representation of it.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] RFC: unicode

Reply via email to