Thank you.
I specifically object to this:
:: Therefore I
:: suggest an alternate solution, at least for the interim: foreigns (scary
:: and obscure, per above) that will _intentionally misinterpret_ data from
:: the outside world as 'UCS-1' and represent it compactly (or do the
:: opposite).
I think that this statement is too broad. As I think you previously
pointed out, interpretation is the domain of the programmer, not the
language.
I did not mean to imply that it was _solely_ the domain of the programmer.
It is a delicate balancing act. J places more of the burden on the
programmer than many other programming languages, and I do not think that
is a bad thing.
But there are limits. I do not need to carefully track which words are
integers and which are floats. I can imagine a version of j in which this
were the case; in which, for instance, + would interpret its arguments as
integers and add them, and +. would interpret its arguments as floats and
add them, and you would have to be very careful to always choose the
correct addition routine.
That language would be more expressive than the one we have, but I think
it would be easy to write buggy code, and it would be much fun to use.
Having to manually track how characters are encoded seems not too
different from having to manually track how numbers are encoded.
And it is possible to make a software floating-point implementation in j,
effectively interpreting integers as floating-point numbers. That is
effectively what I propose to permit, but as a performance hack rather
than for any desirable semantic properties.
I also object to any proposal to retract language primitives where we do
not have demonstrated working replacements whose merits we are judging.
(I could go into more detail on how I think these issues should be
approached, if you like.)
Please do.
Specifically, removing u: does not seem like it would solve any of the
problems you highlighted with your examples involving x, y and z. (I
think bill lam's post --
http://www.jsoftware.com/pipermail/programming/2022-March/060355.html
-- gives a succinct description of the *language* (as opposed to
foreigns or the operating system).)
I agree that removing u: is not sufficient. But:
1. I do think that removing u: is necessary.
2. Insofar as we are considering only language issues (rather than
issues of interoperability), I do not see why it is desirable to
promote as primary routines for translating between encodings.
3. The language is not consistent; ": treats text as if it were
utf-encoded, but promotion (} , " etc.) treats text as if it were
ucs-encoded.
4. I think that issues of interoperability _are_ important, and still
worthy of discussion. For instance, more consistent display (and this
is arguably not a foreign issue, but ":'s fault) would make clear what
was happening with x, y, and z.
Taking a step back to your opening example in this thread:
Your z was not a valid utf-32 sequence. It would represent a valid utf-8
sequence, *if* the programmer treated it as such. When treated as a
utf-32 sequence, the result would be invalid.
That's part of the problem. z is a perfectly good, valid utf-32 sequence.
When interpreted as a utf-32 sequence, it does not represent the same
thing as x does when interpreted as a utf-8 sequence. But there is
nothing wrong with aób, nor with a utf-32 representation of it.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm