The section I quoted comes from the u: page, not the ": page.
https://www.jsoftware.com/docs/help807/dictionary/duco.htm
It may be more pure, or not, to implement ": and u: as foreigns, but we
still have to decide how they should behave. And I find the current
behaviour (as well as that of e.g. 1!:1) problematic.
On Sun, 20 Mar 2022, bill lam wrote:
if ": and u: were implemented using foreign conjunction, J would be more
pure.
The original J dictionary said nothing about unicode at all. How to handle
unicode in ": is implementation dependent.
https://www.jsoftware.com/docs/help807/dictionary/d602.htm
On Sun, Mar 20, 2022 at 1:56 PM Raul Miller <[email protected]> wrote:
I think a point has been lost here (partially because of hasty
statements I made, where I was not considering all of the details of
how ": works) on why getting rid of u: would not change anything about
the initial example in this thread:
#x=: 8 u: 97 243 98
4
datatype x
literal
#z=: 10 u: 97 195 179 98
4
datatype z
unicode4
datatype x,z
unicode4
#x,z
8
When displayed, x is displayed as utf-8. This is largely due to
properties of the host environment and the operating system. Here, x
is treated as an array of unicode octets.
When we combine x and z into an array, x is not treated as an array of
octets. It is, instead treated as a utf-32 sequence. Discarding u:
would not change this, because u: was not involved in that operation.
Most likely, the operation you were looking for was something like
#x,&":z
10
or
#x,&(8 u: ]) z
10
Here, we are not treating x as a utf-32 array -- we are instead first
representing z as utf-8.
And, again, discarding u: would not change this aspect of J (except to
cause an error for the x,&(8 u: ]) z example).
Thanks,
--
Raul
On Sun, Mar 20, 2022 at 1:10 AM Raul Miller <[email protected]> wrote:
>
> On Sat, Mar 19, 2022 at 8:34 PM Elijah Stone <[email protected]>
wrote:
> > I think a deprecation period would probably be a good idea.
>
> I think we would need to complete the preceding steps before we
> attempted such a thing.
>
> Deprecation based on something which has not been implemented is bad
news.
>
> > Per the dictionary:
> >
> > > ": converts literal2 and literal4 to U8 encoded 1-byte char
>
> Yes, I realized that after I hit send on that message.
>
> > Not specified is whether literal2 is interpreted as ucs-2 or utf-16.
> > Experimentally, it is utf-16.
>
> It's my understanding that ucs-2 is a subset of utf-16.
>
> > > ; verb each sequence
> >
> > I don't understand the significance of this.
>
> Generally speaking, when you are working with text, you are working
> with arbitrary length sequences. So, boxing intermediate results and
> razing the boxes is a frequently used idiom.
>
> ;(# ":)each 1 2 3
> 122333
>
> > > Generally speaking, if you want an unambiguous representation of your
> > > data, you should use something like {{ 5!:5<'y' }} rather than ":
> >
> > I don't need unambiguous. I'll take non-obfuscatory. And, as
mentioned,
> > the behaviour of ": here is inconsistent with other primitives.
>
> Every primitive is in some sense "inconsistent" with other primitives,
> because every primitive accomplishes something different.
>
> The ": primitive is about formatting text for display. That is going
> to have to be different from an operation like addition.
>
> > > it is not being displayed correctly.
> >
> > The display seems correct to me.
>
> Ah, that was my browser / email client messing up.
>
> Thanks,
>
> --
> Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm