Re: [Jprogramming] RFC: unicode

Elijah Stone Sat, 19 Mar 2022 23:37:28 -0700

The section I quoted comes from the u: page, not the ": page.
https://www.jsoftware.com/docs/help807/dictionary/duco.htm

It may be more pure, or not, to implement ": and u: as foreigns, but westill have to decide how they should behave. And I find the currentbehaviour (as well as that of e.g. 1!:1) problematic.


On Sun, 20 Mar 2022, bill lam wrote:

if ": and u: were implemented using foreign conjunction, J would be more
pure.
The original J dictionary said nothing about unicode at all. How to handle
unicode in ": is implementation dependent.

https://www.jsoftware.com/docs/help807/dictionary/d602.htm

On Sun, Mar 20, 2022 at 1:56 PM Raul Miller <[email protected]> wrote:

I think a point has been lost here (partially because of hasty
statements I made, where I was not considering all of the details of
how ": works) on why getting rid of u: would not change anything about
the initial example in this thread:

   #x=: 8 u: 97 243 98
4
   datatype x
literal
   #z=: 10 u:  97 195 179 98
4
   datatype z
unicode4
   datatype x,z
unicode4
   #x,z
8

When displayed, x is displayed as utf-8. This is largely due to
properties of the host environment and the operating system. Here, x
is treated as an array of unicode octets.

When we combine x and z into an array, x is not treated as an array of
octets. It is, instead treated as a utf-32 sequence. Discarding u:
would not change this, because u: was not involved in that operation.

Most likely, the operation you were looking for was something like

   #x,&":z
10

or

   #x,&(8 u: ]) z
10

Here, we are not treating x as a utf-32 array -- we are instead first
representing z as utf-8.

And, again, discarding u: would not change this aspect of J (except to
cause an error for the x,&(8 u: ]) z example).

Thanks,

--
Raul

On Sun, Mar 20, 2022 at 1:10 AM Raul Miller <[email protected]> wrote:
>
> On Sat, Mar 19, 2022 at 8:34 PM Elijah Stone <[email protected]>
wrote:
> > I think a deprecation period would probably be a good idea.
>
> I think we would  need to complete the preceding steps before we
> attempted such a thing.
>
> Deprecation based on something which has not been implemented is bad
news.
>
> > Per the dictionary:
> >
> > > ": converts literal2 and literal4 to U8 encoded 1-byte char
>
> Yes, I realized that after I hit send on that message.
>
> > Not specified is whether literal2 is interpreted as ucs-2 or utf-16.
> > Experimentally, it is utf-16.
>
> It's my understanding that ucs-2 is a subset of utf-16.
>
> > >   ; verb each sequence
> >
> > I don't understand the significance of this.
>
> Generally speaking, when you are working with text, you are working
> with arbitrary length sequences. So, boxing intermediate results and
> razing the boxes is a frequently used idiom.
>
>    ;(# ":)each 1 2 3
> 122333
>
> > > Generally speaking, if you want an unambiguous representation of your
> > > data, you should use something like {{ 5!:5<'y' }} rather than ":
> >
> > I don't need unambiguous.  I'll take non-obfuscatory.  And, as
mentioned,
> > the behaviour of ": here is inconsistent with other primitives.
>
> Every primitive is in some sense "inconsistent" with other primitives,
> because every primitive accomplishes something different.
>
> The ": primitive is about formatting text for display. That is going
> to have to be different from an operation like addition.
>
> > > it is not being displayed correctly.
> >
> > The display seems correct to me.
>
> Ah, that was my browser / email client messing up.
>
> Thanks,
>
> --
> Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] RFC: unicode

Reply via email to