I think a point has been lost here (partially because of hasty statements I made, where I was not considering all of the details of how ": works) on why getting rid of u: would not change anything about the initial example in this thread:
#x=: 8 u: 97 243 98 4 datatype x literal #z=: 10 u: 97 195 179 98 4 datatype z unicode4 datatype x,z unicode4 #x,z 8 When displayed, x is displayed as utf-8. This is largely due to properties of the host environment and the operating system. Here, x is treated as an array of unicode octets. When we combine x and z into an array, x is not treated as an array of octets. It is, instead treated as a utf-32 sequence. Discarding u: would not change this, because u: was not involved in that operation. Most likely, the operation you were looking for was something like #x,&":z 10 or #x,&(8 u: ]) z 10 Here, we are not treating x as a utf-32 array -- we are instead first representing z as utf-8. And, again, discarding u: would not change this aspect of J (except to cause an error for the x,&(8 u: ]) z example). Thanks, -- Raul On Sun, Mar 20, 2022 at 1:10 AM Raul Miller <[email protected]> wrote: > > On Sat, Mar 19, 2022 at 8:34 PM Elijah Stone <[email protected]> wrote: > > I think a deprecation period would probably be a good idea. > > I think we would need to complete the preceding steps before we > attempted such a thing. > > Deprecation based on something which has not been implemented is bad news. > > > Per the dictionary: > > > > > ": converts literal2 and literal4 to U8 encoded 1-byte char > > Yes, I realized that after I hit send on that message. > > > Not specified is whether literal2 is interpreted as ucs-2 or utf-16. > > Experimentally, it is utf-16. > > It's my understanding that ucs-2 is a subset of utf-16. > > > > ; verb each sequence > > > > I don't understand the significance of this. > > Generally speaking, when you are working with text, you are working > with arbitrary length sequences. So, boxing intermediate results and > razing the boxes is a frequently used idiom. > > ;(# ":)each 1 2 3 > 122333 > > > > Generally speaking, if you want an unambiguous representation of your > > > data, you should use something like {{ 5!:5<'y' }} rather than ": > > > > I don't need unambiguous. I'll take non-obfuscatory. And, as mentioned, > > the behaviour of ": here is inconsistent with other primitives. > > Every primitive is in some sense "inconsistent" with other primitives, > because every primitive accomplishes something different. > > The ": primitive is about formatting text for display. That is going > to have to be different from an operation like addition. > > > > it is not being displayed correctly. > > > > The display seems correct to me. > > Ah, that was my browser / email client messing up. > > Thanks, > > -- > Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
