That may be done, but I fear the shape will be inconsistent, 7 u: char atom => atom 7 u: lit2 atom => atom but 7 u: int atom => atom or list depending on range of int.
In general J expects shape of result to be consistent for working of automatic assembly of results and rank conjunction. Having shape of 7 u: int to be rank-1 need not be the best choice, it is still better than dependence of range of int. 4 u: int is different, its argument is always restricted in range, and within this range, atomic result is guaranteed. On 2 Apr, 2017 9:30 am, "robert therriault" <[email protected]> wrote: Thanks Bill, You are right I was confusing U16 with literal2. Part of the reason for that is that, datatype 7 u: 3101 unicode datatype 4 u: 3101 unicode datatype u: 3101 unicode I guess that there is not really a way to distinguish the fact that 7 u: 3101 returns U16 instead of literal2 without inventing a separate J datatype. It is nice that this allows the 7 u: to deal with unicode4 arguments rather seamlessly. datatype 9 u: 128512 unicode4 7 u: 128512 😀 datatype 7 u: 128512 unicode 3 u: 7 u: 128512 55357 56832 But I do wonder if since 7 u: 3101 ఝ {. 7 u: 3101 ఝ $ {. 7 u: 3101 # $ {. 7 u: 3101 0 Could the single non-surrogate U16 act a bit more like the ASCII cases do, or would that break the U16 by being non-standard? 7 u: 'a' a $ 7 u: 'a' # $ 7 u: 'a' 0 Cheers, bob > On Apr 1, 2017, at 5:54 PM, bill lam <[email protected]> wrote: > > the rght argument 30101 is an integer, not literal2. > > 7 u: returns utf16 not literal2. utf16 has surrogate pairs so that result > must be rank-1. utf16 is not a J data type. > > 4 u: returns literal2 (a J data type) in which the concept of surrogate > pairs does not apply. literal2 has atom. > > try 7 u: 128512 to confirm the result is a surrogate pair. Also 9 u: 128512 > is a literal4 atom. > > pre-j805, 7 u: integer is a domain error, behavior of j805 is incompatible. > there will be an global parameter to restore the domain error so that it > becomes compatible again. the same applies to 8 u: integer. > > Pre-j805 only support literal2. > Utf16 was first introduced in j805. Your confusion might come from mixing > up literal2 and utf16. > > On 2 Apr, 2017 12:55 am, "robert therriault" <[email protected]> wrote: > > u: 30101 > 疕 > datatype u: 30101 > unicode > $ u: 30101 > > #$ u: 30101 > 0 NB. unicode (literal2) atom as expected > > 4 u: 30101 > 疕 > datatype 4 u: 30101 > unicode > $ 4 u: 30101 > > #$ 4 u: 30101 > 0 NB. unicode (literal2) atom as expected > > 7 u: 30101 > 疕 > datatype 7 u: 30101 > unicode > $ 7 u: 30101 > 1 NB. unicode (literal2) list of length 1 is unexpected > #$ 7 u: 30101 > 1 NB. rank 1 is unexpected > > > The dictionary suggests that with a right argument of literal2, then if all > values <128, convert to ASCII, otherwise as is. [0] > I believe that since the argument is > 128 the 'as is' case would apply and > that no change in shape should occur, but Unicode is a tricky beast and I > welcome enlightenment. > > Cheers, bob > > [0] http://www.jsoftware.com/help/dictionary/duco.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
