That may be done, but I fear the shape will be inconsistent,
7 u: char atom => atom
7 u: lit2 atom => atom
but 7 u: int atom => atom or list depending on range of int.

In general J expects shape of result to be consistent for working of
automatic assembly of results and rank conjunction.

Having shape of 7 u: int to be rank-1 need not be the best choice, it is
still better than dependence of range of int.

4 u: int is different, its argument is always restricted in range, and
within this range, atomic result is guaranteed.




On 2 Apr, 2017 9:30 am, "robert therriault" <[email protected]> wrote:

Thanks Bill,

You are right I was confusing U16 with literal2. Part of the reason for that
is that,

    datatype 7 u: 3101
unicode
   datatype 4 u: 3101
unicode
   datatype  u: 3101
unicode

I guess that there is not really a way to distinguish the fact that 7 u:
3101 returns
U16 instead of literal2 without inventing a separate J datatype. It is nice
that this
allows the 7 u: to deal with unicode4 arguments rather seamlessly.

    datatype 9 u: 128512
unicode4
    7 u: 128512
😀
    datatype 7 u: 128512
unicode
   3 u: 7 u: 128512
55357 56832

But I do wonder if since

    7 u: 3101
ఝ
   {. 7 u: 3101
ఝ
   $ {. 7 u: 3101

   # $ {. 7 u: 3101
0

Could the single non-surrogate U16 act a bit more like the ASCII cases do,
or would
that break the U16 by being non-standard?
    7 u: 'a'
a
   $ 7 u: 'a'

   # $ 7 u: 'a'
0

Cheers, bob

> On Apr 1, 2017, at 5:54 PM, bill lam <[email protected]> wrote:
>
> the rght argument 30101 is an integer, not literal2.
>
> 7 u: returns utf16 not literal2. utf16 has surrogate pairs so that result
> must be rank-1. utf16 is not a J data type.
>
> 4 u: returns literal2 (a J data type) in which the concept of surrogate
> pairs  does not apply. literal2 has atom.
>
> try 7 u: 128512 to confirm the result is a surrogate pair. Also 9 u:
128512
> is a literal4 atom.
>
> pre-j805, 7 u: integer is a domain error, behavior of j805 is
incompatible.
> there will be an global parameter to restore the domain error so that it
> becomes compatible again. the same applies to 8 u: integer.
>
> Pre-j805 only support literal2.
> Utf16 was first introduced in j805. Your confusion might come from mixing
> up literal2 and utf16.
>
> On 2 Apr, 2017 12:55 am, "robert therriault" <[email protected]>
wrote:
>
>     u: 30101
> 疕
> datatype u: 30101
> unicode
> $ u: 30101
>
> #$ u: 30101
> 0 NB. unicode (literal2) atom as expected
>
> 4 u: 30101
> 疕
> datatype 4 u: 30101
> unicode
> $ 4 u: 30101
>
> #$ 4 u: 30101
> 0 NB. unicode (literal2) atom as expected
>
> 7 u: 30101
> 疕
> datatype 7 u: 30101
> unicode
> $ 7 u: 30101
> 1       NB. unicode (literal2) list of length 1 is unexpected
> #$ 7 u: 30101
> 1 NB. rank 1 is unexpected
>
>
> The dictionary suggests that with a right argument of literal2, then if
all
> values <128, convert to ASCII, otherwise as is. [0]
> I believe that since the argument is > 128 the 'as is' case would apply
and
> that no change in shape should occur, but Unicode is a tricky beast and I
> welcome enlightenment.
>
> Cheers, bob
>
> [0] http://www.jsoftware.com/help/dictionary/duco.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to