Re: [Jbeta] U8 and unicode

Don Guinn Wed, 29 May 2013 10:30:32 -0700

I don't think so. But maybe. The thing is that everything in J assumes that
literal (char) is U8 except monadic u: default for char and concatenating
literal with unicode (wchar). Just make everything assume that literal is
U8. I can't think of a case where one would want it otherwise if the char
were really text. If there is a case where one wants to copy the lower byte
and zero the upper byte to wchar, apply 2&u: . Does anybody now use
_128{.a. characters for anything other than U8? If not, such a change
should not affect anyone. char data which does not contain any U8 would not
be affected.


Right now if one has both wchar and U8 in an application, care must be
taken to make sure that any char data that might contain U8 is run through
7&u: before concatenating it to wchar. Optimization may convert wchar to
char unbeknownst to the programmer. Not now probably, but who knows in the
future.

If I combine an integer with a real I expect the integer to be converted to
real before combining. It is not necessary for me to convert the integer to
real. Why not have concatenation of char and wchar work the same way? Like
I showed with z,":z, where the result of ":z is U8 gave unexpected results.

Before Unicode _128{.a. was needed for non-ASCII characters. Not any more.
Do away with the idea that literal and U8 are different.


On Wed, May 29, 2013 at 10:11 AM, bill lam <[email protected]> wrote:

> I do not quite understand your suggestion, did you mean change
> the semantics of u: ?  I hope not since it will break existing
> codes and I consider existing behaviour ok.
>
> 7&u: does not always result in a wide character whereas monad u:
> always does.
>
> Ср, 29 май 2013, Don Guinn писал(а):
> > The default conversion of literal to unicode (char to wchar) does not
> work
> > as one would expect. Simply setting the high-order byte to zero is like
> > converting integer to floating point by simply copying the bits as is.
> >
> > J favors representing Unicode as U8, not unicode (wchar). Not a problem.
> > And normally, if a literal contains characters in the range _128{.{a.
> they
> > represent U8 characters. Why not default conversion of literal to unicode
> > as if the literal might be U8? Make the default of monadic u: be 7&u:
> when
> > applied to a literal and when concatenating literal and unicode assume
> that
> > the literal is U8.
> >
> > The monadic default for u: to be 2&u: is not really a problem, but the
> > concatenation can easily result in errors.
> >
> >    z=.u:16b2211
> >
> >    3!:0 z
> >
> > 131072
> >
> >    3!:0 ":z
> >
> > 2
> >
> >    z,":z
> >
> > ∑âˆ‘
> >
> >    3 u: z,":z
> >
> > 8721 226 136 145
> >
> >    3 u: z,7 u: ":z
> >
> > 8721 8721
> >
> >
> > It seems to me that the last statement is what one would expect for
> > concatenation.
> >
> >
> > J8 is soon to be released. Although the way J handles U8 and unicode has
> > been around quite a while, this might be a good time to make change the
> > defaults, if you agree that the change would be good.
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jbeta] U8 and unicode

Reply via email to