J literal considers literal _128{.a. as valid. Those bytes are invalid in
UTF-8 as UTF-8 uses them to represent unicode values >127. To represent
those characters in UTF-8 require two literals each. It is not an oddness
in the UTF-8 specification. It is the confusion caused by treating UTF-8 as
literal.


On Thu, Feb 27, 2014 at 7:39 AM, Raul Miller <[email protected]> wrote:

> What you are seeing here is an oddness in the unicode specification.
>
>    a.i.8 u: 7 u: 'þ'
> 195 190
>
> 7 u: gives you utf-16 character representation, and 8 u: gives you
> utf-8 character representation.
>
> It just happens to be the case that the character value in the utf-16
> representation of thorn (þ) happens to be less than 256. But if you do
> not make a careful distinction between "literals" and "characters" you
> can confuse yourself by expecting the wrong thing here.
>
> Thanks,
>
> --
> Raul
>
> On Thu, Feb 27, 2014 at 7:03 AM, Björn Helgason <[email protected]> wrote:
> > There are a lot of strange things happening regarding national
> characters.
> >
> > þ is within the 256 chars but behaves strange regarding a.
> >
> > 7 u: 'þ'
> > þ
> > a. i. 7 u: 'þ'
> > 254
> >    254 { a.
> > �
> >    7 u: 254 { a.
> > |domain error
> > |   7     u:254{a.
> >    3 u: 254 { a.
> > 254
> >   'þ' = 254 { a.
> > 0 0
> >   (7 u: 'þ') = 254 { a.
> > 1
> >
> >
> > -
> > Björn Helgason
> > gsm:6985532
> > skype:gosiminn
> > On 26.2.2014 14:36, "Raul Miller" <[email protected]> wrote:
> >
> >> a. is just 256 literal characters, it is a noun.
> >>
> >> I expect u: might have been what you were thinking about? It's a verb.
> >>
> >> Thanks,
> >>
> >> --
> >> Raul
> >>
> >> On Wed, Feb 26, 2014 at 2:19 AM, Björn Helgason <[email protected]>
> wrote:
> >> > Actually I want a. back as it was.
> >> >
> >> > Giving me two or three number is wrong and is confusing at best.
> >> >
> >> > It should return the digital number for Unicode and only one number
> per
> >> > char.
> >> >
> >> > a. is the atomic vector and this way the atomic has grown to include
> all
> >> of
> >> > Unicode.
> >> >
> >> > -
> >> > Björn Helgason
> >> > gsm:6985532
> >> > skype:gosiminn
> >> > On 25.2.2014 16:10, "Björn Helgason" <[email protected]> wrote:
> >> >
> >> >> a. and especially i. a. - looking up chars indexes used to be useful.
> >> >>
> >> >> It is not as easy anymore.
> >> >>
> >> >> The national chars are often not in there with a single number.
> >> >>
> >> >> Sometimes two or three.
> >> >>
> >> >> Reading files also sometimes with unicode markings.
> >> >>
> >> >> -
> >> >> Björn Helgason
> >> >> gsm:6985532
> >> >> skype:gosiminn
> >> >> On 25.2.2014 14:03, "Don Guinn" <[email protected]> wrote:
> >> >>
> >> >>> I tried that a while back. I extended the table for ;: to treat the
> >> bytes
> >> >>> for _128{.a to be treated as letters which made all multi-byte UTF-8
> >> >>> treated as alphas. Statements were broken into tokens properly. But
> >> then I
> >> >>> found that the interpreter used the top half of a. internally. I
> >> mentioned
> >> >>> that in the forum a while back when someone noticed that some
> >> character in
> >> >>> there acted weird. Roger said that could be changed if needed.
> Might be
> >> >>> easy for Roger to change that but it didn't look so easy to me.
> >> >>>
> >> >>> I looked at the tables for Unicode (wide characters) and in the
> form of
> >> >>> UTF-8 and couldn't see any easy to distinguish the category of a
> >> >>> character.
> >> >>> Those that one would consider an alpha were mixed in with graphics
> and
> >> >>> controls. APL characters were not grouped together but scattered all
> >> over
> >> >>> the place.
> >> >>>
> >> >>> For trying it out and seeing what happens shouldn't be too
> difficult to
> >> >>> see
> >> >>> how it would work but there are a lot of questions to answer before
> >> making
> >> >>> it a production tool.
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, Feb 24, 2014 at 10:11 PM, bill lam <[email protected]>
> >> wrote:
> >> >>>
> >> >>> > This seems simpler. The first thing to do is build a prototype
> >> >>> > implementaton,
> >> >>> > and then we can see what are other problems out there.
> >> >>> >
> >> >>> > Пн, 24 фев 2014, Don Guinn писал(а):
> >> >>> > > A middle ground might be to allow for some Unicode (UTF-8) to be
> >> >>> > > considered letters like a-z,A-Z. Then one could name APL iota to
> >> >>> > something
> >> >>> > > like i. . In addition, it would allow non-English languages not
> be
> >> >>> > > restricted to ASCII characters for names. Greek letters in
> >> mathematics
> >> >>> > > could be used as names making statements look a little more like
> >> >>> > > traditional mathematics. It would be simpler to allow all
> Unicode
> >> >>> > > characters be considered letters, but that might lend to other
> >> >>> problems.
> >> >>> > >
> >> ----------------------------------------------------------------------
> >> >>> > > For information about J forums see
> >> >>> http://www.jsoftware.com/forums.htm
> >> >>> >
> >> >>> > --
> >> >>> > regards,
> >> >>> > ====================================================
> >> >>> > GPG key 1024D/4434BAB3 2008-08-24
> >> >>> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> >> >>> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> >> >>> >
> >> ----------------------------------------------------------------------
> >> >>> > For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >> >>>
> ----------------------------------------------------------------------
> >> >>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> >> >>
> >> >>
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to