J literal considers literal _128{.a. as valid. Those bytes are invalid in
UTF-8 as UTF-8 uses them to represent unicode values >127. To represent
those characters in UTF-8 require two literals each. It is not an oddness
in the UTF-8 specification. It is the confusion caused by treating UTF-8 as
literal.On Thu, Feb 27, 2014 at 7:39 AM, Raul Miller <[email protected]> wrote: > What you are seeing here is an oddness in the unicode specification. > > a.i.8 u: 7 u: 'þ' > 195 190 > > 7 u: gives you utf-16 character representation, and 8 u: gives you > utf-8 character representation. > > It just happens to be the case that the character value in the utf-16 > representation of thorn (þ) happens to be less than 256. But if you do > not make a careful distinction between "literals" and "characters" you > can confuse yourself by expecting the wrong thing here. > > Thanks, > > -- > Raul > > On Thu, Feb 27, 2014 at 7:03 AM, Björn Helgason <[email protected]> wrote: > > There are a lot of strange things happening regarding national > characters. > > > > þ is within the 256 chars but behaves strange regarding a. > > > > 7 u: 'þ' > > þ > > a. i. 7 u: 'þ' > > 254 > > 254 { a. > > � > > 7 u: 254 { a. > > |domain error > > | 7 u:254{a. > > 3 u: 254 { a. > > 254 > > 'þ' = 254 { a. > > 0 0 > > (7 u: 'þ') = 254 { a. > > 1 > > > > > > - > > Björn Helgason > > gsm:6985532 > > skype:gosiminn > > On 26.2.2014 14:36, "Raul Miller" <[email protected]> wrote: > > > >> a. is just 256 literal characters, it is a noun. > >> > >> I expect u: might have been what you were thinking about? It's a verb. > >> > >> Thanks, > >> > >> -- > >> Raul > >> > >> On Wed, Feb 26, 2014 at 2:19 AM, Björn Helgason <[email protected]> > wrote: > >> > Actually I want a. back as it was. > >> > > >> > Giving me two or three number is wrong and is confusing at best. > >> > > >> > It should return the digital number for Unicode and only one number > per > >> > char. > >> > > >> > a. is the atomic vector and this way the atomic has grown to include > all > >> of > >> > Unicode. > >> > > >> > - > >> > Björn Helgason > >> > gsm:6985532 > >> > skype:gosiminn > >> > On 25.2.2014 16:10, "Björn Helgason" <[email protected]> wrote: > >> > > >> >> a. and especially i. a. - looking up chars indexes used to be useful. > >> >> > >> >> It is not as easy anymore. > >> >> > >> >> The national chars are often not in there with a single number. > >> >> > >> >> Sometimes two or three. > >> >> > >> >> Reading files also sometimes with unicode markings. > >> >> > >> >> - > >> >> Björn Helgason > >> >> gsm:6985532 > >> >> skype:gosiminn > >> >> On 25.2.2014 14:03, "Don Guinn" <[email protected]> wrote: > >> >> > >> >>> I tried that a while back. I extended the table for ;: to treat the > >> bytes > >> >>> for _128{.a to be treated as letters which made all multi-byte UTF-8 > >> >>> treated as alphas. Statements were broken into tokens properly. But > >> then I > >> >>> found that the interpreter used the top half of a. internally. I > >> mentioned > >> >>> that in the forum a while back when someone noticed that some > >> character in > >> >>> there acted weird. Roger said that could be changed if needed. > Might be > >> >>> easy for Roger to change that but it didn't look so easy to me. > >> >>> > >> >>> I looked at the tables for Unicode (wide characters) and in the > form of > >> >>> UTF-8 and couldn't see any easy to distinguish the category of a > >> >>> character. > >> >>> Those that one would consider an alpha were mixed in with graphics > and > >> >>> controls. APL characters were not grouped together but scattered all > >> over > >> >>> the place. > >> >>> > >> >>> For trying it out and seeing what happens shouldn't be too > difficult to > >> >>> see > >> >>> how it would work but there are a lot of questions to answer before > >> making > >> >>> it a production tool. > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> On Mon, Feb 24, 2014 at 10:11 PM, bill lam <[email protected]> > >> wrote: > >> >>> > >> >>> > This seems simpler. The first thing to do is build a prototype > >> >>> > implementaton, > >> >>> > and then we can see what are other problems out there. > >> >>> > > >> >>> > Пн, 24 фев 2014, Don Guinn писал(а): > >> >>> > > A middle ground might be to allow for some Unicode (UTF-8) to be > >> >>> > > considered letters like a-z,A-Z. Then one could name APL iota to > >> >>> > something > >> >>> > > like i. . In addition, it would allow non-English languages not > be > >> >>> > > restricted to ASCII characters for names. Greek letters in > >> mathematics > >> >>> > > could be used as names making statements look a little more like > >> >>> > > traditional mathematics. It would be simpler to allow all > Unicode > >> >>> > > characters be considered letters, but that might lend to other > >> >>> problems. > >> >>> > > > >> ---------------------------------------------------------------------- > >> >>> > > For information about J forums see > >> >>> http://www.jsoftware.com/forums.htm > >> >>> > > >> >>> > -- > >> >>> > regards, > >> >>> > ==================================================== > >> >>> > GPG key 1024D/4434BAB3 2008-08-24 > >> >>> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > >> >>> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > >> >>> > > >> ---------------------------------------------------------------------- > >> >>> > For information about J forums see > >> http://www.jsoftware.com/forums.htm > >> >>> > ---------------------------------------------------------------------- > >> >>> For information about J forums see > http://www.jsoftware.com/forums.htm > >> >> > >> >> > >> > ---------------------------------------------------------------------- > >> > For information about J forums see > http://www.jsoftware.com/forums.htm > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
