There are a variety of different kinds of unicode "characters". http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters
Looking at the ;: monad, the general classes of characters which one would expect ;: to recognize are: Numbers Letters Spaces Other (and "spelling error"). So the first step to implementing unicode support would be to implement a ;: workalike (in user space) which classifies unicode characters in the above groups (use the "spelling error" category for anything that is too messy to deal with - errors like that can go away in later implementations). If this model was then incorporated in the interpreter you'd have a fair bit more work to do. But that also could be modeled in user space. Thanks, -- Raul On Wed, Feb 26, 2014 at 11:39 PM, bill lam <[email protected]> wrote: > If we simply allow unicode characters in names, although we can assign > ascii primitives to APL-like names, but that will require space > around each of those names, so that insead of writing > > life←{↑1 ⍵∨.∧3 4=+/,_1 0 1∘.⊖_1 0 1∘.⌽⊂⍵} > > we need to write > > life ← {↑ 1 ⍵ ∨ . ∧ 3 4=+/,¯1 0 1 ∘ .⊖ ¯1 0 1 ∘ . ⌽ ⊂ ⍵}' > > IMO this is paintful to write. If we modified monad :; to let > each unicode character above U+127 to parse as a token of its own. > (beware of line wrapping) > > ;:'life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}' > +----+---+-+---+-+---+---+-+---+---+-+-+-+-+--+-----+---+-+---+--+-----+---+-+---+---+---+-+ > |life|←|{|↑|1|⍵|∨|.|∧|3 4|=|+|/|,|¯|1 0 1|∘|.|⊖|¯|1 0 1|∘|.|⌽|⊂|⍵|}| > +----+---+-+---+-+---+---+-+---+---+-+-+-+-+--+-----+---+-+---+--+-----+---+-+---+---+---+-+ > > The upper negative sign is troublesome, may be easier if replaced > by underscore > > ;:'life←{↑1 ⍵∨.∧3 4=+/,_1 0 1∘.⊖_1 0 1∘.⌽⊂⍵}' > +----+---+-+---+-+---+---+-+---+---+-+-+-+-+------+---+-+---+------+---+-+---+---+---+-+ > |life|←|{|↑|1|⍵|∨|.|∧|3 4|=|+|/|,|_1 0 1|∘|.|⊖|_1 0 1|∘|.|⌽|⊂|⍵|}| > +----+---+-+---+-+---+---+-+---+---+-+-+-+-+------+---+-+---+------+---+-+---+---+---+-+ > > We can either map a specific set of APL characters to internal > representation of primitives, or allow a name to be a single > unicode character above U+127 in addition to the current > definition of names. > > Ср, 26 фев 2014, Don Guinn писал(а): >> The discussion keeps coming back to APL characters. For now not why not >> just look at using international characters (Unicode/UTF-8) as letters in >> names? One can assign some of those as letters for primitives if he wants. >> Then see where things go. >> >> A problem with accepting unicode/UTF-8 characters is that there are so many >> and there does not seem to be any pattern as to which are letters that can >> be included in words and those to be considered as a words in themselves, >> like + does not need spaces around it to be recognized as a word not >> requiring spaces around it. What to do with Chinese characters? >> >> I have found it easier when dealing with UTF-8 to convert it to unicode, do >> my processing, then convert it back to UTF-8. Then everything works as >> before. Character counts are correct. Searching for characters works as >> before. No games are needed because this way no characters take more than >> one atom. The only thing I have do do differently is I don't use a. for >> converting between characters and numbers. >> >> I think that it would enhance J to allow for international characters to be >> accepted in J statements. Then the question of APL characters then becomes >> easier to address. This is a low priority issue right now, but perhaps >> later. >> >> >> On Wed, Feb 26, 2014 at 12:19 AM, Björn Helgason <[email protected]> wrote: >> >> > Actually I want a. back as it was. >> > >> > Giving me two or three number is wrong and is confusing at best. >> > >> > It should return the digital number for Unicode and only one number per >> > char. >> > >> > a. is the atomic vector and this way the atomic has grown to include all of >> > Unicode. >> > >> > - >> > Björn Helgason >> > gsm:6985532 >> > skype:gosiminn >> > On 25.2.2014 16:10, "Björn Helgason" <[email protected]> wrote: >> > >> > > a. and especially i. a. - looking up chars indexes used to be useful. >> > > >> > > It is not as easy anymore. >> > > >> > > The national chars are often not in there with a single number. >> > > >> > > Sometimes two or three. >> > > >> > > Reading files also sometimes with unicode markings. >> > > >> > > - >> > > Björn Helgason >> > > gsm:6985532 >> > > skype:gosiminn >> > > On 25.2.2014 14:03, "Don Guinn" <[email protected]> wrote: >> > > >> > >> I tried that a while back. I extended the table for ;: to treat the >> > bytes >> > >> for _128{.a to be treated as letters which made all multi-byte UTF-8 >> > >> treated as alphas. Statements were broken into tokens properly. But >> > then I >> > >> found that the interpreter used the top half of a. internally. I >> > mentioned >> > >> that in the forum a while back when someone noticed that some character >> > in >> > >> there acted weird. Roger said that could be changed if needed. Might be >> > >> easy for Roger to change that but it didn't look so easy to me. >> > >> >> > >> I looked at the tables for Unicode (wide characters) and in the form of >> > >> UTF-8 and couldn't see any easy to distinguish the category of a >> > >> character. >> > >> Those that one would consider an alpha were mixed in with graphics and >> > >> controls. APL characters were not grouped together but scattered all >> > over >> > >> the place. >> > >> >> > >> For trying it out and seeing what happens shouldn't be too difficult to >> > >> see >> > >> how it would work but there are a lot of questions to answer before >> > making >> > >> it a production tool. >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> On Mon, Feb 24, 2014 at 10:11 PM, bill lam <[email protected]> wrote: >> > >> >> > >> > This seems simpler. The first thing to do is build a prototype >> > >> > implementaton, >> > >> > and then we can see what are other problems out there. >> > >> > >> > >> > Пн, 24 фев 2014, Don Guinn писал(а): >> > >> > > A middle ground might be to allow for some Unicode (UTF-8) to be >> > >> > > considered letters like a-z,A-Z. Then one could name APL iota to >> > >> > something >> > >> > > like i. . In addition, it would allow non-English languages not be >> > >> > > restricted to ASCII characters for names. Greek letters in >> > mathematics >> > >> > > could be used as names making statements look a little more like >> > >> > > traditional mathematics. It would be simpler to allow all Unicode >> > >> > > characters be considered letters, but that might lend to other >> > >> problems. >> > >> > > >> > ---------------------------------------------------------------------- >> > >> > > For information about J forums see >> > >> http://www.jsoftware.com/forums.htm >> > >> > >> > >> > -- >> > >> > regards, >> > >> > ==================================================== >> > >> > GPG key 1024D/4434BAB3 2008-08-24 >> > >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> > >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> > >> > ---------------------------------------------------------------------- >> > >> > For information about J forums see >> > http://www.jsoftware.com/forums.htm >> > >> ---------------------------------------------------------------------- >> > >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > >> > > >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
