There are a variety of different kinds of unicode "characters".

http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters

Looking at the ;: monad, the general classes of characters which one
would expect ;: to recognize are:

Numbers
Letters
Spaces
Other
(and "spelling error").

So the first step to implementing unicode support would be to
implement a ;: workalike (in user space) which classifies unicode
characters in the above groups (use the "spelling error" category for
anything that is too messy to deal with - errors like that can go away
in later implementations).

If this model was then incorporated in the interpreter you'd have a
fair bit more work to do. But that also could be modeled in user
space.

Thanks,

-- 
Raul

On Wed, Feb 26, 2014 at 11:39 PM, bill lam <[email protected]> wrote:
> If we simply allow unicode characters in names, although we can assign
> ascii primitives to APL-like names, but that will require space
> around each of those names, so that insead of writing
>
> life←{↑1 ⍵∨.∧3 4=+/,_1 0 1∘.⊖_1 0 1∘.⌽⊂⍵}
>
> we need to write
>
> life ← {↑ 1 ⍵ ∨ . ∧ 3 4=+/,¯1 0 1 ∘ .⊖ ¯1 0 1 ∘ . ⌽ ⊂ ⍵}'
>
> IMO this is paintful to write.  If we modified monad :; to let
> each unicode character above U+127 to parse as a token of its own.
> (beware of line wrapping)
>
>    ;:'life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}'
> +----+---+-+---+-+---+---+-+---+---+-+-+-+-+--+-----+---+-+---+--+-----+---+-+---+---+---+-+
> |life|←|{|↑|1|⍵|∨|.|∧|3 4|=|+|/|,|¯|1 0 1|∘|.|⊖|¯|1 0 1|∘|.|⌽|⊂|⍵|}|
> +----+---+-+---+-+---+---+-+---+---+-+-+-+-+--+-----+---+-+---+--+-----+---+-+---+---+---+-+
>
> The upper negative sign is troublesome, may be easier if replaced
> by underscore
>
>    ;:'life←{↑1 ⍵∨.∧3 4=+/,_1 0 1∘.⊖_1 0 1∘.⌽⊂⍵}'
> +----+---+-+---+-+---+---+-+---+---+-+-+-+-+------+---+-+---+------+---+-+---+---+---+-+
> |life|←|{|↑|1|⍵|∨|.|∧|3 4|=|+|/|,|_1 0 1|∘|.|⊖|_1 0 1|∘|.|⌽|⊂|⍵|}|
> +----+---+-+---+-+---+---+-+---+---+-+-+-+-+------+---+-+---+------+---+-+---+---+---+-+
>
> We can either map a specific set of APL characters to internal
> representation of primitives, or allow a name to be a single
> unicode character above U+127 in addition to the current
> definition of names.
>
> Ср, 26 фев 2014, Don Guinn писал(а):
>> The discussion keeps coming back to APL characters. For now not why not
>> just look at using international characters (Unicode/UTF-8) as letters in
>> names? One can assign some of those as letters for primitives if he wants.
>> Then see where things go.
>>
>> A problem with accepting unicode/UTF-8 characters is that there are so many
>> and there does not seem to be any pattern as to which are letters that can
>> be included in words and those to be considered as a words in themselves,
>> like + does not need spaces around it to be recognized as a word not
>> requiring spaces around it. What to do with Chinese characters?
>>
>> I have found it easier when dealing with UTF-8 to convert it to unicode, do
>> my processing, then convert it back to UTF-8. Then everything works as
>> before. Character counts are correct. Searching for characters works as
>> before. No games are needed because this way no characters take more than
>> one atom. The only thing I have do do differently is I don't use a. for
>> converting between characters and numbers.
>>
>> I think that it would enhance J to allow for international characters to be
>> accepted in J statements. Then the question of APL characters then becomes
>> easier to address. This is a low priority issue right now, but perhaps
>> later.
>>
>>
>> On Wed, Feb 26, 2014 at 12:19 AM, Björn Helgason <[email protected]> wrote:
>>
>> > Actually I want a. back as it was.
>> >
>> > Giving me two or three number is wrong and is confusing at best.
>> >
>> > It should return the digital number for Unicode and only one number per
>> > char.
>> >
>> > a. is the atomic vector and this way the atomic has grown to include all of
>> > Unicode.
>> >
>> > -
>> > Björn Helgason
>> > gsm:6985532
>> > skype:gosiminn
>> > On 25.2.2014 16:10, "Björn Helgason" <[email protected]> wrote:
>> >
>> > > a. and especially i. a. - looking up chars indexes used to be useful.
>> > >
>> > > It is not as easy anymore.
>> > >
>> > > The national chars are often not in there with a single number.
>> > >
>> > > Sometimes two or three.
>> > >
>> > > Reading files also sometimes with unicode markings.
>> > >
>> > > -
>> > > Björn Helgason
>> > > gsm:6985532
>> > > skype:gosiminn
>> > > On 25.2.2014 14:03, "Don Guinn" <[email protected]> wrote:
>> > >
>> > >> I tried that a while back. I extended the table for ;: to treat the
>> > bytes
>> > >> for _128{.a to be treated as letters which made all multi-byte UTF-8
>> > >> treated as alphas. Statements were broken into tokens properly. But
>> > then I
>> > >> found that the interpreter used the top half of a. internally. I
>> > mentioned
>> > >> that in the forum a while back when someone noticed that some character
>> > in
>> > >> there acted weird. Roger said that could be changed if needed. Might be
>> > >> easy for Roger to change that but it didn't look so easy to me.
>> > >>
>> > >> I looked at the tables for Unicode (wide characters) and in the form of
>> > >> UTF-8 and couldn't see any easy to distinguish the category of a
>> > >> character.
>> > >> Those that one would consider an alpha were mixed in with graphics and
>> > >> controls. APL characters were not grouped together but scattered all
>> > over
>> > >> the place.
>> > >>
>> > >> For trying it out and seeing what happens shouldn't be too difficult to
>> > >> see
>> > >> how it would work but there are a lot of questions to answer before
>> > making
>> > >> it a production tool.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Feb 24, 2014 at 10:11 PM, bill lam <[email protected]> wrote:
>> > >>
>> > >> > This seems simpler. The first thing to do is build a prototype
>> > >> > implementaton,
>> > >> > and then we can see what are other problems out there.
>> > >> >
>> > >> > Пн, 24 фев 2014, Don Guinn писал(а):
>> > >> > > A middle ground might be to allow for some Unicode (UTF-8) to be
>> > >> > > considered letters like a-z,A-Z. Then one could name APL iota to
>> > >> > something
>> > >> > > like i. . In addition, it would allow non-English languages not be
>> > >> > > restricted to ASCII characters for names. Greek letters in
>> > mathematics
>> > >> > > could be used as names making statements look a little more like
>> > >> > > traditional mathematics. It would be simpler to allow all Unicode
>> > >> > > characters be considered letters, but that might lend to other
>> > >> problems.
>> > >> > >
>> > ----------------------------------------------------------------------
>> > >> > > For information about J forums see
>> > >> http://www.jsoftware.com/forums.htm
>> > >> >
>> > >> > --
>> > >> > regards,
>> > >> > ====================================================
>> > >> > GPG key 1024D/4434BAB3 2008-08-24
>> > >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> > >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> > >> > ----------------------------------------------------------------------
>> > >> > For information about J forums see
>> > http://www.jsoftware.com/forums.htm
>> > >> ----------------------------------------------------------------------
>> > >> For information about J forums see http://www.jsoftware.com/forums.htm
>> > >
>> > >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to