Recently there was a discussion on recognizing APL characters from other
characters for ";:". I had wanted to try using the boxed form for "mj"
instead of the literal form in ";:". So I gave it a try. Attached is a
script containing "sj" and "mj" which will recognize APL characters for U16
and U32. It also extends ";:" to handle U16 and U32.
Here a few examples:
NB. Normal J Statement
words 'sum=. (i.3 4)+/ .*0j4+pru 4'
+---+--+-+--+---+-+-+-+-+-+---+-+---+-+
|sum|=.|(|i.|3 4|)|+|/|.|*|0j4|+|pru|4|
+---+--+-+--+---+-+-+-+-+-+---+-+---+-+
NB. An equivalent APL statement
words U16 'sum← (⍳3 4)+.×0j4+pru 4'
+---+-+-+-+---+-+--+-+---+-+---+-+
|sum|←|(|⍳|3 4|)|+.|×|0j4|+|pru|4|
+---+-+-+-+---+-+--+-+---+-+---+-+
NB. Support for unicode
words '¿Qué tan difícil es aprender el lenguaje de programación J?'
+----+---+-------+--+--------+--+--------+--+------------+-+-+
|¿Qué|tan|difícil|es|aprender|el|lenguaje|de|programación|J|?|
+----+---+-------+--+--------+--+--------+--+------------+-+-+
NB. A mixture of types
,.<;.2 words U16 {{)n
x u^:(v0`v1`v2)y NB. <==> (x v0 y)u^:(x v1 y) (x v2 y)
(~R∊R∘.×R)/R←1↓ιR ⍝ Find all primes from 1 to R
Hvað er með alla umlóta?
The amount is £100
1 3 2j1+2 ¯3 2r7
}}
+-------------------------------------------------------------------+
|+-+-+--+-+--+-+--+-+--+-+-+-------------------------------------+-+|
||x|u|^:|(|v0|`|v1|`|v2|)|y|NB. <==> (x v0 y)u^:(x v1 y) (x v2 y)| ||
|+-+-+--+-+--+-+--+-+--+-+-+-------------------------------------+-+|
+-------------------------------------------------------------------+
|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-----------------------------+-+|
||(|~|R|∊|R|∘|.|×|R|)|/|R|←|1|↓|ι|R|⍝ Find all primes from 1 to R| ||
|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-----------------------------+-+|
+-------------------------------------------------------------------+
|+----+--+---+----+------+-+-+ |
||Hvað|er|með|alla|umlóta|?| | |
|+----+--+---+----+------+-+-+ |
+-------------------------------------------------------------------+
|+---+------+--+----+-+ |
||The|amount|is|£100| | |
|+---+------+--+----+-+ |
+-------------------------------------------------------------------+
|+-------+-+----+---+-+ |
||1 3 2j1|+|2 ¯3|2r7| | |
|+-------+-+----+---+-+ |
+-------------------------------------------------------------------+
NB. A speed test
ts=:6!:2,7!:2@]
y=: 1e6$' fourscore and seven years ago, our fathers'
ts ';: y'
0.0211792 4.26892e7
ts 'words y'
0.022107 2.8022e7
(;:-:words)y
1
I found the boxed form for "mj" quite workable. In addition it is able to
handle a wider range of data types.
As expected it is a little slower than the standard ";:" as it forces ";:"
to convert "literal" to "literal2". But I was surprised to see the drastic
reduction in memory required to run the modified version of ";:".
Note 'Sequential Machine Test'
This creates a definition of "mj" and "sj" for "Sequential Machine" to support
unicode and APL characters. This also adds support for U16 and U32.
Text consisting of legitimate ASCII characters in U8 or "literal" give results
identical to what ";:" does now. Other characters are treated like "alp-num"
(letters). The result is that multi-byte U8 bytes are copied as is into the
result.
So no invalid character symbols appear in the result.
U16 and U32 arguments include additional checking for unique unicode characters.
This results is simultaineously supporting both J syntax and APL syntax. APL
symbols
are treated like J primitives in that they are each boxed. The APL overbar "¯"
is
treated like J underbar "_". The APL lamp "⍝" is like "NB.".
This uses the boxed representation of "mj" instead of a character vector so only
necessary characters are specified. Any unrecognized characters result in (#mj)
which are handled by the last row of "sj" and treats the unrecognized
characters as
text (alp-num). The types of characters include those in the current ";:" with
three new character types: APL, lamp and unicode. "sj" is extended by three
rows and
three columns to handle APL characters and all other unicode characters.
The characters used for creating J primitives are put into the "other" box in
"mj".
That's just the name used for them J help. Really should change the name to "J".
Okay, what is an APL character? I copied the list of symbols from Dyalog's
language
bar. See the end of this script for the details of building the name "APL".
In summary:
for U8:
No real change as J primitives etc. are boxed as ";:" does now, but U8 start and
continue bytes are kept together as text.
for U16 and U32:
J primitives and APL characters (other than those specified below) are put into
separate boxes.
The APL overbar (¯) is treated like the J underbar (_) and is included as part
of a
number.
The APL lamp (⍝) starts a comment just like the J "NB.".
All other unicode characters are treated like "alp-num" and included in the
same box
with ASCII "alp-num". In other words, text words can include unicode characters.
This only makes APL characters be treated like J primitives. I have not tested
"sj"
thoroughly. See the Note at the bottom of this script on problem with J using
types
"literal", "literal2" and "literal4" when dealing with U8.
)
NB. Definition of mj as list of boxes including APL characters
other =: '=<>+*-%^$~|,;#!/\[]{}"`@&%?()' NB. First of primitives &
punctuation
space =: ' ' NB. Space and tab
letters =: 'NB'-.~a.{~(,(a.i.'Aa')+/i.26)
letterN =: 'N'
letterB =: 'B'
digits =: '0123456789_¯'
decimal =: '.'
colon =: ':'
quote =: ''''
leftcurly =: '{'
linefeed =: LF
rightcurly=: '}'
lamp =: 7 u: '⍝' NB. Start APL comment
APL =: 7 u:
'←×÷⍟⌹○⌈⌊⊥⊤⊣⊢≠≤≥≡≢∨∧⍲⍱↑↓⊂⊃⊆⌷⍋⍒⍳⍸∊⍷∪∩⌿⍀⍪⍴⌽⊖⍉¨⍨⍣∘⍤⍥⍞⎕⍠⌸⌺⌶⍎⍕⋄→⍵⍺∇⍬ι'
mj=:<other;space;letters;letterN;letterB;digits;decimal;colon;quote;leftcurly;linefeed;rightcurly;APL;lamp
NB. Definition of sj including APL and unicode support
sj=: <.0 10#:10*}.".;._2(0 :0)
' X S A N B 9 . : Q { LF } APL Lamp U ']0
1.1 0.0 2.1 3.1 2.1 6.1 1.1 1.1 7.1 11.1 10.1 12.1 15.1 9.1 16.1 NB. 0 space
1.2 0.3 2.2 3.2 2.2 6.2 1.0 1.0 7.2 11.2 10.2 12.2 15.2 9.2 16.2 NB. 1 other
1.2 0.3 2.0 2.0 2.0 2.0 1.0 1.0 7.2 11.2 10.2 12.2 15.2 9.2 16.0 NB. 2 alp/num
1.2 0.3 2.0 2.0 4.0 2.0 1.0 1.0 7.2 11.2 10.2 12.2 15.2 9.2 16.0 NB. 3 N
1.2 0.3 2.0 2.0 2.0 2.0 5.0 1.0 7.2 11.2 10.2 12.2 15.2 9.2 16.0 NB. 4 NB
9.0 9.0 9.0 9.0 9.0 9.0 1.0 1.0 9.0 9.0 10.2 9.0 9.2 9.2 16.0 NB. 5 NB.
1.4 0.5 6.0 6.0 6.0 6.0 6.0 1.0 7.4 11.4 10.2 12.4 12.2 9.2 16.0 NB. 6 num
7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 8.0 7.0 7.0 7.0 7.0 7.0 7.0 NB. 7 '
1.2 0.3 2.2 3.2 2.2 6.2 1.2 1.2 7.0 11.2 10.2 12.2 15.2 0.2 0.2 NB. 8 ''
9.0 9.0 9.0 9.0 9.0 9.0 9.0 9.0 9.0 9.0 10.2 9.0 9.0 9.0 9.0 NB. 9 comment
1.2 0.2 2.2 3.2 2.2 6.2 1.2 1.2 7.2 11.2 10.2 12.2 12.2 9.2 16.2 NB. 10 LF
1.2 0.3 2.2 3.2 2.2 6.2 1.0 1.0 7.2 13.0 10.2 1.2 1.2 9.2 16.2 NB. 11 {
1.2 0.3 2.2 3.2 2.2 6.2 1.0 1.0 7.2 1.2 10.2 14.0 14.2 9.2 16.2 NB. 12 }
1.2 0.3 2.2 3.2 2.2 6.2 1.7 1.7 7.2 1.2 10.2 1.2 1.2 9.2 16.2 NB. 13 {{
1.2 0.3 2.2 3.2 2.2 6.2 1.7 1.7 7.2 1.2 10.2 1.2 1.2 9.2 16.2 NB. 14 }}
1.2 0.2 2.2 3.2 2.2 6.2 1.2 1.0 7.2 11.2 10.2 12.2 15.2 9.2 16.2 NB. 15 APL
1.0 0.3 2.0 3.2 3.0 6.0 1.0 1.0 7.2 11.2 10.2 12.2 16.0 9.2 16.0 NB. 16
Unicode
)
words=:(0;sj;mj)&;:
U8 =: 8&u: NB. Convert to UTF-8
U16 =: 7&u: NB. Convert to UTF-16
U32 =: 9&u: NB. Convert to UTF-32
Upoint=: 3 u: 9 u: ] NB. Get numeric unicode point
Note 'A few examples'
NB. Normal J Statement
words 'sum=. (i.3 4)+/ .*0j4+pru 4'
NB. An equivalent APL statement
words U16 'sum← (⍳3 4)+.×0j4+pru 4'
NB. Support for unicode
words '¿Qué tan difícil es aprender el lenguaje de programación J?'
NB. A mixture of types
,.<;.2 words U16 {{)n
x u^:(v0`v1`v2)y NB. <==> (x v0 y)u^:(x v1 y) (x v2 y)
(~R∊R∘.×R)/R←1↓ιR ⍝ Find all primes from 1 to R
Hvað er með alla umlóta?
The amount is £100
1 3 2j1+2 ¯3 2r7
}}
NB. A speed test
ts=:6!:2,7!:2@]
y=: 1e6$' fourscore and seven years ago, our fathers'
ts ';: y'
ts 'words y'
(;:-:words)y
)
Note 'An interesting problem'
When the right argument of ";:" is "literal", ";:" converts "literal" to
"literal2"
when using the "mj" defined here because as as far as ";:" is concerned, "mj" is
boxed "literal2". This does not convert U8 to U16. There are three APL
characters
which fall in "a." - (°×÷) or (176 215 247{a.). Many unicode characters, when
represented as U8, will include one or more bytes. If ";:" is given any of these
characters as U8 including one of these bytes, it will be interpreted as an APL
character.
In the examples I have tried this does not seem to be a problem. But if the
right
argument of ";:" happens to be "literal", which is also U8, converting it to
U16 or
U32 avoids this possible problem.
)
Note 'Building APL characters from those in Dyalog APL'
Built a list of APL characters from Dyalogs's language bar named "dyalog". This
includes many characters already included in J primitives and other boxes
defining
"mj".
The final list of APL characters is built by removing duplicate characters from
"dialog". Those that are already handled by other boxes in "mj".
This is done at the end of this script to not complicate explanations in the
first
part. Copied Dyalog_APL back to the name "APL" at the top of this script.
Dyalog uses unicode point 9075 for iota, where other places use 953. Both look
about
the same. I switched to 953 for iota as it looks a little better. One solution
would
be to put both into APL. But not for now.
U16 32 9075 32 953
⍳ ι
)
NB. APL characters vary slightly depending on the source.
NB. Including all varations.
NB. Doesn't really matter as only going to box them.
build_APL=: 3 : 0
NB. From the Dyalog APL app
dyalog=.7 u:{{)n
←
+-×÷*⍟⌹○!?
|⌈⌊⊥⊤⊣⊢
=≠≤<>≥≡≢
∨∧⍲⍱
↑↓⊂⊃⊆⌷⍋⍒
⍳⍸∊⍷∪∩~
/\⌿⍀
,⍪⍴⌽⊖⍉
¨⍨⍣.∘⍤⍥@
⍞⎕⍠⌸⌺⌶⍎⍕
⋄⍝→⍵⍺∇&
¯⍬
}}
NB. Found in an APL script, a different iota
other_APL=.7 u:{{)n
ι
}}
l=.7 u:;
other;space;letters;letterN;letterB;digits;decimal;colon;quote;leftcurly;linefeed;rightcurly;lamp
APL=.~.(dyalog,other_APL)-.l
)----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm