[Jchat] J subset compilation

2014-02-27 Thread Joe Bogner
I started to toy with the idea of working on a subset of J that can be
compiled. The idea has been brought up before in the past. Yes, I
realize it's a path fraught with peril and a likely premature demise.
Still, as a toy, I figure it would be fun.

Options for intermediate language:

1. GCC RTL - It looks very challenging and as far as I can tell would
require recompiling GCC, which is something I don't want to do

2. LLVM IR - I cobbled together a small example that emits IR which
can then be compiled. Hypothetically, we could use the LLVM C bindings
to generate IR from J words directly from J. For my proof of concept,
I used fsharp bindings since that seemed to be the easiest path on
wndows. Julia, Rust, and Haskell have LLVM targets.

3. C - Either clang, gcc or tinycc. Generating C doesn't seem as
stable and professional grade as the earlier options. However, I have
come across decent languages that do it (Chicken Scheme and others).
ELI does it, http://fastarray.appspot.com/compile.html. APEX also does
it, http://www.snakeisland.com/apexup.htm. Could be used to also JIT
code from J for small functions.

I'm currently edging towards the C path. Maybe that makes more sense
as a little proof of concept since it seems like it'd be quicker to
get up and running. My vision here is to implement a subset of verbs
in C ({ i. # $) .. really basic .. and then have a parser that takes a
series of J expressions and translates them into the corresponding
running list of C function calls - all in a single main().

Some things I'm excited about trying out:

1. clang/llvm vectorization -
http://blog.llvm.org/2013/05/llvm-33-vectorization-improvements.html

2. parallel computation and concurrency -
http://opensource.mlba-team.de/xdispatch/docs/current/index.html
--
For information about J forums see http://www.jsoftware.com/forums.htm


Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread chris burke
> The problem I am trying to point out is that the characters in _128{.a
fall
> in a no-man's land. They are ambiguous. Sometimes they are treated like
> 8-bit extended ASCII. Sometimes they are treated like UTF-8 compression
> characters.

I don't agree that _128{.a. fall into a no-man's land.

J text is utf8, so _128{.a. are ordinary bytes. They are not any kind of
characters, since they are not valid utf8.

Earlier versions of J (J5 and earlier?) treated them as 8-bit extended
ASCII, but this is no longer the case.


On Thu, Feb 27, 2014 at 3:14 PM, Don Guinn  wrote:

> Everybody wants to talk about handling APL characters. I'm for that too,
> but first we need to make it clear on how to handle UTF-8 or UTF-whatever.
> The problem I am trying to point out is that the characters in _128{.a fall
> in a no-man's land. They are ambiguous. Sometimes they are treated like
> 8-bit extended ASCII. Sometimes they are treated like UTF-8 compression
> characters.
>
>u,U
> þþ
> shows how display got confused. Is it supposed to display UTF-8? Or is it
> supposed to display 8-bit extended ASCII? Looks like it ran into an error
> attempting to display it as UTF-8 so it switched to 8-bit extended ASCII.
> ": output is always literal. So
>#":u,U
> 6
>a.i.":u,U
> 195 131 194 190 195 190
> switched all the 8-bit extended ASCII to UTF-8. But sometimes it just puts
> in � when it can't figure out what to do. Maybe it should have displayed
> the 8-bit extended ASCII instead. The trouble is that the character þ is
> ambiguous.
>
> The reason why 7 u: 254{a. is an error is because 7 u. specifically has
> UTF-8 or ASCII as a right argument. 254{a. is neither. It is what I have
> been calling 8-bit extended ASCII.
>
> Before we can even hope to effectively deal with APL characters we need to
> be very clear on how to handle UTF-8.
> --
> For information about J forums see http://www.jsoftware.com/forums.htm
>
--
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Roger Hui
The URL got lost:  *My Favorite APL Symbol
*



On Thu, Feb 27, 2014 at 3:54 PM, Roger Hui wrote:

> > That brings up an interesting question... How DID the APL character set
> get
> > designed? Some IBM graphic designers? Ken? Who?
>
> I believe it was Ken Iverson.  It can not have been a mere graphic
> designer because the design is too exquisitely good for that.  Some
> discussions on the topic:
>
> *The Design of APL *(section 
> 2).
>
> *APL\360 History *.
>  Search for "design of the typ" [sic], two occurrences.
>
> *APL Quotations and 
> Anecdotes*,
> starting at the exchange between Brooker and Iverson.
>
> *My Favorite APL Symbol*​
>
>
>
>
> On Thu, Feb 27, 2014 at 3:25 PM, Skip Cave wrote:
>
>> As Eric has pointed out, J has carefully picked related pairs of ASCII
>> characters which graphically show the connection between related
>> functions.
>> APL did this as well. The problem arises when we realize that there are
>> many J primitives in related groups which don't have any APL characters
>> that would fit, and neither are there any sets of unicode glyphs which
>> have
>> the appropriate graphical characteristics that would suggest that
>> relatedness.
>>
>> It is clear. To do the J-to-single-glyph conversion right we would need
>> some new glyphs. That would likely require an expert graphical designer
>> who
>> was also either a mathematician or a programmer, who could express the
>> functionality AND the relatedness of related primitives in a single glyph.
>> Good luck with that.
>>
>> It could be done by someone with the right skill set, but who (or what
>> group of people) would that be?
>>
>> That brings up an interesting question... How DID the APL character set
>> get
>> designed? Some IBM graphic designers? Ken? Who?
>>
>> Skip
>>
>>
>> Skip Cave
>> Cave Consulting LLC
>>
>
--
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Eric Iverson
The APL gplyphs, down to the finest details, were designed by a very
small group (not graphic designers, but perfectionists in their own
way) in a small room with tight deadlines. I doubt the quality can be
duplicated. We should focus on programming, which is what we do, and
use the excellent tools at hand. If you require APL gplyphs to be
happy, there are excellent tools that support them. If ASCII is good
enough for you, then J is a treat in its own way.

On Thu, Feb 27, 2014 at 6:25 PM, Skip Cave  wrote:
> As Eric has pointed out, J has carefully picked related pairs of ASCII
> characters which graphically show the connection between related functions.
> APL did this as well. The problem arises when we realize that there are
> many J primitives in related groups which don't have any APL characters
> that would fit, and neither are there any sets of unicode glyphs which have
> the appropriate graphical characteristics that would suggest that
> relatedness.
>
> It is clear. To do the J-to-single-glyph conversion right we would need
> some new glyphs. That would likely require an expert graphical designer who
> was also either a mathematician or a programmer, who could express the
> functionality AND the relatedness of related primitives in a single glyph.
> Good luck with that.
>
> It could be done by someone with the right skill set, but who (or what
> group of people) would that be?
>
> That brings up an interesting question... How DID the APL character set get
> designed? Some IBM graphic designers? Ken? Who?
>
> Skip
>
>
> Skip Cave
> Cave Consulting LLC
>
>
> On Thu, Feb 27, 2014 at 4:49 PM, PMA  wrote:
>
>> I'm most concerned, that the integrity of J's vocabulary's
>> internal relationships (cited yesterday by I-forget-whom)
>> not be compromised.
>>
>> P
>>
>>
>> Eric Iverson wrote:
>>
>>> It is trivial to display/enter fancy single glyphs for J. At least as
>>> trivial (which I personally don't consider to be trivial) as it is for
>>> APL.
>>>
>>> Those interested (and I am definitely not) should ignore this trivial
>>> aspect of the problem and focus on the hard part which is the mapping
>>> of single unicode glyphs to J primitives.
>>>
>>> What are the glyphs for =. =: +. *. *: =: ? ?. etc.?
>>>
>>> Perhaps if I saw a complete proposal for this glyph to J mapping that
>>> had some communitiy consensus, I wouldn't be so profoundly negative on
>>> discussions like this.
>>>
>>> Keep in mind that the APL folk paid enormous attention to the
>>> appearance of the APL glyphs. A hodge podge of unicode glyphs from
>>> random parts of unicode fonts would not please anyone who loved APL.
>>>
>>> We have beaten this dead horse every year for 25 years. When will it
>>> be out of its misery?
>>>
>>>
>>> Something that people
>>>
>>> On Thu, Feb 27, 2014 at 5:10 PM, Joe Bogner  wrote:
>>>
 On Thu, Feb 27, 2014 at 4:56 PM, Skip Cave
  wrote:

> Just my two cents worth...
>
> As an old APL (occasional) programmer, I always wanted a way to flip a
> switch in the J editor and turn J's 2-character primitives into APL
> characters (where appropriate), and either leave J's unique verbs alone,
> have the community decide on an appropriate single glyph, or let me
> pick a
> symbol for those myself. Then I could always flip that switch in the
> editor
> back, and see the actual J code, any time I wanted.
>


 It seems like it would be straightforward to create a JHS editor and
 viewer that toggles between the character sets. Could it just parse
 the words and replace with the APL character from a lookup table? The
 editor would replace back to ASCII before evaluation.
 --
 For information about J forums see http://www.jsoftware.com/forums.htm

>>> --
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>>
>> --
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> --
> For information about J forums see http://www.jsoftware.com/forums.htm
--
For information about J forums see http://www.jsoftware.com/forums.htm


Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Roger Hui
> That brings up an interesting question... How DID the APL character set
get
> designed? Some IBM graphic designers? Ken? Who?

I believe it was Ken Iverson.  It can not have been a mere graphic designer
because the design is too exquisitely good for that.  Some discussions on
the topic:

*The Design of APL *(section 2).

*APL\360 History *.
 Search for "design of the typ" [sic], two occurrences.

*APL Quotations and
Anecdotes*,
starting at the exchange between Brooker and Iverson.

*My Favorite APL Symbol *​




On Thu, Feb 27, 2014 at 3:25 PM, Skip Cave  wrote:

> As Eric has pointed out, J has carefully picked related pairs of ASCII
> characters which graphically show the connection between related functions.
> APL did this as well. The problem arises when we realize that there are
> many J primitives in related groups which don't have any APL characters
> that would fit, and neither are there any sets of unicode glyphs which have
> the appropriate graphical characteristics that would suggest that
> relatedness.
>
> It is clear. To do the J-to-single-glyph conversion right we would need
> some new glyphs. That would likely require an expert graphical designer who
> was also either a mathematician or a programmer, who could express the
> functionality AND the relatedness of related primitives in a single glyph.
> Good luck with that.
>
> It could be done by someone with the right skill set, but who (or what
> group of people) would that be?
>
> That brings up an interesting question... How DID the APL character set get
> designed? Some IBM graphic designers? Ken? Who?
>
> Skip
>
>
> Skip Cave
> Cave Consulting LLC
>
--
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Skip Cave
As Eric has pointed out, J has carefully picked related pairs of ASCII
characters which graphically show the connection between related functions.
APL did this as well. The problem arises when we realize that there are
many J primitives in related groups which don't have any APL characters
that would fit, and neither are there any sets of unicode glyphs which have
the appropriate graphical characteristics that would suggest that
relatedness.

It is clear. To do the J-to-single-glyph conversion right we would need
some new glyphs. That would likely require an expert graphical designer who
was also either a mathematician or a programmer, who could express the
functionality AND the relatedness of related primitives in a single glyph.
Good luck with that.

It could be done by someone with the right skill set, but who (or what
group of people) would that be?

That brings up an interesting question... How DID the APL character set get
designed? Some IBM graphic designers? Ken? Who?

Skip


Skip Cave
Cave Consulting LLC


On Thu, Feb 27, 2014 at 4:49 PM, PMA  wrote:

> I'm most concerned, that the integrity of J's vocabulary's
> internal relationships (cited yesterday by I-forget-whom)
> not be compromised.
>
> P
>
>
> Eric Iverson wrote:
>
>> It is trivial to display/enter fancy single glyphs for J. At least as
>> trivial (which I personally don't consider to be trivial) as it is for
>> APL.
>>
>> Those interested (and I am definitely not) should ignore this trivial
>> aspect of the problem and focus on the hard part which is the mapping
>> of single unicode glyphs to J primitives.
>>
>> What are the glyphs for =. =: +. *. *: =: ? ?. etc.?
>>
>> Perhaps if I saw a complete proposal for this glyph to J mapping that
>> had some communitiy consensus, I wouldn't be so profoundly negative on
>> discussions like this.
>>
>> Keep in mind that the APL folk paid enormous attention to the
>> appearance of the APL glyphs. A hodge podge of unicode glyphs from
>> random parts of unicode fonts would not please anyone who loved APL.
>>
>> We have beaten this dead horse every year for 25 years. When will it
>> be out of its misery?
>>
>>
>> Something that people
>>
>> On Thu, Feb 27, 2014 at 5:10 PM, Joe Bogner  wrote:
>>
>>> On Thu, Feb 27, 2014 at 4:56 PM, Skip Cave
>>>  wrote:
>>>
 Just my two cents worth...

 As an old APL (occasional) programmer, I always wanted a way to flip a
 switch in the J editor and turn J's 2-character primitives into APL
 characters (where appropriate), and either leave J's unique verbs alone,
 have the community decide on an appropriate single glyph, or let me
 pick a
 symbol for those myself. Then I could always flip that switch in the
 editor
 back, and see the actual J code, any time I wanted.

>>>
>>>
>>> It seems like it would be straightforward to create a JHS editor and
>>> viewer that toggles between the character sets. Could it just parse
>>> the words and replace with the APL character from a lookup table? The
>>> editor would replace back to ASCII before evaluation.
>>> --
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>> --
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>>
> --
> For information about J forums see http://www.jsoftware.com/forums.htm
>
--
For information about J forums see http://www.jsoftware.com/forums.htm


Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Don Guinn
Everybody wants to talk about handling APL characters. I'm for that too,
but first we need to make it clear on how to handle UTF-8 or UTF-whatever.
The problem I am trying to point out is that the characters in _128{.a fall
in a no-man's land. They are ambiguous. Sometimes they are treated like
8-bit extended ASCII. Sometimes they are treated like UTF-8 compression
characters.

   u,U
þþ
shows how display got confused. Is it supposed to display UTF-8? Or is it
supposed to display 8-bit extended ASCII? Looks like it ran into an error
attempting to display it as UTF-8 so it switched to 8-bit extended ASCII.
": output is always literal. So
   #":u,U
6
   a.i.":u,U
195 131 194 190 195 190
switched all the 8-bit extended ASCII to UTF-8. But sometimes it just puts
in � when it can't figure out what to do. Maybe it should have displayed
the 8-bit extended ASCII instead. The trouble is that the character þ is
ambiguous.

The reason why 7 u: 254{a. is an error is because 7 u. specifically has
UTF-8 or ASCII as a right argument. 254{a. is neither. It is what I have
been calling 8-bit extended ASCII.

Before we can even hope to effectively deal with APL characters we need to
be very clear on how to handle UTF-8.
--
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread PMA

I'm most concerned, that the integrity of J's vocabulary's
internal relationships (cited yesterday by I-forget-whom)
not be compromised.

P

Eric Iverson wrote:

It is trivial to display/enter fancy single glyphs for J. At least as
trivial (which I personally don't consider to be trivial) as it is for
APL.

Those interested (and I am definitely not) should ignore this trivial
aspect of the problem and focus on the hard part which is the mapping
of single unicode glyphs to J primitives.

What are the glyphs for =. =: +. *. *: =: ? ?. etc.?

Perhaps if I saw a complete proposal for this glyph to J mapping that
had some communitiy consensus, I wouldn't be so profoundly negative on
discussions like this.

Keep in mind that the APL folk paid enormous attention to the
appearance of the APL glyphs. A hodge podge of unicode glyphs from
random parts of unicode fonts would not please anyone who loved APL.

We have beaten this dead horse every year for 25 years. When will it
be out of its misery?


Something that people

On Thu, Feb 27, 2014 at 5:10 PM, Joe Bogner  wrote:

On Thu, Feb 27, 2014 at 4:56 PM, Skip Cave  wrote:

Just my two cents worth...

As an old APL (occasional) programmer, I always wanted a way to flip a
switch in the J editor and turn J's 2-character primitives into APL
characters (where appropriate), and either leave J's unique verbs alone,
have the community decide on an appropriate single glyph, or let me pick a
symbol for those myself. Then I could always flip that switch in the editor
back, and see the actual J code, any time I wanted.



It seems like it would be straightforward to create a JHS editor and
viewer that toggles between the character sets. Could it just parse
the words and replace with the APL character from a lookup table? The
editor would replace back to ASCII before evaluation.
--
For information about J forums see http://www.jsoftware.com/forums.htm

--
For information about J forums see http://www.jsoftware.com/forums.htm



--
For information about J forums see http://www.jsoftware.com/forums.htm


Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Eric Iverson
It is trivial to display/enter fancy single glyphs for J. At least as
trivial (which I personally don't consider to be trivial) as it is for
APL.

Those interested (and I am definitely not) should ignore this trivial
aspect of the problem and focus on the hard part which is the mapping
of single unicode glyphs to J primitives.

What are the glyphs for =. =: +. *. *: =: ? ?. etc.?

Perhaps if I saw a complete proposal for this glyph to J mapping that
had some communitiy consensus, I wouldn't be so profoundly negative on
discussions like this.

Keep in mind that the APL folk paid enormous attention to the
appearance of the APL glyphs. A hodge podge of unicode glyphs from
random parts of unicode fonts would not please anyone who loved APL.

We have beaten this dead horse every year for 25 years. When will it
be out of its misery?


Something that people

On Thu, Feb 27, 2014 at 5:10 PM, Joe Bogner  wrote:
> On Thu, Feb 27, 2014 at 4:56 PM, Skip Cave  wrote:
>> Just my two cents worth...
>>
>> As an old APL (occasional) programmer, I always wanted a way to flip a
>> switch in the J editor and turn J's 2-character primitives into APL
>> characters (where appropriate), and either leave J's unique verbs alone,
>> have the community decide on an appropriate single glyph, or let me pick a
>> symbol for those myself. Then I could always flip that switch in the editor
>> back, and see the actual J code, any time I wanted.
>
>
> It seems like it would be straightforward to create a JHS editor and
> viewer that toggles between the character sets. Could it just parse
> the words and replace with the APL character from a lookup table? The
> editor would replace back to ASCII before evaluation.
> --
> For information about J forums see http://www.jsoftware.com/forums.htm
--
For information about J forums see http://www.jsoftware.com/forums.htm


Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread PMA

A beautiful two cents worth!

Pete


Skip Cave wrote:

Just my two cents worth...

As an old APL (occasional) programmer, I always wanted a way to flip a
switch in the J editor and turn J's 2-character primitives into APL
characters (where appropriate), and either leave J's unique verbs alone,
have the community decide on an appropriate single glyph, or let me pick a
symbol for those myself. Then I could always flip that switch in the editor
back, and see the actual J code, any time I wanted.

For me, it was never about how many characters I had to type. It was about
what I saw, when I looked at the code. IMHO, the APL single glyphs just
made the functionality of programs much easier to grasp as I read through
them.

  If I am entering code and the switch was in APL mode, I could just type
the actual J 2-character primitives, and the one-character APL symbol would
appear on the screen.

When sending code around, I can always send the normal ASCII J
representation (like sending the compiled binaries of a program), and the
receiver of the code would have the option of looking at the J code in its
native form, or viewing the APL-like symbols.

I'm sure this plan has many (undiscovered by me) flaws, but it is my
dream...

Skip



Skip Cave
Cave Consulting LLC


On Thu, Feb 27, 2014 at 1:03 PM, Don Guinn  wrote:


This discussion started out on using APL characters as executable in J. I'm
not sure I would want to make many equivalences between APL symbols and J
primitives; however, representing APL characters and international
characters gets into the way J handles these characters with the character
types literal, unicode and UTF-8.

Those not interested bail out now as the rest is kind of boring, but my
soap-box.

About the time mini-computers and personal computers became common 7-bit
ASCII was well-established standard. But since by this time computers had
standardized on 8 bits to the character. This extra bit allowed for
supporting international characters and still fit in the byte. In addition,
APL used those extra characters to support APL characters. But this lead to
confusion since those characters varied between countries and systems.

Unicode was created to attempt to clean this mess up. It took the 7-bit
ASCII and a fairly accepted version of the 8-bit version of extended ASCII
and added leading zeros up to 32 bits. Now there is all kinds of room to
support many languages in a compatible manner.

Enter UCS Transformation Format, in particular UTF-8. There are many
problems with Unicode as it made ASCII files much larger and take longer to
send over slow communications lines. And there is the endian issue between
different computers. UTF-8 is an ingenious technique to compress unicode in
a manner that is completely compatible with 7-bit ASCII. The endian problem
is eliminated. It is not compatible with 8-bit ASCII extensions. 7-bit
ASCII text looks identical to UTF-8 text. The 8-bit ASCII extensions text
does not. Those characters become two bytes each using the UTF-8
compression algorithm.

J converts literal to unicode by simply putting a zero byte in front
extending it to the the 16-bit version of Unicode implemented in Windows
and Unix. This is perfectly valid as the numeric values of the first 256
Unicode letters match the 8-bit ASCII extension. UTF-8 assumes that
_128{.a. characters in literal are used in the compression algorithm. That
they do not represent extended ASCII. But J treats UTF-8 as literal making
it impossible to tell if those characters represent extended ASCII or UTF-8
compression.

UTF-8 is a compressed version of Unicode that J fits in literal. J treats
literal as 8-bit extended ASCII when combining and converting to/from
unicode (wide). It treats literal as UTF-8 when entered from the keyboard
and displayed. Got a bit of an inconsistency here.

U =: 7 u: u =: 'þ'

3!:0 u   NB. u is literal

2

3!:0 U   NB. U is unicode

131072

#u   NB. u takes 2 atoms

2

#U   NB. U takes 1 atom

1

'abc',u  NB. ASCII literals catenate with UTF-8

abcþ

'abc',U  NB. ASCII literals catenate with unicode

abcþ

u,U  NB. UTF-8 literals do not catenate well with unicode

þþ

a.i.u,U  NB. Here we have þ in two forms

195 190 254

So, when programming in J one must never mix UTF-8 and unicode without
being extremely careful and aware of what can happen. It is easiest to use
ASCII and UTF-8 together. Not a problem as one cannot get any unicode into
J without specifically converting to unicode using u: .

The alternative is to make sure all text that might contain UTF-8 is
converted to unicode. That can be difficult at times.

The trouble with mixing ASCII and UTF-8 is that J primitives work on the
atoms of literal. Any UTF-8 are treated as 8-bit extended ASCII. Counting
characters and reshaping fail with UTF-8. Searching for UTF-8 characters is
harder. An example of a failure character counting with UTF-8 is the
displaying of boxed literals.

h

Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Joe Bogner
On Thu, Feb 27, 2014 at 4:56 PM, Skip Cave  wrote:
> Just my two cents worth...
>
> As an old APL (occasional) programmer, I always wanted a way to flip a
> switch in the J editor and turn J's 2-character primitives into APL
> characters (where appropriate), and either leave J's unique verbs alone,
> have the community decide on an appropriate single glyph, or let me pick a
> symbol for those myself. Then I could always flip that switch in the editor
> back, and see the actual J code, any time I wanted.


It seems like it would be straightforward to create a JHS editor and
viewer that toggles between the character sets. Could it just parse
the words and replace with the APL character from a lookup table? The
editor would replace back to ASCII before evaluation.
--
For information about J forums see http://www.jsoftware.com/forums.htm


Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Raul Miller
I've seen about two dozen schemes for representing apl as ascii. I
imagine we could do the same thing in reverse?

Thanks,

-- 
Raul

On Thu, Feb 27, 2014 at 4:56 PM, Skip Cave  wrote:
> Just my two cents worth...
>
> As an old APL (occasional) programmer, I always wanted a way to flip a
> switch in the J editor and turn J's 2-character primitives into APL
> characters (where appropriate), and either leave J's unique verbs alone,
> have the community decide on an appropriate single glyph, or let me pick a
> symbol for those myself. Then I could always flip that switch in the editor
> back, and see the actual J code, any time I wanted.
>
> For me, it was never about how many characters I had to type. It was about
> what I saw, when I looked at the code. IMHO, the APL single glyphs just
> made the functionality of programs much easier to grasp as I read through
> them.
>
>  If I am entering code and the switch was in APL mode, I could just type
> the actual J 2-character primitives, and the one-character APL symbol would
> appear on the screen.
>
> When sending code around, I can always send the normal ASCII J
> representation (like sending the compiled binaries of a program), and the
> receiver of the code would have the option of looking at the J code in its
> native form, or viewing the APL-like symbols.
>
> I'm sure this plan has many (undiscovered by me) flaws, but it is my
> dream...
>
> Skip
>
>
>
> Skip Cave
> Cave Consulting LLC
>
>
> On Thu, Feb 27, 2014 at 1:03 PM, Don Guinn  wrote:
>
>> This discussion started out on using APL characters as executable in J. I'm
>> not sure I would want to make many equivalences between APL symbols and J
>> primitives; however, representing APL characters and international
>> characters gets into the way J handles these characters with the character
>> types literal, unicode and UTF-8.
>>
>> Those not interested bail out now as the rest is kind of boring, but my
>> soap-box.
>>
>> About the time mini-computers and personal computers became common 7-bit
>> ASCII was well-established standard. But since by this time computers had
>> standardized on 8 bits to the character. This extra bit allowed for
>> supporting international characters and still fit in the byte. In addition,
>> APL used those extra characters to support APL characters. But this lead to
>> confusion since those characters varied between countries and systems.
>>
>> Unicode was created to attempt to clean this mess up. It took the 7-bit
>> ASCII and a fairly accepted version of the 8-bit version of extended ASCII
>> and added leading zeros up to 32 bits. Now there is all kinds of room to
>> support many languages in a compatible manner.
>>
>> Enter UCS Transformation Format, in particular UTF-8. There are many
>> problems with Unicode as it made ASCII files much larger and take longer to
>> send over slow communications lines. And there is the endian issue between
>> different computers. UTF-8 is an ingenious technique to compress unicode in
>> a manner that is completely compatible with 7-bit ASCII. The endian problem
>> is eliminated. It is not compatible with 8-bit ASCII extensions. 7-bit
>> ASCII text looks identical to UTF-8 text. The 8-bit ASCII extensions text
>> does not. Those characters become two bytes each using the UTF-8
>> compression algorithm.
>>
>> J converts literal to unicode by simply putting a zero byte in front
>> extending it to the the 16-bit version of Unicode implemented in Windows
>> and Unix. This is perfectly valid as the numeric values of the first 256
>> Unicode letters match the 8-bit ASCII extension. UTF-8 assumes that
>> _128{.a. characters in literal are used in the compression algorithm. That
>> they do not represent extended ASCII. But J treats UTF-8 as literal making
>> it impossible to tell if those characters represent extended ASCII or UTF-8
>> compression.
>>
>> UTF-8 is a compressed version of Unicode that J fits in literal. J treats
>> literal as 8-bit extended ASCII when combining and converting to/from
>> unicode (wide). It treats literal as UTF-8 when entered from the keyboard
>> and displayed. Got a bit of an inconsistency here.
>>
>>U =: 7 u: u =: 'þ'
>>
>>3!:0 u   NB. u is literal
>>
>> 2
>>
>>3!:0 U   NB. U is unicode
>>
>> 131072
>>
>>#u   NB. u takes 2 atoms
>>
>> 2
>>
>>#U   NB. U takes 1 atom
>>
>> 1
>>
>>'abc',u  NB. ASCII literals catenate with UTF-8
>>
>> abcþ
>>
>>'abc',U  NB. ASCII literals catenate with unicode
>>
>> abcþ
>>
>>u,U  NB. UTF-8 literals do not catenate well with unicode
>>
>> þþ
>>
>>a.i.u,U  NB. Here we have þ in two forms
>>
>> 195 190 254
>>
>> So, when programming in J one must never mix UTF-8 and unicode without
>> being extremely careful and aware of what can happen. It is easiest to use
>> ASCII and UTF-8 together. Not a problem as one cannot get any unicode into
>> J without specifically converting to unicode using u: .
>>
>> The alternative is to make sure all text tha

Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Skip Cave
Just my two cents worth...

As an old APL (occasional) programmer, I always wanted a way to flip a
switch in the J editor and turn J's 2-character primitives into APL
characters (where appropriate), and either leave J's unique verbs alone,
have the community decide on an appropriate single glyph, or let me pick a
symbol for those myself. Then I could always flip that switch in the editor
back, and see the actual J code, any time I wanted.

For me, it was never about how many characters I had to type. It was about
what I saw, when I looked at the code. IMHO, the APL single glyphs just
made the functionality of programs much easier to grasp as I read through
them.

 If I am entering code and the switch was in APL mode, I could just type
the actual J 2-character primitives, and the one-character APL symbol would
appear on the screen.

When sending code around, I can always send the normal ASCII J
representation (like sending the compiled binaries of a program), and the
receiver of the code would have the option of looking at the J code in its
native form, or viewing the APL-like symbols.

I'm sure this plan has many (undiscovered by me) flaws, but it is my
dream...

Skip



Skip Cave
Cave Consulting LLC


On Thu, Feb 27, 2014 at 1:03 PM, Don Guinn  wrote:

> This discussion started out on using APL characters as executable in J. I'm
> not sure I would want to make many equivalences between APL symbols and J
> primitives; however, representing APL characters and international
> characters gets into the way J handles these characters with the character
> types literal, unicode and UTF-8.
>
> Those not interested bail out now as the rest is kind of boring, but my
> soap-box.
>
> About the time mini-computers and personal computers became common 7-bit
> ASCII was well-established standard. But since by this time computers had
> standardized on 8 bits to the character. This extra bit allowed for
> supporting international characters and still fit in the byte. In addition,
> APL used those extra characters to support APL characters. But this lead to
> confusion since those characters varied between countries and systems.
>
> Unicode was created to attempt to clean this mess up. It took the 7-bit
> ASCII and a fairly accepted version of the 8-bit version of extended ASCII
> and added leading zeros up to 32 bits. Now there is all kinds of room to
> support many languages in a compatible manner.
>
> Enter UCS Transformation Format, in particular UTF-8. There are many
> problems with Unicode as it made ASCII files much larger and take longer to
> send over slow communications lines. And there is the endian issue between
> different computers. UTF-8 is an ingenious technique to compress unicode in
> a manner that is completely compatible with 7-bit ASCII. The endian problem
> is eliminated. It is not compatible with 8-bit ASCII extensions. 7-bit
> ASCII text looks identical to UTF-8 text. The 8-bit ASCII extensions text
> does not. Those characters become two bytes each using the UTF-8
> compression algorithm.
>
> J converts literal to unicode by simply putting a zero byte in front
> extending it to the the 16-bit version of Unicode implemented in Windows
> and Unix. This is perfectly valid as the numeric values of the first 256
> Unicode letters match the 8-bit ASCII extension. UTF-8 assumes that
> _128{.a. characters in literal are used in the compression algorithm. That
> they do not represent extended ASCII. But J treats UTF-8 as literal making
> it impossible to tell if those characters represent extended ASCII or UTF-8
> compression.
>
> UTF-8 is a compressed version of Unicode that J fits in literal. J treats
> literal as 8-bit extended ASCII when combining and converting to/from
> unicode (wide). It treats literal as UTF-8 when entered from the keyboard
> and displayed. Got a bit of an inconsistency here.
>
>U =: 7 u: u =: 'þ'
>
>3!:0 u   NB. u is literal
>
> 2
>
>3!:0 U   NB. U is unicode
>
> 131072
>
>#u   NB. u takes 2 atoms
>
> 2
>
>#U   NB. U takes 1 atom
>
> 1
>
>'abc',u  NB. ASCII literals catenate with UTF-8
>
> abcþ
>
>'abc',U  NB. ASCII literals catenate with unicode
>
> abcþ
>
>u,U  NB. UTF-8 literals do not catenate well with unicode
>
> þþ
>
>a.i.u,U  NB. Here we have þ in two forms
>
> 195 190 254
>
> So, when programming in J one must never mix UTF-8 and unicode without
> being extremely careful and aware of what can happen. It is easiest to use
> ASCII and UTF-8 together. Not a problem as one cannot get any unicode into
> J without specifically converting to unicode using u: .
>
> The alternative is to make sure all text that might contain UTF-8 is
> converted to unicode. That can be difficult at times.
>
> The trouble with mixing ASCII and UTF-8 is that J primitives work on the
> atoms of literal. Any UTF-8 are treated as 8-bit extended ASCII. Counting
> characters and reshaping fail with UTF-8. Searching for UTF-8 characters is
> harder. An examp

Re: [Jchat] APL character support (moved to Chat)

2014-02-27 Thread Raul Miller
On Thu, Feb 27, 2014 at 2:03 PM, Don Guinn  wrote:
> J converts literal to unicode by simply putting a zero byte in front
> extending it to the the 16-bit version of Unicode implemented in Windows
> and Unix.

Not always. For example:

   7 u: 254{a.
|domain error

http://www.jsoftware.com/help/dictionary/duco.htm is fairly specific
about what conversions are supported. The conversion you mention is
one of them, but it's not the only supported conversion.

Thanks,

-- 
Raul
--
For information about J forums see http://www.jsoftware.com/forums.htm


[Jchat] APL character support (moved to Chat)

2014-02-27 Thread Don Guinn
This discussion started out on using APL characters as executable in J. I'm
not sure I would want to make many equivalences between APL symbols and J
primitives; however, representing APL characters and international
characters gets into the way J handles these characters with the character
types literal, unicode and UTF-8.

Those not interested bail out now as the rest is kind of boring, but my
soap-box.

About the time mini-computers and personal computers became common 7-bit
ASCII was well-established standard. But since by this time computers had
standardized on 8 bits to the character. This extra bit allowed for
supporting international characters and still fit in the byte. In addition,
APL used those extra characters to support APL characters. But this lead to
confusion since those characters varied between countries and systems.

Unicode was created to attempt to clean this mess up. It took the 7-bit
ASCII and a fairly accepted version of the 8-bit version of extended ASCII
and added leading zeros up to 32 bits. Now there is all kinds of room to
support many languages in a compatible manner.

Enter UCS Transformation Format, in particular UTF-8. There are many
problems with Unicode as it made ASCII files much larger and take longer to
send over slow communications lines. And there is the endian issue between
different computers. UTF-8 is an ingenious technique to compress unicode in
a manner that is completely compatible with 7-bit ASCII. The endian problem
is eliminated. It is not compatible with 8-bit ASCII extensions. 7-bit
ASCII text looks identical to UTF-8 text. The 8-bit ASCII extensions text
does not. Those characters become two bytes each using the UTF-8
compression algorithm.

J converts literal to unicode by simply putting a zero byte in front
extending it to the the 16-bit version of Unicode implemented in Windows
and Unix. This is perfectly valid as the numeric values of the first 256
Unicode letters match the 8-bit ASCII extension. UTF-8 assumes that
_128{.a. characters in literal are used in the compression algorithm. That
they do not represent extended ASCII. But J treats UTF-8 as literal making
it impossible to tell if those characters represent extended ASCII or UTF-8
compression.

UTF-8 is a compressed version of Unicode that J fits in literal. J treats
literal as 8-bit extended ASCII when combining and converting to/from
unicode (wide). It treats literal as UTF-8 when entered from the keyboard
and displayed. Got a bit of an inconsistency here.

   U =: 7 u: u =: 'þ'

   3!:0 u   NB. u is literal

2

   3!:0 U   NB. U is unicode

131072

   #u   NB. u takes 2 atoms

2

   #U   NB. U takes 1 atom

1

   'abc',u  NB. ASCII literals catenate with UTF-8

abcþ

   'abc',U  NB. ASCII literals catenate with unicode

abcþ

   u,U  NB. UTF-8 literals do not catenate well with unicode

þþ

   a.i.u,U  NB. Here we have þ in two forms

195 190 254

So, when programming in J one must never mix UTF-8 and unicode without
being extremely careful and aware of what can happen. It is easiest to use
ASCII and UTF-8 together. Not a problem as one cannot get any unicode into
J without specifically converting to unicode using u: .

The alternative is to make sure all text that might contain UTF-8 is
converted to unicode. That can be difficult at times.

The trouble with mixing ASCII and UTF-8 is that J primitives work on the
atoms of literal. Any UTF-8 are treated as 8-bit extended ASCII. Counting
characters and reshaping fail with UTF-8. Searching for UTF-8 characters is
harder. An example of a failure character counting with UTF-8 is the
displaying of boxed literals.

   http://www.jsoftware.com/forums.htm