vote no - Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-06 Thread David Dyck

The first message had many of the following characters viewable in my
telnet window, but the repost introduced a 0xC2 prefix to the 0xA7 character.

I have this feeling that many people would vote against posting all these
funny characters, as is does make reading the perl6 mailing lists difficult
in some contexts.  Ever since introducing these UTF-8   127 characters
into this mailing list, I can never be sure of what the posting author
intended to send.  I'm all for supporting UTF-8 characters in strings,
and perhaps even in variable names but to we really have to have
perl6 programs with core operators in UTF-8.  I'd like to see all
the perl6 code that had UTF-8 operators start with  use non_portable_utf8_operators.

As it stands now, I'm going to have to find new tools for my linux platform
that has been performing fine since 1995 (perl5.9 still supports libc5!),
and I don't yet know how I am
going to be able to telnet in from win98, and I'll bet that the dos kermit that I
use when I dial up won't support UTF-8 characters either.

 David

ps.

I just read how many people will need to upgrade their operating systems
if the want to upgrade to MS Word11.

Do we want to require operating system and/or many support tools to
be upgraded before we can share perl6 scripts via email?


On Tue, 5 Nov 2002 at 09:56 -0800, Michael Lazzaro [EMAIL PROTECTED]:

  CodeSymbol  Comment
  167 §  Could be used
  169 ©  Could be used
  171 «  May well be used
  172 ¬  Not?
  174 ®  Could be used
  176 °  Could be used
  177 ±  Introduces an interesting level of uncertainty?  Useable
  181 µ  Could be used
  182 ¶  Could be used
  186 º  Could be used (but I dislike it as it is alphabetic)
  187 »  May well be used
  191 ¿  Could be used




Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-05 Thread Dan Kogai
On Tuesday, Nov 5, 2002, at 04:58 Asia/Tokyo, Larry Wall wrote:
(B It would be really funny to use cent $B!q(B, pound $B!r(B, or yen (J\(B as a sigil, 
(B though...
(B
(BWhich 'yen' ?  I believe you already know \ (U+005c - REVERSE SOLIDUS) 
(Bis prited as a yen figure in most of Japanese platforms so yen is 
(Balready everywhere :)
(B
(BOne big problem for introducing Unicode operator is that there are too 
(Bmany symbols that look the same but with different code points (Unicode 
(Bconsortium has so done to make its capitalist members happy so their 
(Bproprietary symbols in their legacy codes are preserved).  Therefore I 
(Bobject to the idea of making Unicode operator "standard", however 
(Badvanced that particular operator would be.  At the same time, things 
(Blike "use (more) operators = taste;" is very welcome.  i.e.
(B
(B	use operators = "smooth";
(B	$hashref = $B!j(B%hash  # U+2640 FEMALE SIGN
(B	$value   = $hashref$B!i(B{key}; # U+2642 MALE SIGN
(B
(B People who believe slippery slope arguments should never go skiing.
(B
(BI don't want perl6 to be as "tough" as skiing, though.
(B
(B On the other hand, even the useful slippery slopes have "beginner"
(B slopes.  I think one advantage of using Unicode for advanced features
(B is that it *looks* scary.  So in general we should try to keep the
(B basic features in ASCII, and only use Unicode where there be dragons.
(B
(BHeck.  We already have source filters in perl5 and I'm pretty much sure 
(Bsomeone will just invent yet another 'use operators = "ascii";' kind 
(Bof stuff in perl6.  I thought "use English" was already enough.
(B
(B It will certainly be possible to write APL in Perl, but if you do,
(B you'll get what you deserve.
(B
(BAnd even APL has j.  Methinks the question is now whether you make APL 
(Bout of j or j out of APL.
(B
$BCF(B the $B!i(B with Too Many Symbols to Deal With
(B
(BP.S.  Here is even wilder idea than Unicode operators.  Why don't we 
(Bjust make perl6 XML-based and allow inline objects to be operators?
(B
(Bperl
(B$two = $one operator src="plus.png" $one;
(B/perl
(B
(B. Yuck!


Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-05 Thread Richard Proctor
This UTF discussion has got silly.

I am sitting at a computer that is operating in native Latin-1 and is
quite happy - there is no likelyhood that UTF* is ever likely to reach it.

The Gillemets are coming through fine, but most of the other heiroglyphs need
a lot to be desired.

Lets consider the coding comparisons.

Chars in the range 128-159 are not defined in Latin-1 (issue 1) and are
used differently by windows to Latin-1 (later issues) so should be avoided.

Chars in the range 160-191 (which include the gillemot) are coming through
fine if encoded by the sender as UTF8.

Anything in the range 192-255 is encoded differently and thus should be
avoided.

Therefore the only addition characters that could be used, that will work
under UTF8 and Latin-1 and Windows are:

CodeSymbol  Comment
160 Non-breaking space (map to normal whitespace)
161 ¡   Could be used
162 ¢   Could be used
163 £   Could be used
164 ¤   Could be used
165 ¥   Could be used
166 ¦   Could be used
167 §   Could be used
168 ¨   Could be used thouugh risks confusion with 
169 ©   Could be used
170 ª   Could be used (but I dislike it as it is alphabetic)
171 «   May well be used
172 ¬   Not?
173 ­   Nonbreaking - treat as the same
174 ®   Could be used
175 ¯   May cause confusion with _ and -
176 °   Could be used
177 ±   Introduces an interesting level of uncertainty?  Useable
178 ²   To the power of 2 (squaring ? ) Otherwise best avoided
179 ³   Cubing? Otherwise best avoided
180 ´   Too confusing with ' and `
181 µ   Could be used
182 ¶   Could be used
183 ·   Dot Product? though likely to be confused with .
184 ¸   treat as ,
185 ¹   To the power 1? Probably best avoided
186 º   Could be used (but I dislike it as it is alphabetic)
187 »   May well be used
188 ¼   Could be used
189 ½   Could be used
190 ¾   Could be used
191 ¿   Could be used

Richard 

-- 
Personal [EMAIL PROTECTED]http://www.waveney.org
Telecoms [EMAIL PROTECTED]  http://www.WaveneyConsulting.com
Web services [EMAIL PROTECTED]http://www.wavwebs.com
Independent Telecomms Specialist, ATM expert, Web Analyst  Services




Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-05 Thread Michael Lazzaro
Thanks, I've been hoping for someone to post that list.  Taking it one 
step further, we can assume that the only chars that can be used are 
those which:

-- don't have an obvious meaning that needs to be reserved
-- appear decently on all platforms
-- are distinct and recognizable in the tiny font sizes
 used when programming

Comparing your list with mine, with some subjective editing based on my 
small courier font, that chops the list of usable operators down to 
only a handful:

Code	Symbol	Comment
167	§	Could be used
169	©	Could be used
171	«	May well be used
172	¬	Not?
174	®	Could be used
176	°	Could be used
177	±	Introduces an interesting level of uncertainty?  Useable
181	µ	Could be used
182	¶	Could be used
186	º	Could be used (but I dislike it as it is alphabetic)
187	»	May well be used
191	¿	Could be used


That's all.  A shame, because some of the others have very interesting 
possibilities:

   • ≠ ø † ∑ ∂ ƒ ∆ ≤ ≥ ∫ ≈ Ω ‡ ± ˇ ∏ Æ

But if Windows can't easily do them, that's a pretty big problem.  
Thanks for the list.

MikeL



Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-05 Thread Jonathan Scott Duff

I'm all for one or two unicode operators if they're chosen properly
(and I trust Larry to do that since he's done a stellar job so far),
but what's the mechanism to generate unicode operators if you don't
have access to a unicode-aware editor/terminal/font/etc.?  IS the only
recourse to use the named versions?  Or will there be some sort of
digraph/trigraph/whatever sequence that always gives us the operator
we need?  Something like \x[263a] but in regular code and not just
quote-ish contexts:  

$campers = $a \x[263a] $b   # make $a and $b happy

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-05 Thread Smylers
Dan Kogai wrote:

 We already have source filters in perl5 and I'm pretty much sure
 someone will just invent yet another 'use operators = ascii;' kind
 of stuff in perl6.

I think that's backwards to have operators being funny characters by
default but requiring explicit declaration to use well-known Ascii
characters.

Doing it t'other way round would mean that you can always write fully
portable code fragments in pure Ascii, something that'd be helpful on
mailing lists and the like.

There could be an alias syntax for people in an environment where they'd
prefer to have a non-Ascii character in place of a conglomerate of Ascii
symbols, maybe:

  treat '»...«' as '[...]';

That has the documentational advantage that any non-Ascii character used
in code must be declared earlier in that file.  And even if the
non-Ascii character gets warped in the post and displays oddly for you,
you can still see what the author intended it to do.

This has the risk that Damian described of everybody defining their own
operators, but I think that's unlikely.  There's likely to be a
convention used by many people, at least those who operate in a given
character set.  This way also permits those who live in a Latin 2 (or
whatever) world to have their own convention using characters that make
sense to them.

Smylers



Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-05 Thread Smylers
Richard Proctor wrote:

 I am sitting at a computer that is operating in native Latin-1 and is
 quite happy - there is no likelyhood that UTF* is ever likely to reach
 it.
 
 ... Therefore the only addition characters that could be used, that
 will work under UTF8 and Latin-1 and Windows ...

What about people who don't use Latin-1, perhaps because their native
language uses Latin-2 or some other character set mutually exclusive
with Latin-1?

I don't have a Latin-2 ('Central and East European languages') typeface
handy, but its manpage includes:

  253   171   AB LATIN CAPITAL LETTER T WITH CARON
  273   187   BB LATIN SMALL LETTER T WITH CARON

Caron is sadly missing from my dictionary so I'm not sure what those
would look like, but I suspect they wouldn't be great symbols for vector
operators.

 171   «   May well be used

Also I wonder how similar to doubled less-than or greater-than signs
guillemets would look.  In this font they're fine, but I'm concerned at
my abilities to make them sufficiently distinguishable on a whiteboard,
and whether publishers will cope with them (compare a recent discussion
on 'use Perl' regarding curly quotes and fi ligatures appearing in
code samples).

Smylers



Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-05 Thread Richard Proctor
On Tue 05 Nov, Smylers wrote:
 Richard Proctor wrote:
 
  I am sitting at a computer that is operating in native Latin-1 and is
  quite happy - there is no likelyhood that UTF* is ever likely to reach
  it.
  
  ... Therefore the only addition characters that could be used, that
  will work under UTF8 and Latin-1 and Windows ...
 
 What about people who don't use Latin-1, perhaps because their native
 language uses Latin-2 or some other character set mutually exclusive
 with Latin-1?


Once you go beyond latin-1 there is nothing common anyway.  The Gullimots
become T and t with inverted hats under Latin-2, oe and G with an inverted
hat under Latin-3, oe and G with a squiggle under it under Latin-4, No
meaning and a stylisd K for Latin-5, (cant find latin6), Gullimots under
Latin 7, nothing under latin-8. 

Richard

-- 
Personal [EMAIL PROTECTED]http://www.waveney.org
Telecoms [EMAIL PROTECTED]  http://www.WaveneyConsulting.com
Web services [EMAIL PROTECTED]http://www.wavwebs.com
Independent Telecomms Specialist, ATM expert, Web Analyst  Services




Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]

2002-11-05 Thread Michael Lazzaro

As one of the instigators of this thread, I submit that we've probably 
argued about the Unicode stuff enough.  The basic issues are now known, 
and it's known that there's no general agreement on any of this stuff, 
nor will there ever be.  To wit:

-- Extended glyphs might be extremely useful in extending the operator 
table in non-ambiguous ways, especially for advanced things like «op»..

-- Many people loathe the idea, and predict newcomers will too.

-- Many mailers  older platforms tend to react badly for both viewing 
and inputting.

-- If extended characters are used at all, the decision needs to be 
made whether they shall be least-common-denominator Latin1, UTF-8, or 
full Unicode, and if there are backup spellings so that everyone can 
play.

It's up to Larry, and he knows where we're all coming from.  Unless 
anyone has any _new_ observations, I propose we pause the debate until 
a decision is reached?

MikeL