vote no - Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
The first message had many of the following characters viewable in my telnet window, but the repost introduced a 0xC2 prefix to the 0xA7 character. I have this feeling that many people would vote against posting all these funny characters, as is does make reading the perl6 mailing lists difficult in some contexts. Ever since introducing these UTF-8 127 characters into this mailing list, I can never be sure of what the posting author intended to send. I'm all for supporting UTF-8 characters in strings, and perhaps even in variable names but to we really have to have perl6 programs with core operators in UTF-8. I'd like to see all the perl6 code that had UTF-8 operators start with use non_portable_utf8_operators. As it stands now, I'm going to have to find new tools for my linux platform that has been performing fine since 1995 (perl5.9 still supports libc5!), and I don't yet know how I am going to be able to telnet in from win98, and I'll bet that the dos kermit that I use when I dial up won't support UTF-8 characters either. David ps. I just read how many people will need to upgrade their operating systems if the want to upgrade to MS Word11. Do we want to require operating system and/or many support tools to be upgraded before we can share perl6 scripts via email? On Tue, 5 Nov 2002 at 09:56 -0800, Michael Lazzaro [EMAIL PROTECTED]: CodeSymbol Comment 167 § Could be used 169 © Could be used 171 « May well be used 172 ¬ Not? 174 ® Could be used 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 181 µ Could be used 182 ¶ Could be used 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 191 ¿ Could be used
Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
On Tuesday, Nov 5, 2002, at 04:58 Asia/Tokyo, Larry Wall wrote: (B It would be really funny to use cent $B!q(B, pound $B!r(B, or yen (J\(B as a sigil, (B though... (B (BWhich 'yen' ? I believe you already know \ (U+005c - REVERSE SOLIDUS) (Bis prited as a yen figure in most of Japanese platforms so yen is (Balready everywhere :) (B (BOne big problem for introducing Unicode operator is that there are too (Bmany symbols that look the same but with different code points (Unicode (Bconsortium has so done to make its capitalist members happy so their (Bproprietary symbols in their legacy codes are preserved). Therefore I (Bobject to the idea of making Unicode operator "standard", however (Badvanced that particular operator would be. At the same time, things (Blike "use (more) operators = taste;" is very welcome. i.e. (B (B use operators = "smooth"; (B $hashref = $B!j(B%hash # U+2640 FEMALE SIGN (B $value = $hashref$B!i(B{key}; # U+2642 MALE SIGN (B (B People who believe slippery slope arguments should never go skiing. (B (BI don't want perl6 to be as "tough" as skiing, though. (B (B On the other hand, even the useful slippery slopes have "beginner" (B slopes. I think one advantage of using Unicode for advanced features (B is that it *looks* scary. So in general we should try to keep the (B basic features in ASCII, and only use Unicode where there be dragons. (B (BHeck. We already have source filters in perl5 and I'm pretty much sure (Bsomeone will just invent yet another 'use operators = "ascii";' kind (Bof stuff in perl6. I thought "use English" was already enough. (B (B It will certainly be possible to write APL in Perl, but if you do, (B you'll get what you deserve. (B (BAnd even APL has j. Methinks the question is now whether you make APL (Bout of j or j out of APL. (B $BCF(B the $B!i(B with Too Many Symbols to Deal With (B (BP.S. Here is even wilder idea than Unicode operators. Why don't we (Bjust make perl6 XML-based and allow inline objects to be operators? (B (Bperl (B$two = $one operator src="plus.png" $one; (B/perl (B (B. Yuck!
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
This UTF discussion has got silly. I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. The Gillemets are coming through fine, but most of the other heiroglyphs need a lot to be desired. Lets consider the coding comparisons. Chars in the range 128-159 are not defined in Latin-1 (issue 1) and are used differently by windows to Latin-1 (later issues) so should be avoided. Chars in the range 160-191 (which include the gillemot) are coming through fine if encoded by the sender as UTF8. Anything in the range 192-255 is encoded differently and thus should be avoided. Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows are: CodeSymbol Comment 160 Non-breaking space (map to normal whitespace) 161 ¡ Could be used 162 ¢ Could be used 163 £ Could be used 164 ¤ Could be used 165 ¥ Could be used 166 ¦ Could be used 167 § Could be used 168 ¨ Could be used thouugh risks confusion with 169 © Could be used 170 ª Could be used (but I dislike it as it is alphabetic) 171 « May well be used 172 ¬ Not? 173 Nonbreaking - treat as the same 174 ® Could be used 175 ¯ May cause confusion with _ and - 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 178 ² To the power of 2 (squaring ? ) Otherwise best avoided 179 ³ Cubing? Otherwise best avoided 180 ´ Too confusing with ' and ` 181 µ Could be used 182 ¶ Could be used 183 · Dot Product? though likely to be confused with . 184 ¸ treat as , 185 ¹ To the power 1? Probably best avoided 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 188 ¼ Could be used 189 ½ Could be used 190 ¾ Could be used 191 ¿ Could be used Richard -- Personal [EMAIL PROTECTED]http://www.waveney.org Telecoms [EMAIL PROTECTED] http://www.WaveneyConsulting.com Web services [EMAIL PROTECTED]http://www.wavwebs.com Independent Telecomms Specialist, ATM expert, Web Analyst Services
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Thanks, I've been hoping for someone to post that list. Taking it one step further, we can assume that the only chars that can be used are those which: -- don't have an obvious meaning that needs to be reserved -- appear decently on all platforms -- are distinct and recognizable in the tiny font sizes used when programming Comparing your list with mine, with some subjective editing based on my small courier font, that chops the list of usable operators down to only a handful: Code Symbol Comment 167 § Could be used 169 © Could be used 171 « May well be used 172 ¬ Not? 174 ® Could be used 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 181 µ Could be used 182 ¶ Could be used 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 191 ¿ Could be used That's all. A shame, because some of the others have very interesting possibilities: • ≠ ø † ∑ ∂ ƒ ∆ ≤ ≥ ∫ ≈ Ω ‡ ± ˇ ∏ Æ But if Windows can't easily do them, that's a pretty big problem. Thanks for the list. MikeL
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
I'm all for one or two unicode operators if they're chosen properly (and I trust Larry to do that since he's done a stellar job so far), but what's the mechanism to generate unicode operators if you don't have access to a unicode-aware editor/terminal/font/etc.? IS the only recourse to use the named versions? Or will there be some sort of digraph/trigraph/whatever sequence that always gives us the operator we need? Something like \x[263a] but in regular code and not just quote-ish contexts: $campers = $a \x[263a] $b # make $a and $b happy -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Dan Kogai wrote: We already have source filters in perl5 and I'm pretty much sure someone will just invent yet another 'use operators = ascii;' kind of stuff in perl6. I think that's backwards to have operators being funny characters by default but requiring explicit declaration to use well-known Ascii characters. Doing it t'other way round would mean that you can always write fully portable code fragments in pure Ascii, something that'd be helpful on mailing lists and the like. There could be an alias syntax for people in an environment where they'd prefer to have a non-Ascii character in place of a conglomerate of Ascii symbols, maybe: treat '»...«' as '[...]'; That has the documentational advantage that any non-Ascii character used in code must be declared earlier in that file. And even if the non-Ascii character gets warped in the post and displays oddly for you, you can still see what the author intended it to do. This has the risk that Damian described of everybody defining their own operators, but I think that's unlikely. There's likely to be a convention used by many people, at least those who operate in a given character set. This way also permits those who live in a Latin 2 (or whatever) world to have their own convention using characters that make sense to them. Smylers
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Richard Proctor wrote: I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. ... Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows ... What about people who don't use Latin-1, perhaps because their native language uses Latin-2 or some other character set mutually exclusive with Latin-1? I don't have a Latin-2 ('Central and East European languages') typeface handy, but its manpage includes: 253 171 AB LATIN CAPITAL LETTER T WITH CARON 273 187 BB LATIN SMALL LETTER T WITH CARON Caron is sadly missing from my dictionary so I'm not sure what those would look like, but I suspect they wouldn't be great symbols for vector operators. 171 « May well be used Also I wonder how similar to doubled less-than or greater-than signs guillemets would look. In this font they're fine, but I'm concerned at my abilities to make them sufficiently distinguishable on a whiteboard, and whether publishers will cope with them (compare a recent discussion on 'use Perl' regarding curly quotes and fi ligatures appearing in code samples). Smylers
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
On Tue 05 Nov, Smylers wrote: Richard Proctor wrote: I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. ... Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows ... What about people who don't use Latin-1, perhaps because their native language uses Latin-2 or some other character set mutually exclusive with Latin-1? Once you go beyond latin-1 there is nothing common anyway. The Gullimots become T and t with inverted hats under Latin-2, oe and G with an inverted hat under Latin-3, oe and G with a squiggle under it under Latin-4, No meaning and a stylisd K for Latin-5, (cant find latin6), Gullimots under Latin 7, nothing under latin-8. Richard -- Personal [EMAIL PROTECTED]http://www.waveney.org Telecoms [EMAIL PROTECTED] http://www.WaveneyConsulting.com Web services [EMAIL PROTECTED]http://www.wavwebs.com Independent Telecomms Specialist, ATM expert, Web Analyst Services
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
As one of the instigators of this thread, I submit that we've probably argued about the Unicode stuff enough. The basic issues are now known, and it's known that there's no general agreement on any of this stuff, nor will there ever be. To wit: -- Extended glyphs might be extremely useful in extending the operator table in non-ambiguous ways, especially for advanced things like «op».. -- Many people loathe the idea, and predict newcomers will too. -- Many mailers older platforms tend to react badly for both viewing and inputting. -- If extended characters are used at all, the decision needs to be made whether they shall be least-common-denominator Latin1, UTF-8, or full Unicode, and if there are backup spellings so that everyone can play. It's up to Larry, and he knows where we're all coming from. Unless anyone has any _new_ observations, I propose we pause the debate until a decision is reached? MikeL