[Patch] Re: Unicode Operators cheatsheet, please!
Rob Kinyon wrote: xOn 5/31/05, Sam Vilain [EMAIL PROTECTED] wrote: Rob Kinyon wrote: I would love to see a document (one per editor) that describes the Unicode characters in use and how to make them. The Set implementation in Pugs uses (at last count) 20 different Unicode characters as operators. I have updated the unicode quickref, and started a Perlmonks discussion node for this to be explored - see http://www.perlmonks.org/index.pl?node_id=462246 As I replied on Perlmonks, it would be more helpful if the Compose keys were listed and not just the ASCII versions. Plus, a quick primer on how to enable Unicode in your favorite editor. I don't know about Emacs, but the Vim documentation on multibyte is difficult to work with, at best. Well, :help digraph isn't particularly bad, though the included table only covers latin-1. The canonical source is RFC1345. But I've attached a patch for the set symbols that have them. Thanks, Rob Index: docs/quickref/unicode === --- docs/quickref/unicode (revision 4305) +++ docs/quickref/unicode (working copy) @@ -21,6 +21,10 @@ Note that the compose combinations here are an X11R6 standard, and do not necessarily correspond to the compose combinations available when you use your compose key. + +The digraphs used in vim come from Character Mnemonics Character Sets, +RFC1345 (http://www.ietf.org/rfc/rfc1345.txt). After doing :set digraph, +the digraph ^k A B may also be entered as A BS B. Unicode ASCIIkey sequence charfallbackVimEmacs Unix Compose Key combination @@ -30,22 +34,22 @@ ¥ Y ^k Y e C-x 8 Y Compose Y = Set.pm operators (included for reference): -≠ != -∩ * -∪ + +≠ != ^k ! = +∩ * ^k ( U +∪ + ^k ) U ∖ - -⊂ -⊃ -⊆ = -⊇ = -⊄ !( $a $b ) +⊂ ^k ( C +⊃ ^k ) C +⊆ = ^k ( _ +⊇ = ^k ) _ +⊄ !( $a $b ) ⊅ !( $a $b ) ⊈ !( $a = $b ) ⊉ !( $a = $b ) -⊊ +⊊ ⊋ -∋/∍ $a.includes($b) -∈/∊ $b.includes($a) +∋/∍ $a.includes($b) ^k ) - +∈/∊ $b.includes($a) ^k ( - ∌!$a.includes($b) ∉!$b.includes($a) @@ -58,20 +62,20 @@ So, these *might* be considered not too awful; -× * -¬ ! +× * ^k * X +¬ ! ^k N O ∕ / ≡ =:= ≔ := ⩴ or ≝ ::= - ≈ or ≊~~ + ≈ or ≊~~ ^k ? 2 … ... -√ sqrt() -∧ -∨ || +√ sqrt() ^k R T +∧ ^k A N +∨ || ^k O R ∣ mod (? bit of a stretch, perhaps) - ⌈$x⌉ceil($x) - ⌊$x⌋floor($x) + ⌈$x⌉ceil($x) ^k / 7 + ⌊$x⌋floor($x)^k 7 / 7 However I think it is a BAD idea that the following unicode characters
Re: Unicode Operators cheatsheet, please!
xOn 5/31/05, Sam Vilain [EMAIL PROTECTED] wrote: Rob Kinyon wrote: I would love to see a document (one per editor) that describes the Unicode characters in use and how to make them. The Set implementation in Pugs uses (at last count) 20 different Unicode characters as operators. I have updated the unicode quickref, and started a Perlmonks discussion node for this to be explored - see http://www.perlmonks.org/index.pl?node_id=462246 As I replied on Perlmonks, it would be more helpful if the Compose keys were listed and not just the ASCII versions. Plus, a quick primer on how to enable Unicode in your favorite editor. I don't know about Emacs, but the Vim documentation on multibyte is difficult to work with, at best. Thanks, Rob
Re: Unicode Operators cheatsheet, please!
Rob Kinyon wrote: I would love to see a document (one per editor) that describes the Unicode characters in use and how to make them. The Set implementation in Pugs uses (at last count) 20 different Unicode characters as operators. I have updated the unicode quickref, and started a Perlmonks discussion node for this to be explored - see http://www.perlmonks.org/index.pl?node_id=462246 Sam.
Unicode Operators cheatsheet, please!
I would love to see a document (one per editor) that describes the Unicode characters in use and how to make them. The Set implementation in Pugs uses (at last count) 20 different Unicode characters as operators. While I'm sure these documents exist on the web somewhere, since P6 is the first time most of us will be using these operators, it'd be nice if P6 provided a nice cheatsheet for them. Thanks, Rob
Re: Unicode Operators cheatsheet, please!
On Fri, May 27, 2005 at 10:29:39AM -0400, Rob Kinyon wrote: I would love to see a document (one per editor) that describes the Unicode characters in use and how to make them. The Set implementation in Pugs uses (at last count) 20 different Unicode characters as operators. Good idea. A modest start is at docs/quickref/unicode . -- Gaal Yahas [EMAIL PROTECTED] http://gaal.livejournal.com/
Re: Unicode operators
Flaviu Turean wrote: [...] 5. if you want to wait for the computing platforms before programming in p6, then there is quite a wait ahead. how about platforms which will never catch up? VMS, anyone? Not to start an OS war thread or anything, but why do people still have this mistaken impression of VMS? We have compilers and hard drives and networking and everything. We even have color monitors. Sure, we lack a decent c++ compiler, but we consider that a feature. :-) brad
Re: Unicode operators
At 1:27 PM -0800 11/6/02, Brad Hughes wrote: Flaviu Turean wrote: [...] 5. if you want to wait for the computing platforms before programming in p6, then there is quite a wait ahead. how about platforms which will never catch up? VMS, anyone? Not to start an OS war thread or anything, but why do people still have this mistaken impression of VMS? We have compilers and hard drives and networking and everything. We even have color monitors. Sure, we lack a decent c++ compiler, but we consider that a feature. :-) Lacking a decent C++ compiler isn't necessarily a strike against VMS--to be a strike against, there'd actually have to *be* a decent C++ compiler... -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Unicode operators
On Nov 07, Dan Sugalski wrote: Lacking a decent C++ compiler isn't necessarily a strike against VMS--to be a strike against, there'd actually have to *be* a decent C++ compiler... Doesn't VMS have a /bin/false? - Kurt
vote no - Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
The first message had many of the following characters viewable in my telnet window, but the repost introduced a 0xC2 prefix to the 0xA7 character. I have this feeling that many people would vote against posting all these funny characters, as is does make reading the perl6 mailing lists difficult in some contexts. Ever since introducing these UTF-8 127 characters into this mailing list, I can never be sure of what the posting author intended to send. I'm all for supporting UTF-8 characters in strings, and perhaps even in variable names but to we really have to have perl6 programs with core operators in UTF-8. I'd like to see all the perl6 code that had UTF-8 operators start with use non_portable_utf8_operators. As it stands now, I'm going to have to find new tools for my linux platform that has been performing fine since 1995 (perl5.9 still supports libc5!), and I don't yet know how I am going to be able to telnet in from win98, and I'll bet that the dos kermit that I use when I dial up won't support UTF-8 characters either. David ps. I just read how many people will need to upgrade their operating systems if the want to upgrade to MS Word11. Do we want to require operating system and/or many support tools to be upgraded before we can share perl6 scripts via email? On Tue, 5 Nov 2002 at 09:56 -0800, Michael Lazzaro [EMAIL PROTECTED]: CodeSymbol Comment 167 § Could be used 169 © Could be used 171 « May well be used 172 ¬ Not? 174 ® Could be used 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 181 µ Could be used 182 ¶ Could be used 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 191 ¿ Could be used
Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
On Tuesday, Nov 5, 2002, at 04:58 Asia/Tokyo, Larry Wall wrote: (B It would be really funny to use cent $B!q(B, pound $B!r(B, or yen (J\(B as a sigil, (B though... (B (BWhich 'yen' ? I believe you already know \ (U+005c - REVERSE SOLIDUS) (Bis prited as a yen figure in most of Japanese platforms so yen is (Balready everywhere :) (B (BOne big problem for introducing Unicode operator is that there are too (Bmany symbols that look the same but with different code points (Unicode (Bconsortium has so done to make its capitalist members happy so their (Bproprietary symbols in their legacy codes are preserved). Therefore I (Bobject to the idea of making Unicode operator "standard", however (Badvanced that particular operator would be. At the same time, things (Blike "use (more) operators = taste;" is very welcome. i.e. (B (B use operators = "smooth"; (B $hashref = $B!j(B%hash # U+2640 FEMALE SIGN (B $value = $hashref$B!i(B{key}; # U+2642 MALE SIGN (B (B People who believe slippery slope arguments should never go skiing. (B (BI don't want perl6 to be as "tough" as skiing, though. (B (B On the other hand, even the useful slippery slopes have "beginner" (B slopes. I think one advantage of using Unicode for advanced features (B is that it *looks* scary. So in general we should try to keep the (B basic features in ASCII, and only use Unicode where there be dragons. (B (BHeck. We already have source filters in perl5 and I'm pretty much sure (Bsomeone will just invent yet another 'use operators = "ascii";' kind (Bof stuff in perl6. I thought "use English" was already enough. (B (B It will certainly be possible to write APL in Perl, but if you do, (B you'll get what you deserve. (B (BAnd even APL has j. Methinks the question is now whether you make APL (Bout of j or j out of APL. (B $BCF(B the $B!i(B with Too Many Symbols to Deal With (B (BP.S. Here is even wilder idea than Unicode operators. Why don't we (Bjust make perl6 XML-based and allow inline objects to be operators? (B (Bperl (B$two = $one operator src="plus.png" $one; (B/perl (B (B. Yuck!
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
This UTF discussion has got silly. I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. The Gillemets are coming through fine, but most of the other heiroglyphs need a lot to be desired. Lets consider the coding comparisons. Chars in the range 128-159 are not defined in Latin-1 (issue 1) and are used differently by windows to Latin-1 (later issues) so should be avoided. Chars in the range 160-191 (which include the gillemot) are coming through fine if encoded by the sender as UTF8. Anything in the range 192-255 is encoded differently and thus should be avoided. Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows are: CodeSymbol Comment 160 Non-breaking space (map to normal whitespace) 161 ¡ Could be used 162 ¢ Could be used 163 £ Could be used 164 ¤ Could be used 165 ¥ Could be used 166 ¦ Could be used 167 § Could be used 168 ¨ Could be used thouugh risks confusion with 169 © Could be used 170 ª Could be used (but I dislike it as it is alphabetic) 171 « May well be used 172 ¬ Not? 173 Nonbreaking - treat as the same 174 ® Could be used 175 ¯ May cause confusion with _ and - 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 178 ² To the power of 2 (squaring ? ) Otherwise best avoided 179 ³ Cubing? Otherwise best avoided 180 ´ Too confusing with ' and ` 181 µ Could be used 182 ¶ Could be used 183 · Dot Product? though likely to be confused with . 184 ¸ treat as , 185 ¹ To the power 1? Probably best avoided 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 188 ¼ Could be used 189 ½ Could be used 190 ¾ Could be used 191 ¿ Could be used Richard -- Personal [EMAIL PROTECTED]http://www.waveney.org Telecoms [EMAIL PROTECTED] http://www.WaveneyConsulting.com Web services [EMAIL PROTECTED]http://www.wavwebs.com Independent Telecomms Specialist, ATM expert, Web Analyst Services
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Thanks, I've been hoping for someone to post that list. Taking it one step further, we can assume that the only chars that can be used are those which: -- don't have an obvious meaning that needs to be reserved -- appear decently on all platforms -- are distinct and recognizable in the tiny font sizes used when programming Comparing your list with mine, with some subjective editing based on my small courier font, that chops the list of usable operators down to only a handful: Code Symbol Comment 167 § Could be used 169 © Could be used 171 « May well be used 172 ¬ Not? 174 ® Could be used 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 181 µ Could be used 182 ¶ Could be used 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 191 ¿ Could be used That's all. A shame, because some of the others have very interesting possibilities: • ≠ ø † ∑ ∂ ƒ ∆ ≤ ≥ ∫ ≈ Ω ‡ ± ˇ ∏ Æ But if Windows can't easily do them, that's a pretty big problem. Thanks for the list. MikeL
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
I'm all for one or two unicode operators if they're chosen properly (and I trust Larry to do that since he's done a stellar job so far), but what's the mechanism to generate unicode operators if you don't have access to a unicode-aware editor/terminal/font/etc.? IS the only recourse to use the named versions? Or will there be some sort of digraph/trigraph/whatever sequence that always gives us the operator we need? Something like \x[263a] but in regular code and not just quote-ish contexts: $campers = $a \x[263a] $b # make $a and $b happy -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Dan Kogai wrote: We already have source filters in perl5 and I'm pretty much sure someone will just invent yet another 'use operators = ascii;' kind of stuff in perl6. I think that's backwards to have operators being funny characters by default but requiring explicit declaration to use well-known Ascii characters. Doing it t'other way round would mean that you can always write fully portable code fragments in pure Ascii, something that'd be helpful on mailing lists and the like. There could be an alias syntax for people in an environment where they'd prefer to have a non-Ascii character in place of a conglomerate of Ascii symbols, maybe: treat '»...«' as '[...]'; That has the documentational advantage that any non-Ascii character used in code must be declared earlier in that file. And even if the non-Ascii character gets warped in the post and displays oddly for you, you can still see what the author intended it to do. This has the risk that Damian described of everybody defining their own operators, but I think that's unlikely. There's likely to be a convention used by many people, at least those who operate in a given character set. This way also permits those who live in a Latin 2 (or whatever) world to have their own convention using characters that make sense to them. Smylers
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Richard Proctor wrote: I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. ... Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows ... What about people who don't use Latin-1, perhaps because their native language uses Latin-2 or some other character set mutually exclusive with Latin-1? I don't have a Latin-2 ('Central and East European languages') typeface handy, but its manpage includes: 253 171 AB LATIN CAPITAL LETTER T WITH CARON 273 187 BB LATIN SMALL LETTER T WITH CARON Caron is sadly missing from my dictionary so I'm not sure what those would look like, but I suspect they wouldn't be great symbols for vector operators. 171 « May well be used Also I wonder how similar to doubled less-than or greater-than signs guillemets would look. In this font they're fine, but I'm concerned at my abilities to make them sufficiently distinguishable on a whiteboard, and whether publishers will cope with them (compare a recent discussion on 'use Perl' regarding curly quotes and fi ligatures appearing in code samples). Smylers
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
On Tue 05 Nov, Smylers wrote: Richard Proctor wrote: I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. ... Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows ... What about people who don't use Latin-1, perhaps because their native language uses Latin-2 or some other character set mutually exclusive with Latin-1? Once you go beyond latin-1 there is nothing common anyway. The Gullimots become T and t with inverted hats under Latin-2, oe and G with an inverted hat under Latin-3, oe and G with a squiggle under it under Latin-4, No meaning and a stylisd K for Latin-5, (cant find latin6), Gullimots under Latin 7, nothing under latin-8. Richard -- Personal [EMAIL PROTECTED]http://www.waveney.org Telecoms [EMAIL PROTECTED] http://www.WaveneyConsulting.com Web services [EMAIL PROTECTED]http://www.wavwebs.com Independent Telecomms Specialist, ATM expert, Web Analyst Services
Re: Unicode operators
one more data point from a person who lived, travelled and used computers in a few countries (Romania, France, Germany, Belgium, UK, Canada, US, Holland, Italy). paraphrasing: rule 1: if it's not on my keyboard, it doesn't exist; rune 2: if it's not on everybody's keyboard, it doesn't exist. long, windy argument: 1. enter an internet cafe in Amsterdam, read your account in the web browser. you get a window, it's hard to guess which OS is underneath. all you get is a browser window, full screen. you are on the perl6-language mailing list. before even contributing to the list you need to configure your keyboard, and you have to figure out how. and you have to trust the OS and browser installation to correctly transfer the funnies; 2. different keyboards have different symbols on them. did you know that the UK keyboard is different from the US one? Belgium has two national keyboards (Vallon and Flemish), the Vallon one is different from the one used in France (and from the one used in Quebec), the Flemish one different from the one used in Holland, and so on; 3. backquote is not on all keyboards, similarly the curlies. some have a funny quote (oblique), which doesn't transfer/translate well, and which, visually, seems fine until you run it through the interpreter; 4. everybody is doing it! first one is free! actually, it is like the other favourite pastime: everybody is doing it, but the first time hurts the most (of the people ;-) setting it up is difficult, afterwards yes, it may come up fine for more symbols; 5. if you want to wait for the computing platforms before programming in p6, then there is quite a wait ahead. how about platforms which will never catch up? VMS, anyone? 6. they'll catch up with p6 and employ Unicode, or they'll die or the other way 'round; 7. I type this on a Solaris box, telnet'd into a Linux box, I run pine (please _do_not_ ask people to change application so that they become worthy of reading your messages!). accented letters don't go through; 8. and are not exactly common in non-Latin scripts. one more alien symbol to learn for those who started their lives in scripts like Chinese, Japanese, Hindi, Arabic, etc.; 9. now you have the set-up of a six-year old Swiss can the six-year old explain how he did it? 10. fearless leaders listen to their constituency and act accordingly, this is the only way they can remain fearless. still reading? flaviu
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
As one of the instigators of this thread, I submit that we've probably argued about the Unicode stuff enough. The basic issues are now known, and it's known that there's no general agreement on any of this stuff, nor will there ever be. To wit: -- Extended glyphs might be extremely useful in extending the operator table in non-ambiguous ways, especially for advanced things like «op».. -- Many people loathe the idea, and predict newcomers will too. -- Many mailers older platforms tend to react badly for both viewing and inputting. -- If extended characters are used at all, the decision needs to be made whether they shall be least-common-denominator Latin1, UTF-8, or full Unicode, and if there are backup spellings so that everyone can play. It's up to Larry, and he knows where we're all coming from. Unless anyone has any _new_ observations, I propose we pause the debate until a decision is reached? MikeL