Re: is sigil user - extensible ? Was: UTF-8 and Unicode FAQ, demos
Acadi asked: is it possible to extend the perl sigil behaviour . Yes. that is , one day somebody decides it needs ¢ as sigil for certain class of variables . will it be possible to do . ( without rewriting the whole perl ) Yes. Just inherit the standard Perl grammar, extend the Cvar rule and install the derived grammar as the caller's parser. Damian
is sigil user - extensible ? Was: UTF-8 and Unicode FAQ, demos
Larry Wall writes: It would be really funny to use cent ¢, pound £, or yen Â¥ as a sigil, though... C'mon, everybody's doing it! First one's free, kid... ;-) People who believe slippery slope arguments should never go skiing. just (re)reading *old* threads : is it possible to extend the perl sigil behaviour . that is , one day somebody decides it needs ¢ as sigil for certain class of variables . will it be possible to do . ( without rewriting the whole perl ) e.g. my ¢a = ... ; and this being the same as my ??a is Cent_sigil_type ; like my $a ; is same as my $a is Scalar ; ( as I understand , perl knows what to do with $a not because it notice every time '$' in the beginning but because it notice the compile -- time property of that variable is Scalar ) I am not sure if that is *all* sigil is about in perl but if yes then adding new sigil will be doable : just add one more property to all variables starting with ¢ , e.g. ( and provide corresponding functionality -- that is a black hole !) . so it seems that sigil *is* extensible. ( at least through some sort of filtering ) . e.g. I can force all variables starting with 'A' to be constant . now 'A' is special sigil . ( can I ??? ) ( probably this is something perl should avoid somehow ) arcadi .
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
Larry Wall writes: But at the moment I'm thinking there's something wrong about any approach that requires a special character on the signature side. I'm starting to think that all the convolving should be specified on the left. So in this: for parallel(x, y, z) - $x, $y, $z { ... } the signature specifies that we are expecting 3 scalars to the sub, and conveys no information as to whether they are generated in parallel or serially. That's entirely specified on the left. if I understand correctly, the main problems with Apocalypse version of for are : * need for special meaning of ; in the nlock signature * need to specify unifying/intersection/other behaviour * not everybody is happy with strean vs block arguments alignment possibilities one solution , to which thetread converged (??) is to essentially give simple ways to weave many streams in single one, and for to become always single-stream . this is essentially the old a ^| b proposal written in english. it doesnot solve the alignment problem. also , it seems ( but may be I am wrong ) that there is run-time overhead , since weaving if done explicitly takes additional time ~ length of arrays . this will not happen if for will notice one of weaving functions and optimize it away. So that means that we will have to have standart set of weaving functions recognizable by for . so possibly we can revive the multistream for if we wrap this behaviour around loop and given , something like this loop { given each a - $x { given each b - $y { given each c - $z { last loop if undef $x|$y|$z this is already valid perl6 syntax if array a have iterator method similar hash. ( may be it is called a.next or a.iter ) . and if each will notice how many arguments closure expects. as it is , it looks weird , and we loose the fact that for loop is *single* topicalizer scope ( here we have to break 3 of them to get out . and also the topic inside the ... is the *last* argument $z and not the first as would be for usual for . so strictly speaking , this is not wrapping around -- this is just valid (??) sintax. but may be it *is* possible to somehow wrap the multistream behaviour around loop - given pair. I dont know. maybe new keyword stream loop { stream a - $x,$y { stream b - $z{ stream c - $alpha,$beta { last loop if undef $x|$y|$z and stream does not set the topicilizer scope. it seems that stream is just a function . and then it does not automatically create a topicalizer scope. or maybe each is sort of redundant inside given and we have loop { given a - [$x,$y ] { given b - [$z ] { given c - [$alpha,$beta] { last loop if undef $x|$y|$z but then a will have to remember its current index. and given to be aware of it. may be its too much for given. for round_robin_by_3s(x, y, z) - $x, $y, $z { ... } Fooling around with signature syntax for that rare case is not worth it. This way, the Cfor won't have to know anything about the signature other than that it expects 3 scalar arguments. And Simon will be happ(y|ier) that we've removed an exception. and for this type of things there is always weaving possibility. arcadi .
vote no - Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
The first message had many of the following characters viewable in my telnet window, but the repost introduced a 0xC2 prefix to the 0xA7 character. I have this feeling that many people would vote against posting all these funny characters, as is does make reading the perl6 mailing lists difficult in some contexts. Ever since introducing these UTF-8 127 characters into this mailing list, I can never be sure of what the posting author intended to send. I'm all for supporting UTF-8 characters in strings, and perhaps even in variable names but to we really have to have perl6 programs with core operators in UTF-8. I'd like to see all the perl6 code that had UTF-8 operators start with use non_portable_utf8_operators. As it stands now, I'm going to have to find new tools for my linux platform that has been performing fine since 1995 (perl5.9 still supports libc5!), and I don't yet know how I am going to be able to telnet in from win98, and I'll bet that the dos kermit that I use when I dial up won't support UTF-8 characters either. David ps. I just read how many people will need to upgrade their operating systems if the want to upgrade to MS Word11. Do we want to require operating system and/or many support tools to be upgraded before we can share perl6 scripts via email? On Tue, 5 Nov 2002 at 09:56 -0800, Michael Lazzaro [EMAIL PROTECTED]: CodeSymbol Comment 167 § Could be used 169 © Could be used 171 « May well be used 172 ¬ Not? 174 ® Could be used 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 181 µ Could be used 182 ¶ Could be used 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 191 ¿ Could be used
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
On Mon, Nov 04, 2002 at 07:27:56PM -0800, Brian Ingerson wrote: : Mutt? : : I'm using mutt and I still haven't had the privledge of correctly viewing one : of these unicode characters yet. I'm gonna be really mad if you say you're : also using an OS X terminal. I suspect that it's my horrific OS X termcap : that's misbehaving here. : : Aargh! I'm using mutt version 1.4i. The stock mutt on my RedHat wasn't new enough. Larry
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
On Tue, Nov 05, 2002 at 11:36:45AM -0500, Ken Fox wrote: : Jonathan Scott Duff wrote: : : Um ... could we have a zip functor as well? I think the common case : will be to pull N elements from each list rather than N from one, M : from another, etc. So, in the spirit of timtowtdi: : : for zip(a,b,c) - $x,$y,$z { ... } : : sub zip (\:ref repeat{1,}) { :my $max = max(map { $_.length } _); :my $i = 0; :while ($i $max) { :for (_) { :yield $_[$i] :} :++$i :} :return ( ) : } : : That prototype syntax is probably obsolete, but I'm not sure : what the current proposal is. It might be better to force scalar : context on the args so that both arrays and array refs can be : zipped. You never have to put \ into a signature anymore--that's the default. You only get list context (and flattening) when you use the splat. For a recurring scalar context, you want something like: sub zip (refs is repeatedly (Array)) { The exact syntax is subject to change, of course. : I really like the idea of using generic iterators instead of : special syntax. Sometimes it seems like we're discussing 6.x : instead of just 6.0. : : This iterator is nice too: : : sub pairs (\a, \b) { :my $max = max(a.length, b.length); :my $i = 0; :while ($i $max) { :yield a[$i] = b[$i]; :++$i :} :return ( ) : } : : for pairs (a, b) { :print .x, .y : } Neither of these work on arrays which have a finite but unknown length. Larry
Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
On Tuesday, Nov 5, 2002, at 04:58 Asia/Tokyo, Larry Wall wrote: (B It would be really funny to use cent $B!q(B, pound $B!r(B, or yen (J\(B as a sigil, (B though... (B (BWhich 'yen' ? I believe you already know \ (U+005c - REVERSE SOLIDUS) (Bis prited as a yen figure in most of Japanese platforms so yen is (Balready everywhere :) (B (BOne big problem for introducing Unicode operator is that there are too (Bmany symbols that look the same but with different code points (Unicode (Bconsortium has so done to make its capitalist members happy so their (Bproprietary symbols in their legacy codes are preserved). Therefore I (Bobject to the idea of making Unicode operator "standard", however (Badvanced that particular operator would be. At the same time, things (Blike "use (more) operators = taste;" is very welcome. i.e. (B (B use operators = "smooth"; (B $hashref = $B!j(B%hash # U+2640 FEMALE SIGN (B $value = $hashref$B!i(B{key}; # U+2642 MALE SIGN (B (B People who believe slippery slope arguments should never go skiing. (B (BI don't want perl6 to be as "tough" as skiing, though. (B (B On the other hand, even the useful slippery slopes have "beginner" (B slopes. I think one advantage of using Unicode for advanced features (B is that it *looks* scary. So in general we should try to keep the (B basic features in ASCII, and only use Unicode where there be dragons. (B (BHeck. We already have source filters in perl5 and I'm pretty much sure (Bsomeone will just invent yet another 'use operators = "ascii";' kind (Bof stuff in perl6. I thought "use English" was already enough. (B (B It will certainly be possible to write APL in Perl, but if you do, (B you'll get what you deserve. (B (BAnd even APL has j. Methinks the question is now whether you make APL (Bout of j or j out of APL. (B $BCF(B the $B!i(B with Too Many Symbols to Deal With (B (BP.S. Here is even wilder idea than Unicode operators. Why don't we (Bjust make perl6 XML-based and allow inline objects to be operators? (B (Bperl (B$two = $one operator src="plus.png" $one; (B/perl (B (B. Yuck!
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
This UTF discussion has got silly. I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. The Gillemets are coming through fine, but most of the other heiroglyphs need a lot to be desired. Lets consider the coding comparisons. Chars in the range 128-159 are not defined in Latin-1 (issue 1) and are used differently by windows to Latin-1 (later issues) so should be avoided. Chars in the range 160-191 (which include the gillemot) are coming through fine if encoded by the sender as UTF8. Anything in the range 192-255 is encoded differently and thus should be avoided. Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows are: CodeSymbol Comment 160 Non-breaking space (map to normal whitespace) 161 ¡ Could be used 162 ¢ Could be used 163 £ Could be used 164 ¤ Could be used 165 ¥ Could be used 166 ¦ Could be used 167 § Could be used 168 ¨ Could be used thouugh risks confusion with 169 © Could be used 170 ª Could be used (but I dislike it as it is alphabetic) 171 « May well be used 172 ¬ Not? 173 Nonbreaking - treat as the same 174 ® Could be used 175 ¯ May cause confusion with _ and - 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 178 ² To the power of 2 (squaring ? ) Otherwise best avoided 179 ³ Cubing? Otherwise best avoided 180 ´ Too confusing with ' and ` 181 µ Could be used 182 ¶ Could be used 183 · Dot Product? though likely to be confused with . 184 ¸ treat as , 185 ¹ To the power 1? Probably best avoided 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 188 ¼ Could be used 189 ½ Could be used 190 ¾ Could be used 191 ¿ Could be used Richard -- Personal [EMAIL PROTECTED]http://www.waveney.org Telecoms [EMAIL PROTECTED] http://www.WaveneyConsulting.com Web services [EMAIL PROTECTED]http://www.wavwebs.com Independent Telecomms Specialist, ATM expert, Web Analyst Services
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
On Tue, Nov 05, 2002 at 03:21:54PM +1100, Damian Conway wrote: Larry wrote: But let's keep it out of the signature, I think. In other words, if something like for @x ∥ @y ∥ @z - $x, $y, $z { ... } is to work, then @result = @x ∥ @y ∥ @z; has to interleave @x, @y, and @z. It's not special to the Cfor. Very nice. The n-ary zip operator. Um ... could we have a zip functor as well? I think the common case will be to pull N elements from each list rather than N from one, M from another, etc. So, in the spirit of timtowtdi: for zip(@a,@b,@c) - $x,$y,$z { ... } # one at a time for zip(@a,@b,@c,3) - $x,$y,$z { ... } # three at a time zip() would interleave its array arguments one at a time by default and N at a time if the last argument is a number. Then the RHS of the arrow just tells perl (and us) how many things to pull from the resultant list. This would, of course, lead to strange things like this though: for zip(@a,@b,2) - $x,$y,$z { ... } but perl is always giving us enough rope. Besides ... someone may want/need those semantics. Or perhaps just: sub take(int $n, *@from) { yield splice @from, 0, $n while @from $n; return ( @from, undef xx ($n-@from) ) } three = take.assuming(n=3); for three(@x), three(@y), three($z) - $x, $y, $z { ... } Or if we generalized zip() a little: for weave(@a,2,@b,1) - $x,$y,$z { ... } Which would take 2 elements from @a, and one from @b, until both arrays were exhausted. I'm just casting for alternatives to the punctuative versions in case I hit something that's really good :-) -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: UTF-8 and Unicode FAQ, demos
On Mon, Nov 04, 2002 at 12:26:56PM -0800, Austin Hastings wrote: Of course, I also think I'm allowed to be a little inconsistent in forcing things like ?op? on people. After all, there's gotta be some advantage to being the Fearless Leader... Which kind of begs the question: Who are you? And can you authenticate that which you just implicitly claimed? (See quote header, above, if you don't understand my question) That message got cc:'ed to me, and according to the headers I got, somebody either cracked 'wall.org' or that's the real Larry. Looks like he just switched to mutt and has a little bit of config tweaking yet to do. ;) -- Matt Matthew Zimmerman Interdisciplinary Biophysics, University of Virginia http://www.people.virginia.edu/~mdz4c/
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
Jonathan Scott Duff wrote: Um ... could we have a zip functor as well? I think the common case will be to pull N elements from each list rather than N from one, M from another, etc. So, in the spirit of timtowtdi: for zip(a,b,c) - $x,$y,$z { ... } sub zip (\:ref repeat{1,}) { my $max = max(map { $_.length } _); my $i = 0; while ($i $max) { for (_) { yield $_[$i] } ++$i } return ( ) } That prototype syntax is probably obsolete, but I'm not sure what the current proposal is. It might be better to force scalar context on the args so that both arrays and array refs can be zipped. I really like the idea of using generic iterators instead of special syntax. Sometimes it seems like we're discussing 6.x instead of just 6.0. This iterator is nice too: sub pairs (\a, \b) { my $max = max(a.length, b.length); my $i = 0; while ($i $max) { yield a[$i] = b[$i]; ++$i } return ( ) } for pairs (a, b) { print .x, .y } - Ken
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Thanks, I've been hoping for someone to post that list. Taking it one step further, we can assume that the only chars that can be used are those which: -- don't have an obvious meaning that needs to be reserved -- appear decently on all platforms -- are distinct and recognizable in the tiny font sizes used when programming Comparing your list with mine, with some subjective editing based on my small courier font, that chops the list of usable operators down to only a handful: Code Symbol Comment 167 § Could be used 169 © Could be used 171 « May well be used 172 ¬ Not? 174 ® Could be used 176 ° Could be used 177 ± Introduces an interesting level of uncertainty? Useable 181 µ Could be used 182 ¶ Could be used 186 º Could be used (but I dislike it as it is alphabetic) 187 » May well be used 191 ¿ Could be used That's all. A shame, because some of the others have very interesting possibilities: • ≠ ø † ∑ ∂ ƒ ∆ ≤ ≥ ∫ ≈ Ω ‡ ± ˇ ∏ Æ But if Windows can't easily do them, that's a pretty big problem. Thanks for the list. MikeL
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
I'm all for one or two unicode operators if they're chosen properly (and I trust Larry to do that since he's done a stellar job so far), but what's the mechanism to generate unicode operators if you don't have access to a unicode-aware editor/terminal/font/etc.? IS the only recourse to use the named versions? Or will there be some sort of digraph/trigraph/whatever sequence that always gives us the operator we need? Something like \x[263a] but in regular code and not just quote-ish contexts: $campers = $a \x[263a] $b # make $a and $b happy -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Dan Kogai wrote: We already have source filters in perl5 and I'm pretty much sure someone will just invent yet another 'use operators = ascii;' kind of stuff in perl6. I think that's backwards to have operators being funny characters by default but requiring explicit declaration to use well-known Ascii characters. Doing it t'other way round would mean that you can always write fully portable code fragments in pure Ascii, something that'd be helpful on mailing lists and the like. There could be an alias syntax for people in an environment where they'd prefer to have a non-Ascii character in place of a conglomerate of Ascii symbols, maybe: treat '»...«' as '[...]'; That has the documentational advantage that any non-Ascii character used in code must be declared earlier in that file. And even if the non-Ascii character gets warped in the post and displays oddly for you, you can still see what the author intended it to do. This has the risk that Damian described of everybody defining their own operators, but I think that's unlikely. There's likely to be a convention used by many people, at least those who operate in a given character set. This way also permits those who live in a Latin 2 (or whatever) world to have their own convention using characters that make sense to them. Smylers
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
Richard Proctor wrote: I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. ... Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows ... What about people who don't use Latin-1, perhaps because their native language uses Latin-2 or some other character set mutually exclusive with Latin-1? I don't have a Latin-2 ('Central and East European languages') typeface handy, but its manpage includes: 253 171 AB LATIN CAPITAL LETTER T WITH CARON 273 187 BB LATIN SMALL LETTER T WITH CARON Caron is sadly missing from my dictionary so I'm not sure what those would look like, but I suspect they wouldn't be great symbols for vector operators. 171 « May well be used Also I wonder how similar to doubled less-than or greater-than signs guillemets would look. In this font they're fine, but I'm concerned at my abilities to make them sufficiently distinguishable on a whiteboard, and whether publishers will cope with them (compare a recent discussion on 'use Perl' regarding curly quotes and fi ligatures appearing in code samples). Smylers
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
Scott Duff wrote: Very nice. The n-ary zip operator. Um ... could we have a zip functor as well? Yes, I expect so. Much as C|, C, and C^ will be operator versions of Cany, Call, and Cone. And I'd suggest that it be implemented something like: sub zip(ARRAY *sources; $by = 1) { if exists $by all(sources).isa(PAIR) { warn Useless 'by' argument (every array already has a count); } else { for sources { $_ = $_=$by unless .isa(PAIR) } } my zipped; while any(sources).key { push zipped, splice(.key, 0, .value) for sources; } return zipped; } So, in the spirit of timtowtdi: for zip(a,b,c) - $x,$y,$z { ... } # one at a time for zip(a,b,c,3) - $x,$y,$z { ... } # three at a time As implied above, I think the N-at-a-time behaviour would be better mediated by an optional named parameter. So that second one should be: for zip(a,b,c,by=3) - $x,$y,$z { ... } # three at a time Or if we generalized zip() a little: for weave(a,2,b,1) - $x,$y,$z { ... } Which would take 2 elements from a, and one from b, until both arrays were exhausted. As Buddha Buck suggested elsewhere, and as I have coded above, I would imagine that this functionality would be mediated by pairs and merged into a single Czip function. So that last example is just: for zip(a=2,b=1) - $x,$y,$z { ... } Damian
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
On Tue 05 Nov, Smylers wrote: Richard Proctor wrote: I am sitting at a computer that is operating in native Latin-1 and is quite happy - there is no likelyhood that UTF* is ever likely to reach it. ... Therefore the only addition characters that could be used, that will work under UTF8 and Latin-1 and Windows ... What about people who don't use Latin-1, perhaps because their native language uses Latin-2 or some other character set mutually exclusive with Latin-1? Once you go beyond latin-1 there is nothing common anyway. The Gullimots become T and t with inverted hats under Latin-2, oe and G with an inverted hat under Latin-3, oe and G with a squiggle under it under Latin-4, No meaning and a stylisd K for Latin-5, (cant find latin6), Gullimots under Latin 7, nothing under latin-8. Richard -- Personal [EMAIL PROTECTED]http://www.waveney.org Telecoms [EMAIL PROTECTED] http://www.WaveneyConsulting.com Web services [EMAIL PROTECTED]http://www.wavwebs.com Independent Telecomms Specialist, ATM expert, Web Analyst Services
Re: Unicode operators [Was: Re: UTF-8 and Unicode FAQ, demos]
As one of the instigators of this thread, I submit that we've probably argued about the Unicode stuff enough. The basic issues are now known, and it's known that there's no general agreement on any of this stuff, nor will there ever be. To wit: -- Extended glyphs might be extremely useful in extending the operator table in non-ambiguous ways, especially for advanced things like «op».. -- Many people loathe the idea, and predict newcomers will too. -- Many mailers older platforms tend to react badly for both viewing and inputting. -- If extended characters are used at all, the decision needs to be made whether they shall be least-common-denominator Latin1, UTF-8, or full Unicode, and if there are backup spellings so that everyone can play. It's up to Larry, and he knows where we're all coming from. Unless anyone has any _new_ observations, I propose we pause the debate until a decision is reached? MikeL
Re: UTF-8 and Unicode FAQ, demos
Damian Conway wrote: Larry Wall wrote: That suggests to me that the circumlocution could be *. A five character multiple symbol??? I guess that's the penalty for not upgrading to something that can handle unicode. Unless this is subtle humor, the Huffman encoding idea is getting seriously out of hand. That 5 char ASCII sequence is *identically* encoded when read by the human eye. Humans can probably type the 5 char sequence faster too. How does Unicode win here? I know I'm just another sample point in a sea of samples, but my embedded symbol parser seems optimized for alphabetic symbols. The cool non-alphabetic Unicode symbols are beautiful to look at, but they don't help me read or write faster. There are rare exceptions (like grouping) where I strongly prefer non-alphabetics, but otherwise alphabetics help me get past the what is this code? phase and into the what does this code do? phase as quickly as possible. (I just noticed that all the non-alphabetic symbols (except '?') in the previous paragraph are used for grouping. Weird.) - Ken
RE: UTF-8 and Unicode FAQ, demos
Ken Fox wrote: Damian Conway wrote: Larry Wall wrote: That suggests to me that the circumlocution could be *. A five character multiple symbol??? I guess that's the penalty for not upgrading to something that can handle unicode. Unless this is subtle humor, the Huffman encoding idea is getting seriously out of hand. That 5 char ASCII sequence is *identically* encoded when read by the human eye. Humans can probably type the 5 char sequence faster too. How does Unicode win here? I know I'm just another sample point in a sea of samples Can't we have our cake and eat it too? Give ASCII digraph or trigraph alternatives for the incoming tide of Perl6 Unicode? Allow both * and »*«? Or something similar '*', [*], etc... -- Garrett Goebel IS Development Specialist ScriptPro Direct: 913.403.5261 5828 Reeds Road Main: 913.384.1008 Mission, KS 66202 Fax: 913.384.2180 www.scriptpro.com [EMAIL PROTECTED]
RE: UTF-8 and Unicode FAQ, demos
Garrett Goebel: # Ken Fox wrote: # Unless this is subtle humor, the Huffman encoding idea is getting # seriously out of hand. That 5 char ASCII sequence is *identically* # encoded when read by the human eye. Humans can probably type the 5 # char sequence faster too. How does Unicode win here? # # Can't we have our cake and eat it too? Give ASCII digraph or # trigraph alternatives for the incoming tide of Perl6 Unicode? The Unicode version is more typing than the non-Unicode version, so what's the advantage? It's prettier? --Brent Dax [EMAIL PROTECTED] @roles=map {Parrot $_} qw(embedding regexen Configure) Wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. And radio operates exactly the same way. The only difference is that there is no cat. --Albert Einstein (explaining radio)
Re: UTF-8 and Unicode FAQ, demos
On Monday, November 4, 2002, at 08:55 AM, Brent Dax wrote: # Can't we have our cake and eat it too? Give ASCII digraph or # trigraph alternatives for the incoming tide of Perl6 Unicode? The Unicode version is more typing than the non-Unicode version, so what's the advantage? It's prettier? Well, yes! :-)... but also because they are unique characters compared to all the other existing prefix/postfix/binary/quotelike operators, so there pretty much zero chance of ambiguity. Using just a few Unicode symbols would seriously open up the range of possible sensible operators, without causing the kind of mind-numbing ambiguities and subtle no-not-this-I-mean-that we've seen in the whole xor/hyper discussions. UTF-8 «op» representations have the advantage of trivially not conflicting with _any_ existing operators, and being visually distinct from all of them. There may be a few other things in easy-to-find-and-type Latin1, like one or two of these: • ≈ ∫ ∆ ® © § ∑ Ω ∆ ¶ ‡ ± ˇ ¿ That could maybe fill in for ';' in the cases where ';' has been given a sneaky meaning, or represent some infrequent but terrifically useful unary or binary op, etc. C'mon, everybody's doing it! First one's free, kid... ;-) MikeL
Re: UTF-8 and Unicode FAQ, demos
On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote: Matthew Zimmerman wrote in perl.perl6.language : So let me make my original question a little more general: are Perl 6 source files encoded in Latin-1, UTF-8, or will Perl 6 provide some sort of translation mechanism, like specifying the charset on the command line? I expect probably something similar to Perl 5's encoding pragma. (But hopefully lexically scoped.) Okay, but what will the default be? UTF-8? iso-8859-1? My current locale? Am I going to have put use encoding 'utf8'; # or whatever the P6 syntax will be at the beginning of every program that might get distributed outside of my home country to make sure it'll run? Are we going to tell newbies to make sure they have '-w' and 'use strict' *and* 'use encoding' at the beginning of their programs? I'm just worried about the possibility of writing Perl 6 programs and then sending them to friends in other parts of the world and having them fail in subtle ways because my Perl 6 expects 0xAB and theirs expects 0xC2AB (or visa versa). Or if I post a code sample to CLPM that runs on my machine that doesn't compile from the posting because my news client automatically converts charsets. Undoubtedly the Perl 6 parser will be smart enough to figure out all of this, and I'm making a mountain out of a molehill. But I just want to make sure that one of the people in authority here either is or will be thinking about this. -- Matt Matthew Zimmerman Interdisciplinary Biophysics, University of Virginia http://www.people.virginia.edu/~mdz4c/
Re: UTF-8 and Unicode FAQ, demos
On Mon, Nov 04, 2002 at 10:19:55AM -0800, Michael Lazzaro wrote: UTF-8 «op» representations have the advantage of trivially not conflicting with _any_ existing operators, and being visually distinct from all of them. There may be a few other things in easy-to-find-and-type Latin1, like one or two of these: • ≈ ∫ ∆ ® © § ∑ Ω ∆ ¶ ‡ ± ˇ ¿ I've actually got my eye on ≈ (U+2248 ALMOST EQUAL TO) as a replacement for ~~ someday in the distant future. I suppose it could be argued that we should use ≅ (U+2245 APPROXIMATELY EQUAL TO) instead. That's what =~ was supposed to represent, after all... That could maybe fill in for ';' in the cases where ';' has been given a sneaky meaning, or represent some infrequent but terrifically useful unary or binary op, etc. You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Even if we limit ourselves to Latin1 for now, there's things like the broken pipe ¦ and logical not ¬ and such that look useful. I'd avoid using standard signs like multiply × and divide ÷ for non-standard purposes though. (Not that we can exactly use multiply even for its standard purpose--there's an awfully heavy resemblance between × and x, at least in the typical sans serif font.) It would be really funny to use cent ¢, pound £, or yen ¥ as a sigil, though... C'mon, everybody's doing it! First one's free, kid... ;-) People who believe slippery slope arguments should never go skiing. On the other hand, even the useful slippery slopes have beginner slopes. I think one advantage of using Unicode for advanced features is that it *looks* scary. So in general we should try to keep the basic features in ASCII, and only use Unicode where there be dragons. It will certainly be possible to write APL in Perl, but if you do, you'll get what you deserve. In fact, the problem with APL is not that it's possible to write APL in it, but that it is impossible not to... :-) Larry
Re: UTF-8 and Unicode FAQ, demos
On Mon, Nov 04, 2002 at 11:27:16AM -0800, Austin Hastings wrote: --- Matthew Zimmerman [EMAIL PROTECTED] wrote: On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote: Matthew Zimmerman wrote in perl.perl6.language : So let me make my original question a little more general: are Perl 6 source files encoded in Latin-1, UTF-8, or will Perl 6 provide some sort of translation mechanism, like specifying the charset on the command line? I expect probably something similar to Perl 5's encoding pragma. (But hopefully lexically scoped.) Okay, but what will the default be? UTF-8? iso-8859-1? My current locale? Am I going to have put use encoding 'utf8'; # or whatever the P6 syntax will be at the beginning of every program that might get distributed outside of my home country to make sure it'll run? 8859-1 will be the default. Actually, Unicode will be the default. 8859-1 can probably also be handled without declaration. If you want trigraph support, you'll have to put use encoding 'ugly-american'; at the top of your files. ;-) ;-) ;-) Otherwise, it'll be one-character ?fancyops? all the way. Mmm, I view one-character Unicode operators as more of an escape hatch for the future, not as something to be made mandatory. But then, I'm one of those ugly Americans. Of course, I also think I'm allowed to be a little inconsistent in forcing things like »op« on people. After all, there's gotta be some advantage to being the Fearless Leader... Larry
Re: UTF-8 and Unicode FAQ, demos
--- [EMAIL PROTECTED], UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR. wrote: Mmm, I view one-character Unicode operators as more of an escape hatch for the future, not as something to be made mandatory. But then, I'm one of those ugly Americans. EBCDIC didn't support brackets, originally, so ANSI included trigraphs called ??( and ??) for [ and ], respectively. But the fact of the matter is that about epsilon (which is to say, really close to zero) people wrote trigraphs. So, yeah, include trigraph sequences if it will make happy the people on the list who can't be bothered to read the documentation for their own keyboard IO system. But don't expect the rest of us to use them. In short: 1- « and » are really useful in my context. 2- I can make my work environment generate them in one (modified) keystroke. 3- I can make my home environment do likewise. 4- The ascii-only version isn't faster and easier, nor more morally pure. 5- There is no differently keyboard abled market out there which has engaged my sympathy, ascii-operator wise. Ergo, 6- my @a = @b «+» @c; Of course, I also think I'm allowed to be a little inconsistent in forcing things like »op« on people. After all, there's gotta be some advantage to being the Fearless Leader... Which kind of begs the question: Who are you? And can you authenticate that which you just implicitly claimed? (See quote header, above, if you don't understand my question) Larry =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
On 2002-11-04 at 12:26:56, Austin Hastings wrote: 1- ? and ? are really useful in my context. Okay. Now can you get your mailer to send them properly? :)
Re: UTF-8 and Unicode FAQ, demos
After all, there's gotta be some advantage to being the Fearless Leader... Larry Thousands will cry for the blood of the Perl 6 design team. As Leader, you can draw their ire. Because you are Fearless, you won't mind... -- ralph
Re: UTF-8 and Unicode FAQ, demos
Ken Fox wrote: I know I'm just another sample point in a sea of samples, but my embedded symbol parser seems optimized for alphabetic symbols. The cool non-alphabetic Unicode symbols are beautiful to look at, but they don't help me read or write faster. Once again: we're only talking about « and ». There are rare exceptions (like grouping) E.g. « and » ;-) where I strongly prefer non-alphabetics, but otherwise alphabetics help me get past the what is this code? phase and into the what does this code do? phase as quickly as possible. Interestingly, I find it just the opposite. The use of symbolic operators makes it easier for me to differentiate the nouns, verbs, and punctuation of a piece of code. Damian
Re: UTF-8 and Unicode FAQ, demos
Garrett Goebel wrote: Can't we have our cake and eat it too? Give ASCII digraph or trigraph alternatives for the incoming tide of Perl6 Unicode? Allow both * and »*«? I'd really prefer we didn't. I'd much rather keep and for other things. Or something similar '*', [*], etc... Much as I hate the notion of di- and trigraphs, this is a possibility. Though I'd much rather we just allowed POD escapes (e.g. Elaquo and Eraquo) in code. And, yes, I'm aware that makes Elaquo*Eraquo incredibly ugly. I'm rather *counting* on it, in fact ;-) Damian
Re: UTF-8 and Unicode FAQ, demos
people on the list who can't be bothered to read the documentation for their own keyboard IO system. Most of this discussion seems to focus on keyboarding. But that's of little consequence. This will always be spotted before it does much harm and will affect just one person and their software at a time. Errors in encoding during transmission is a whole lot more problematic. This will almost always be spotted after the fact, and may affect many people at a time and require fixes to multiple systems not controlled by the sender or receiver. -- ralph
Re: UTF-8 and Unicode FAQ, demos
--- Me [EMAIL PROTECTED] wrote: people on the list who can't be bothered to read the documentation for their own keyboard IO system. Most of this discussion seems to focus on keyboarding. But that's of little consequence. This will always be spotted before it does much harm and will affect just one person and their software at a time. Good. Counting Damian, that makes three of us. Welcome aboard, ralph. :-) Errors in encoding during transmission is a whole lot more problematic. This will almost always be spotted after the fact, and may affect many people at a time and require fixes to multiple systems not controlled by the sender or receiver. I disagree (slightly). I get emailed powerpoint files, jpeg images, and tens of other binary formats every day, and they consistently come through correctly. The transmission network is working fine. What we've got is an encoding problem at the MUA level. Mark Reed says my mailer (Yahoo!) tagged a message containing high-bit characters as US-ASCII. Several people the other day reported on the differences in UTF8 vs. Latin-1 handling among pine, elm, and other mailers. There are problems, and this kind of change will create a demand to get them fixed. Those products that satisfy the demand will survive. The others won't. Up until now, though, everyone's been lax about making the encoding stuff strack. But this is a language widely regarded as a huge player, and when a huge player says You need to take care of (something), then it gets done. Perl6 will do more to address the real technical issues of electronic communication between Americans and French-speakers than anything else. (Primarily because Perl hackers want to talk to each other, but no French-speaker wants to talk to an American ;-) =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
I'm having trouble this is even being considered. At all. And especially for these operators... So, yeah, include trigraph sequences if it will make happy the people on the list who can't be bothered to read the documentation for their own keyboard IO system. But don't expect the rest of us to use them. So you're one of the very few people who bothered to set up unicode, and now you want to force the rest of us into your own little leet group. Given the choice between learning how to reconfigure their keyboard, editor, terminal, fonts, and everything else, or just not learning perl6, I bet you'd have a LOT of people who get scared away. Face it, too many people think perl is linenoise heavy and random already. Which brings me to my real question: why these operators? It's not as if they're even particularly intuitive for this context. They're quotes. They don't mean vector anything, and never have. I could almost see if the characters in question just screamed the function in question (sqrt, not equals, not, sum, almost anything like that), but these are just sort of random. Given how crazy this is all getting, is it absolutely certain that we're better off not just making vector operations work without modifiers? I reread the apocalypse just now, and I don't really see the problem. The main argument against seems to be perl5 people expect it to be scalar, but perl5 people will have to get used to a lot. I think the operators should just be list based, and if you want otherwise you can specify scalar:op or convert both sides to scalars manually (preferably with .length, so it's absolutely clear what's meant). -- Adam Lopresto ([EMAIL PROTECTED]) http://cec.wustl.edu/~adam/ Who are you and what have you done with reality? --Jamin Gray
Re: UTF-8 and Unicode FAQ, demos
On Mon, Nov 04, 2002 at 12:26:56PM -0800, Austin Hastings wrote: In short: 1- ? and ? are really useful in my context. 2- I can make my work environment generate them in one (modified) keystroke. 3- I can make my home environment do likewise. 4- The ascii-only version isn't faster and easier, nor more morally pure. 5- There is no differently keyboard abled market out there which has engaged my sympathy, ascii-operator wise. Ergo, 6- my @a = @b ?+? @c; It's a great argument. I know how to type funny characters too. I can even read some of the ones some people send. Just don't expect me to be able to understand any Perl 6 you mail me. Whether the problem is at your end, my end or somewhere in the middle is moot. On the other hand, maybe all these issues will be sorted out before we can start writing Perl 6 in earnest. In one way I hope that is true. In another I hope it isn't ;-) -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net
Re: UTF-8 and Unicode FAQ, demos
Austin Hastings wrote in perl.perl6.language : What we've got is an encoding problem at the MUA level. Mark Reed says my mailer (Yahoo!) tagged a message containing high-bit characters as US-ASCII. Several people the other day reported on the differences in UTF8 vs. Latin-1 handling among pine, elm, and other mailers. Not only the MUA level. Usually source code is written in a lowest common denominator of ascii, even for languages that allow unicode identifiers (Java) or markup. That's because source code is handled by parsers, documentation extractors, pretty printers, diff(1), patch(1), version control software, and (you said it) various internet clients. That's why some people may still prefer to continue using pure ascii even though then think that unicode operators are cool. (Esp. if they are under the influence of FUD : use PHP ! it's ascii compliant !) Perl6 will do more to address the real technical issues of electronic communication between Americans and French-speakers than anything else. (Primarily because Perl hackers want to talk to each other, but no French-speaker wants to talk to an American ;-) You're Italian, aren't you ?
Re: UTF-8 and Unicode FAQ, demos
[EMAIL PROTECTED] (Damian Conway) writes: Or something similar '*', [*], etc... Much as I hate the notion of di- and trigraphs, this is a possibility. I do like this too, because it reminds me of C trigraphs, which had precisely the same purpose - allow people with old-fashioned sub-standard character sets to come and play with the big boys. And eventually, the old trigraphs died out because everyone caught up with the decent (for the era) character sets. That's assuming we have to have Unicode operators. I would, however, like to hear a passionate argument in favour of this, because we've seen plenty of arguments against (encoding, transmission, keyboarding, etc.) but not all that many in favour, so a nice definitive one would be helpful. -- evilPetey I often think I'd get better throughput yelling at the modem.
Re: UTF-8 and Unicode FAQ, demos
--- Rafael Garcia-Suarez [EMAIL PROTECTED] wrote: Austin Hastings wrote in perl.perl6.language : What we've got is an encoding problem at the MUA level. Mark Reed says my mailer (Yahoo!) tagged a message containing high-bit characters as US-ASCII. Several people the other day reported on the differences in UTF8 vs. Latin-1 handling among pine, elm, and other mailers. Not only the MUA level. Usually source code is written in a lowest common denominator of ascii, even for languages that allow unicode identifiers (Java) or markup. That's because source code is handled by parsers, documentation extractors, pretty printers, diff(1), patch(1), version control software, and (you said it) various internet clients. That's why some people may still prefer to continue using pure ascii even though then think that unicode operators are cool. (Esp. if they are under the influence of FUD : use PHP ! it's ascii compliant !) Yeah, but ActiveState does Perl, and Microsoft owns ActiveState, so we've got the kings of FUD on our side for a change. Joy. Perl6 will do more to address the real technical issues of electronic communication between Americans and French-speakers than anything else. (Primarily because Perl hackers want to talk to each other, but no French-speaker wants to talk to an American ;-) You're Italian, aren't you ? Actually, an American who's been ignored in many places. :-) =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
[EMAIL PROTECTED] (Austin Hastings) writes: Yeah, but ActiveState does Perl, and Microsoft owns ActiveState To what extent are *either* of those statements true? :) -- All the good ones are taken.
Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote: You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Or more than one type of supercomma, e.g: for x ¡ò y ¡ò z - $x ¡ò $y ¡ò $z { ... } to mean: for x ; y ; z - $x ; $y ; $z { ... } - vs - for x ¡× y ¡× z - $x ¡× $y ¡× $z { ... } to mean: for x - $x { for y - $y { for z - $z { ... } } } ;-) MikeL
Re: UTF-8 and Unicode FAQ, demos
--- Simon Cozens [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] (Austin Hastings) writes: Yeah, but ActiveState does Perl, and Microsoft owns ActiveState To what extent are *either* of those statements true? :) Hmm. Well, last time I checked you could still download a perl binary from ActiveState.com. And, in fact, check out the motivation behind the agreement: http://www.dnjonline.com/newsreel/index.html Microsoft buys into Perl Aug 6 - Microsoft has hired ActiveState Tool Corporation to improve the Windows functionality of the Open Source scripting language Perl. This agreement reinforces a long-term relationship between Microsoft and Perl, stemming from 1993 when Microsoft funded the first port of Perl 5 to the Windows platform. ActiveState develops and distributes the popular Windows version, called ActivePerl. Our mission is to make Perl as popular as possible, said Dick Hardt, chief executive of ActiveState. The monetary details of the deal weren't revealed - in fact there was no mention of it anywhere on the Microsoft web site. Instead the impetous seems to have come from Microsoft India where the main aim is to improve Perl's support for non-Roman character-sets through Unicode. As part of the agreement, ActiveState will add features previously missing from Windows ports of Perl, as well as full support for Unicode - a key feature to users dealing with Asian character sets. blah blah blah ... =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
--- Adam D. Lopresto [EMAIL PROTECTED] wrote: I'm having trouble this is even being considered. At all. And especially for these operators. Heute vektoren, morgen das welt! Uniperl, Uniperl uber alles, Uber alles in der welt! With hyper-states through choose and true(); Masterfully golf scorin' script, Von der bis an die all(), Von der any() bis an den - Uniperl, Uniperl uber alles, Uber alles in der welt! So you're one of the very few people who bothered to set up unicode, and now you want to force the rest of us into your own little leet group. Nerp. Hadn't given it a second thought until the whining started about It's so hard... I had actually figured that I'd be able to set a keystroke in my editor and that would be the end of it. But then, for no good reason that I can think of, I tried microsoft's help site and found it in about thirty seconds. No need to set up a keyboard macro -- it's part of the OS. I did BBS, though not as a warez d00d. It's L33t. Given the choice between learning how to reconfigure their keyboard, editor, terminal, fonts, and everything else, or just not learning perl6, I bet you'd have a LOT of people who get scared away. That sounds a lot like what I said (and to a certain extent still fear) back when - was first going away. It didn't work then, either. Face it, too many people think perl is linenoise heavy and random already. Which is why adding a single character with a single meaning that can be covered in chapter 14 instead of chapter 3 is a workable idea, and why creating an operator called Jesus, it looks like an ASCII-art version of a dancing penguin in high heels isn't. Bow-tie operator, indeed! If @a [*=] @b; doesn't scan like rats chewing their way into your cable, what does? Which brings me to my real question: why these operators? It's not as if they're even particularly intuitive for this context. They're quotes. They don't mean vector anything, and never have. I could almost see if the characters in question just screamed the function in question (sqrt, not equals, not, sum, almost anything like that), but these are just sort of random. Simple answer: Larry suggested them. And was willing to sacrifice qw functionality to this. Also, I suppose, because of the map() suggestion a while back -- this operation is going to wind up taking a huge range of parameters in some not-too-distant future. And @a = @b sub @c; will read a lot better, when sub is 8 lines long. Given how crazy this is all getting, is it absolutely certain that we're better off not just making vector operations work without modifiers? I reread the apocalypse just now, and I don't really see the problem. The main argument against seems to be perl5 people expect it to be scalar, but perl5 people will have to get used to a lot. I think the operators should just be list based, and if you want otherwise you can specify scalar:op or convert both sides to scalars manually (preferably with .length, so it's absolutely clear what's meant). It's not absolutely certain. But this discussion was destined to happen, since we're just about out of line noise, but we're nowhere close to being out of clever ideas. =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
On 04/11/02 14:09 -0800, Austin Hastings wrote: --- Rafael Garcia-Suarez [EMAIL PROTECTED] wrote: Austin Hastings wrote in perl.perl6.language : What we've got is an encoding problem at the MUA level. Mark Reed says my mailer (Yahoo!) tagged a message containing high-bit characters as US-ASCII. Several people the other day reported on the differences in UTF8 vs. Latin-1 handling among pine, elm, and other mailers. Not only the MUA level. Usually source code is written in a lowest common denominator of ascii, even for languages that allow unicode identifiers (Java) or markup. That's because source code is handled by parsers, documentation extractors, pretty printers, diff(1), patch(1), version control software, and (you said it) various internet clients. That's why some people may still prefer to continue using pure ascii even though then think that unicode operators are cool. (Esp. if they are under the influence of FUD : use PHP ! it's ascii compliant !) Yeah, but ActiveState does Perl, and Microsoft owns ActiveState, so we've got the kings of FUD on our side for a change. Joy. Speaking of FUD, that's simply not true, nor tasteful IMO. AS has done a handful of short-term contacts for MS, and that's the extent of their relationship. FWIW, AS also does as much or more Unix development as Windows development. They also employ some people who have individually advanced Perl more than you'll ever know. Perl6 will do more to address the real technical issues of electronic communication between Americans and French-speakers than anything else. (Primarily because Perl hackers want to talk to each other, but no French-speaker wants to talk to an American ;-) You're Italian, aren't you ? Actually, an American who's been ignored in many places. :-) =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
[EMAIL PROTECTED] (Austin Hastings) writes: If @a [*=] @b; doesn't scan like rats chewing their way into your cable, what does? This is why God gave us functions as well as operators. -- I _am_ pragmatic. That which works, works, and theory can go screw itself. - Linus Torvalds
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
[Note to all: yes, this is me, despite the weirdities of the quoting and headers. This is how it looks when I using mutt out of the box, because I haven't yet customized it like I have pine. But I do like being able to see my own Unicode characters, not to mention everyone else's. If you don't believe this is me, well, I'll just tell you that I live on a tropical island near Antarctica, my social security number is 987-65-4321, and my mother's maiden name was the same as my maternal grandfather's maiden name. Or something like that... --Ed] On Mon, Nov 04, 2002 at 02:25:08PM -0800, Michael Lazzaro wrote: On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote: You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Or more than one type of supercomma, e.g: for x ∫ y ∫ z - $x ∫ $y ∫ $z { ... } to mean: for x ; y ; z - $x ; $y ; $z { ... } That almost works visually. - vs - for x § y § z - $x § $y § $z { ... } to mean: for x - $x { for y - $y { for z - $z { ... } } } ;-) Glad you put the smiley. I think the latter is much clearer. But at the moment I'm thinking there's something wrong about any approach that requires a special character on the signature side. I'm starting to think that all the convolving should be specified on the left. So in this: for parallel(x, y, z) - $x, $y, $z { ... } the signature specifies that we are expecting 3 scalars to the sub, and conveys no information as to whether they are generated in parallel or serially. That's entirely specified on the left. The natural processing of lists says that serial is specified like this: for a, b, c - $x, $y, $z { ... } Of course, parallel() is a rotten thing to have to say unless you're into readability. So we could still have some kind of parallizing supercomma, mabye even ∥ (U+2225 PARALLEL TO). But let's keep it out of the signature, I think. In other words, if something like for x ∥ y ∥ z - $x, $y, $z { ... } is to work, then result = x ∥ y ∥ z; has to interleave x, y, and z. It's not special to the Cfor. In the case of Cfor, of course, the compiler should feel free to optimize out the actual construction of an interleaved array. I suppose it could be argued that ∥ is really spelled »,« or some such. However, result = x »,« y »,« z; just doesn't read quite as well for some reason. A slightly better case could be made for result = x `|| y `|| z; The reason we originally munged with the signature was so that we could do weird things with differing numbers of streams on the left and the right. But if you really want a way to take 3 from x, then 3 from y, then 3 from z, there should be something equivalent to: for round_robin_by_3s(x, y, z) - $x, $y, $z { ... } Fooling around with signature syntax for that rare case is not worth it. This way, the Cfor won't have to know anything about the signature other than that it expects 3 scalar arguments. And Simon will be happ(y|ier) that we've removed an exception. Ed, er, Larry
RE: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
Larry Wall: (B# for @x $B!B(B @y $B!B(B @z - $x, $y, $z { ... } (B (BEven if you decide to use UTF-8 operators (which I am Officially (BRecommending Against), *please* don't use this one. This shows up as a (Bbox in the Outlook UTF-8 font. (B (B--Brent Dax [EMAIL PROTECTED] (B@roles=map {"Parrot $_"} qw(embedding regexen Configure) (B (BWire telegraph is a kind of a very, very long cat. You pull his tail in (BNew York and his head is meowing in Los Angeles. And radio operates (Bexactly the same way. The only difference is that there is no cat. (B--Albert Einstein (explaining radio)
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
On 04/11/02 17:52 -0800, [EMAIL PROTECTED] wrote: [Note to all: yes, this is me, despite the weirdities of the quoting and headers. This is how it looks when I using mutt out of the box, because I haven't yet customized it like I have pine. But I do like being able to see my own Unicode characters, not to mention everyone else's. If you don't believe this is me, well, I'll just tell you that I live on a tropical island near Antarctica, my social security number is 987-65-4321, and my mother's maiden name was the same as my maternal grandfather's maiden name. Or something like that... --Ed] Mutt? I'm using mutt and I still haven't had the privledge of correctly viewing one of these unicode characters yet. I'm gonna be really mad if you say you're also using an OS X terminal. I suspect that it's my horrific OS X termcap that's misbehaving here. Aargh! Brian On Mon, Nov 04, 2002 at 02:25:08PM -0800, Michael Lazzaro wrote: On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote: You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Or more than one type of supercomma, e.g: for @x I @y I @z - $x I $y I $z { ... } to mean: for @x ; @y ; @z - $x ; $y ; $z { ... } That almost works visually. - vs - for @x § @y § @z - $x § $y § $z { ... } to mean: for @x - $x { for @y - $y { for @z - $z { ... } } } ;-) Glad you put the smiley. I think the latter is much clearer. But at the moment I'm thinking there's something wrong about any approach that requires a special character on the signature side. I'm starting to think that all the convolving should be specified on the left. So in this: for parallel(@x, @y, @z) - $x, $y, $z { ... } the signature specifies that we are expecting 3 scalars to the sub, and conveys no information as to whether they are generated in parallel or serially. That's entirely specified on the left. The natural processing of lists says that serial is specified like this: for @a, @b, @c - $x, $y, $z { ... } Of course, parallel() is a rotten thing to have to say unless you're into readability. So we could still have some kind of parallizing supercomma, mabye even P (U+2225 PARALLEL TO). But let's keep it out of the signature, I think. In other words, if something like for @x P @y P @z - $x, $y, $z { ... } is to work, then @result = @x P @y P @z; has to interleave @x, @y, and @z. It's not special to the Cfor. In the case of Cfor, of course, the compiler should feel free to optimize out the actual construction of an interleaved array. I suppose it could be argued that P is really spelled »,« or some such. However, @result = @x »,« @y »,« @z; just doesn't read quite as well for some reason. A slightly better case could be made for @result = @x `|| @y `|| @z; The reason we originally munged with the signature was so that we could do weird things with differing numbers of streams on the left and the right. But if you really want a way to take 3 from @x, then 3 from @y, then 3 from @z, there should be something equivalent to: for round_robin_by_3s(@x, @y, @z) - $x, $y, $z { ... } Fooling around with signature syntax for that rare case is not worth it. This way, the Cfor won't have to know anything about the signature other than that it expects 3 scalar arguments. And Simon will be happ(y|ier) that we've removed an exception. Ed, er, Larry
Re: UTF-8 and Unicode FAQ, demos
Larry wrote: I've actually got my eye on ≈ (U+2248 ALMOST EQUAL TO) as a replacement for ~~ someday in the distant future. I suppose it could be argued that we should use ≅ (U+2245 APPROXIMATELY EQUAL TO) instead. That's what =~ was supposed to represent, after all... Yeah, either of those work. But neither is entirely satisfactory, since there's nothing almost or approximate about the matching the operator does. We obviously need a unicode IS LIKE UNTO codepoint. ;-) You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Amen. Even if we limit ourselves to Latin1 for now, Which I suspect we should seriously consider. Maybe leave 9+ bit operators to Perl 7. ;-) I'd avoid using standard signs like multiply × and divide ÷ for non-standard purposes though. (Not that we can exactly use multiply even for its standard purpose--there's an awfully heavy resemblance between × and x, at least in the typical sans serif font.) That's why I semi-seriously suggested replacing Cx by C×. For some reason alphabetic operators (at least, those that are pretending to be symbols) really bug me. It would be really funny to use cent ¢, pound £, or yen ¥ as a sigil, though... H. Given that a pound is worth more than a dollar, maybe £ is the sigil for pairs. ;-) Damian
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
Larry wrote: But at the moment I'm thinking there's something wrong about any approach that requires a special character on the signature side. I'm starting to think that all the convolving should be specified on the left. So in this: for parallel(x, y, z) - $x, $y, $z { ... } the signature specifies that we are expecting 3 scalars to the sub, and conveys no information as to whether they are generated in parallel or serially. That's entirely specified on the left. The natural processing of lists says that serial is specified like this: for a, b, c - $x, $y, $z { ... } Of course, parallel() is a rotten thing to have to say unless you're into readability. So we could still have some kind of parallizing supercomma, mabye even ∥ (U+2225 PARALLEL TO). I'd rather we not use that. I found it surprisingly hard to distinguish∥from ||. May I suggest that this might be the opportunity to deploy ¦ (i.e. Ebrvbar). But let's keep it out of the signature, I think. In other words, if something like for x ∥ y ∥ z - $x, $y, $z { ... } is to work, then result = x ∥ y ∥ z; has to interleave x, y, and z. It's not special to the Cfor. Very nice. The n-ary zip operator. I suppose it could be argued that ∥ is really spelled »,« or some such. However, result = x »,« y »,« z; just doesn't read quite as well for some reason. Agreed. A slightly better case could be made for result = x `|| y `|| z; Except by those who suffer FIABCB (font-induced apostrophe/backtick character blindness). The reason we originally munged with the signature was so that we could do weird things with differing numbers of streams on the left and the right. But if you really want a way to take 3 from x, then 3 from y, then 3 from z, there should be something equivalent to: for round_robin_by_3s(x, y, z) - $x, $y, $z { ... } Or perhaps just: sub take(int $n, *from) { yield splice from, 0, $n while from $n; return ( from, undef xx ($n-from) ) } three = take.assuming(n=3); for three(x), three(y), three($z) - $x, $y, $z { ... } ??? Fooling around with signature syntax for that rare case is not worth it. This way, the Cfor won't have to know anything about the signature other than that it expects 3 scalar arguments. And Simon will be happ(y|ier) that we've removed an exception. and reinstituted the previous exception that a semicolon in an parameter list marks the start of optional parameters! :-) Damian
Re: UTF-8 and Unicode FAQ, demos
[EMAIL PROTECTED] (Matthew Zimmerman) writes: Larry has been consistently using OxAB op 0xBB in his messages to represent a (French quote) hyperop, (corresponding to the Unicode characters 0x00AB and 0x00BB) More and more conversations like this, (and how many have we seen here already?) about characters sets, encodings, mail quoting issues, in fact, anything other than Perl, will be rife on every Perl-related mailing list if we persist with this idiotic idea of having Unicode operators. -- Irrigation of the land with seawater desalinated by fusion power is ancient. It's called 'rain'. -- Michael McClary, in alt.fusion
Re: UTF-8 and Unicode FAQ, demos
On 2 Nov 2002 at 0:06, Simon Cozens wrote: [EMAIL PROTECTED] (Matthew Zimmerman) writes: Larry has been consistently using OxAB op 0xBB in his messages to represent a (French quote) hyperop, (corresponding to the Unicode characters 0x00AB and 0x00BB) More and more conversations like this, (and how many have we seen here already?) about characters sets, encodings, mail quoting issues, in fact, anything other than Perl, will be rife on every Perl-related mailing list if we persist with this idiotic idea of having Unicode operators. It may seem idiotic to the egocentric people who only needs chars a-z in his language. But for all others (think about Chinese), Unicode is real asset. -- Markus Laire 'malaire' [EMAIL PROTECTED]
Re: UTF-8 and Unicode FAQ, demos
From: Markus Laire [EMAIL PROTECTED] Date: Sat, 02 Nov 2002 14:44:39 +0200 On 2 Nov 2002 at 0:06, Simon Cozens wrote: More and more conversations like this, (and how many have we seen here already?) about characters sets, encodings, mail quoting issues, in fact, anything other than Perl, will be rife on every Perl-related mailing list if we persist with this idiotic idea of having Unicode operators. It may seem idiotic to the egocentric people who only needs chars a-z in his language. But for all others (think about Chinese), Unicode is real asset. I don't think anyone's arguing that unicode shouldn't be in the language. I am all for allowing people to define their own unicode operators and such. I just don't think it should be in the core. I do most of my work over an ssh connection to my favorite server, through gnome-terminal. gnome-terminal does not support unicode, so this whole thread has been filled with ?'s and \251's. I can't see a thing... And I'm a mostly typical geek. I _finally_ got unicode working in Emacs, though it was not easy. I still haven't any idea how to type these things, just look at them. Think about how much trouble a less-geeky-than-I person would have. We _want_ the world to be unicode compatible, for sure. But having a useful operator in unicode isn't quite the answer. Rather than fixing their boxes to work with unicode, like we on this list would, they simply wouldn't use the operator. I don't quite think this is the desired effect. I'm fine with having tolerable synonyms. Vector plus shouldn't be `[+] but I'm okay with it being ^[+] or some such. The only thing to think about there is what will happen when someone writes in unicode, then someone comes along in maintainance without a unicode-compatible editor. It will surely be perplexing to see vector plus written ?+?. Of course, this is equivalent to the problem of unicode variable names, so the point is moot. Luke
Re: UTF-8 and Unicode FAQ, demos
On Sat, Nov 02, 2002 at 06:07:34AM -0700, Luke Palmer wrote: I do most of my work over an ssh connection to my favorite server, through gnome-terminal. gnome-terminal does not support unicode, so this whole thread has been filled with ?'s and \251's. I can't see a thing... gnome-terminal does support unicode. For the gnome1 version: - select a font in iso10646-1 encoding - set at least LC_CTYPE to something like en_US.UTF-8. At least in Debian GNU/Linux you might also have to dpkg-reconfigure locales to actually enable that locale to be generated - echo -n ^[%G inside the terminal, where ^[ is a literal escape character (type it as Control-V Control-[) For the gnome2 version: - set at least LC_CTYPE to something like en_US.UTF-8. - start a new gnome terminal. If you already have one running with a different locale setting, you might have to run it as gnome-terminal --disable-factory This is enough to run mutt and (with the right font, like misc-fixed) read almost any correctly tagged Asian spam! -- Bart.
Re: UTF-8 and Unicode FAQ, demos
On Friday, November 1, 2002, at 04:06 PM, Simon Cozens wrote: More and more conversations like this, (and how many have we seen here already?) about characters sets, encodings, mail quoting issues, in fact, anything other than Perl, will be rife on every Perl-related mailing list if we persist with this idiotic idea of having Unicode operators. You keep saying or suggesting that the idea of using Unicode operators is idiotic. Perhaps you could make an argument in support that assertion (as Luke and Paul have done). I for one would be interested to hear your reasoning. Regards, David -- David Wheeler AIM: dwTheory [EMAIL PROTECTED] ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: [EMAIL PROTECTED]
Re: UTF-8 and Unicode FAQ, demos
On Sat, Nov 02, 2002 at 12:06:07AM +, Simon Cozens wrote: More and more conversations like this, (and how many have we seen here already?) about characters sets, encodings, mail quoting issues, in fact, anything other than Perl, will be rife on every Perl-related mailing list if we persist with this idiotic idea of having Unicode operators. There will certainly be some pain in breaking out of ASCII. It might well be idiotic now, but I don't think it will be idiotic in ten years. And I am quite willing to deal with a certain amount of short-term crap on behalf of the future. Larry
Re: UTF-8 and Unicode FAQ, demos
On 2002.11.01 19:06 Simon Cozens wrote: More and more conversations like this, (and how many have we seen here already?) about characters sets, encodings, mail quoting issues, in fact, anything other than Perl, will be rife on every Perl-related mailing list if we persist with this idiotic idea of having Unicode operators. I don't really want Unicode operators either, but if it is decided that there will be such operators, I would still _want_to_know_how_to_use_them_. So let me make my original question a little more general: are Perl 6 source files encoded in Latin-1, UTF-8, or will Perl 6 provide some sort of translation mechanism, like specifying the charset on the command line? -- Matt Matthew Zimmerman Interdisciplinary Biophysics, University of Virginia http://www.people.virginia.edu/~mdz4c/
Re: UTF-8 and Unicode FAQ, demos
Larry has been consistently using OxAB op 0xBB in his messages to represent a (French quote) hyperop, (corresponding to the Unicode characters 0x00AB and 0x00BB) which is consistent with the iso-8859-1 encoding (despite the fact that my mailserver or his mailer insists on labelling those messages as UTF-8). However, the UTF-8 encoding of those Unicode characters actually is: 0xC2AB op 0xC2BB .. As far as I understand it, the UTF-8 encoding only allows single byte representations of characters if they fall in the 0x00 to 0x7F range. So the question is, if I'm writing a program and I actually want to use one of these ops, do I put 0xAB op 0xBB or 0xC2AB op 0xC2BB ? -- Matt, who'd never thought he'd have to do hex dumps to debug his Perl programs ;) -- Matthew Zimmerman Interdisciplinary Biophysics, University of Virginia http://www.people.virginia.edu/~mdz4c/
UTF-8 and Unicode FAQ, demos
Here is an extensive FAQ for Unicode and UTF-8: http://www.cl.cam.ac.uk/~mgk25/unicode.html and here is a test file that will show you how many of the most common glyphs (WGL4, via Microsoft) you are capable of displaying in your current setup: http://www.cl.cam.ac.uk/~mgk25/ucs/wgl4.txt A reduced list of interesting characters is as follows. Note that I may not be sending them all correctly, as not all of them are available on OSX. And that not all interesting characters are a part of the WGL4 set. 00AB # « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 00BB # » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 00AC # ¬ NOT SIGN 2202 # ∂ PARTIAL DIFFERENTIAL 2206 # ∆ INCREMENT 220F # ∏ N-ARY PRODUCT 2211 # ∑ N-ARY SUMMATION 2219 # ÅE BULLET OPERATOR 221A # √ SQUARE ROOT 221E # ∞ INFINITY 221F # àA RIGHT ANGLE 2229 # Åø INTERSECTION 222B # ∫ INTEGRAL 2248 # ≈ ALMOST EQUAL TO 2260 # ≠ NOT EQUAL TO 2261 # Åfl IDENTICAL TO 2264 # ≤ LESS-THAN OR EQUAL TO 2265 # ≥ GREATER-THAN OR EQUAL TO 00D7 # Å~ MULTIPLICATION SIGN 00F7 # ÷ DIVISION SIGN 00B0 # ° DEGREE SIGN 00B1 # ± PLUS-MINUS SIGN 00B5 # µ MICRO SIGN 00B6 # ¶ PILCROW SIGN 2020 # † DAGGER 2021 # ‡ DOUBLE DAGGER 2022 # • BULLET 2026 # … HORIZONTAL ELLIPSIS 2030 # ‰ PER MILLE SIGN 00A1 # ¡ INVERTED EXCLAMATION MARK 00A2 # ¢ CENT SIGN 00A3 # £ POUND SIGN 00A4 # °Ë CURRENCY SIGN 00A5 # ¥ YEN SIGN 00A6 # ∫ BROKEN BAR 00A7 # § SECTION SIGN 00A8 # ¨ DIAERESIS 00A9 # © COPYRIGHT SIGN 00AA # ª FEMININE ORDINAL INDICATOR 00AD # ú SOFT HYPHEN 00AE # ® REGISTERED SIGN 00AF # ¯ MACRON 00B2 # ©˜ SUPERSCRIPT TWO 00B3 # ©¯ SUPERSCRIPT THREE 00B7 # · MIDDLE DOT 00B8 # ¸ CEDILLA 00BA # º MASCULINE ORDINAL INDICATOR 00BF # ¿ INVERTED QUESTION MARK 203C # ßÑ DOUBLE EXCLAMATION MARK 02C7 # ˇ CARON 02D8 # ˘ BREVE 02D9 # ˙ DOT ABOVE 02DA # ˚ RING ABOVE 2122 # ™ TRADE MARK SIGN 2126 # Ω OHM SIGN 2190 # Å© LEFTWARDS ARROW 2191 # Å™ UPWARDS ARROW 2192 # Å® RIGHTWARDS ARROW 2193 # Å´ DOWNWARDS ARROW 2194 # °Í LEFT RIGHT ARROW 2195 # ¢’ UP DOWN ARROW 0391 # Éü GREEK CAPITAL LETTER ALPHA 0392 # Ɇ GREEK CAPITAL LETTER BETA 0393 # É° GREEK CAPITAL LETTER GAMMA 0394 # É¢ GREEK CAPITAL LETTER DELTA 0395 # É£ GREEK CAPITAL LETTER EPSILON 0396 # ɧ GREEK CAPITAL LETTER ZETA 0397 # É• GREEK CAPITAL LETTER ETA 0398 # ɶ GREEK CAPITAL LETTER THETA 0399 # Éß GREEK CAPITAL LETTER IOTA 039A # É® GREEK CAPITAL LETTER KAPPA 039B # É© GREEK CAPITAL LETTER LAMDA 039C # É™ GREEK CAPITAL LETTER MU 039D # É´ GREEK CAPITAL LETTER NU 039E # ɨ GREEK CAPITAL LETTER XI 039F # É≠ GREEK CAPITAL LETTER OMICRON 03A0 # ÉÆ GREEK CAPITAL LETTER PI 03A1 # ÉØ GREEK CAPITAL LETTER RHO 03A3 # É∞ GREEK CAPITAL LETTER SIGMA 03A4 # ɱ GREEK CAPITAL LETTER TAU 03A5 # É≤ GREEK CAPITAL LETTER UPSILON 03A6 # É≥ GREEK CAPITAL LETTER PHI 03A7 # É¥ GREEK CAPITAL LETTER CHI 03A8 # ɵ GREEK CAPITAL LETTER PSI 03A9 # Ω GREEK CAPITAL LETTER OMEGA 03B1 # Éø GREEK SMALL LETTER ALPHA 03B2 # É¿ GREEK SMALL LETTER BETA 03B3 # É¡ GREEK SMALL LETTER GAMMA 03B4 # ɬ GREEK SMALL LETTER DELTA 03B5 # É√ GREEK SMALL LETTER EPSILON 03B6 # Ƀ GREEK SMALL LETTER ZETA 03B7 # É≈ GREEK SMALL LETTER ETA 03B8 # É∆ GREEK SMALL LETTER THETA 03B9 # É« GREEK SMALL LETTER IOTA 03BA # É» GREEK SMALL LETTER KAPPA 03BB # É… GREEK SMALL LETTER LAMDA 03BC # É GREEK SMALL LETTER MU 03BD # ÉÀ GREEK SMALL LETTER NU 03BE # Éà GREEK SMALL LETTER XI 03BF # ÉÕ GREEK SMALL LETTER OMICRON 03C0 # π GREEK SMALL LETTER PI 03C1 # Éœ GREEK SMALL LETTER RHO 03C2 # V GREEK SMALL LETTER FINAL SIGMA 03C3 # É– GREEK SMALL LETTER SIGMA 03C4 # É— GREEK SMALL LETTER TAU 03C5 # É“ GREEK SMALL LETTER UPSILON 03C6 # É” GREEK SMALL LETTER PHI 03C7 # É‘ GREEK SMALL LETTER CHI 03C8 # É’ GREEK SMALL LETTER PSI 03C9 # É÷ GREEK SMALL LETTER OMEGA MikeL
Re: UTF-8 and Unicode FAQ, demos
And if you really want to drool at all the neat glyphs that the wonderful, magical world of math has given us, check out: http://www.unicode.org/charts/PDF/U2A00.pdf now *theres* some brackets! MikeL
Re: UTF-8 and Unicode FAQ, demos
Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Date: Thu, 31 Oct 2002 10:11:00 -0800 From: Michael Lazzaro [EMAIL PROTECTED] X-SMTPD: qpsmtpd/0.12, http://develooper.com/code/qpsmtpd/ And if you really want to drool at all the neat glyphs that the wonderful, magical world of math has given us, check out: http://www.unicode.org/charts/PDF/U2A00.pdf now *theres* some brackets! Ooh! Let's use 2AF7 and 2AF8 for qw! MikeL
Re: UTF-8 and Unicode FAQ, demos
--- Luke Palmer [EMAIL PROTECTED] wrote: And if you really want to drool at all the neat glyphs that the wonderful, magical world of math has given us, check out: http://www.unicode.org/charts/PDF/U2A00.pdf now *theres* some brackets! Ooh! Let's use 2AF7 and 2AF8 for qw! Frankly, I don't know HOW we've lived for so long without larger than and smaller than operators. =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/