Re: Semantics of vector operations
Luke Palmer wrote: But I'm still sure that the unicode-deficient would rather write: I suspect the unicode-deficient would rather write Ruby. Adding unicode operators to Perl will just reinforce its reputation as a line noise language. I know it has been said before, and I'm sure it will be said again, but this is a really bad idea, IMHO. Sure, make Perl Unicode compliant, right down to variable and operator names. But don't make people spend an afternoon messing around with mutt, vim, emacs and all the other tools they use, just so that they can read, write, email and print Perl programs correctly. A
Compiler writing tools
I've been writing a lot of compiler recently, and figuring as how Perl 6 is aiming to replace yacc, I think I'll share some of my positive and negative experiences. Perhaps Perl 6 can adjust itself to help me out a bit. :-) =over =item * RegCounter I have a class called RegCounter which is of immense use, but could be possibly more elegant. It's a tied hash that, upon access, generates a new name and stores it in a table for later retrieval under the same name. It has a method called Cnext that returns a new RegCounter that shares the same counter, and puts whatever was in that one's ret slot into whatever argument was given to Cnext, by default next. The first [^a-z] characters in the name are passed along to the generated register name, defaulting to a target-specific string (for instance, I use $P for Parrot programs). So I can do, for instance: method if_statement::code($rc) { # $rc is the regcounter self.item[0].code($rc.next('condition')) ~ unless $rc{condition}, $rc{Lfalse}\n ~ self.item[1].code($rc.next) ~ $rc{Lfalse}:\n } =item * Concatenations The code example you just saw gets much, much uglier if there is added complexity. One of my compilers returns lists of lines, the other concatenates strings, and they're both pretty hard to read -- especially when there are heredocs all over the place (which happens frequently). I think $() will help somewhat, as will interpolating method calls, but for a compiler, I'd really like PHP-like parse switching. That is, I could do something like (I'll use $ and $ for ? and ?): method logical_or_expression::code($rc) { EOC; null $rc{ret} $ for @($self.item[0]) - $item { $ $item.code($rc.next) if $rc{next}, $rc{Ldone} $ } $ $rc{Ldone}: EOC } For this case, I think it would also be a good idea to have a string implementation somewhere that stores things as ropes, a list of strings, so that immense copying isn't necessary. =item * Comments We've already gone over this, but it'd be good to have the ability for parsers to (somehow) feed into one another, so that you can do comments without putting a comment in between every grammar rule (or mangling things to do that somehow), or search and replace, which has the disadvantage of being unable to disable comments during parts of the parse. $Parse::RecDescent::skip works well, but I don't think it's general enough. =item * Line Counting It is Iessential that the regex engine is capable (perhaps off by default) of keeping track of your line number. =back Luke
Re: Semantics of vector operations
[EMAIL PROTECTED] (Andy Wardley) writes: Sure, make Perl Unicode compliant, right down to variable and operator names. But don't make people spend an afternoon messing around with mutt, vim, emacs and all the other tools they use, just so that they can read, write, email and print Perl programs correctly. To be honest, I don't think that'll be a problem, but only because by the time Perl 6 is widely deployed, people will have got themselves sorted out as far as Unicode's concerned. I suspect similar things were said when C decided to use 7 bit characters. That doesn't mean I think Unicode operators are a good idea, of course. -- When in doubt, print 'em out. -- Karl's Programming Proverb 0x7
Re: Compiler writing tools
Luke Palmer wrote: I think $() will help somewhat, as will interpolating method calls, but for a compiler, I'd really like PHP-like parse switching. That is, I could do something like (I'll use $ and $ for ? and ?): Check out the new scanner module for Template Toolkit v3. It does this exactly that. It allows you to specify as many different tag styles as you like and uses a composite regex to locate them in a source document. It extracts the intervening text, and then calls back to your code to do whatever you like with them. It takes care of the surrounding text and handles things like counting line numbers so that you don't have to worry about it. The code is still in development so you'll need to get it from CVS. See: http://tt3.template-toolkit.org/code.html Everything is raw and undocumented, but examples/scanner.pl shows an example of what you want to do. Be warned that I'm working on this right now, so things are changing often. Having said that, the scanner is pretty much stable, although the handler object that it interacts with isn't. A
Re: Semantics of vector operations
--- Andy Wardley [EMAIL PROTECTED] wrote: Adding unicode operators to Perl will just reinforce its reputation as a line noise language. Perl6, the language with *real* runes. Come to think of it, some of the ogham runes would look more incharacter as a 'distribute' operator than guillemets... :-) More seriously, what about things live 'combining characters', eg U20D0 (vector indication); U0307 (derivative)? Alex __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/
Re: Semantics of vector operations
On Mon, Feb 02, 2004 at 09:59:50AM +, Simon Cozens wrote: [EMAIL PROTECTED] (Andy Wardley) writes: Sure, make Perl Unicode compliant, right down to variable and operator names. But don't make people spend an afternoon messing around with mutt, vim, emacs and all the other tools they use, just so that they can read, write, email and print Perl programs correctly. To be honest, I don't think that'll be a problem, but only because by the time Perl 6 is widely deployed, people will have got themselves sorted out as far as Unicode's concerned. I suspect similar things were said when C decided to use 7 bit characters. Don't be so sure. I've been seeing the and characters properly sometimes, as ??? sometimes, and I think there were some other variants (maybe for other extended characters) - depending upon whether I'm reading the messages locally at home or remotely through a terminal emulator. Those emulators are not about to be replaced for any other reason in the near future. I'll be able to work it out if I have to, but it'll be an annoyance, and probably one that shows up many times with different bits of software, and often those bits will not be under my control and will have to be worked around rather than fixed. (In the canine-ical sense, it is the current software that is fixed, i.e. it has limited functionality.) That doesn't mean I think Unicode operators are a good idea, of course. They will cause problems for sure.
Re: Semantics of vector operations
Alex Burr writes: --- Andy Wardley [EMAIL PROTECTED] wrote: Adding unicode operators to Perl will just reinforce its reputation as a line noise language. Perl6, the language with *real* runes. Come to think of it, some of the ogham runes would look more incharacter as a 'distribute' operator than guillemets... :-) More seriously, what about things live 'combining characters', eg U20D0 (vector indication); U0307 (derivative)? Those are fair game for modules, but they won't be in the core because they're not in latin-1. Luke
Re: Semantics of vector operations
On Mon, Feb 02, 2004 at 01:14:48PM -0500, John Macdonald wrote: : On Mon, Feb 02, 2004 at 09:59:50AM +, Simon Cozens wrote: : [EMAIL PROTECTED] (Andy Wardley) writes: : Sure, make Perl Unicode compliant, right down to variable and operator : names. But don't make people spend an afternoon messing around with mutt, : vim, emacs and all the other tools they use, just so that they can read, : write, email and print Perl programs correctly. : : To be honest, I don't think that'll be a problem, but only because by the : time Perl 6 is widely deployed, people will have got themselves sorted out : as far as Unicode's concerned. I suspect similar things were said when C : decided to use 7 bit characters. : : Don't be so sure. I've been seeing the and : characters properly sometimes, as ??? sometimes, : and I think there were some other variants (maybe for : other extended characters) - depending upon whether : I'm reading the messages locally at home or remotely : through a terminal emulator. Those emulators are : not about to be replaced for any other reason in the : near future. Well, sure. But what we're trying to optimize here is specifically not the near future. : I'll be able to work it out if I have to, but it'll : be an annoyance, and probably one that shows up : many times with different bits of software, and : often those bits will not be under my control and : will have to be worked around rather than fixed. : (In the canine-ical sense, it is the current software : that is fixed, i.e. it has limited functionality.) : : That doesn't mean I think Unicode operators are a good idea, of course. : : They will cause problems for sure. No question about that. But Unicode is addressing (or attempting to address) a basic unreducable complexity of the world, and I'm not willing to sweep that complexity under someone else's carpet for the purposes of short-term anaesthesia. I expect that over the long term people will learn to use Unicode in moderation, after a short period of (over)exuberant experimentation. As a temporary measure (where temporary is measured in years), I'd suggest Unicode declarations include an Cis ASCII('[EMAIL PROTECTED]') trait. Larry
Re: Semantics of vector operations
On Mon, Feb 02, 2004 at 11:44:17AM -0700, Luke Palmer wrote: : Alex Burr writes: : --- Andy Wardley [EMAIL PROTECTED] wrote: : : Adding unicode operators to Perl will just reinforce : its reputation as : a line noise language. : : Perl6, the language with *real* runes. : : Come to think of it, some of the ogham runes would : look more incharacter as a 'distribute' operator than : guillemets... :-) : : More seriously, what about things live 'combining : characters', eg U20D0 (vector indication); : U0307 (derivative)? : : Those are fair game for modules, but they won't be in the core because : they're not in latin-1. Yes, that's the policy, at least for 6.0.0. Once everyone's on the Unicode bandwagon (I realize we're talking years here), we can think about relaxing that. That being said, we can potentially use × U+00D7 MULTIPLICATION SIGN. (Though my vim can't seem to decide whether it's a single-width or a double-width character, urgh...) By the way here's a program called uni that greps the Unicode characters: #!/usr/bin/perl binmode STDOUT, :utf8; $pat = @ARGV; @names = split /^/, do 'unicore/Name.pl'; for (@names) { if (/$pat/io) { $hex = hex($_); print chr($hex),\t,$_; } } Sorry if I posted that before, but it's a really useful little beastie. Larry
Re: Semantics of vector operations
On Feb 2, 2004, at 5:20 PM, Larry Wall wrote: That being said, we can potentially use × U+00D7 MULTIPLICATION SIGN. (Though my vim can't seem to decide whether it's a single-width or a double-width character, urgh...) I realize this is a tad OT, but can anyone tell me how I can get Emacs to properly display Unicode characters? I expect that others on the list could benefit, too. Cheers, David
Re: Compiler writing tools
On Mon, Feb 02, 2004 at 02:09:33AM -0700, Luke Palmer wrote: : I've been writing a lot of compiler recently, and figuring as how Perl : 6 is aiming to replace yacc, I think I'll share some of my positive and : negative experiences. Perhaps Perl 6 can adjust itself to help me out : a bit. :-) Perl 6 is designed to be adjusted, but it would be quite an AI feat for it to adjust itself. :-) : =over : : =item * RegCounter : : I have a class called RegCounter which is of immense use, but could be : possibly more elegant. It's a tied hash that, upon access, generates a : new name and stores it in a table for later retrieval under the same : name. : : It has a method called Cnext that returns a new RegCounter that shares : the same counter, and puts whatever was in that one's ret slot into : whatever argument was given to Cnext, by default next. : : The first [^a-z] characters in the name are passed along to the : generated register name, defaulting to a target-specific string (for : instance, I use $P for Parrot programs). : : So I can do, for instance: : : method if_statement::code($rc) { # $rc is the regcounter : self.item[0].code($rc.next('condition')) : ~ unless $rc{condition}, $rc{Lfalse}\n : ~ self.item[1].code($rc.next) : ~ $rc{Lfalse}:\n : } What do you want Perl 6 to do for you here? : =item * Concatenations : : The code example you just saw gets much, much uglier if there is added : complexity. One of my compilers returns lists of lines, the other : concatenates strings, and they're both pretty hard to read -- especially : when there are heredocs all over the place (which happens frequently). : : I think $() will help somewhat, as will interpolating method calls, but : for a compiler, I'd really like PHP-like parse switching. That is, I : could do something like (I'll use $ and $ for ? and ?): : : method logical_or_expression::code($rc) { : EOC; : null $rc{ret} : $ for @($self.item[0]) - $item { $ : $item.code($rc.next) : if $rc{next}, $rc{Ldone} : $ } $ : $rc{Ldone}: : EOC : } This seems to me to fall into the category of useful language warpings, but not necessarily for mandatory public consumption. String literals are parsed by the main parser in Perl 6, unlike in Perl 5. So a grammatical munging should be doable. All is fair if you predeclare and all that... By the way, the first production language I ever wrote was an inside-out language where control commands were embedded in text that was to be output by default. So I'm not knocking your proposal. : For this case, I think it would also be a good idea to have a string : implementation somewhere that stores things as ropes, a list of : strings, so that immense copying isn't necessary. Well, I suggested something like this early in the design of Parrot, but it doesn't seem to have flown in the general case. On the other hand, the string abstraction ought to be big enough to hide alternate implementations behind it. The whole is from notion is built on that idea. : =item * Comments : : We've already gone over this, but it'd be good to have the ability for : parsers to (somehow) feed into one another, so that you can do : comments without putting a comment in between every grammar rule (or : mangling things to do that somehow), or search and replace, which has : the disadvantage of being unable to disable comments during parts of the : parse. $Parse::RecDescent::skip works well, but I don't think it's : general enough. Agreed. I do think you want the comments in the grammar, if for no other reason than it provides a hook to do something with the comment if you retarget the grammar from normal compilation to, say, code translation. I don't think it's out of the realm of possibility for Perl 6 to support strings with embedded objects as funny characters. In the limit, a string could be composed of nothing but a stream of objects. (As a hack, one can embed illegal Unicode characters (above U+10) that map an integer to an array of objects, but maybe we can do better from a GC perspective.) : =item * Line Counting : : It is Iessential that the regex engine is capable (perhaps off by : default) of keeping track of your line number. By all means! A compiler must absolutely never emit an inaccurate line number if it can help it. Few things are as irritating as ...bailing out near line 100. If we don't provide an explicit lexical analysis pass that handles this, then the regex engine must somehow. Though I haven't really thought much about the *how* part of the somehow. Larry