Re: Semantics of vector operations

2004-02-02 Thread Andy Wardley
Luke Palmer wrote:
 But I'm still sure that the unicode-deficient would rather write:

I suspect the unicode-deficient would rather write Ruby.

Adding unicode operators to Perl will just reinforce its reputation as
a line noise language.

I know it has been said before, and I'm sure it will be said again,
but this is a really bad idea, IMHO.  

Sure, make Perl Unicode compliant, right down to variable and operator 
names.  But don't make people spend an afternoon messing around with mutt, 
vim, emacs and all the other tools they use, just so that they can read, 
write, email and print Perl programs correctly.

A



Compiler writing tools

2004-02-02 Thread Luke Palmer
I've been writing a lot of compiler recently, and figuring as how Perl
6 is aiming to replace yacc, I think I'll share some of my positive and
negative experiences.   Perhaps Perl 6 can adjust itself to help me out
a bit.  :-)

=over

=item * RegCounter

I have a class called RegCounter which is of immense use, but could be
possibly more elegant.  It's a tied hash that, upon access, generates a
new name and stores it in a table for later retrieval under the same
name.  

It has a method called Cnext that returns a new RegCounter that shares
the same counter, and puts whatever was in that one's ret slot into
whatever argument was given to Cnext, by default next.

The first [^a-z] characters in the name are passed along to the
generated register name, defaulting to a target-specific string (for
instance, I use $P for Parrot programs).

So I can do, for instance:

method if_statement::code($rc) { # $rc is the regcounter
  self.item[0].code($rc.next('condition'))
~ unless $rc{condition}, $rc{Lfalse}\n
~ self.item[1].code($rc.next)
~ $rc{Lfalse}:\n
}

=item * Concatenations

The code example you just saw gets much, much uglier if there is added
complexity.  One of my compilers returns lists of lines, the other
concatenates strings, and they're both pretty hard to read -- especially
when there are heredocs all over the place (which happens frequently).

I think $() will help somewhat, as will interpolating method calls, but
for a compiler, I'd really like PHP-like parse switching.  That is, I
could do something like (I'll use $ and $ for ? and ?):

method logical_or_expression::code($rc) {
EOC;
null $rc{ret}
$ for @($self.item[0]) - $item { $
  $item.code($rc.next)
  if $rc{next}, $rc{Ldone}
$ } $
$rc{Ldone}:
EOC
}

For this case, I think it would also be a good idea to have a string
implementation somewhere that stores things as ropes, a list of
strings, so that immense copying isn't necessary.

=item * Comments

We've already gone over this, but it'd be good to have the ability for
parsers to (somehow) feed into one another, so that you can do
comments without putting a comment in between every grammar rule (or
mangling things to do that somehow), or search and replace, which has
the disadvantage of being unable to disable comments during parts of the
parse.  $Parse::RecDescent::skip works well, but I don't think it's
general enough.

=item * Line Counting

It is Iessential that the regex engine is capable (perhaps off by
default) of keeping track of your line number.

=back

Luke


Re: Semantics of vector operations

2004-02-02 Thread Simon Cozens
[EMAIL PROTECTED] (Andy Wardley) writes:
 Sure, make Perl Unicode compliant, right down to variable and operator 
 names.  But don't make people spend an afternoon messing around with mutt, 
 vim, emacs and all the other tools they use, just so that they can read, 
 write, email and print Perl programs correctly.

To be honest, I don't think that'll be a problem, but only because by the
time Perl 6 is widely deployed, people will have got themselves sorted out
as far as Unicode's concerned. I suspect similar things were said when C
decided to use 7 bit characters.

That doesn't mean I think Unicode operators are a good idea, of course.

-- 
When in doubt, print 'em out.
-- Karl's Programming Proverb 0x7


Re: Compiler writing tools

2004-02-02 Thread Andy Wardley
Luke Palmer wrote:
 I think $() will help somewhat, as will interpolating method calls, but
 for a compiler, I'd really like PHP-like parse switching.  That is, I
 could do something like (I'll use $ and $ for ? and ?):

Check out the new scanner module for Template Toolkit v3.  It does this
exactly that.  It allows you to specify as many different tag styles as
you like and uses a composite regex to locate them in a source document.  
It extracts the intervening text, and then calls back to your code to do 
whatever you like with them.  It takes care of the surrounding text and 
handles things like counting line numbers so that you don't have to worry 
about it.

The code is still in development so you'll need to get it from CVS.  See:

  http://tt3.template-toolkit.org/code.html

Everything is raw and undocumented, but examples/scanner.pl shows an 
example of what you want to do.  Be warned that I'm working on this
right now, so things are changing often.  Having said that, the scanner 
is pretty much stable, although the handler object that it interacts
with isn't.

A



Re: Semantics of vector operations

2004-02-02 Thread Alex Burr

--- Andy Wardley [EMAIL PROTECTED] wrote:

 Adding unicode operators to Perl will just reinforce
 its reputation as
 a line noise language.

Perl6, the language with *real* runes.

Come to think of it, some of the ogham runes would
look more incharacter as a 'distribute' operator than
guillemets... :-)

More seriously, what about things live 'combining
characters', eg U20D0 (vector indication); 
U0307 (derivative)?

Alex


__
Do you Yahoo!?
Yahoo! SiteBuilder - Free web site building tool. Try it!
http://webhosting.yahoo.com/ps/sb/


Re: Semantics of vector operations

2004-02-02 Thread John Macdonald
On Mon, Feb 02, 2004 at 09:59:50AM +, Simon Cozens wrote:
 [EMAIL PROTECTED] (Andy Wardley) writes:
  Sure, make Perl Unicode compliant, right down to variable and operator 
  names.  But don't make people spend an afternoon messing around with mutt, 
  vim, emacs and all the other tools they use, just so that they can read, 
  write, email and print Perl programs correctly.
 
 To be honest, I don't think that'll be a problem, but only because by the
 time Perl 6 is widely deployed, people will have got themselves sorted out
 as far as Unicode's concerned. I suspect similar things were said when C
 decided to use 7 bit characters.

Don't be so sure.  I've been seeing the  and 
characters properly sometimes, as ??? sometimes,
and I think there were some other variants (maybe for
other extended characters) - depending upon whether
I'm reading the messages locally at home or remotely
through a terminal emulator.  Those emulators are
not about to be replaced for any other reason in the
near future.

I'll be able to work it out if I have to, but it'll
be an annoyance, and probably one that shows up
many times with different bits of software, and
often those bits will not be under my control and
will have to be worked around rather than fixed.
(In the canine-ical sense, it is the current software
that is fixed, i.e.  it has limited functionality.)

 That doesn't mean I think Unicode operators are a good idea, of course.

They will cause problems for sure.


Re: Semantics of vector operations

2004-02-02 Thread Luke Palmer
Alex Burr writes:
 --- Andy Wardley [EMAIL PROTECTED] wrote:
 
  Adding unicode operators to Perl will just reinforce
  its reputation as
  a line noise language.
 
 Perl6, the language with *real* runes.
 
 Come to think of it, some of the ogham runes would
 look more incharacter as a 'distribute' operator than
 guillemets... :-)
 
 More seriously, what about things live 'combining
 characters', eg U20D0 (vector indication); 
 U0307 (derivative)?

Those are fair game for modules, but they won't be in the core because
they're not in latin-1. 

Luke


Re: Semantics of vector operations

2004-02-02 Thread Larry Wall
On Mon, Feb 02, 2004 at 01:14:48PM -0500, John Macdonald wrote:
: On Mon, Feb 02, 2004 at 09:59:50AM +, Simon Cozens wrote:
:  [EMAIL PROTECTED] (Andy Wardley) writes:
:   Sure, make Perl Unicode compliant, right down to variable and operator 
:   names.  But don't make people spend an afternoon messing around with mutt, 
:   vim, emacs and all the other tools they use, just so that they can read, 
:   write, email and print Perl programs correctly.
:  
:  To be honest, I don't think that'll be a problem, but only because by the
:  time Perl 6 is widely deployed, people will have got themselves sorted out
:  as far as Unicode's concerned. I suspect similar things were said when C
:  decided to use 7 bit characters.
: 
: Don't be so sure.  I've been seeing the  and 
: characters properly sometimes, as ??? sometimes,
: and I think there were some other variants (maybe for
: other extended characters) - depending upon whether
: I'm reading the messages locally at home or remotely
: through a terminal emulator.  Those emulators are
: not about to be replaced for any other reason in the
: near future.

Well, sure.  But what we're trying to optimize here is specifically
not the near future.

: I'll be able to work it out if I have to, but it'll
: be an annoyance, and probably one that shows up
: many times with different bits of software, and
: often those bits will not be under my control and
: will have to be worked around rather than fixed.
: (In the canine-ical sense, it is the current software
: that is fixed, i.e.  it has limited functionality.)
: 
:  That doesn't mean I think Unicode operators are a good idea, of course.
: 
: They will cause problems for sure.

No question about that.  But Unicode is addressing (or attempting
to address) a basic unreducable complexity of the world, and I'm not
willing to sweep that complexity under someone else's carpet for
the purposes of short-term anaesthesia.  I expect that over the long
term people will learn to use Unicode in moderation, after a short
period of (over)exuberant experimentation.

As a temporary measure (where temporary is measured in years), I'd
suggest Unicode declarations include an Cis ASCII('[EMAIL PROTECTED]') trait.

Larry


Re: Semantics of vector operations

2004-02-02 Thread Larry Wall
On Mon, Feb 02, 2004 at 11:44:17AM -0700, Luke Palmer wrote:
: Alex Burr writes:
:  --- Andy Wardley [EMAIL PROTECTED] wrote:
:  
:   Adding unicode operators to Perl will just reinforce
:   its reputation as
:   a line noise language.
:  
:  Perl6, the language with *real* runes.
:  
:  Come to think of it, some of the ogham runes would
:  look more incharacter as a 'distribute' operator than
:  guillemets... :-)
:  
:  More seriously, what about things live 'combining
:  characters', eg U20D0 (vector indication); 
:  U0307 (derivative)?
: 
: Those are fair game for modules, but they won't be in the core because
: they're not in latin-1. 

Yes, that's the policy, at least for 6.0.0.  Once everyone's on the
Unicode bandwagon (I realize we're talking years here), we can think
about relaxing that.

That being said, we can potentially use ×  U+00D7 MULTIPLICATION SIGN.
(Though my vim can't seem to decide whether it's a single-width or a
double-width character, urgh...)

By the way here's a program called uni that greps the Unicode characters:

#!/usr/bin/perl
 
binmode STDOUT, :utf8;
$pat = @ARGV;
 
@names = split /^/, do 'unicore/Name.pl';
for (@names) {
if (/$pat/io) {
$hex = hex($_);
print chr($hex),\t,$_;
}
}

Sorry if I posted that before, but it's a really useful little beastie.

Larry


Re: Semantics of vector operations

2004-02-02 Thread David Wheeler
On Feb 2, 2004, at 5:20 PM, Larry Wall wrote:

That being said, we can potentially use ×  U+00D7 MULTIPLICATION SIGN.
(Though my vim can't seem to decide whether it's a single-width or a
double-width character, urgh...)
I realize this is a tad OT, but can anyone tell me how I can get Emacs 
to properly display Unicode characters? I expect that others on the 
list could benefit, too.

Cheers,

David



Re: Compiler writing tools

2004-02-02 Thread Larry Wall
On Mon, Feb 02, 2004 at 02:09:33AM -0700, Luke Palmer wrote:
: I've been writing a lot of compiler recently, and figuring as how Perl
: 6 is aiming to replace yacc, I think I'll share some of my positive and
: negative experiences.   Perhaps Perl 6 can adjust itself to help me out
: a bit.  :-)

Perl 6 is designed to be adjusted, but it would be quite an AI feat
for it to adjust itself.  :-)

: =over
: 
: =item * RegCounter
: 
: I have a class called RegCounter which is of immense use, but could be
: possibly more elegant.  It's a tied hash that, upon access, generates a
: new name and stores it in a table for later retrieval under the same
: name.  
: 
: It has a method called Cnext that returns a new RegCounter that shares
: the same counter, and puts whatever was in that one's ret slot into
: whatever argument was given to Cnext, by default next.
: 
: The first [^a-z] characters in the name are passed along to the
: generated register name, defaulting to a target-specific string (for
: instance, I use $P for Parrot programs).
: 
: So I can do, for instance:
: 
: method if_statement::code($rc) { # $rc is the regcounter
:   self.item[0].code($rc.next('condition'))
: ~ unless $rc{condition}, $rc{Lfalse}\n
: ~ self.item[1].code($rc.next)
: ~ $rc{Lfalse}:\n
: }

What do you want Perl 6 to do for you here?

: =item * Concatenations
: 
: The code example you just saw gets much, much uglier if there is added
: complexity.  One of my compilers returns lists of lines, the other
: concatenates strings, and they're both pretty hard to read -- especially
: when there are heredocs all over the place (which happens frequently).
: 
: I think $() will help somewhat, as will interpolating method calls, but
: for a compiler, I'd really like PHP-like parse switching.  That is, I
: could do something like (I'll use $ and $ for ? and ?):
: 
: method logical_or_expression::code($rc) {
: EOC;
: null $rc{ret}
: $ for @($self.item[0]) - $item { $
:   $item.code($rc.next)
:   if $rc{next}, $rc{Ldone}
: $ } $
: $rc{Ldone}:
: EOC
: }

This seems to me to fall into the category of useful language warpings,
but not necessarily for mandatory public consumption.  String literals
are parsed by the main parser in Perl 6, unlike in Perl 5.  So a
grammatical munging should be doable.  All is fair if you predeclare and
all that...

By the way, the first production language I ever wrote was an
inside-out language where control commands were embedded in text that
was to be output by default.  So I'm not knocking your proposal.

: For this case, I think it would also be a good idea to have a string
: implementation somewhere that stores things as ropes, a list of
: strings, so that immense copying isn't necessary.

Well, I suggested something like this early in the design of Parrot,
but it doesn't seem to have flown in the general case.  On the other
hand, the string abstraction ought to be big enough to hide alternate
implementations behind it.  The whole is from notion is built on that
idea.

: =item * Comments
: 
: We've already gone over this, but it'd be good to have the ability for
: parsers to (somehow) feed into one another, so that you can do
: comments without putting a comment in between every grammar rule (or
: mangling things to do that somehow), or search and replace, which has
: the disadvantage of being unable to disable comments during parts of the
: parse.  $Parse::RecDescent::skip works well, but I don't think it's
: general enough.

Agreed.  I do think you want the comments in the grammar, if for no
other reason than it provides a hook to do something with the comment
if you retarget the grammar from normal compilation to, say, code
translation.  I don't think it's out of the realm of possibility for
Perl 6 to support strings with embedded objects as funny characters.
In the limit, a string could be composed of nothing but a stream
of objects.  (As a hack, one can embed illegal Unicode characters
(above U+10) that map an integer to an array of objects, but
maybe we can do better from a GC perspective.)

: =item * Line Counting
: 
: It is Iessential that the regex engine is capable (perhaps off by
: default) of keeping track of your line number.

By all means!  A compiler must absolutely never emit an inaccurate line
number if it can help it.  Few things are as irritating as ...bailing
out near line 100.  If we don't provide an explicit lexical analysis
pass that handles this, then the regex engine must somehow.  Though I
haven't really thought much about the *how* part of the somehow.

Larry