[perl #32667] [TODO] IMCC - documentation needs updating

2007-01-17 Thread Allison Randal via RT
On Tue Jan 16 12:05:17 2007, [EMAIL PROTECTED] wrote:
> Attached patch adds new syntax documentation to docs/imcc/syntax.pod and
> fixes some typos there. It now also indicates where various flags are
> explained.

Applied in r16678.

> Is the shorthand syntax for function calls ("($P0, a :slurpy) = foo(3, b
> :flat)") clear, or can we better use examples there?

The shorthand directives section could probably use a (short)
description for each different syntax.

Thanks!
Allison


Re: [perl #41237] [TODO] PMC Class name IDs will require a dot in front

2007-01-17 Thread Allison Randal

Matt Diephouse wrote:

I actually prefer the dot. I don't like the possible ambiguity between
types and local variables:

   .local string MyClass
   MyClass = '...'
   $P0 = new MyClass # is this a type or a string?


At that point, what we're really talking about is sigils. So, why put 
sigils on types instead of putting them on variables? And is dot really 
the best sigil for types?



Capitalized variable names may be rare and/or bad practice, but it's
bound to happen. There was talk on #parrot about how this still
conflicts with macros, but those are rarer than variables.


If we're setting up a system to remove ambiguity, better to remove 
ambiguity entirely than move to a slightly less common ambiguity.



Also, if we decide that anything starting with a dot that doesn't have
parens is a type, I could write:

   $I0 = typeof $P0
   if $I0 == .Foo goto bar


You can do that already.


Klaas-Jan Stol wrote:
A dot also indicates that this is not pure PASM, but rather PIR. 


Except that the dot is required in PASM. Removing the dot was an added 
bit of PIR syntatic sugar, intended to make it more human-readable (and 
human-writable).


The dot 
implies the token is pre-processed by IMCC (the type is looked up during 
parsing, IIRC), which is, IMHO, more consistent with the other 
dot-prefixed tokens from PIR. 


Except it's not consistent. To a certain extent type IDs act like 
constants. You can:


  print .String

Or, you can create your own constant and print it (PASM here):

  .constant Foo 2
  print .Foo

But if you try to create a constant with the same name as a type ID, it 
is simply ignored:


  .constant String 1
  print .String

Prints "33" instead of the constant value "1".

It's an unfortunate conflict. (Not quite as unfortunate as the 
variablename/methodname conflict, but still pretty awful.)


Allison


Re: I/O PDD - ready for implementation

2007-01-17 Thread Nicholas Clark
On Tue, Jan 09, 2007 at 09:33:52AM -0800, Larry Wall wrote:

> The Perl 6 perspective on this is that error values should be allowed to
> be as "interesting" as you like.  The lower level routine goes ahead and
> pregenerates the exception object but returns it as an interesting
> error value instead of throwing it.  Then the calling code can just
> decide to throw the unthrown exception, or it can generate its own
> exception (that perhaps includes the unthrown exception).  In any case,
> you get a better error message if you include all the relevant facts
> from the lower-level routine, and those tend to get lost with scalar
> error values.  By returning an object you still a simple test to see
> whether there's an exception, but you're not limiting the information
> flow by assuming all the information passes through whatever scalar
> is functioning as the boolean value of "oops".
> 
> In any case, this would certainly make it easier to put Perl 6 on top.  :)

It would actually also make it easier to put Perl 5 done-with-hindsight on
top of :-)

One of the issues with writing IO layers in Perl 5 is that the existing
interface is defined in terms of Perl builtins that return undef on failure,
and set C<$!>, and in turn C<$!> is only allowed to hold a small vocabulary
of integer codes (typically around 100) which are defined by the operating
system, for use in reporting operating system level errors.

This works well on regular IO, talking direct to the operating system, but
goes pear shaped when you write an IO layer, and want to report an error
condition. You're forced to make a lossy mapping of your true error
condition (such as detecting an invalid character encoding or corrupt
compressed data) into the least inappropriate errno value. It would be much
nicer to have the option of returning true objects.

On Wed, Jan 10, 2007 at 09:26:22AM -0800, Larry Wall wrote:

> Possible optimization: for those success values that are sufficiently
> "uninteresting" maybe they could just be refs to constant shared
> objects so you avoid allocating them every time.  Even if you have to
> return some integer like a number of characters read, this is usually
> the same number till the last block of the file, so that could be
> factored out.

If most IO operations are actually returning the OS error code, then having
around 128 cached shared objects for boxing up each errno value seems feasible
to me.

> Or it could be an out-of-band thing like errno, but it would just
> happen to be an out-of-band object instead of an integer.  I can imagine
> various states in between where it looks out-of-band but really comes
> through the return interface for cleanness.

Out of band things feel bad. I'm not sure how parrot will implement
concurrency, but C's return -1 with out-of-band errno feels like a mistake
to avoid. It ends up with C implementations having to use icky hacks to make
something that feels like it's

extern int errno;

but is actually thread local (whilst still being read/write).

The Linux Kernel made a nicer decision to change from -1 to a negative value
(just as efficient to check) where the negative value happens to be the
errno value. The POSIX threads API avoids conflating value returns with
error returns by specifying that the return value is
success-or-positive-errno, but again it's avoiding anything out of band,
or seemingly-out-of-band.

Nicholas Clark


Re: repository open for commits

2007-01-17 Thread Leopold Toetsch
Am Mittwoch, 17. Januar 2007 02:07 schrieb jerry gay:
> i never officially closed the repo to commits, but for those of you
> awaiting parrot's release, it's now complete. you may commit freely.
> thanks for your patience.

Congrats. Well done.

> ~jerry

Thanks
leo


Re: Numeric Semantics

2007-01-17 Thread TSa

HaloO,

Luke Palmer wrote:

That is, is 1 different from 1.0?


I opt for 1 being Int and 1.0 being Num. But for the
latter a test .does(Int) might succeed on the footing
that the fractional part is zero, that is 1.0%1 == 0.
Note that 1/3*3 does not necessarily equal 1.0 for
floating point math. It's more like 0.999... which raises
the question how the construction of an Int from a Num works?
Do we truncate, floor or round? Would (1/3*3).does(Int) be
true?



 Should 10**500 be infinity or a 1 with 500 zeroes after it?


IEEE 754 distinguishes Overflow from Inf. Both are outside the
order of the valid numbers. Should we have these exceptional values
comparable? E.g. code like

# some calculation involving $x
if $x == Overflow {...}

seems reasonable. And I think that Inf == Inf and things
like $x < Inf should be valid syntax. I think Overflow < Inf
makes sense as well.



 Should 10**10**6 run out of memory?  Should
"say (1/3)**500" print a bunch of digits to the screen or print 0?


Are we silently underflowing to zero? Or should it return an
Underflow? In particular I would expect Underflow to carry a
sign and Underflow.abs > 0.

Also Num needs to support signed zero e.g. for handling 1/(1/$x)
for $x == +/-Inf to end up with the correct sign of Inf. But of
course +0 == -0.


A somewhat tangential idea of mine are types like int31 that makes
signed 32 bit arithmetic modulo the Mersenne prime 2**31 - 1. That
would guarantee two things. First $x * $y > 0 whenever $x > 0 &&
$y > 0, that is there are no divisors of zero. Second -2**31 is
available to encode "infinity" or NaN. In int61 arithmetic with
modulus 2**61 - 1 in a 64 bit value we have the two spare bit 
combinations 01 and 10 after the sign to encode special numbers.


Regards, TSa.
--


[svn:perl6-synopsis] r13526 - doc/trunk/design/syn

2007-01-17 Thread larry
Author: larry
Date: Wed Jan 17 10:56:32 2007
New Revision: 13526

Modified:
   doc/trunk/design/syn/S13.pod

Log:
Replaced "is commutative" with a more general multisig syntax.


Modified: doc/trunk/design/syn/S13.pod
==
--- doc/trunk/design/syn/S13.pod(original)
+++ doc/trunk/design/syn/S13.podWed Jan 17 10:56:32 2007
@@ -12,9 +12,9 @@
 
   Maintainer: Larry Wall <[EMAIL PROTECTED]>
   Date: 2 Nov 2004
-  Last Modified: 18 Aug 2006
+  Last Modified: 17 Jan 2007
   Number: 13
-  Version: 5
+  Version: 6
 
 =head1 Overview
 
@@ -62,21 +62,6 @@
 multi sub *infix:<~>(Str $s1, ArabicStr $s2) {...}
 multi sub *infix:<~>(ArabicStr $s1, Str $s2) {...}
 
-Binary operators may be declared as commutative:
-
-multi sub infix:<+> (Us $us, Them $them) is commutative { myadd($us,$them) 
}
-
-That's equivalent to:
-
-multi sub infix:<+> (Us $us, Them $them) { myadd($us,$them) }
-multi sub infix:<+> (Them $them, Us $us) { myadd($us,$them) }
-
-Note the lack of C<*> on those definitions.  That means this definition
-of addition is only in effect within the scope of the package in which
-C<< infix:<+> >> is defined.  Similar constraints apply to lexically scoped
-multi subs.  Generally you want to put your multi subs into the C<*>
-space, however, so that they work everywhere.
-
 The C syntax had one benefit over Perl 6's syntax in that
 it was easy to alias several different operators to the same service
 routine.  This can easily be handled with Perl 6's aliasing:
@@ -87,6 +72,46 @@
 &infix:<*> ::= &unimpl;
 &infix: ::= &unimpl;
 
+That's one solution, but often your alternatives all have the same
+name, and vary instead in their signature.  Some operators are
+commutative, or can otherwise take their arguments in more than
+one order.  Perl allows you to declare multiple signatures for a
+given body, and these will be pattern matched as if you had declared
+separate multi entries.  If you say:
+
+multi sub infix:<+> (Us $us, Them $them) |
+(Them $them, Us $us) { myadd($us,$them) }
+
+that's equivalent to:
+
+multi sub infix:<+> (Us $us, Them $them) { myadd($us,$them) }
+multi sub infix:<+> (Them $them, Us $us) { myadd($us,$them) }
+
+except that there really is only one body.  If you declared a C
+variable within the body, for instance, there would only be one
+of them.
+
+Note the lack of C<*> on the definitions above.  That means this definition
+of addition is syntactically in effect only within the scope in which
+C<< infix:<+> >> is defined or imported.  Similar constraints apply
+to lexically scoped multi subs.  Generally you want to put your multi
+subs into the C<*> space, however, so that they work everywhere.
+
+When you use the multiple signature syntax, the alternate signatures
+must all bind the same set of formal variable names, though they
+are allowed to vary in any other way, such as by type, or by which
+parameters are considered optional or named-only or slurpy.  In other
+words, the compiler is allowed to complain if any of the alternatives
+omits any of the variable names.  This is intended primarily to catch
+editing errors.
+
+Conjectural: If the first parameter to a multi signature is followed
+by an invocant colon, that signature represents two signatures, one
+for an ordinary method definition, and one for the corresponding multi
+definition that has a comma instead of the colon.  This form is legal
+only where the standard method definition would be legal, and only
+if any declared type of the first parameter is consistent with $?CLASS.
+
 =head1 Fallbacks
 
 Dispatch is based on a routine's signature declaration without regard


Re: Numeric Semantics

2007-01-17 Thread Jonathan Lang

TSa wrote:

Luke Palmer wrote:
> That is, is 1 different from 1.0?

I opt for 1 being Int and 1.0 being Num. But for the
latter a test .does(Int) might succeed on the footing
that the fractional part is zero, that is 1.0%1 == 0.


I'm very leery of the idea that "A.does(B)" ever returns true when
role A does not compose role B; and my understanding has been that Int
does Num, not the other way around.

--
Jonathan "Dataweaver" Lang


Re: The S13 "is commutative" trait

2007-01-17 Thread Larry Wall
On Tue, Jan 16, 2007 at 01:41:30PM -0800, Jonathan Lang wrote:
: Luke Palmer wrote:
: >Seems reasonable.  My generality alarm goes off when I realize that
: >you can't specify commutativity for two of the three args, but that's
: >fine because it's definitely a cpanable feature.
: 
: IIRC, it's possible to embed one signature within another one; if the
: embedded signature has two parameters and "is commutative" while the
: embedding signature is not commutative and has a third arg, wouldn't
: that produce commutativity for two out of the three, as long as
: they're adjacent?
: 
: >> Does the trait only apply within one region of the arglist, or can I
: >> create a 1-arg method that is commutative between the "self" arg and its
: >> data arg? (I assume not -- I can't quite work out what that would mean)
: >
: >That's CPAN's job, I think.
: 
: IMHO, "is commutative" should only apply to positional args: named
: args have this behavior automatically, and trying to include the
: invocant would tend to interfere with the self-contained nature of
: classes and roles - that is, it would allow role A to define a method
: for role B.

I've decided "is commutative" must die of ill-definedness.  See instead
the recent S13 change to support multiple signatures on a single body.

This approach seems to be both more general and better defined.
I like that, up to a point...

Larry


Re: [svn:perl6-synopsis] r13526 - doc/trunk/design/syn

2007-01-17 Thread Jonathan Lang

[EMAIL PROTECTED] wrote:

+Conjectural: If the first parameter to a multi signature is followed
+by an invocant colon, that signature represents two signatures, one
+for an ordinary method definition, and one for the corresponding multi
+definition that has a comma instead of the colon.  This form is legal
+only where the standard method definition would be legal, and only
+if any declared type of the first parameter is consistent with $?CLASS.


Should "...and one for the corresponding multi definition..." read
"...and one for the corresponding sub definition..."?  Or is there
something about methods vs. multis that I'm missing?  And does this
dual declaration have to be restricted to multi signatures?  Why not
say that a method or submethod signature with an explicit invocant
effectively doubles as a corresponding sub definition, with the
invocant prepended as the first positional parameter?

--
Jonathan "Dataweaver" Lang


[svn:perl6-synopsis] r13527 - doc/trunk/design/syn

2007-01-17 Thread larry
Author: larry
Date: Wed Jan 17 11:50:09 2007
New Revision: 13527

Modified:
   doc/trunk/design/syn/S03.pod

Log:
Revised reduce semantics to allow list infixes to work correctly.


Modified: doc/trunk/design/syn/S03.pod
==
--- doc/trunk/design/syn/S03.pod(original)
+++ doc/trunk/design/syn/S03.podWed Jan 17 11:50:09 2007
@@ -12,9 +12,9 @@
 
   Maintainer: Larry Wall <[EMAIL PROTECTED]>
   Date: 8 Mar 2004
-  Last Modified: 8 Jan 2007
+  Last Modified: 17 Jan 2007
   Number: 3
-  Version: 88
+  Version: 89
 
 =head1 Changes to Perl 5 operators
 
@@ -1373,8 +1373,17 @@
 (And, in fact, the latter are already easy to express anyway,
 and more obviously nonsensical.)
 
-A reduce operator returns only a scalar result regardless of context.
-To return all intermediate results, backslash the operator:
+Most reduce operators return a simple scalar value, and hence do not care
+whether they are evaluated in item or list context.  However, as with
+other list operators and functions, a reduce operator may return a list
+that will automatically be interpolated into list context, so you may
+use it on infix operators that operate over lists as well as scalars:
+
+my ($min, $max) = [minmax] @minmaxpairs;
+
+A variant of the reduction metaoperator is pretty much guaranteed
+to produce a list; to lazily generate all intermediate results along
+with the final result, you can backslash the operator:
 
 say [\+] 1..*  #  (1, 3, 6, 10, 15, ...)
 
@@ -1990,7 +1999,7 @@
 !!! ...  ???
 [+] [*] [<] [\+] [\*] etc.
 (also = as list assignment)
-list infix  ¥ <== ==> X XX X~X X*X XeqvX etc.
+list infix  ¥ <== ==> minmax X XX X~X X*X XeqvX etc.
 loose and   and
 loose oror xor err
 expr terminator ; {} as control block, statement modifiers


[svn:perl6-synopsis] r13528 - doc/trunk/design/syn

2007-01-17 Thread larry
Author: larry
Date: Wed Jan 17 14:05:20 2007
New Revision: 13528

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Clarify how C<||> limits longest-token semantics.


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podWed Jan 17 14:05:20 2007
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
-   Last Modified: 16 Jan 2007
+   Last Modified: 17 Jan 2007
Number: 5
-   Version: 43
+   Version: 44
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I rather than "regular
@@ -73,14 +73,18 @@
 (You may now use C<||> to indicate the old temporal alternation.  That is,
 C<|> and C<||> now work within regex syntax much the same as they
 do outside of regex syntax, where they represent junctional and
-short-circuit OR.)  Every regex in Perl 6 is required to be able to
+short-circuit OR.  This includes the fact that C<|> has tighter
+precedence than C<||>.)  Every regex in Perl 6 is required to be able to
 return its list of initial constant strings (transitively including the
 initial constant strings of any initial subrule called by that regex).
 A logical alternation using C<|> then takes two or more of these lists
 and dispatches to the alternative that advertises the longest matching
 prefix, not necessarily to the alternative that comes first lexically.
 (However, in the case of a tie between alternatives, the earlier
-alternative does take precedence.)
+alternative does take precedence.)  Backtracking into a constant prefix
+(or into a :: that would backtrack over a constant prefix) causes
+the next longest match to be attempted, even if that is specified
+in a different subrule.
 
 Initial constants must take into account case sensitivity (or any other
 canonicalization primitives) and do the right thing even when propagated
@@ -90,6 +94,18 @@
 say, a trie, the trie must continue to have the appropriate semantics
 for the originating rule.
 
+The C<||> form has the old short-circuit semantics, and will not
+attempt to match its right side unless all possibilities (including
+all C<|> possibilities) are exhausted on its left.  The first C<||>
+in a regex makes constant strings on its left available to the
+outer longest-token matcher, but hides any subsequent tests from
+longest-token matching.  Every C<||> establishes a new longest-token
+table.  That is, if you use C<|> on the right side of C<||>, that
+right side establishes a new top level for longest-token processing
+for this subexpression and any called subrules.  The right side's
+longest-token list is invisible to the left of the C<||> or outside
+the regex containing the C<||>.
+
 =head1 Modifiers
 
 =over
@@ -506,11 +522,13 @@
 C<&> and C<&&> forms.  The C<&> form allows the compiler and/or the
 run-time system to decide which parts to evaluate first, and it is
 erroneous to assume either order happens consistently.  The C<&&>
-form short-circuits, and backtracking makes the right argument vary
-faster than the left.
+form guarantees left-to-right order, and backtracking makes the right
+argument vary faster than the left.
 
-The C<&> and C<&&> operators are list associative like C<|> and C<||>,
-but have tighter precedence.
+The C<&> operator is list associative like C<|>, but has slightly
+tighter precedence.  Likewise C<&&> has slightly tighter precedence
+than C<||>.  As with the normal junctional and short-circuit operators,
+C<&> and C<|> are both tighter than C<&&> and C<||>.
 
 =back
 


[svn:perl6-synopsis] r13529 - doc/trunk/design/syn

2007-01-17 Thread larry
Author: larry
Date: Wed Jan 17 18:24:48 2007
New Revision: 13529

Modified:
   doc/trunk/design/syn/S05.pod

Log:
We now analyze regex expressions as pattern/action pairs and grammars as
collections of those pairs.  The initial-constant-strings approach
is now generalized to initial DFAable patterns as suggested by luqui++.
The idea of "token" is now defined a little more rigorously, and its
epiphenomenal nature explicated, especially with respect to C.


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podWed Jan 17 18:24:48 2007
@@ -16,7 +16,7 @@
Date: 24 Jun 2002
Last Modified: 17 Jan 2007
Number: 5
-   Version: 44
+   Version: 45
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I rather than "regular
@@ -68,43 +68,7 @@
 =back
 
 While the syntax of C<|> does not change, the default semantics do
-change slightly.   Instead of representing temporal alternation, C<|>
-now represents logical alternation with longest-token semantics.
-(You may now use C<||> to indicate the old temporal alternation.  That is,
-C<|> and C<||> now work within regex syntax much the same as they
-do outside of regex syntax, where they represent junctional and
-short-circuit OR.  This includes the fact that C<|> has tighter
-precedence than C<||>.)  Every regex in Perl 6 is required to be able to
-return its list of initial constant strings (transitively including the
-initial constant strings of any initial subrule called by that regex).
-A logical alternation using C<|> then takes two or more of these lists
-and dispatches to the alternative that advertises the longest matching
-prefix, not necessarily to the alternative that comes first lexically.
-(However, in the case of a tie between alternatives, the earlier
-alternative does take precedence.)  Backtracking into a constant prefix
-(or into a :: that would backtrack over a constant prefix) causes
-the next longest match to be attempted, even if that is specified
-in a different subrule.
-
-Initial constants must take into account case sensitivity (or any other
-canonicalization primitives) and do the right thing even when propagated
-up to rules that don't have the same canonicalization.  That is, they
-must continue to represent the set of matches that the lower rule would
-match.  If and when the optimizer turns such a list of prefixes into,
-say, a trie, the trie must continue to have the appropriate semantics
-for the originating rule.
-
-The C<||> form has the old short-circuit semantics, and will not
-attempt to match its right side unless all possibilities (including
-all C<|> possibilities) are exhausted on its left.  The first C<||>
-in a regex makes constant strings on its left available to the
-outer longest-token matcher, but hides any subsequent tests from
-longest-token matching.  Every C<||> establishes a new longest-token
-table.  That is, if you use C<|> on the right side of C<||>, that
-right side establishes a new top level for longest-token processing
-for this subexpression and any called subrules.  The right side's
-longest-token list is invisible to the left of the C<||> or outside
-the regex containing the C<||>.
+change slightly.  See the section below on "Longest-token matching".
 
 =head1 Modifiers
 
@@ -654,13 +618,12 @@
 as a literal, or fails if no key matches.  (A C<""> key will match
 anywhere, provided no longer key matches.)
 
-In a context requiring a set of initial constant strings, the keys
-of the hash comprise that set of strings, and any subsequent matching
-performed by the hash values is not considered a part of those strings,
-even if that subsequent match begins by matching more constant string.
-The keys are considered to be canonicalized in the same way as any
-surrounding context, so for instance within a case-insensitive context
-the hash keys must match insensitively also.
+In a context requiring a set of initial token patterns, the initial
+token patterns are taken to be each key plus any initial token pattern
+matched by the corresponding value (if the value is a string or regex).
+The token patterns are considered to be canonicalized in the same way
+as any surrounding context, so for instance within a case-insensitive
+context the hash keys must match insensitively also.
 
 Subsequent matching depends on the hash value:
 
@@ -1534,6 +1497,107 @@
 
 =back
 
+=head1 Longest-token matching
+
+Instead of representing temporal alternation, C<|> now represents
+logical alternation with longest-token semantics.  (You may now use
+C<||> to indicate the old temporal alternation.  That is, C<|> and
+C<||> now work within regex syntax much the same as they do outside
+of regex syntax, where they represent junctional and short-circuit OR.
+This includes the fact that C<|> has tighter precedence than C<||>.)
+
+Hi