RFC 331 (v2) Consolidate the $1 and C\1 notations

2000-10-01 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Consolidate the $1 and C\1 notations

=head1 VERSION

  Maintainer: David Storrs [EMAIL PROTECTED]
  Date: 28 Sep 2000
  Last Modified: 30 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 331
  Version: 2
  Status: Frozen

=head1 ABSTRACT

Currently, C\1 and $1 have only slightly different meanings within a
regex.  It is possible to consolidate them without losing any
functionality and, in the process, we gain intuitiveness.

=head1 CHANGES

v1-v2:  
A major rewrite:

=over 4

=item *
Reformatted the argument into "The Problem" and "The Solution" sections

=item *
Added "Some Examples" section

=item *
Added "Why do this?" section

=item *
Added "P526 migration" section

=item *
Proposed the @/ variable

=item *
Various trivial edits and typo-fixs

=back


=head1 DESCRIPTION

Note:  For convenience, I am going to talk about C\1 and $1 in this RFC.
In actuality, these notations extend indefinitely:  C\1..\n and
C$1..$n.  Take it as read that anything which applies to $1 also applies
to C$2, $3, etc.


=head2 The Problem

In current versions of Perl, C\1 and C$1 mean different things.
Specifically, C\1 means "whatever was matched by the first set of
grouping parens Iin this regex match."  $1 means "whatever was matched
by the first set of grouping parens Iin the previously-run regex match."
For example:

=over 4

=item *
C/(foo)_$1_bar/

=item *
C/(foo)_\1_bar/

=back

the second will match 'foo_foo_bar', while the first will match
'foo_[SOMETHING]_bar' where [SOMETHING] is whatever was captured in the
Bprevious match...which could be a long, long way away, possibly even in
some module that you didn't even realize you were including (because it
was included by a module that was included by a module that was included
by a...).

The primary reason for this distinction is s///, in which the left hand
side is a pattern while the right hand side is a string (assuming no 'e'
modifier).  Therefore:

=over 4

=item *
Cs/(foo)$1/$1bar/ # changes "foo???" to "foobar" where ??? is from the
last match

=item *
Cs/(foo)\1/$1bar/ # changes "foofoo" to "foobar"

=back

Note that, in the first example, the two $1s refer to different things,
whereas in the second example, $1 and C\1 refer to the same thing.  This
is counterintuitive and non-Perlish; Perl should be intuitive and DWIMish.

A separate, though less important, problem with the way backreferences are
currently implemented is that it is difficult for a human to tell at a
glance whether \10 means "escape character 10" or "backreference 10"...the
only way to tell is to count the number of captured elements and see if
there actually are ten of them, in which case \10 is a backreference and
otherwise it is an escape character.  In general, this isn't a problem
because most patterns don't have ten sets of capturing parens.


=head2 The Solution

Ok, so the problem is that $1 and C\1 are counterintuitive.  How do we
make them intuitive without losing any functionality?

First, let's get rid of the C\1 form for backreferences.

Second, let's say that $n refers to the nth captured subelement of the
pattern match which occured in this Bstatement--note that this is
distinct from "in this pattern match."  That means that, in
Cs/(foo)$1/$1bar/, both $1s refer to the same thing (the string 'foo'),
even though one of them occured inside a pattern and one occured inside a
string.  (See note [1] in the IMPLEMENTATION section.)

Third, let's create a new special variable, @/ (mnemonic: the / is the
default delimiter for a pattern match; if the English module remains
extant, then @/ could have the long name of @LAST_MATCH, but there are
currently several threads concerning removal of the English module). Much
like the current C$1, $2... variables, this array will only be created
(and hence, the speed price will only be paid), if you access its members.
The 0th element of @/ will contain the qr()d form of the last pattern
match, while successive elements refer to the captured subelements.

Fourth, let's change when we update the variables which store the captures
(the current C$1, $2, etc).  @/ will only be updated when the entire
statement which contains a pattern match has finished running (e.g., when
the entire s/// is completed), rather than as soon as the pattern match is
done (and therefore before the substitution happens).  


=head2 Some Examples

=over 4

=item 1
If you did the following:

C"Bilbo Baggins" =~ /((\w+)\s+(\w+))/

Then @/ would contain the following:

C$/[0] the compiled equivalent of C/((\w+)\s+(\w+))/, 

C$/[1] the string "Bilbo Baggins"

C$/[2] the string "Bilbo"

C$/[3] the string "Baggins"

Note that after the match, C$/[1], C$/[2], and C$/[3] contain
exactly what C$1, $2, and C$3 would contain with present-day syntax.
Furthermore, the compiled form of the match is available so if you want to
repeat the match later (or insert it into a larger regex), you can 

RFC 347 (v2) Remove long-deprecated $* (aka $MULTILINE_MATCHING)

2000-10-01 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Remove long-deprecated $* (aka $MULTILINE_MATCHING)

=head1 VERSION

  Maintainer: Hugo van der Sanden [EMAIL PROTECTED]
  Date: 29 Sep 2000
  Last Modified: 30 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 347
  Version: 2
  Status: Frozen

=head1 ABSTRACT

The magic $* variable (known in English as $MULTILINE_MATCHING)
has been deprecated for years. It is time to kill it.

=head1 DESCRIPTION

In days of yore, you would set $* to 1 to achieve in all regexps
the same as you can now achieve on a per-regexp basis with the
/m flag. Nowadays, when most perl programmers have never heard
of it, it is an accident waiting to happen and requires ugly
additional cruft for the defensive programmer to avoid.

The particular danger of $* is its 'action at a distance' effect:
as a global variable, its effect reaches into and out of scopes
that we normally expect to protect us.

=head1 MIGRATION

The long deprecation cycle helps here. p52p6 should complain and
die if it sees any attempt to set $* or $MULTILINE_MATCHING to a
non-zero value, or any attempt to alias it other than in English.
It should silently (or maybe with a warning) ignore any attempt to
set it to a zero value, and silently (or maybe with a warning)
replace any attempt to read it with a constant undef.

=head1 IMPLEMENTATION

This only simplifies the regexp engine, and should help fix some
longstanding bugs in the scope of /m. There is a bit of work to
do to extricate it, but nothing seriously difficult.

=head1 REFERENCES

perlvar manpage for discussion of $*




RFC 360 (v1) Allow multiply matched groups in regexes to return a listref of all matches

2000-10-01 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Allow multiply matched groups in regexes to return a listref of all matches

=head1 VERSION

  Maintainer: Kevin Walker [EMAIL PROTECTED]
  Date: 30 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 360
  Version: 1
  Status: Developing

=head1 DESCRIPTION

Since the October 1 RFC deadline is nigh, this will be pretty informal.

Suppose you want to parse text with looks like:

 name: John Abajace
 children: Tom, Dick, Harry
 favorite colors: red, green, blue

 name: I. J. Reilly
 children: Jane, Gertrude
 favorite colors: black, white
 
 ...

Currently, this takes two passes:

 while ($text =~ /name:\s*(.*?)\n\s*
children:\s*(.*?)\n\s*
favorite\ colors:\s*(.*?)\n/sigx) {
 # now second pass for $2 ( = "Tom, Dick, Harry") and $3, yielding
 # list of children and favorite colors
 }

If we introduce a new construction, (?@ ... ), which means "spit out a
list ref of all matches, not just the last match", then this could be
done in one pass:

 while ($text =~ /name:\s*(.*?)\n\s*
children:\s*(?:(?@\S+)[, ]*)*\n\s*
favorite\ colors:\s*(?:(?@\S+)[, ]*)*\n/sigx) {
 # now we have:
 #  $1 = "John Abajace";
 #  $2 = ["Tom", "Dick", "Harry"]
 #  $3 = ["red", "green", "blue"]
 }

Although the above example is contrived, I have very often felt the need
for this feature in real-world projects.

=head1 IMPLEMENTATION

Unknown.

=head1 REFERENCES

None.




RFC 112 (v4) Assignment within a regex

2000-10-01 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Assignment within a regex

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 16 Aug 2000
  Last Modified: 1 Oct 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 112
  Version: 4
  Status: Frozen

=head1 ABSTRACT

Provide a simple way of naming and picking out information from a regex
without having to count the brackets.

=head1 DESCRIPTION

If a regex is complex, counting the bracketed sub-expressions to find the
ones you wish to pick out can be messy.  It is also prone to maintainability
problems if and when you wish to add to the expression.  Using (?:) can be
used to surpress picking up brackets, it helps, but it still gets "complex".  
I would sometimes rather just pickout the bits I want within the regex itself.

Suggested syntax: (?$foo= ... ) would assign the string that is matched by
the patten ... to $foo when the patten matches.  These assignments would be
made left to right after the match has succeded but before processing a 
replacement or other results (or prior to a some (?{...}) or (??{...})
code).  There may be whitespace between the $foo and the "=".  

Potentially the $foo could be any scalar LHS, as in (?$foo{$bar}= ... ),
likewise the '=' could be any asignment operator.

The camel and the docs include this example:

   if (/Time: (..):(..):(..)/) {
$hours = $1;
$minutes = $2;
$seconds = $3;
}

This then becomes:
 
  /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/

This is more maintainable than counting the brackets and easier to understand
for a complex regex.  And one does not have to worry about the scope of $1
etc.

=head2 When does the assignment actually happen?

In general all assignments should wait to the very end, and then assign
them all.  However before code callouts (?{...}) and friends, the named
assignments that are currently defined should be made so that
the code can refer to them by name.

It may be appropriate for any assignments made before a code callout
to be localised so they can unrolled should the expression finally fail.

=head2 Named Backrefs

The first versions of this RFC did not allow for backrefs.  I now think this
was a shortcoming.  It can be done with (??{quotemeta $foo}), but I find this
clumsy, a better way of using a named back ref might be (?\$foo).

=head2 Scoping

The question of scoping for these assignments has been raised, but I don't
currently have a feel for the "best" way to handle this.  Input welcome.

Hugo: I think it should be defined to act the same as in (??{...}), whenever
we get around to defining that.

=head2 Brackets

Using this method for capturing wanted content, it might be desirable to stop
ordinary brackets capturing, and needing to use (?:...).  I therefore suggest
that as an enhancement to regexes that /b (bracket?) ordinary brackets just
group, without capture - in effect they all behave as (?:...).

=head1 CHANGES

V3 - added bit about backrefs, and brackets.

V4 - Clarified a few things and froze

=head1 IMPLENTATION

Currently all $scalars in regexes are expanded before the main regex compiler
gets to analyse the syntax.  This problem also affects several other RFCs
(166 for example).  The expansion of variables in regexes needs for these
(and other RFCs) to be driven from within the regex compiler so that the
regex can expand as and where appropriate.  Changing this should not affect
any existing behaviour.

=head1 REFERENCES

I brought this up on p5p a couple of years ago, but it was lost in the
noise...

RFC 166 

Perlstorm #0040




RFC 166 (v4) Alternative lists and quoting of things

2000-10-01 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Alternative lists and quoting of things

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 27 Aug 2000
  Last Modified: 1 Oct 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 166
  Version: 4
  Status: Frozen

=head1 ABSTRACT

Expand Alternate Lists from Arrays and Quote the contents of things 
inside regexes.

=head1 DESCRIPTION

These are a couple of constructs to make it easy to build up regexes
from other things.

=head2 Alternative Lists from arrays

The basic idea is to expand an array as a list of alternatives.  There
are two possible syntaxs (?@foo) and just plain @foo.  @foo might just have
existing uses (just), therefore I prefer the (?@foo) syntax.

(?@foo) is just syntactic sugar for (?:(??{ join('|',@foo) })) A bracketed
list of alternatives. But built at regex compile time maybe its 
@{[ join('|',@foo) ]}.

=head2 Quoting the contents of things

If a regex uses $foo or @bar there are problems if the content of
the variables contain special characters.  What is needed is a way
of \Quoting the content of scalars $foo or arrays (?@foo).

Suggested syntax:

(?Q$foo) Quotes the contents of the scalar $foo - equivalent to
(??{ quotemeta $foo }).

(?Q@foo) Quotes each item in a list (as above) this is equivalent to
(?:(??{ join ('|', map quotemeta, @foo)})).

In this syntax the Q is used as it represents a more inteligent \Quot\E.

It is recognised that (?Q$foo) is equivalent to \Q$foo\E, but it does not
mean that this is a bad idea to add this at the same time as (?Q@foo) for
reasons of symetry and perl DWIM.

It is recognised the (?Q might be reserved for control of a hypothetical
Q flag, but this does feel "appropriate" as its about \Quoting.

=head2 Comments

Hugo:
 (?@foo) and (?Q@foo) are both things I've wanted before now. I'm
 not sure if this is the right syntax, particularly if RFC 112 is
 adopted: it would be confusing to have (?@foo) to have so
 different a meaning from (?$foo=...), and even more so if the
 latter is ever extended to allow (?@foo=...).
 I see no reason that implementation should cause any problems
 since this is purely a regexp-compile time issue.

Me: I cant see any reasonable meaning to (?@foo=...) this seams an appropriate
syntax, but I am open for others to be suggested.

=head1 CHANGES

V1 of this RFC had three ideas, one has been dropped, the other is now part
of RFC 198.

V2 Expands the list expansion and quoting with quoting of scalars and 
Implemention issues.

V3 In an error what should have been 165 V2 was issued as 166 V2 so this is V3
with a change in (?Q$foo).  This is in a pre-frozen state.

V4 Added a couple of minor changes from Hugo and frozen.

=head1 MIGRATION

As (?@foo) and (?Q...) these are additions with out any compatibility issues.

The option of just @foo for list exansion, might represent a small problem if
people already use the construct.

=head1 IMPLENTATION

Both of these are changes are regex compile time issues.

Generating lists from arrays almost works by localising $" as '|' for the 
regex and just using @foo.

MJD has demonstrated implementing (?@foo) as (?\@foo) by means of an overload
of regexes, this slight change was necessary because of the expansion of
@foo - see below.

Both of these changes are currently affected by the expansion of variables in
the regex before the regex compiler gets to work on the regex.  This problem also
affects several other RFCs.  The expansion of variables in regexes needs
for these (and other RFCs) to be driven from within the regex compiler so
that the regex can expand as and where appropriate.  Changing this should not
affect any existing behaviour.

=head1 REFERENCES

RFC 198: Boolean Regexes





RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Ban Perl hooks into regexes

=head1 VERSION

  Maintainer: Simon Cozens [EMAIL PROTECTED]
  Date: 25 Sep 2000 
  Mailing List: [EMAIL PROTECTED]
  Number: 308
  Version: 1
  Status: Developing

=head1 ABSTRACT

Remove C?{ code }, C??{ code } and friends.

=head1 DESCRIPTION

The regular expression engine may well be rewritten from scratch or
borrowed from somewhere else. One of the scarier things we've seen
recently is that Perl's engine casts back its Krakken tentacles into Perl
and executes Perl code. This is spooky, tangled, and incestuous.
(Although admittedly fun.)

It would be preferable to keep the regular expression engine as
self-contained as possible, if nothing else to enable it to be used
either outside Perl or inside standalone translated Perl programs
without a Perl runtime.

To do this, we'll have to remove the bits of the engine that call 
Perl code. In short: C?{ code } and C??{ code } must die.

=head1 IMPLEMENTATION

It's more of an unimplementation really.

=head1 REFERENCES

None.




RFC 317 (v1) Access to optimisation information for regular expressions

2000-09-25 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Access to optimisation information for regular expressions

=head1 VERSION

  Maintainer: Hugo van der Sanden ([EMAIL PROTECTED])
  Date: 25 September 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 317
  Version: 1
  Status: Developing

=head1 ABSTRACT

Currently you can see optimisation information for a regexp only
by running with -Dr in a debugging perl and looking at STDERR.
There should be an interface that allows us to read this information
programmatically and possibly to alter it.

=head1 DESCRIPTION

At its core, the regular expression matcher knows how to check
whether a pattern matches a string starting at a particular location.
When the regular expression is compiled, perl may also look for
optimisation information that can be used to rule out some or all
of the possible starting locations in advance.

Currently you can find out about the optimisation information
captured for a particular regexp only in a perl built with
DEBUGGING, by turning on -Dr:

  % perl -Dr -e 'qr{test.*pattern}'
  Compiling REx `test.*pattern'
  size 8 first at 1
  rarest char p at 0
  rarest char s at 2
 1: EXACT test(3)
 3: STAR(5)
 4:   REG_ANY(0)
 5: EXACT pattern(8)
 8: END(0)
  anchored `test' at 0 floating `pattern' at 4..2147483647 (checking floating) minlen 
11 
  Omitting $` $ $' support.
  
  EXECUTING...
  
  Freeing REx: `test.*pattern'
  %

For some purposes it would help to be able to get at this information
programmatically: the test suite could take advantage of this (to test
that optimisations occur as expected), and it could also be useful for
enhanced development tools, such as a graphical regexp debugger.

Additionally there are times that the programmer is able to supply
optimisation that the regexp engine cannot discover for itself. While
we could consider making it possible to modify these values, it is
important to remember that these are only hints: the regexp engine
is free to ignore them. So there is a danger that people will misuse
writable optimisation information to move part of the logic out of
the regexp, and then blame us when it breaks.

Suggested example usage:

  % perl -wl
  use re;
  $a = qr{test.*pattern};
  print join ':', $a-fixed_string, $a-floating_string, $a-minlen;
  __END__
  test:pattern:11
  %

.. but perhaps a single new method returning a hashref would be
cleaner and more extensible:

  $opt = $a-optimisation;
  print join ':', @$opt{qw/ fixed_string floating_string minlen /};

=head1 IMPLEMENTATION

Straightforward: add interface functions within the perl core to give
access to read and/or write the optimisation values; add methods in
re.pm that use XS code to reach the internal functions.

=head1 REFERENCES

Prompted by discussion of RFC 72:

RFC 72: Variable-length lookbehind: the regexp engine should also go backward.




RFC 276 (v1) Localising Paren Counts in qr()s.

2000-09-24 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Localising Paren Counts in qr()s.

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 24 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 276
  Version: 1
  Status: Developing

=head1 ABSTRACT

The Paren Counts and backreferences should be localised in each qr(), to
prevent surprises when qr()s are used in combination.

=head1 DESCRIPTION

TomCs perl storm #0040 has:

 Figure out way to do 
 
 /$e1 $e2/
 
 safely, where $e1 might have '(foo) \1' in it. 
 and $e2 might have '(bar) \1' in it.  Those won't work.

=head2 DISCUSSION

Me: If e1 and e2 are qr// type things the answer might be to localise 
the backref numbers in each qr// expression.   Use of assignment in a regex
and named backrefs (RFC 112) would make this a lot safer.


Hugo: 
I think it is reaonable to ask whether the current handling of qr{}
subpatterns is correct:

perl -wle '$a=qr/(a)\1/; $b=qr/(b).*\1/; /$a($b)/g and print join ":", $1, 
pos for "aabbac"' 
a:5

I'm tempted to suggest it isn't; that the paren count should be local
to each qr{}, so that the above prints 'bb:4'. I think that most people
currently construct their qr{} patterns as if they are going to be
handled in isolation, without regard to the context in which they are
embedded - why else do they override the embedder's flags if not to
achieve that?

The problem then becomes: do we provide a mechansim to access the
nested backreferences outside of the qr{} in which they were referenced,
and if so what syntax do we offer to achieve that? I don't have an answer
to the latter, which tempts me to answer 'no' to the former for all the
wrong reasons. I suspect (and suggest) that complication is the only
reason we don't currently have the behaviour I suggest the rest of the
semantics warrant - that backreferences are localised within a qr().

I lie: the other reason qr{} currently doesn't behave like that is that
when we interpolate a compiled regexp into a context that requires it be
recompiled, we currently ignore the compiled form and act only on the
original string. Perhaps this is also an insufficiently intelligent thing
to do.

MJD:
Interpolated qr() items shouldn't be recompiled anyway.  They should
be treated as subroutine calls.  Unfortunately, this requires a
reentrant regex engine, which Perl doesn't have.  But I think it's the
right way to go, and it would solve the backreference problem, as well
as many other related problems.

Me: You can access the nested backreferences outside of the qr{} in which 
they were referenced by use of the named backref see RFC 112.

=head2 AGREEMENTS

The paren count in each qr() is localised to each qr().

There is no way to access the nested backrefernces outside of the qr() by
number they may be accessed by name (see RFC 112).

The regex engine must be made re-entrant.

The regex compiler should not need to recompile qr()s when used as part of
another regex.

=head1 IMPLENTATION

The Regex engine must be made re-entrant.

The expansion of variables in regexes must be driven by the regex compiler
(Same problem as for RFCs 112, 166 ...)

=head1 REFERENCES

Perlstorm #0040 from TomC.

RFC 112: Assignment within a regex




RFC 112 (v3) Asignment within a regex

2000-09-23 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Asignment within a regex

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 16 Aug 2000
  Last Modified: 23 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 112
  Version: 3
  Status: Developing

=head1 ABSTRACT

Provide a simple way of naming and picking out information from a regex
without having to count the brackets.

=head1 DESCRIPTION

If a regex is complex, counting the bracketed sub-expressions to find the
ones you wish to pick out can be messy.  It is also prone to maintainability
problems if and when you wish to add to the expression.  Using (?:) can be
used to surpress picking up brackets, it helps, but it still gets "complex".  
I would sometimes rather just pickout the bits I want within the regex itself.

Suggested syntax: (?$foo= ... ) would assign the string that is matched by
the patten ... to $foo when the patten matches.  These assignments would be
made left to right after the match has succeded but before processing a 
replacement or other results (or prior to a some (?{...}) or (??{...})
code).  There may be whitespace between the $foo and the "=".  

Potentially the $foo could be any scalar LHS, as in (?$foo{$bar}= ... )!,
likewise the '=' could be any asignment operator.

The camel and the docs include this example:

   if (/Time: (..):(..):(..)/) {
$hours = $1;
$minutes = $2;
$seconds = $3;
}

This then becomes:
 
  /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/

This is more maintainable than counting the brackets and easier to understand
for a complex regex.  And one does not have to worry about the scope of $1 etc.

=head2 Named Backrefs

The first versions of this RFC did not allow for backrefs.  I now think this
was a shortcoming.  It can be done with (??{quotemeta $foo}), but I find this
clumsy, a better way of using a named back ref might be (?\$foo).

=head2 Scoping

The question of scoping for these assignments has been raised, but I don't
currently have a feel for the "best" way to handle this.  Input welcome.

=head2 Brackets

Using this method for capturing wanted content, it might be desirable to
stop ordinary brackets capturing, and needing to use (?:...).  I therefore
suggest that as an enhancement to regexes that /b (bracket?) ordinary brackets
just group, without capture - in effect they all behave as (?:...).

=head1 CHANGES

V3 - added bit about backrefs, and brackets.

=head1 IMPLENTATION

Currently all $scalars in regexes are expanded before the main regex compiler
gets to analyse the syntax.  This problem also affects several other RFCs
(166 for example).  The expansion of variables in regexes needs for these
(and other RFCs) to be driven from within the regex compiler so that the
regex can expand as and where appropriate.  Changing this should not affect
any existing behaviour.

=head1 REFERENCES

I brought this up on p5p a couple of years ago, but it was lost in the noise...

RFC 166: Alternative lists and quoting of things

Perlstorm #0040




RFC 158 (v3) Regular Expression Special Variables

2000-09-22 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Regular Expression Special Variables

=head1 VERSION

  Maintainer: Uri Guttman [EMAIL PROTECTED]
  Date: 25 Aug 2000
  Last Modified: 22 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 158
  Version: 3
  Status: Frozen
  Frozen since: v2

=head1 ABSTRACT

This RFC addresses ways to make the regex special variables $`, $ and
$' not be such pariahs like they are now.

=head1 CHANGES

I dropped the local scoping of $`, $ and $' as they are already
localized now.

=head1 DESCRIPTION

$`, $ and $' are useful variables which are never used by any
experienced Perl hacker since they have well known problems with
efficiency. Since they are globals, any use of them anywhere in your
code forces all regexes to copy their data for potential later
referencing by one of them. I will describe some ideas to make this
issue go away and return these variables back into the toolbox where
they belong.

=head1 IMPLEMENTATION

The copy all regex data problem is solved by a new modifier k (for
keep). This tells the regex to do the copy so the 3 vars will work
properly. So you would use code like this:

$str = 'prefoopost' ;

if ( $str =~ /foo/k ) {

print "pre is [$`]\n" ;
print "match is [$]\n" ;
print "post is [$']\n" ;
}

=head1 IMPACT

None

=head1 UNKNOWNS

None

=head1 REFERENCES

None.





RFC 165 (v3) Allow Varibles in tr///

2000-09-22 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Allow Varibles in tr///

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 27 Aug 2000
  Last Modified: 22 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 165
  Version: 3
  Status: Frozen

=head1 ABSTRACT

Allow variables in a tr///.  At present the only way to do a tr/$foo/$bar/
is to wrap it up in an eval.  I dont like using evals for this sort of thing.

=head1 DESCRIPTION

Suggested syntax: tr/$foo/$bar/e

With a /e, tr will expand both the LHS and RHS of the translate function.
Either or both could be variables. I am suggesting /e as it is sort of like
/e for s///e.

These words from MJD:

The way tr/// works is that a 256-byte table is constructed at compile
time that say for each input character what output character is
produced.  Then when it's time to apply the tr/// to a string, Perl
iterates over the string one character at a time, looks up each
character in the table, and replaces it with the corresponding
character from the table.

With tr///e, you would have to generate the table at run-time.

This would suggest that you want the same sorts of optimizations that
Perl applies when it encounters a regex that contains variables:

1. Perl should examine the strings to see if they have changed
   since the last time it executed the code

2. It should rebuild the tables only if the strings changed

3. There should be a /o modifier that promises Perl that the
   variables will never change.

The implementation could be analogous to the way m/.../o is
implemented, with two separate op nodes: One that tells Perl
'construct the tables' and one that tells Perl 'transform the
string'.  The 'construct the tables' node would remove itself from the
op tree if it saw that the tr//o modifier was used.

Hugo wrote:
 Definitely. Should be easy to implement. There is a potential for
 confusion, since it makes the tr/ lists look even more like
 m/ and s/ patterns, but I think it can only be less confusion than
 the current state of affairs. It is tempting to make it the default,
 and have a flag to turn it off (or just backwhack the dagnabbed
 dollar), and auto-translation of existing scripts would be pretty
 easy, except that it would presumably fail exactly where people
 are using the current workaround, by way of eval.
 

Comments by me:

Therefore tr///o might be a good idea as well.  

If Hugo's idea of making this the normal behaviour, the problem of
existing evals is avoided by p52p6 changing the eval to a perl5_eval
which acts accordingly.  (One of MJD's ideas).

=head1 IMPLENTATION

Hugo:  Should be easy to implement.  

Me: Should not be too complicated, this is just a case of doing existing
things in a different context.

=head1 CHANGES

V2 - Added words from MJD and Hugo - This hopefully in a pre freeze state.

V3 - re issued due to an error in posting V2 and now frozen

=head1 REFERENCES

None yet.





RFC 166 (v3) Alternative lists and quoting of things

2000-09-22 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Alternative lists and quoting of things

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 27 Aug 2000
  Last Modifiedj: 22 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 166
  Version: 3
  Status: Developing

=head1 ABSTRACT

Expand Alternate Lists from Arrays and Quote the contents of things 
inside regexes.


=head1 DESCRIPTION

These are a couple of constructs to make it easy to build up regexes
from other things.

=head2 Alternative Lists from arrays

The basic idea is to expand an array as a list of alternatives.  There
are two possible syntaxs (?@foo) and just plain @foo.  @foo might just have
existing uses (just), therefore I prefer the (?@foo) syntax.

(?@foo) is just syntactic sugar for (?:(??{ join('|',@foo) })) A bracketed
list of alternatives.

=head2 Quoting the contents of things

If a regex uses $foo or @bar there are problems if the content of
the variables contain special characters.  What is needed is a way
of \Quoting the content of scalars $foo or arrays (?@foo).

Suggested syntax:

(?Q$foo) Quotes the contents of the scalar $foo - equivalent to
(??{ quotemeta $foo }).

(?Q@foo) Quotes each item in a list (as above) this is equivalent to
(?:(??{ join ('|', map quotemeta, @foo)})).

In this syntax the Q is used as it represents a more inteligent \Quot\E.

It is recognised that (?Q$foo) is equivalent to \Q$foo\E, but it does not
mean that this is a bad idea to add this at the same time as (?Q@foo) for
reasons of symetry and perl DWIM.

=head2 Comments

Hugo:
 (?@foo) and (?Q@foo) are both things I've wanted before now. I'm
 not sure if this is the right syntax, particularly if RFC 112 is
 adopted: it would be confusing to have (?@foo) to have so
 different a meaning from (?$foo=...), and even more so if the
 latter is ever extended to allow (?@foo=...).
 I see no reason that implementation should cause any problems
 since this is purely a regexp-compile time issue.

Me: I cant see any reasonable meaning to (?@foo=...) this seams an appropriate
syntax, but I am open for others to be suggested.

=head1 CHANGES

V1 of this RFC had three ideas, one has been dropped, the other is now part
of RFC 198.

V2 Expands the list expansion and quoting with quoting of scalars and 
Implemention issues.

V3 In an error what should have been 165 V2 was issued as 166 V2 so this is V3
with a change in (?Q$foo).  This is in a pre-frozen state.

=head1 MIGRATION

As (?@foo) and (?Q...) these are additions with out any compatibility issues.

The option of just @foo for list exansion, might represent a small problem if
people already use the construct.

=head1 IMPLENTATION

Both of these are changes are regex compile time issues.

Generating lists from arrays almost works by localising $" as '|' for the 
regex and just using @foo.

MJD has demonstrated implementing (?@foo) as (?\@foo) by means of an overload
of regexes, this slight change was necessary because of the expansion of
@foo - see below.

Both of these changes are currently affected by the expansion of variables in
the regex before the regex compiler gets to work on the regex.  This problem also
affects several other RFCs.  The expansion of variables in regexes needs
for these (and other RFCs) to be driven from within the regex compiler so
that the regex can expand as and where appropriate.  Changing this should not
affect any existing behaviour.

=head1 REFERENCES

RFC 198





RFC 198 (v2) Boolean Regexes

2000-09-22 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Boolean Regexes

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 6 Sep 2000
  Last Modified: 22 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 198
  Version: 2
  Status: Developing

=head1 ABSTRACT

This is a development of the proposal for the "not a pattern" concept in RFC
166 V1.  Looking deeper into the handling of advanced regexs, there are
potential needs for many other concepts, to allow a regex to extract
information directly from a complex file in one go, rather than a mixture
of splits and nested regexes as is typically needed today.  With these
parsing data should become easier (in some cases). 

=head1 CHANGES

V2 - Changed the "Fail Pattern", enhanced the wording for many things.

=head1 DESCRIPTION

It would be nice (in my opinion) to be able to build more elaborate regexes
allowing data to be mined out of a sting in one go.  These ideas allow
you to apply several patterns to one substring (each must match), to
fail a match from within, to look for patterns that do not contain other
patterns, and to handle looking for cases such as (foo.*bar)|(bar.*foo) in
a more general way of saying "A substring that contains both foo and bar".

These are ideas, at present with some proposed syntax.  The ideas are more
important than the exact syntax at this stage.  This is very much work in
progress.

I have  called these boolean regexs as they bring the concepts of and ()
or (||) and not(!) into the realm of regexes.

Within a boolean regex (or the boolean part of a regex), several new
symbols have meanings, and some have enhanced meanings.

=head2 The Ideas

Are these part of a boolean (?...) construct within an existing regex, or
is the advanced syntax (and meaning of |!^$) invoked by a new flag such
as /B?

These can look like line noise so the use of white space with /x is used
throughout, and it might be appropriate to enforce (or assume) /x within
(...).

=head3 Boolean construct

(?...) grabs a substring, and applies one or more tests to the substring.

=head3 Substring matching multiple patterns ()

(? pattern1  pattern2  pattern3 )

A substring is definied that matches each pattern.

For example, the first pattern may say specify a substring of at least
30 chars, the next two have a foo and a bar.

=head3 Substring matching alternative patterns (||)

(? pattern1 || pattern2 || pattern3)

This is similar to the existing alternative syntax "|" but the
alternatives to "|" behave as /^pattern$/ rather than /pattern/ (^ and $
taken as refereing to the substring in this case - see below).

(pattern1 || pattern2 || pattern3) can be mixed in with the  case above to
build up more advanced cases.  and || operators can be nested with brackets
in normal ways.

=head3 Brackets within boolean regexes

Within a complex boolean regex there are likely to be lots and lots of
brackets to nest and control the behaviour of the regex.  Rather than having
to sprinkle the regex with (?:) line noise, it would be nicer to just use
ordinary brackets () and only support capturing of elements by using one of
the (?$=) or (?%=) constructs that have been proposed elsewhere (RFC 112
and RFC 150).  There might be some case for this as a general capability
using some flag /b = brackets? 

=head3 Substring not matching a pattern

In RFC 166 I originally proposed (?^ pattern ).  This proposal replaces that.
Though it could be used as well outside of the (?) construct.

!pattern matches anything that does not match the pattern.  On its own it is
not terribly useful, but in conjuction with  and || one can do things
such as /(? img  ! alt=)/ ie does it have an image not have an alt.
 
! is chosen as it has the same basic meaning outside of regexes.

!pattern is a non greedy construct that matches any string/substring that
does not match the pattern.  

=head3 Meaning of $ and ^ inside a boolean regex

^ and $ are taken to mean the begining and end of the substring, not begining
and and of the line/string from within a boolean regex.

=head3 Greediness

Should the (?...) construct be greedy or nongreedy?  To some extent this
depends on the elements it contains.  If all the matching set of patterns are
greedy then it will be greedy, if they are not greedy then it will not be. 
This might or might be sufficient.

If the situation is ambiguous (or might be) The boolean can be expresed as
(?? ...) to force non greediness. 

=head3 Delivering a substring to some code that generates a pass/fail

(?*{code}) delivers a substring to the code, which returns with success
or failure.  The code sees the substring as $_.  This is not dependant on the
Boolean regex concept and could be used for other things, though it is most 
useful in this context.  

This is sort of equivalent to (?: (.*)(??{$_ = $1; code})) ie it matches an
arbitary long substring and deliveres it to the code.  But not dependant on
how many brackets have been 

RFC 274 (v1) Generalised Additions to Regexs

2000-09-22 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Generalised Additions to Regexs

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 22 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 274
  Version: 1
  Status: Developing

=head1 ABSTRACT

This proposes a way for generalised additions to regex capabilities.

=head1 DESCIPTION

Given that expansion of regexes could include (+...) and (*...) I have
been thinking about providing a general purpose way of adding
functionality.  Hence I propose that the entire (+...) syntax is
kept free from formal specification for this. (+ = addition)

A module or anything that wants to support some enhanced syntax
registers something that handles "regex enhancements".

At regex compile time, if and when (+foo) is found perl calls
each of the registered regex enhancements in turn, these:

1) Are passed the foo string as a parameter exactly as is.  (There is
an issue of actually finding the end of the generic foo.)

2) The regex enhancement can either recognise the content or not.

3) If not the enhancement returns undef and perl goes to the next regex
enhancement (Does it handle the enhancements as a stack (Last checked
first) or a list (First checked first?) how are they scoped?  Job here
for the OO/scoping fanatics)

4) If perl runs out of registered regex enhancements it reports an error.  

5) if an enhancement recognises the content it could do either of:

a) return replacement expanded regex using existing capabilities perl will
then pass this back through the regex compiler.

b) return a coderef that is called at run time when the regex gets to this
point.  The referenced code needs to have enough access to the regex internals
to be able to see the current sub-expression, request more characters, access
to relevant flags and visability of greediness.  It may also need a coderef
that is simarly called when the regex is being unwound when it backtracks.
These features would also be of interest to the existing code inside regexes
as well.


Thinking from that - the last case should be generalised (it is sort of
like my (?*{...}) from RFC 198 or an enhancement to (??{...}).  If so cases
(a) and (b) are the same as case (b) is just a case of returning (?*{...}) the
appropriate code.  

Following on, if (?{...}) etc code is evaluated
in forward match, it would be a good idea to likewise support some
code block that is ignored on a forward match but is executed when the
code is unwound due to backtracking.  Thus (?{ foo })(?\{ bar })
executes foo on the forward case and bar if it unwinds.  I dont
care at the moment what the syntax is - what about the concepts.
Think about foo putting something on a stack (eg the bracket to match
[RFC 145]) and bar taking it off for example.

Note:

I dont consider this RFC complete, but after posting this on the regex list
to no effect I am making it an RFC to see if it gets a little more feedback...

=head1 MIGRATION

This is a new feature - no compatibity problems

=head1 IMPLENTATION

This has not been looked at in detail, but the desciption above provides
some views as to how it may operate.

=head1 REFERENCES

RFC 145 - Bracket matching

RFC 198 - Boolean Regexes






RFC 110 (v6) counting matches

2000-09-20 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

counting matches

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 16 Aug 2000
  Last Modified: 20 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 110
  Version: 6
  Status: Frozen

=head1 ABSTRACT

Provide a simple way of giving a count of matches of a pattern.

=head1 DESCRIPTION

Have you ever wanted to count the number of matches of a patten?  s///g 
returns the number of matches it finds.  m//g just returns 1 for matching.
Counts can be made using s//$/g but this is wastefull, or by putting some 
counting loop round a m//g.  But this all seams rather messy. 

TomC (and a couple of others) have said that it can also be done as :
$count = () = $string =~ /pattern/g;

However many people do not like this construct, here are a couple of quotes:

jhi: Which I find cute as a demonstration of the Perl's context concept,
but ugly as hell from usability viewpoint.  

Bart Lateur: '()=' is not perfect. It is also butt ugly. It is a "dirty hack".

This construct is also likely to be inefficient as perl will have to
build up a list of all the matches, store them somewhere, count them, then
throw them away.

Therefore I would like a way of counting matches.

=head2 Proposal

m//gt (or m//t see below) would be defined to do the match, and return the
count of matches, this leaves all existing uses consistent and unaffected.
/t is suggested for "counT", as /c is already taken.

Relationship of m//t and m//g - there are three possibilities, my original:

m//gt, where /t adds counting to a group match (/t without /g would just
return 0 or 1).  However \G loses its meaning.

The Alternative By Uri :

m//t and m//g are mutually exclusive and m//gt should be regarded as an error.

Hugo:

 I like this too. I'd suggest /t should mean a) return a scalar of
 the number of matches and b) don't set any special variables. Then
 /t without /g would return 0 or 1, but be faster since no extra
 information need be captured (except internally for (.)\1 type
 matching - compile time checks could determine if these are needed,
 though (?{..}) and (??{..}) patterns would require disabling of
 that optimisation). /tg would give a scalar count of the total
 number of matches. \G would retain its meaning.

I think Hugo's wording about the relationship makes the best sense, and
this is the suggested way forward.

=head1 CHANGES

RFC110 V1 - Original posting to perl6-language

RFC110 V2 - Reposted to perl6-language-regex

RFC110 V3 - Added Uri's alternitive m//t

RFC110 V4 - Added notes about $count = () = $string =~ /pattern/g

RFC110 V5 - Added Hugo's wording about /g and /t relationship, suggested this
is the way forward.

RFC110 V6 - Frozen

=head1 IMPLENTATION

Hugo:
 Implementation should be fairly straightforward,
 though ensuring that optimisations occurred precisely when they
 are safe would probably involve a few bug-chasing cycles.


=head1 REFERENCES

I brought this up on p5p a couple of years ago, but it was lost in the noise...






RFC 93 (v3) Regex: Support for incremental pattern matching

2000-09-18 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Regex: Support for incremental pattern matching

=head1 VERSION

  Maintainer: Damian Conway [EMAIL PROTECTED]
  Date: 11 Aug 2000
  Last Modified: 18 Sep 2000
  Number: 93
  Version: 3
  Mailing List: [EMAIL PROTECTED]
  Status: Frozen

=head1 ABSTRACT

This RFC proposes that, in addition to strings, subroutine references may be
bound (with =~ or !~ or implicitly) to a regular expression.

=head1 DESCRIPTION

It is proposed that the Perl 6 regular expression engine be extended to
allow a regex to match against an incremental character source, rather than
only against a fixed string.

Specifically, it is proposed that a subroutine reference could be bound
to a regular expression:

sub {...} =~ /pattern/;

As the regular expression is matched, it would make calls to the subroutine
to request additional characters to match, or (after it has matched) to 
return any unused characters.

When the regex engine requires additional characters to match, the
subroutine would be called with a single argument, and would be expected
to return a character string containing the extra characters. The single
argument would specify how many characters should be returned (typically
this would be 1, unless internal analysis by the regex engine can deduce
that more than one character will be required). Returning fewer than the
requested number of characters would typically indicate a premature
end-of-string and would probably trigger backtracking and/or failure to
match.

When the match is finished, the subroutine would be called one final time,
and passed two arguments: a string containing the "unused" characters (what
would be $' for a fixed string), and a flag set to 1. The subroutine
could use this call to push-back (or cache) unused data. In the case of
a failure to match (or success of the !~ operator), every character requested
during the match would be sent back.

A typical structure for a subroutine against which a regex was matched
would therefore be:

sub s {
if ($_[1]) {# "putback unused data" request
recache($_[0]);
}
else {  # "send more data" request
return get_chars(max=$_[0])
}
}


=head2 Examples

The most obvious example would be matching against an input stream:

sub { $_[1] ? $fh-pushback($_[0]) : $fh-getn($_[0]) } =~ /pat/;

which could also be written:

^1 ? $fh-pushback(^0) : $fh-getn(^0)  =~ /pat/;

Of course, it would often be useful to have a subroutine that returns a 
closure on a particular filehandle:

sub fhmatch { ^1 ? $_[0]-pushback(^0) : $_[0]-getn(^0) }

fhmatch($fh) =~ /pat/
fhmatch(\*STDIN) =~ /pat/
# etc.

In fact, this might be so commonly useful that matching against a
file handle should be made to work directly. That is:

$fh =~ /pat/
\*STDIN =~ /pat/

One could then do interactive lexing cleanly:

until (eof $fh) {
switch ($fh) {
/^\s*/; # skip leading whitespace
case /^(lexeme1)/   { push @tokens, $1=LEX1 }
case /^(lexeme2)/   { interact_somehow }
case /^(lexeme3)/   { push @tokens, $1=LEX3 }
# etc.
}
}

Note the use of the proposed PAIR data structure to store tokens
in the above example.

Because the character source is a subroutine, one could also match against 
data coming out of a socket:

my $cache = "";

sub matching_socks {
if ($_[1]) { $cache .= $_[0]; return }  # putback
if (length($cache)  $_[0]) {   # not enough cached
my $extra;  # so get some more
recv(SOCKET, $extra, $_[0]-length($cache));
$cache .= $extra;
}
return substr($cache,0,$_[0],"");
}

switch (\matching_socks) {
case /pat1/ { action1() }
case /pat2/ { action1() }
case /pat3/ { action1() }
#etc.
}


or any other source:

sub mega_ape {
return join "", map {['a'..'z',(' ')x6]-[rand 32]} (1..$_[0])
unless $_[1]
}

\mega_ape =~ /Now is the Winter of our discontent.../i;

print "Art after ", length($`), "chars\n";


=head1 IMPLEMENTATION

Dammit, Jim, I'm a doctor, not an magician!

Probably needs to be integrated with IO disciplines too.


=head1 REFERENCES

RFC 22: Builtin switch statement 

RFC 23: Higher order functions 

RFC 84: Replace = (stringifying comma) with = (pair constructor) 




RFC 170 (v2) Generalize =~ to a special apply-to assignment operator

2000-09-16 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Generalize =~ to a special "apply-to" assignment operator

=head1 VERSION

   Maintainer: Nathan Wiger [EMAIL PROTECTED]
   Date: 29 Aug 2000
   Last-Modified: 16 Sep 2000
   Mailing List: [EMAIL PROTECTED]
   Number: 170
   Version: 2
   Status: Frozen

=head1 ABSTRACT

Currently, C=~ is only available for use in specific builtin pattern
matches. This is too bad, because it's really a neat operator.

This RFC proposes a simple way to make it more general-purpose.

=head1 NOTES ON FREEZE

Probably the only way this could be implemented is if BRFC 164 was
also implemented, freeing C=~ for use as a more general-purpose
operator. Indeed, a main point of this RFC is to provide a means for a
backwards-compatible syntax for regex's.

Unlink BRFC 164, most people I heard from liked this. Some criticized
it as being too sugary, since this:

   $string =~ quotemeta;# $string = quotemeta $string;

Is not as clear as the original. However, there is fairly similar
precedent in:

   $x += 5; # $x = $x + 5;

And to me it seems to be quite clear that Cquotemeta is acting on
C$string in the above example, even when you take into account C=~'s
current binding meaning (perhaps more so, in fact).

=head1 DESCRIPTION

First off, this assumes RFC 164. Second, it requires you drop any
knowledge of how C=~ currently works. Finally, it runs directly
counter to RFC 139, which proposes another application for C=~.

This RFC proposes a simple use for C=~ as a generic "apply-to"
operator. When used, any values on the left side of the expression are
implicitly passed to the end of the right-side expression. What this
means is that an expression such as this:

   $value = dostuff($arg1, $arg2, $value);

Could now be rewritten as:

   $value =~ dostuff($arg1, $arg2);

And C$value would be implicitly transferred over to the right side as
the last argument. It's simple, but it makes what is being operated on
quite obvious.

This enables us to rewrite the following constructs:

   $string = quotemeta($string);
   @array = reverse @array;
   ($name) = split /\s+/, $name;
   @vals = sort { $a = $b } @vals;
   @file = grep !/^#/, @file;

   $string = s/\s+/SPACE/, $string;# RFC 164
   $matches = m/\w+/, $string; # RFC 164
   @strs = s/foo/bar/gi, @strs;# RFC 164

As the shorter and more readable:

   $string =~ quotemeta;
   @array =~ reverse;
   ($name) =~ split /\s+/;
   @vals =~ sort { $a = $b };
   @file =~ grep /!^#/;

   $string =~ s/\s+/SPACE/;# looks familiar
   $string =~ m/\w+/;  # this too [1]
   @strs =~ s/foo/bar/gi;  # cool extension

It's a simple solution, true, but it has a good amount of flexibility
and brevity. It could also be the case that multiple values could be
called and returned, so that:

   ($name, $email) = special_parsing($name, $email);

Becomes:

   ($name, $email) =~ special_parsing;

Again, it's simple, but seems to have useful applications. One nice
thing is that in many (most?) situations it appears to be working very
much like C=~ currently works with regex's (from a user perspective).

Finally, note this can only work with functions and function-like
constructs. An attempt to do something like this:

   $x =~ 5 +;

Should Idefinitely remain a syntax error.

=head2 Possible addition of C~= operator

A symmetric operator, C~=, was proposed informally on the list which
would left-pad the argument list:

$stuff =~ dojunk(@args);   # $stuff = dojunk(@args, $stuff);
$stuff ~= dojunk(@args);   # $stuff = dojunk($stuff, @args);

but the consensus that I received was about 50/50: half liked it, half
thought it was too confusing. Even though we don't have a Cbitnot=
operator currently, creating something that looks like one but acts
completely differently is probably not a good idea.

If something like this was included, it would probably be best to go
with another operator, like C=^:

$stuff =~ dojunk(@args);   # $stuff = dojunk(@args, $stuff);
$stuff =^ dojunk(@args);   # $stuff = dojunk($stuff, @args);

But that's awfully close to C^=. Hmmm. Regardless, this operator is
unlikely to be used nearly as widely since Perl functions usually take
the argument to act on in the last position.

=head1 IMPLEMENTATION

Simplistic (hopefully). Should just involve stacking values onto a
function's argument list.

=head1 MIGRATION

This introduces new functionality, which allows backwards compatibility
for regular expressions. As such, it should require no special
translation of code. This RFC assumes RFC 164 will be adopted (which it
may not be) for changes to regular expressions.

=head1 NOTES

[1] That m// one doesn't quite work right, but that's a special case
that I would suggest should be caught by some other part of the grammar
to maintain backwards compatability (like bare //).

=head1 REFERENCES

RFC 164: Replace =~, !~, m//, s///, 

RFC 166 (v2) Alternative lists and quoting of things

2000-09-15 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Alternative lists and quoting of things

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 27 Aug 2000
  Last Modified: 15 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 166
  Version: 2
  Status: Developing

=head1 ABSTRACT

Expand Alternate Lists from Arrays and Quote the contents of things 
inside regexes.

=head1 DESCRIPTION

These are a couple of constructs to make it easy to build up regexes
from other things.

=head2 Alternative Lists from arrays

The basic idea is to expand an array as a list of alternatives.  There
are two possible syntaxs (?@foo) and just plain @foo.  @foo might just have
existing uses (just), therefore I prefer the (?@foo) syntax.

(?@foo) is just syntactic sugar for (?:(??{ join('|',@foo) })) A bracketed
list of alternatives.

=head2 Quoting the contents of things

If a regex uses $foo or @bar there are problems if the content of
the variables contain special characters.  What is needed is a way
of \Quoting the content of scalars $foo or arrays (?@foo).

Suggested syntax:

(?Q$foo) Quotes the contents of the scalar $foo - equivalent to
(??{ quotemeta $foo }).

(?Q@foo) Quotes each item in a list (as above) this is equivalent to
(?:(??{ join ('|', map quotemeta, @foo)})).

In this syntax the Q is used as it represents a more inteligent \Quot\E.

=head2 Comments

Hugo:
 (?@foo) and (?Q@foo) are both things I've wanted before now. I'm
 not sure if this is the right syntax, particularly if RFC 112 is
 adopted: it would be confusing to have (?@foo) to have so
 different a meaning from (?$foo=...), and even more so if the
 latter is ever extended to allow (?@foo=...).
 I see no reason that implementation should cause any problems
 since this is purely a regexp-compile time issue.

Me: I cant see any reasonable meaning to (?@foo=...) this seams an appropriate
syntax, but I am open for others to be suggested.

=head1 CHANGES

RFC 166, v1 was entitled "Additions to regexes".

V1 of this RFC had three ideas, one has been dropped, the other is now part
of RFC 198.

V2 Expands the list expansion and quoting with quoting of scalars and 
Implemention issues.


=head1 MIGRATION

As (?@foo) and (?Q...) these are additions with out any compatibility issues.

The option of just @foo for list exansion, might represent a small problem if
people already use the construct.

=head1 IMPLENTATION

Both of these are changes are regex compile time issues.

Generating lists from arrays almost works by localising $" as '|' for the 
regex and just using @foo.

MJD has demonstrated implementing (?@foo) as (?\@foo) by means of an overload
of regexes, this slight change was necessary because of the expansion of
@foo - see below.

Both of these changes are currently affected by the expansion of variables in
the regex before the regex compiler gets to work on the regex.  This problem also
affects several other RFCs.  The expansion of variables in regexes needs
for these (and other RFCs) to be driven from within the regex compiler so
that the regex can expand as and where appropriate.  Changing this should not
affect any existing behaviour.

=head1 REFERENCES

RFC 198: Boolean Regexes





RFC 110 (v5) counting matches

2000-09-12 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

counting matches

=head1 VERSION

Maintainer: Richard Proctor [EMAIL PROTECTED]
Date: 16 Aug 2000
Last Modified: 12 Sep 2000
Mailing List: [EMAIL PROTECTED]
Number: 110
Version: 5
Status: Developing

=head1 ABSTRACT

Provide a simple way of giving a count of matches of a pattern.

=head1 DESCRIPTION

Have you ever wanted to count the number of matches of a patten?  s///g 
returns the number of matches it finds.  m//g just returns 1 for matching.
Counts can be made using s//$/g but this is wastefull, or by putting some 
counting loop round a m//g.  But this all seams rather messy. 

TomC (and a couple of others) have said that it can also be done as :
$count = () = $string =~ /pattern/g;

However many people do not like this construct, here are a couple of quotes:

jhi: Which I find cute as a demonstration of the Perl's context concept,
but ugly as hell from usability viewpoint.  

Bart Lateur: '()=' is not perfect. It is also butt ugly. It is a "dirty hack".

This construct is also likely to be inefficient as perl will have to
build up a list of all the matches, store them somewhere, count them, then
throw them away.

Therefore I would like a way of counting matches.

=head2 Proposal

m//gt (or m//t see below) would be defined to do the match, and return the
count of matches, this leaves all existing uses consistent and unaffected.
/t is suggested for "counT", as /c is already taken.

Relationship of m//t and m//g - there are three possibilities, my original:

m//gt, where /t adds counting to a group match (/t without /g would just
return 0 or 1).  However \G loses its meaning.

The Alternative By Uri :

m//t and m//g are mutually exclusive and m//gt should be regarded as an error.

Hugo:

 I like this too. I'd suggest /t should mean a) return a scalar of
 the number of matches and b) don't set any special variables. Then
 /t without /g would return 0 or 1, but be faster since no extra
 information need be captured (except internally for (.)\1 type
 matching - compile time checks could determine if these are needed,
 though (?{..}) and (??{..}) patterns would require disabling of
 that optimisation). /tg would give a scalar count of the total
 number of matches. \G would retain its meaning.

I think Hugo's wording about the relationship makes the best sense, and
this is the suggested way forward.

=head1 CHANGES

RFC110 V1 - Original posting to perl6-language

RFC110 V2 - Reposted to perl6-language-regex

RFC110 V3 - Added Uri's alternitive m//t

RFC110 V4 - Added notes about $count = () = $string =~ /pattern/g

RFC110 V5 - Added Hugo's wording about /g and /t relationship, suggested this
is the way forward.

Unless any significant discussion takes place this RFC will move to frozen
within a week.

=head1 IMPLENTATION

Hugo:
 Implementation should be fairly straightforward,
 though ensuring that optimisations occurred precisely when they
 are safe would probably involve a few bug-chasing cycles.


=head1 REFERENCES

I brought this up on p5p a couple of years ago, but it was lost in the noise...






RFC 197 (v1) Numberic Value Ranges In Regular Expressions

2000-09-06 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Numberic Value Ranges In Regular Expressions

=head1 VERSION

  Maintainer: David Nicol [EMAIL PROTECTED]
  Date: 5 september 2000
  Mailing List: [EMAIL PROTECTED]
  Version: 1
  Number: 197
  Status: Developing

=head1 ABSTRACT

round and square bratches mated around two optional comma separated numbers
match iff a gobbled number is within the described range.

=head1 DESCRIPTION

=head2 the syntax of the numeric range regex element

Given a passage of regex text matching

($B1,$N1,$N2,$B2) = /(\[|\()(\-?\d*\.?\d*),(\-?\d*\.?\d*)(\]|\))/
and ($N1 = $N2 or $N1 eq '' or $N2 eq '')

we've got something we hereinafter call a "range."

=head2 what the range matches

A range matches, in the target string, a passage C(\-?\d*\.?\d*)
also known as a
"number" if and only if the number is within the range.  In the normal agebraic sense.

=head2 "within the range"

Square bracket means, that end of the range may include the range specifying
 number, and round parenthesis means, that end of the range includes numbers ov value 
up to (or down to) the number but not equal to it.

=head2 infinity

in the event that one or the other of the range specifying numbers
is the empty string, that end of the range is unbounded.  In the further event
that we have defined infinity and negative infinity on our numbers, the
square/round distinction will come into play.


=head1 COMPATIBILITY

To disambiguate ranges from character sets indluding
digits, commas, and parentheses, either put a backslash on the right
parentheses, or the comma, or
arrange things so the left hand side of the comma is greater than the
right hand side, that way this special case will not apply:

/(37.3,200)/;   # matches any number x, 37.3  x  200
/([37,))/;  # matches and saves any number = 37.
/(37\,200)/;# matches and saves the literal text '37,200'
/[-35,9)]/; # matches any number x, -35 = x  9; followed by a ]
/[3-5,9)]/; # matches a string containing any of 3,4,5,,,9 or )

=head1 IMPLEMENTATION

When applying regular expressions to numeric
data, ranges may optimize away all of the digit lookahead we must currently
indulge in to implement them in perl5.

If we have infinity defined, we'll have to recognize it in strings.

=head1 BUT WAIT THERE'S MORE

It is possible that the syntax described
in this document may help slice multidimensional
containers. (RFC 191)

=head1 REFERENCES

high school algebra





RFC 110 (v4) counting matches

2000-09-04 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

counting matches

=head1 VERSION

Maintainer: Richard Proctor [EMAIL PROTECTED]
Date: 16 Aug 2000
Last Modified: 2 Sep 2000
Version: 4
Mailing List: [EMAIL PROTECTED]
Number: 110
Status: Developing

=head1 ABSTRACT

Provide a simple way of giving a count of matches of a pattern.

=head1 DESCRIPTION

Have you ever wanted to count the number of matches of a patten?  s///g 
returns the number of matches it finds.  m//g just returns 1 for matching.
Counts can be made using s//$/g but this is wastefull, or by putting some 
counting loop round a m//g.  But this all seams rather messy. 

TomC (and a couple of others) have said that it can also be done as :
$count = () = $string =~ /pattern/g;

However many people do not like this construct, here are a couple of quotes:

jhi: Which I find cute as a demonstration of the Perl's context concept,
but ugly as hell from usability viewpoint.  

Bart Lateur: '()=' is not perfect. It is also butt ugly. It is a "dirty hack".

This construct is also likely to be inefficient as perl will have to
build up a list of all the matches, store them somewhere, count them, then
throw them away.

Therefore I would like a way of counting matches.

=head2 Proposal

m//gt (or m//t see below) would be defined to do the match, and return the
count of matches, this leaves all existing uses consistent and unaffected.
/t is suggested for "counT", as /c is already taken.

Relationship of m//t and m//g - there are two possibilities, my original:

m//gt, where /t adds counting to a group match (/t without /g would just
return 0 or 1).  However \G loses its meaning.

The Alternative By Uri :

m//t and m//g are mutually exclusive and m//gt should be regarded as an error.

I have no preference.

=head1 CHANGES

RFC110 V1 - Original posting to perl6-language

RFC110 V2 - Reposted to perl6-language-regex

RFC110 V3 - Added Uri's alternitive m//t

RFC110 V4 - Added notes about $count = () = $string =~ /pattern/g

=head1 IMPLENTATION

No idea

=head1 REFERENCES

I brought this up on p5p a couple of years ago, but it was lost in the noise...





RFC 170 (v1) Generalize =~ to a special-purpose assignment operator

2000-08-29 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Generalize =~ to a special-purpose assignment operator

=head1 VERSION

   Maintainer: Nathan Wiger [EMAIL PROTECTED]
   Date: 29 Aug 2000
   Mailing List: [EMAIL PROTECTED]
   Version: 1
   Number: 170
   Status: Developing
   Requires: RFC 164

=head1 ABSTRACT

Currently, C=~ is only available for use in specific builtin pattern
matches. This is too bad, because it's really a neat operator.

This RFC proposes a simple way to make it more general-purpose.

=head1 DESCRIPTION

First off, this assumes RFC 164. Second, it requires you drop any
knowledge of how C=~ currently works. Finally, it runs directly
counter to RFC 139, which proposes another application for C=~.

This RFC proposes a simple use for C=~: as a last-argument rvalue
duplicator. What this means is that an expression such as this:

   $value = dostuff($arg1, $arg2, $value);

Could now be rewritten as:

   $value =~ dostuff($arg1, $arg2);

And C$value would be implicitly transferred over to the right side as
the last argument. It's simple, but it makes what is being operated on
very obvious.

This enables us to rewrite the following constructs:

   ($name) = split /\s+/, $name;
   $string = quotemeta($string);
   @array = reverse @array;
   @vals = sort { $a = $b } @vals;

   $string = s/\s+/SPACE/, $string;# RFC 164
   $matches = m/\w+/, $string; # RFC 164
   @strs = s/foo/bar/gi, @strs;# RFC 164

As the shorter and more readable:

   ($name) =~ split /\s+/;
   $string =~ quotemeta;
   @array =~ reverse;
   @vals =~ sort { $a = $b };

   $string =~ s/\s+/SPACE/;# looks familiar
   $string =~ m/\w+/;  # this too [1]
   @strs =~ s/foo/bar/gi;  # cool extension

It's a simple solution, true, but it has a good amount of flexibility
and brevity. It could also be the case that multiple values could be
called and returned, so that:

   ($name, $email) = special_parsing($name, $email);

Becomes:

   ($name, $email) =~ special_parsing;

Again, it's simple, but seems to have useful applications.

=head1 IMPLEMENTATION

Simplistic (hopefully).

=head1 MIGRATION

This introduces new functionality, which allows backwards compatibility
for regular expressions. As such, it should require no special
translation of code. This RFC assumes RFC 164 will be adopted (which it
may not be) for changes to regular expressions.

True void contexts may also render some parts of this moot, in which
case coming up with a more advanced use for C=~ may be desirable.

=head1 NOTES

[1] That m// one doesn't quite work right, but that's a special case
that I would suggest should be caught by some other part of the grammar
to maintain backwards compatability (like bare //).

=head1 REFERENCES

RFC 164: Replace =~, !~, m//, and s/// with match() and subst()

RFC 139: Allow Calling Any Function With A Syntax Like s///




RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Replace =~, !~, m//, and s/// with match() and subst()

=head1 VERSION

   Maintainer: Nathan Wiger [EMAIL PROTECTED]
   Date: 27 Aug 2000
   Version: 1
   Mailing List: [EMAIL PROTECTED]
   Number: 164

=head1 ABSTRACT

Several people (including Larry) have expressed a desire to get rid of
C=~ and C!~. This RFC proposes a way to replace Cm// and Cs///
with two new builtins, Cmatch() and Csubst(). 

=head1 DESCRIPTION

=head2 Overview

Everyone knows how C=~ and C!~ work. Several proposals, such as RFCs
135 and 138, attempt to fix some stuff with the current pattern-matching
syntax. Most proposals center around minor modifications to Cm// and
Cs///.

This RFC proposes that Cm// and Cs/// be dropped from the language
altogether, and instead be replaced with new Cmatch and Csubst
builtins, with the following syntaxes:

   $res = match /pattern/flags, $string
   $new = subst /pattern/newpattern/flags, $string

These subs are designed to mirror the format of Csplit, making them
more consistent. Unlike the current forms, these return the modified
string, leaving C$string alone. (Unless they are called in a void
context, in which case they act on and modify C$_ consistent with
current behavior).

Extra arguments can be dropped, consistent with Csplit and many other
builtins:

   match;  # all defaults (pattern is /\w+/?)
   match /pat/;# match $_
   match /pat/, $str;  # match $str
   match /pat/, @strs; # match any of @strs

   subst;  # like s///, pretty useless :-)
   subst /pat/new/;# sub on $_
   subst /pat/new/, $str;  # sub on $str
   subst /pat/new/, @strs; # return array of modified strings
 
These new builtins eliminate the need for C=~ and C!~ altogether,
since they are functions just like Csplit, Cjoin, Csplice, and so
on.

Sometimes examples are easiest, so here are some examples of the new
syntax:

   Perl 5   Perl 6
    --
   if ( /\w+/ ) { } if ( match ) { }
   die "Bad!" if ( $_ !~ /\w+/ );   die "Bad!" if ( ! match ); 
   ($res) = m#^(.*)$#g; ($res) = match #^(.*)$#g;

   next if /\s+/ || /\w+/;  next if match /\s+/ or match /\w+/;
   next if ($str =~ /\s+/) ||   next if match /\s+/, $str or 
   ($str =~ /\w+/)  match /\w+/, $str;
   next unless $str =~ /^N/;next unless match /^N/, $str;
   
   $str =~ s/\w+/$bob/gi;   $str = subst /\w+/$bob/gi, $str;
   ($str = $_) =~ s/\d+/func/ge;   $str = subst /\d+/func/ge;
   s/\w+/this/; subst /\w+/this/; 

   # These are pretty cool...   
   foreach (@old) { @new = subst /hello/X/gi, @old;
  s/hello/X/gi;
  push @new, $_;
   }

   foreach (@str) { print "Got it" if match /\w+/, @str;
  print "Got it" if (/\w+/);
   }

This gives us a cleaner, more consistent syntax. In addition, it makes
several things easier, is more easily extensible:

   callsomesub(subst(/old/new/gi, $mystr));
   $str = subst /old/new/i, $r-getsomeval;

and is easier to read English-wise. However, it requires a little too
much typing. See below.

=head2 Concerns

This should be carefully considered. It's good because it gets rid of
"yet another odditty" with a more standard syntax that I would argue is
more powerful and consistent. However, it also causes everyone to
relearn how to match and substitute patterns. This must be a careful,
conscious decision, lest we really screw stuff up.

That being said, since my intial post I have received several personal
emails endorsing this, hence the reason I decided to RFC it. So it's an
option, it just has to be powerful enough for people to see the "big
win".

Finally, it requires a little too much typing still for my tastes.
Perhaps we should make "m" and "s" at least shortcuts to the names,
possibly allowing users to bind them to the front of the pattern
(similar to some of RFC 138's suggestions). Maybe these two could be
equivalent:

$new = subst /old/new/i, $old;   ==$new = s/old/new/i, $old;

And then it doesn't look that radical anymore. This is similar to RFC
138, only C$old is not modified.

=head1 IMPLEMENTATION

Hold your horses

=head1 MIGRATION

This would be huge. Every pattern match would have to be translated,
every Perl hacker would have to relearn patterns, and every Perl 5
book's regexp section would be instantly out of date. Like I said, this
is not a simple decision. But if there's obvious increases in power, I
think people will appreciate the change, not dread it. At the very least
it makes Perl much more consistent.

=head1 REFERENCES

This is a synthesis of several ideas from myself, Ed Mills, and Tom C

RFC 138: Eliminate =~ operator. 

RFC 135: Require explicit m on matches, even with ?? and // as
delimiters.




RFC 165 (v1) Allow Varibles in tr///

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Allow Varibles in tr///

=head1 VERSION

Maintainer: Richard Proctor [EMAIL PROTECTED]
Date: 27 Aug 2000
Mailing List: [EMAIL PROTECTED]
Version: 1
Number: 165

=head1 ABSTRACT

Allow variables in a tr///.  At present the only way to do a tr/$foo/$bar/
is to wrap it up in an eval.  I dont like using evals for this sort of thing.

=head1 DESCRIPTION

Suggested syntax: tr/$foo/$bar/e

With a /e, tr will expand both the LHS and RHS of the translate function.
Either or both could be variables. I am suggesting /e as it is sort of like
/e for s///e.

=head1 IMPLENTATION

No idea, but should be straight forward.

=head1 REFERENCES

None yet.