Re: is \1 vs $1 a necessary distinction?

2000-09-28 Thread Bart Lateur

On Wed, 27 Sep 2000 10:34:48 -0500, Jonathan Scott Duff wrote:

>If $1 could be made to work properly on the LHS of s///, I'd vote for
>that being The Way.

I disagree, because \1 is different from a variable $foo in at least two
ways:

 * $foo is compiled into /$foo/ before anything is matched. \1 is a
repetition of what was just matched; this is dynamic interpolation
instead of static.

 * if $foo contains metacharacters, they are treated as metacharacters.
for example, if $foo is "a.b", then /$foo/ can match "axb". /\1/, OTOH,
can only match the LITERAL string that $1 captured. With $foo='a.b', 

/($foo)!$foo/

and

/($foo)!\1/

will not match the same set of things.

"\1" is more like equivalent to "\Q$1\E". Therefore, I don't want $1 on
the LHS to be the standard syntax.

-- 
Bart.



Re: is \1 vs $1 a necessary distinction?

2000-09-28 Thread Piers Cawley

Dave Storrs <[EMAIL PROTECTED]> writes:

> On 27 Sep 2000, Piers Cawley wrote:
> 
> > >   Do we *want* to maintain \1?  Why have two notations to do the
> > 
> > I'm kind of curious about what happens when you want to do, say:
> > 
> >   if (m/(\S+)/) {
> >  $reg = qr{<(em|i|b)>($1)};
> >   }
> > 
> > where the $1 in the regex quote is refering to $1 from the previous
> > regex match.
> 
>   Well, how about this:
> 
>   $reg = qr{<(em|i|b)>(${P1})};
> NOTE:  ^  
> 
>   If you assume that $1 and ${1} are equivalent (which makes it
> possible to have as many backrefs as you want), then you could say that,
> if the first character after the { is a P, it means "in the previous regex
> match."

Oh good ghod. That is *vile*.

-- 
Piers




Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Richard Proctor

On Wed 27 Sep, Dave Storrs wrote:
> 
> 
> On Wed, 27 Sep 2000, Richard Proctor wrote:
> > > Both \1 and $1 refer to what is matched by the first set of parens in a
> > > regex.  AFAIK, the only difference between these two notation is that
> > > \1 is used within the regex itself and $1 is used outside of the
> > > regex.  Is there any reason not to standardize these down to one
> > > notation (i.e., eliminate one or the other)?
> > 
> > I think this is fixable.  
> 
>   The way you phrase that makes it sound that other people perceive
> this as a problem as well, which gives me all sorts of warm fuzzies. :>
> 
> > The only real need for this at the moment is to overcome limitations in
> > the order of expansion of regexes.  RFCs 112, 166, 276... all depend on
> > fixing this.  
> 
>   Ok, here's another question.  How the _HELL_ does everyone else on
> this bloody list keep track of every detail in every frigging RFC?  Some
> random comment comes up, and someone will go, "Oh, the third paragraph of
> the second section in RFC 0x97A already mentioned this as a parenthetical
> aside, despite the fact that its title and primary topic had no relation
> to the issue."  I still have (mumble-mumble) RFCs that I haven't even had
> time to *read*, let alone memorize every detail of!

In this context I was the author of guess what 112, 166 and 276 (though 
I admit to having to look up the number of the last one)

> 
>   Grr*grumble, grumble, moan, winge*
> 
>   Ok, back to rationality now.
> 
> > If the regex compiler gets in before the expansion of the variables to
> > make these work, it could handle $1 in all cases \1 can be retained for
> > compatibility.
> 
>   Do we *want* to maintain \1?  Why have two notations to do the
> same thing when one is clearly superior?  (\1 can only go up to \9 while
> the other could theoretically go to ${...}.)  Perl6 is breaking
> backwards compatibility and eliminating all deprecated features...let's
> get rid of \n as backreference notation.
> 

The principle issue would be what to do about use of $1 on the LHS having
its current meaning.  Which is rather good for obfuscated code, but not
terribly kind on normal programming.

Note RFC 112 covers assignment within a regex naming rather than numbering
the brackets one wishes to capture, it also covers named back references.

Currently $1 is expanded by the quoting currently before the regex compiler
gets to play, the regex compiler sees the \1 and knows what to do.  \ meaning
refer back I am reasonably happy with, the numbers I am not.

Richard

-- 

[EMAIL PROTECTED]




Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Dave Storrs



On 27 Sep 2000, Piers Cawley wrote:

> > Do we *want* to maintain \1?  Why have two notations to do the
> 
> I'm kind of curious about what happens when you want to do, say:
> 
>   if (m/(\S+)/) {
>  $reg = qr{<(em|i|b)>($1)};
>   }
> 
> where the $1 in the regex quote is refering to $1 from the previous
> regex match.

Well, how about this:

  $reg = qr{<(em|i|b)>(${P1})};
NOTE:  ^

If you assume that $1 and ${1} are equivalent (which makes it
possible to have as many backrefs as you want), then you could say that,
if the first character after the { is a P, it means "in the previous regex
match."

Dave





Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Randal L. Schwartz

> "Jonathan" == Jonathan Scott Duff <[EMAIL PROTECTED]> writes:

Jonathan> On Wed, Sep 27, 2000 at 08:15:53AM -0700, Dave Storrs wrote:
>> Both \1 and $1 refer to what is matched by the first set of parens in a
>> regex.  AFAIK, the only difference between these two notation is that \1
>> is used within the regex itself and $1 is used outside of the regex.  Is
>> there any reason not to standardize these down to one notation (i.e.,
>> eliminate one or the other)?

Jonathan> \1 can be used on the LHS of a s/// whereas $1 there probably won't do
Jonathan> what you expect.  Also, \1, \2, \3 only takes you as far as \9 ;-)

Wrong.  If you have more than 10 parens visible so far, \10 works just fine.

Jonathan> If $1 could be made to work properly on the LHS of s///, I'd vote for
Jonathan> that being The Way.

It can't ever.  It means $1 from the previous match.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Piers Cawley

Dave Storrs <[EMAIL PROTECTED]> writes:

> On Wed, 27 Sep 2000, Richard Proctor wrote:
> > > Both \1 and $1 refer to what is matched by the first set of parens in a
> > > regex.  AFAIK, the only difference between these two notation is that \1
> > > is used within the regex itself and $1 is used outside of the regex.  Is
> > > there any reason not to standardize these down to one notation (i.e.,
> > > eliminate one or the other)?
> > 
> > I think this is fixable.  
> 
>   The way you phrase that makes it sound that other people perceive
> this as a problem as well, which gives me all sorts of warm fuzzies. :>
> 
> >The only real need for this at the moment is to
> > overcome limitations in the order of expansion of regexes.  RFCs 112, 166,
> > 276... all depend on fixing this.  
>
> [...]
> 
> >If the regex compiler gets in before the
> > expansion of the variables to make these work, it could handle $1 in all cases
> > \1 can be retained for compatibility.
> 
>   Do we *want* to maintain \1?  Why have two notations to do the
> same thing when one is clearly superior?  (\1 can only go up to \9 while
> the other could theoretically go to ${...}.)  Perl6 is breaking
> backwards compatibility and eliminating all deprecated features...let's
> get rid of \n as backreference notation.

I'm kind of curious about what happens when you want to do, say:

  if (m/(\S+)/) {
 $reg = qr{<(em|i|b)>($1)};
  }

  while (<>) {
next unless m{$reg};
...
  }

where the $1 in the regex quote is refering to $1 from the previous
regex match.

-- 
Piers




Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Uri Guttman

> "DS" == Dave Storrs <[EMAIL PROTECTED]> writes:

  DS> Both \1 and $1 refer to what is matched by the first set of parens
  DS> in a regex.  AFAIK, the only difference between these two notation
  DS> is that \1 is used within the regex itself and $1 is used outside
  DS> of the regex.  Is there any reason not to standardize these down
  DS> to one notation (i.e., eliminate one or the other)?


because $1 having be set previously will be interpolated INTO the new
regex. so you have to have another notation to refer to grabbed stuff
from the current regex.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  ---  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  --  http://www.northernlight.com



Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Dave Storrs



On Wed, 27 Sep 2000, Richard Proctor wrote:
> > Both \1 and $1 refer to what is matched by the first set of parens in a
> > regex.  AFAIK, the only difference between these two notation is that \1
> > is used within the regex itself and $1 is used outside of the regex.  Is
> > there any reason not to standardize these down to one notation (i.e.,
> > eliminate one or the other)?
> 
> I think this is fixable.  

The way you phrase that makes it sound that other people perceive
this as a problem as well, which gives me all sorts of warm fuzzies. :>

>The only real need for this at the moment is to
> overcome limitations in the order of expansion of regexes.  RFCs 112, 166,
> 276... all depend on fixing this.  

Ok, here's another question.  How the _HELL_ does everyone else on
this bloody list keep track of every detail in every frigging RFC?  Some
random comment comes up, and someone will go, "Oh, the third paragraph of
the second section in RFC 0x97A already mentioned this as a parenthetical
aside, despite the fact that its title and primary topic had no relation
to the issue."  I still have (mumble-mumble) RFCs that I haven't even had
time to *read*, let alone memorize every detail of!

Grr*grumble, grumble, moan, winge*

Ok, back to rationality now.

>If the regex compiler gets in before the
> expansion of the variables to make these work, it could handle $1 in all cases
> \1 can be retained for compatibility.

Do we *want* to maintain \1?  Why have two notations to do the
same thing when one is clearly superior?  (\1 can only go up to \9 while
the other could theoretically go to ${...}.)  Perl6 is breaking
backwards compatibility and eliminating all deprecated features...let's
get rid of \n as backreference notation.

Dave




Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Dave Storrs



On Wed, 27 Sep 2000, Jonathan Scott Duff wrote:

> If $1 could be made to work properly on the LHS of s///, I'd vote for
> that being The Way.

That was pretty much my thought?




Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Michael Maraist


From: "Dave Storrs" <[EMAIL PROTECTED]>

> Both \1 and $1 refer to what is matched by the first set of parens in a
> regex.  AFAIK, the only difference between these two notation is that \1
> is used within the regex itself and $1 is used outside of the regex.  Is
> there any reason not to standardize these down to one notation (i.e.,
> eliminate one or the other)?

\1 came from sed and friends.  I think an early driving force was
maintaining familiarity with things like awk and sed.  Even today there are
still people that switch to and from other reg-ex languages.  Emacs is the
most common for me (though I still dabble with awk).  I don't see a real
advantage in taking out \1, and it is very likely to needlessly break legacy
code, and additionally confuse various developers that have a habbit of
using \1.

On the other hand, the use of $1with substitutions is important for
consistency.  When you write s/../.../e, you're going to need to use a
substitution variable, "\1" just doesn't fit.
s/(...)/pre\1post/;  works fine
s/(...)/pre$1post/; is the question. I tend to use it only because I
sometimes switch to:
s/(...)/func() . "$1post"/e;  for various reasons..  I just try and
standardize on $1, but that's just me.

Additionally the use of $1 in the matching reg-ex is ambiguous as in:
m/(...).*?$1/;
Does it refer to the internal set of (..), or does it mean the previous
value of $1 before this match.. This becomes non-obvious to the observer in
the following case:
m/($keyword).*?$1/;
Here, our mindset is substitution of external variables, the casual
(non-seasoned) observer might not understand that it really means:
m/($keyword).*?\1/;

My argument is that both \1 and $1 have their places, and limiting to one
type can be troublesome.  Plus, TMTOWTDI. :)

-Michael




Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Richard Proctor



Dave,

> Both \1 and $1 refer to what is matched by the first set of parens in a
> regex.  AFAIK, the only difference between these two notation is that \1
> is used within the regex itself and $1 is used outside of the regex.  Is
> there any reason not to standardize these down to one notation (i.e.,
> eliminate one or the other)?

I think this is fixable.  The only real need for this at the moment is to
overcome limitations in the order of expansion of regexes.  RFCs 112, 166,
276... all depend on fixing this.  If the regex compiler gets in before the
expansion of the variables to make these work, it could handle $1 in all cases
\1 can be retained for compatibility.

Richard





Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Jonathan Scott Duff

On Wed, Sep 27, 2000 at 08:15:53AM -0700, Dave Storrs wrote:
> Both \1 and $1 refer to what is matched by the first set of parens in a
> regex.  AFAIK, the only difference between these two notation is that \1
> is used within the regex itself and $1 is used outside of the regex.  Is
> there any reason not to standardize these down to one notation (i.e.,
> eliminate one or the other)?

\1 can be used on the LHS of a s/// whereas $1 there probably won't do
what you expect.  Also, \1, \2, \3 only takes you as far as \9 ;-)

If $1 could be made to work properly on the LHS of s///, I'd vote for
that being The Way.

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



is \1 vs $1 a necessary distinction?

2000-09-27 Thread Dave Storrs

Both \1 and $1 refer to what is matched by the first set of parens in a
regex.  AFAIK, the only difference between these two notation is that \1
is used within the regex itself and $1 is used outside of the regex.  Is
there any reason not to standardize these down to one notation (i.e.,
eliminate one or the other)?

Dave