Re: RFC 331 (v2) Consolidate the $1 and C\1 notations

2000-10-03 Thread Dave Storrs



On Mon, 2 Oct 2000, Bart Lateur wrote:

 On Mon, 2 Oct 2000 12:46:06 -0700 (PDT), Dave Storrs wrote:
 
  Well, the main reason is that @/ worked best for my particular
 brain.
 
 But you cannot use it in an ordinary regex, can you? There's no way you
 can put $/[1] between slashes in s/.../.../. BAckslashing it doesn't
 work.

True...which means that either perl does Deep Magic to allow it (a
solution I don't like) or (the solution I DO like) the programmer uses
different delimiters on pattern matchs that will contain the @/
variable...which is a good hint that something unusual is happening, which
is a good thing.

 @
 wouldn't be quite the right match...after all, $ contains the _string_
 
 No, but it's closer. $ is closer in meaning to $1 than is, for example,
 $/. *Much* closer.

Hmmm...I see your point.  I've frozen the RFC as per
deadline...Nate, is it too late for me to make a minor semantic change and
rename a proposed variable?

Dave




Re: RFC 331 (v1) Consolidate the $1 and C\1 notations

2000-09-30 Thread Dave Storrs



On Sat, 30 Sep 2000, Bart Lateur wrote:

 I wrote this before, but apparently you didn't hear it. Let me repeat:

You're right, I missed your email when I was incorporating things
into the new version.  Apologies.


 $foo on the LHS allows metacharacter matching, for example "a.*b" can
 match "a foo b". But \1 only allows literal strings. If $1 captured

I don't believe it matters...my version of $1 works exactly like
the current \1 and my $/[1] works exactly like the current $1.  

Dave




Re: RFC 331 (v1) Consolidate the $1 and C\1 notations

2000-09-29 Thread Dave Storrs



On Thu, 28 Sep 2000, Hugo wrote:

 :=item *
 :/(foo)_C\1_bar/
 
 Please don't do this: write C/(foo)_\1_bar/ or /(foo)_\1_bar/, but
 don't insert C in the middle: that makes it much more difficult to
 read.

Sorry; that was a global-replace error that I missed on
proofreading.

 
 :mean different things:  the second will match 'foo_foo_bar', while the
 :first will match 'foo[SOMETHING]bar' where [SOMETHING] is whatever was
 
 should be: foo_[SOMETHING]_bar

Um, yeah, it should...(jeez...I proofed this like three times,
honest!)  *blush*

 
 :captured in the Bprevious match...which could be a long, long way away,
 
 This seems a bit unfair. It is just another variable. Any variable
 you include in a pattern, you are assumed to know that it contains
 the intended value - there is nothing special about $1 in this regard.

Fair enough; the point I was trying to make was that \1 was
captured right here, while $1 was capturd long, long ago in a pattern
match far, far away. The visual/cognitive difference is small, but the
programming difference is huge.


 :=item *
 :${P1} means what $1 currently means (first match in last regex)
 
 Do you understand that this is the same variable as $P1? Traditionally,
 perl very rarely coopts variable names that start with alphanumerics,
 and (off the top of my head) all the ones it does so coopt are letters
 only (ARGV, AUTOLOAD, STDOUT etc). I think we need better reasons to
 extend that to all $P1-style variables.

I do understand that, and I agree with your concern.  Actually, I
didn't think that ${P1} was a particularly good notation even as I was
suggesting it...I just wanted to get the RFC up there before the deadline
so that people could discuss it.

Having now thought about it more, I think that (?P1) is
better...in other words, make references to the previous pattern match be
a regex _extension_, not a core feature (if that's a valid way to phrase
the distinction).


 What is the migration path for existing uses of $P1-style variables?

Wherever p526 sees a pattern that contains a $1, it should replace
it with (?P1).

 

 :=item *
 :s/(bar)(bell)/${P1}$2/   # changes "barbell" to "foobell"
 
 Note that in the current regexp engine, ${P1} has disappeared by the
 time matching starts. Can you explain why we need to change this?
 Note also that if you are sticking with ${P1} either we need to
 rename all existing user variables of this form, or we can no longer
 use the existing 'interpolate this string' (or eval, double-eval etc)
 routines, and have to roll our own for this (these) as well.

I'm a bit confused by the way this came out but, if I understand
what you're asking, then I believe your concerns are solved by the new
proposed syntax.  Am I right?


 :This may require significant changes to the regex engine, which is a topic
 :on which I am not qualified to speak.  Could someone with more
 :knowledge/experience please chime in?
 
 Currently the regexp compiler is handed a string in which $variables
 have already interpolated. [...]

I know there are certain exceptions to this...my Camel III says
(something to the effect of--I don't have it in front of me) "if there is
any doubt as to whether something should be interpolated or left for the
Engine, it will be left for the Engine."

In any case, I don't think this needs to change.  I'm simply
changing what the names of the variables and backreferences are...\1
becomes (the new) $1, and (the current) $1 becomes (?P1)

 Changing the lifetime of backreferences feels likely to be difficult,
 but it isn't clear to me what you are trying to achieve here. I think
 you at least need to add an example of how it would act under s///g
 and s///ge.

Good point.  I'll do that.

 :RFC 276: Localising Paren Counts in qr()s.
 
 I didn't see a mention of these in the body of the proposal.

276 is rather tangentially related, I grant.  However, I felt that
if my proposal went forward, it could impact on how 276 was implemented,
so I crossreferenced to it.

Dave 




Re: RFC 331 (v1) Consolidate the $1 and C\1 notations

2000-09-29 Thread Dave Storrs



On Fri, 29 Sep 2000, Hildo Biersma wrote:

  Currently, C\1 and $1 have only slightly different meanings within a
  regex.  Let's consolidate them together, eliminate the differences, and
  settle on $1 as the standard.
 
 Sigh.  That would remove functionality from the language.
 
 The reason why you need \1 in a regular expression is that $1, $2, ...
 are interpolated from the previous regular expression.  This allows me
 to do a pattern match that captures variables, then use the results of
 that to create a second regular expression. (Remember: A regexp
 interpolates first, then compiles the pattern).


Umm...with all due respect, did you read the RFC?  Because what I
proposed does not eliminate any functionality.  

Dave




is \1 vs $1 a necessary distinction?

2000-09-27 Thread Dave Storrs

Both \1 and $1 refer to what is matched by the first set of parens in a
regex.  AFAIK, the only difference between these two notation is that \1
is used within the regex itself and $1 is used outside of the regex.  Is
there any reason not to standardize these down to one notation (i.e.,
eliminate one or the other)?

Dave




Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Dave Storrs



On Wed, 27 Sep 2000, Jonathan Scott Duff wrote:

 If $1 could be made to work properly on the LHS of s///, I'd vote for
 that being The Way.

That was pretty much my thought?




Re: is \1 vs $1 a necessary distinction?

2000-09-27 Thread Dave Storrs



On 27 Sep 2000, Piers Cawley wrote:

  Do we *want* to maintain \1?  Why have two notations to do the
 
 I'm kind of curious about what happens when you want to do, say:
 
   if (m/(\S+)/) {
  $reg = qr{(em|i|b)($1)/\1};
   }
 
 where the $1 in the regex quote is refering to $1 from the previous
 regex match.

Well, how about this:

  $reg = qr{(em|i|b)(${P1})/\1};
NOTE:  ^

If you assume that $1 and ${1} are equivalent (which makes it
possible to have as many backrefs as you want), then you could say that,
if the first character after the { is a P, it means "in the previous regex
match."

Dave