Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>Nice summary, but I'm not buying what you're selling in the elaboration.

Then you lose, because I am not allowed to disagree with you anymore.
And everyone else has already written you off.

And the answer to "what breaks if mimimal matching is overall but
maximal matching is local"--or even, "if we change it all"-- is
a zillion programs, including just about any progressive match:

while (/.*?(\w+)=(\S+)/g) {
push @{ $h{$1} }, $2;
} 

I can't wait for that to match the rightmost one and then fail.  Bah.

>>/dev/null

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>Take.  It.  To.  Private.  Email.  Please.

I'm going to do better.  I'm taking it to /dev/null.
It's not worth my wasting my life over.  Nobody
agrees with this guy, so it doesn't matter.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

And while I'm at it, consider /(.*)(.*)(.*)/, which we'll call
/ABC./  You need to be able to say all of these independently
and in conjunction with one another:

whether segment A   is longest or shortest overall 
whether segment  B  is longest or shortest overall 
whether segment   C is longest or shortest overall 

whether segment AB  is longest or shortest overall 
whether segment  BC is longest or shortest overall 

whether segment ABC is longest or shortest overall 

Imagine wanting, in /ABC/, A and B to be minimal, C to be maximal,
AB to be maximal, BC to be minimal, and ABC to be maximal.

Does this not strike fear into your heart?  The very notation we'd
have to devise should itself be plenty sufficient to give you serious
pause--and that's not even considering the heat-death problem of
guaranteed worst-case behavior that the word "overall" mandates.

Be very afraid.
--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>At worst, this should take no more than double the amount of time that the
>single pass did, probably less.  Hardly a cause to concern ourselves with
>the heat death of the universe.

Oh really?  We have shown that for the kind of global overall
analysis that you are asking for, that in the general case, all
possible paths much be taken.  You cannot short-circuit, because
you must first consider all possibilities and then weigh each valid
result against each other valid result.

Consider something like /.*/ or /.*?/.  For a string a length N,
there are

(N+1) (N+2)
---
 2

substrings that that matches.  That means that an 80-byte string
has some 3321 possible substrings, all of which must be considered.

In the short-circuiting version, the Engine need consider but one
single solitary case for each of those.  3321 is not the double of 1.

Consider now something like /(.*)(.*)/ or /(.*?)(.*?)/ or /(.*)(.*?)/
or /(.*)(.*?)/.  You now have


   2
( (N+1) (N+2) )
---
   4

cases to consider, or, in the case of an 80-byte string, some
11,029,041 possible choices.  

And with the current, normal, standard, short-circuiting system, 
the Engine has to consider, hm... could it be just one possibility?
And that's just with two wildcards.  People are often writing more
than that.

Can you now see why this would be a problem?  And how even in the
cases where it didn't actually break old programs (many of which it
would!) that it would cause many many them to apparently hang, 
racing for electron death?

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>That would be a strange regexp, but I never suggested it.  I suggested the
>regexp /b.*?d/ and pointed out that I believe "bd" is a more intuitive
>match than "d".  That was the matching text, not the regexp, sorry
>if I didn't make that clear.

Fine.  What you said is 

first
find a b
then 
find any non-newline, repeated 0 to N times
then 
find a d

What part of "first find a b" do you expect a randomizing solution to?
That's very clear.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>You can't explain why "d" matches without making reference to the
>absolute priority of the leftmost rule.  "bd" would still make sense
>(locally) without reference to that rule.

Nope.  Nope, nope, and nope.

Th8is /d/ thing, which is completely unrealistic and
non-real-worldly, says:

find a 
b
such that this is immediately followed by 
b
such that this is immediately followed by 
b
such that this is immediately followed by 
b
such that this is immediately followed by 
c
such that this is immediately followed by 
c
such that this is immediately followed by 
c
such that this is immediately followed by 
c
such that this is immediately followed by 
d

If you think that people for "find a b" to suddently mean 
something stochastic, you know different people than I do.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>On Fri, 15 Dec 2000, Tom Christiansen wrote:

>> >As for special-case rules, I believe that my proposed modification would
>> >REMOVE a special-case semantic rule, at the cost of added complexity at the
>> >implementation level.  
>> 
>> What is this alleged "special-case rule" you are talking about?
>> There is no such thing.  None.  When you write /pat/, it means to
>> find the first such pattern.  There is no special case here.

>The special case is "as long as it has the earliest starting position".

>There may be many, many possible matches for a regexp in a given string,
>especially with an expression as inclusive as ".*".  

You want to change things from "find a match", which has the obviously
deterministic semantics of finding the first match, and alter that
to mean "find all possible matches; now, amongst those...".  This
is much more complicated, at many levels.

You have yet to address my long mail to you.

You have yet to read MRE.

>So, you have to apply some disambiguating rules to identify which matches
>are "interesting" enough to be worth paying attention to.  

There is no ambiguity.  Short-circuiting it not ambiguity.  Stopping when
you have an answer is not ambiguity.  You are mistaken.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>We may have to "agree to disagree".  

I shan't be doing that.

>I'm understand why people believe in
>the current semantics, but I've seen no indication that anyone else
>understands why I believe in these alternative semantics, or has tried.
>(Disagreeing with my conclusion doesn't preclude understanding where I'm
>coming from, but nobody seems to.)

You have not addressed the heat death of the universe as I and
others have illustrated.  Finding all possible matches is very often
completely infeasible.  Please solve the electron decay problem
before continuing.

>Well, obviously we could.  Maybe we shouldn't, but we could do it.  Many,
>many existing programs depended on Perl 4's magic behavior with @'s in
>double-quoted strings, yet Perl 5 broke them all with a fatal error during
>the compile phase.  People survived.  They adapted and moved on.  

Red herring.

>Unlike
>that incompatibility, this one would probably affect few programs.

You're wrong.  Incredibly wrong.  

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>Really?  I haven't taken a survey, but I did ask one co-worker for his
>first impression of what the regexp (from my example) would match.  Not
>being an experienced Perl programmer, but being familiar with regular
>expressions, he believed he understood the idea of non-greedy matching.
>His expectation?  That would match "bd", not "d".

I'm sure you invalidated the test results by giving the wrong set up.
Listen very closely:

PERL DOES NOT HAVE GREEDY MATCHING.

Got that?  Neither does it have stingy matching.  Only the quantifiers
have such a property.  NOT THE MATCH ITSELF.Wait, let me say it again:

PERL DOES NOT HAVE GREEDY MATCHING.

There is no global greed, only local greed.  And greed is a misleading
term.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>Have you thought it through NOW, on a purely semantic level (in isolation
>from implementation issues and historical precedent), 

I've said it before, and I'll say it again: you keep using 
the word "semantic", but I do not think you know what that word means.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>> More generally, it seems to me that you're hung up on the description 
>> of "*?" as "shortest possible match".  That's an ambiguous 

>Yup, that's a bit confusing.  It's really "start matching as soon as
>possible, and stop matching as soon as possible".  (The usual greedy
>one is, of course, "keep matching as long as possible".)  The initial
>invariant part, "start as soon as possible", is the de facto and de
>jure (at least POSIX 1003.2, but probably also Single Unix)
>definition, and therefore rather non-negotiable.

It's like people who write /^.*fred/ instead of /.*fred/.  They
are forgetting something critical: where the Engine starts the serach.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>Actually, I'm not sure -- it's conceivable that the ending point would ALSO
>move inward for a different starting point within the original match.  But
>the ending point should NEVER be advanced further -- that's where the
>"leftmost over nongreedy" rule should apply instead...

Please show us your implementation for a pattern matching engine
that lets the current end-point vary.  This is very exciting,
because now you can relax the restriction that lookbehinds
must be constant width.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>I want the maximum level of OVERALL consistency for regular expressions as

We're there, thank you very much.  "Find a match" is the over-riding
sentiment, the principle semantic.  It is completely consistent with 
this.  You've got greed/nongreed very wrong.

>a whole, rather than immutable adherence to the "leftmost trumps nongreedy"
>rule currently in place.  Most of the time, I agree with the precedence of
>leftmost over nongreedy.  The example I gave is a case where I believe the
>strict adherence to the leftmost rule actually introduces complexity and
>makes the regular expression system less self-consistent.

You have yet to provide a concrete, real-world example of this allegation.  To
the contrary, you give unrealworld examples.  

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>I meant that I've never seen
>a concrete, realistic example where the current behavior is more beneficial
>to the programmer than my proposed behavior.  

Absense of evidence is hardly evidence of absence.

   `cat /vmunix` =~ /\w+/

I just love guaranteed worst-case behavior.  NOT.
It is good to short circuit.  Very good.

>(I imagine in most cases, it
>will be a moot point, since the match will usually be the same.)

Then why the bloody blazes are you arguing about this so vociferously?

>Strange argument.  Greedy matching was once considered fundamental to the
>design of regex, and the "leftmost" behavior is 100% consistent with greedy
>matching.  

Nope.  These are orthogonal, unrelated concpets.

>Yet Perl 5 added non-greedy modifiers, changing a fundamental
>aspect of every preceding regex system, and still called it a regex...

Whether a match should be minimal or maximal in no way changes
whether the language is to be deemed "regular" by the proper
definition of that term.  Back-references, which have been in Perl
since its inception, suffice to disqualify the language from that
category, but minimal and maximal alternation do not.  But this
doesn't matter.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>As for special-case rules, I believe that my proposed modification would
>REMOVE a special-case semantic rule, at the cost of added complexity at the
>implementation level.  

What is this alleged "special-case rule" you are talking about?
There is no such thing.  None.  When you write /pat/, it means to
find the first such pattern.  There is no special case here.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen

>I made a mistake in phrasing it this way, because it seemed to suggest that
>I thought it was an implementation bug that it returns "d" instead
>of "bd".  I didn't make it clear that I was trying to approach this as
>a purely SEMANTIC question, considered in isolation from the implementation
>of the system.  

You keep using "semantic".  However, I do not think that that word
means what you think it means.

>The question is, "what interpretation makes the most sense,
>at a high level", not "why does the current behavior make sense".

There are all three of them different things.

>It's not that there aren't justifications for the current behavior.  It's a
>question of perspective -- from one perspective (mine), "bd" makes more
>sense semantically.  

No, sir.  You cannot use the S word for that.

Here are the *SEMANTICS* of pattern matching in Perl:

When there's more than one match, the first match found (that is,
the leftmost) is the winner, with ties being resolved in favor of
the longer string for maximal matches and the shorter string for
minimal matches.

This is *not* an "implementational detail".  These *are* the
semantics.  You are asking for *different* semantics.

What you are doing is simply an attempt to impose a sloppy
English-language description on the behavior of the code.  Just
because you should happen to understand the English does not mean
that this describes the code.

It's like people thinking /<.*?>/ will find a tag because they are
thinking in English, not Perl.  Of course it won't.  

>I believe it it more intuitive, at the highest level.

"Intuitive" is another one of those words frequently bandied 
about that is nearly always misapplied.

WRONG:
The frobnitz interface is more intuitive.

RIGHT:
The nipple is the only intuitive human interface.

CORRECTION: 
From my own historical experiences and resulting biases, 
the frobnitz interface would have been more what
I personally without regard to anyone else would have
been expecting.

>>From a different (more implementation-oriented) perspective, the current

No, this is not "implementation-oriented".  It is merely the semantics.

>Hopefully, we can have a rational discussion about whether this semantic
>anomaly is real or imagined, what impact "fixing" it would have on the
>implementation (if it's deemed real), and whether it's worth "fixing".

I do not expect you to be rational, because I do not think we can
agree to your terms.  There is no semantic anomaly, anymore than
thinking that <.*> or <.*?> finds an HTML tag is some sort of
"semantic anomaly".   It is the result of your mistranslating between
English and code.  

>Here's where I see the disconnect happening.  I'm approaching this from a
>semantic perspective, asking myself "what should this match (ideally)?"

No, you're not.  Please stop abusing the S word.  It places you 
on no moral high ground whatsoever.

--tom



Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-14 Thread Tom Christiansen

>No question that's how it's been implemented.  But WHY would anyone want
>such behavior?  When is it beneficial?

It is beneficial because this is how it's always been, because it
is faster, because it is more expressive, because it is more powerful,
because it is more intuitive, and because it is more perlian.

In elaboration:

0) All NFAs before POSIX acted this way.  It is historically 
   consistent and perfectly expected.

1) It is obviously faster to come to an answer earlier on in the
   execution than it would be to come to an answer later.  It's
   like an expression whose evaluation short-circuits.  Also, when
   the matching sematics permit back tracking and back references,
   the combinatoric possibilities can easily explode into virtual
   unsolvability as the 2**N algorithm loses its race to the heat
   death of the universe.  Yes, if Perl did overall-longest or
   overall-shorted, this would produce a more predictable time;
   however, as we see with DFAs and POSIX NFAs, this prediction
   plays out as guaranteed *WORST-CASE* time.  It is not acceptable
   to make everyone pay the worst-case time.  Never penalize
   the whole world for the needs or desires or the few.

2) Consider the simple case, /A|B/.  In your overall longest/shortest,
   guaranteed worst-case time, both submatch A and submatch B must
   be calculated, and then the lengths of their matches both be compared.
   Perl, fortunately, does not do that.  Rather, the first one in that
   sequence wins.  That means that under the current scheme, the 
   patterns /A|B/ and /B|A/ have different semantics.  Under your 
   worst-case scheme, they do not.  Because /A|B/ and /B|A/ mean
   something different, more expressivity is provided.  This is the
   same scenario, albeit expressed slightly differently, as your
   situation.  The issues manifest in both are equivalent.

3) This leads to increased power.  It's like the difference between
   a short-circuiting "or" and one that blindly plods ahead trying
   to figure something out even when all is for naught.  Compare A&&B
   with A&B, for example.  If A is 0, then B need not be computed, 
   yet in the second version, one runs subexpression B nevertheless.
   If according to the rules of one particular system, patX and
   patY mean different things, whereas in a second system, they are
   completely interchangeable, then the first system can express
   nuances that the second one cannot.  When you have more nuances,
   more expressivity, then you have more power, because you can say
   things you could not otherwise say.  Why do C and its derivatives
   such as Perl have short-circuiting Boolean operators?  Because
   in older languages, such as Fortran and Pascal, where you did
   not have them, one quickly found that this was cumbersome and
   annoying.

4) It is more intuitive to the reader and the writer to minimize
   strange action at a distance.  It's more to remember; or, perhaps
   better phrased, more to forget.  That's why we don't like 
   variables set in one place magically affecting innocent code
   elsewhere.  Maybe it's more applicable here to say that that's
   why having mixed precedences and associativities confuses people.
   If in an expression like A->B->C->D, you had to know a prior when
   evaluating A that D was going to be coming up, it would require 
   greater look-ahead, more mental storage.  Even if a computer could
   do it, people would find it harder.  That's why we don't write

   &{&{$fnctbl{expr}}(arg1)}(arg2)

when we can simply write

   $fnctbl{expr}->(arg1)->(arg2)

It is not intuitive to people to have to do too much look-ahead, 
or too much storage.  Having distance items interact with one
another is confusing, and we've already got that situation with
backreferences, as in /(\w+)(\w+)\s+\2(\w+)/, which depending on 
how you start weighting those +'s into +?'s, can really move
matters around.  Let's not exacerbate the counterintuitiveness.

5)  It is more Perlian because of the principle that things that look 
different should actually *be* different.  /A|B/ and /B|A/ look
quite different.  Thus, they should likewise *be* different.

>I didn't need the long-winded explanation, and I don't need help with
>understanding how that regexp matches what it does.  I understand it
>perfectly well already.  I'm no neophyte with regular expressions, even if
>Perl 5 does offer some regexp features I've never bothered to exploit...

All NFAs prior to POSIX behaved in the fashion that Perl's continue
to behave in.  I am surprised that over the long course of your 
experiences with regexes, that you never noticed this fundamental
principle before.

>My point is that the current behavior, while reasonable, isn't quite right.

You're wrong.  Don't call it "not right".  It's perfectly correct
and consistent.  It follows directly from historical behavior of
these things, and quite simply, it's in the rules.  It's

Re: Perl 5's "non-greedy" matching can be TOO greedy!

2000-12-14 Thread Tom Christiansen

>Does anyone disagree with the premise, and believe that "d" is the
>CORRECT match for the non-greedy regexp above?

Yes.  The Camel's regex chapter reads:

You might say that eagerness holds priority over greed (or thrift).

>For what it's worth, here's a quote from a Perl 5.005_03 "perlre" manpage:

> By default, a quantified subpattern is "greedy", that is, it will
> match as many times as possible (given a particular starting
> location) while still allowing the rest of the pattern to match.
> If you want it to match the minimum number of times possible,
> follow the quantifier with a "?".  Note that the meanings don't 
> change, just the "greediness":

>I don't believe that ".*?" matching "bbb" above qualifies as "to match
>the minimum number of times possible", when it is possible only to match
>the "" and still match the full regexp.  Since the documentation makes
>no mention of earliest-match in this paragraph, I can only assume this is
>unintended behavior, but I'm asking to check my assumptions.  Any devil's
>advocates out there who want to argue for the current behavior?

The simple story is this:

Rule 1: Given two matches at *different* starting points, the
one that occurs earlier wins.

*OTHERWISE*

Rule 2: Given two matches at the *same* starting points, the
one that is longer wins.

Or, more lengthly:

Given the opportunity to match something a variable number of
times, maximal quantifiers will elect to maximize the repeat
count.  So when we say "as many times as you'd like", the greedy
quantifier interprets this to mean "as many times as you can
possibly get away with", constrained only by the requirement
that this not cause specifications later in the match to fail.
If a pattern contains two open-ended quantifiers, then obviously
both cannot consume the entire string: characters used by one
part of the match are no longer available to a later part.  Each
quantifier is greedy at the expense of those that follow it,
reading the pattern left to right.

That's the traditional behavior of quantifiers in regular
expressions.  However, Perl permits you to reform the behavior
of its quantifiers: by placing a C after that quantifier,
you change it from maximal to minimal.  That doesn't mean that
a minimal quantifier will always match the smallest number of
repetitions allowed by its range, any more than a maximal
quantifier must always match the greatest number allowed in its
range.  The overall match must still succeed, and the minimal
match will take as much as it needs to succeed, and no more.
(Minimal quantifiers value contentment over greed.)

For example, in the match:

"exasperate" =~ /e(.*)e/#  $1 now "xasperat"

the C<.*> matches "C", the longest possible way for
it to match.  (It also stores that value in C<$1>, as described
below under "Capturing and Clustering".)  Although there was a
shorter match available, a greedy match doesn't care.  Given
two choices at the same starting point, it always returns the
I of the two.

Contrast this with this:

"exasperate" =~ /e(.*?)e/   #  $1 now "xasp"

Here, the minimal matching version, C<.*?>, is used.  Adding
the C to C<*> makes C<*?> take on the opposite behavior: Now
given two choices at the same starting point, it always returns
the I of the two.

Although you could read C<*?> as saying to match zero or more
of something but preferring zero, that doesn't mean it will
always match zero characters.  If it did so here, for example,
and left C<$1> set to C<"">, then the second "C" wouldn't
be found, since it doesn't immediately follow the first one.

You might also wonder why, in minimally matching C,
Perl didn't stick "C" into C<$1>.  After all, "C"
also falls between two C's, and is shorter than "C".
In Perl, the minimal/maximal choice applies only when selecting
the shortest or longest from among several matches that all
have the same starting point.  If two possible matches exist,
but these start at different offsets in the string, then their
lengths don't matter--and neither does whether you've used a
minimal quantifier or a maximal one.  The earliest of several
valid matches always wins out over all latecomers.  It's only
when multiple possible matches start at the same point that you
use minimal or maximal matching to break the tie.  If the
starting points differ, there's no tie to break.  Perl's matching
is normally I; with minimal matching, it
becomes I.  But the "leftmost" part never
varies, and is the dominant criterion. Not all regex
engines work this way.  Some believe in overall greed, in which
the longest match always wins, even if it shows up later.  Perl
isn't that way.  You might say that eagerness holds 

Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-28 Thread Tom Christiansen

>I consider recursive regexps very useful:
>
> $a = qr{ (?> [^()]+ ) | \( (??{ $a }) \) };

Yes, they're "useful", but darned tricky sometimes, and in
ways other than simple regex-related stuff.  For example,
consider what happens if you do

my $regex = qr{ (?> [^()]+ ) | \( (??{ $regex }) \) };

That doesn't work due to differing scopings on either side
of the assignment.  And clearly a non-regex approach could
be more legible for recursive parsing.

--tom

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.




RFC 198 (v2) Boolean Regexes

2000-09-27 Thread Tom Christiansen

This seems very complicated.  Did you look at the Ram:6 recipe on
expressing AND, OR, and NOT  in a regex?  For example, to do
/FOO/ && /BAR/ you need not write /FOO.*BAR|BAR.*FOO/ -- and in 
fact, should not, as it doesn't work properly on some pairs!
For example, /CAN/ && /ANAL/ can't be written /CAN.*ANAL|ANAL.*CAN/
of you expect to match "CANAL".   Overlaps bite you.  You really
need /(?=.*CAN)(?=.*ANAL)/ instead, which permits multiple assertions.
Please check out the recipe I'm talking about.

--tom, from a strange place

PS: NB -- I cannot access my mail spool.  And the mailing list
archives are 4 days behind on the website, so there is no 
hope of me participating in real-time, nor in seeing any 
private replies.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.




Re: \z vs \Z vs $

2000-09-21 Thread Tom Christiansen

>I gather you're talking about //s making perl ignore the setting of $*.
>You're right, I didn't know that. But I doubt if it's that important,
>this variable already has been marked as deprecated since Perl5 came
>out, about 5 years ago. It's a good candiadte to be removed in Perl6.

Agreed.

>My point is: to most people, //s already mostly means "treat \n as an
>ordinary character". Let's draw this through, and make //s remove all
>special meanings of "\n", in particular WRT /$/.

>Then, there's the matter of combining //m and //s. It would have no
>effect in that case, because //m makes /$/ behave like /\n|\z/. //ms
>wouldn't change that.

Er, not quite.  It's a lookahead.

/foo$/   is /foo(?=\n?\z)/
/foo$/m  is /foo(?=\n|\z)/

or some such.

>p.s. The mnemonic of //s (single line) would not make any sense any
>more. It never really did work.

No, it never did.  Camel-3 doesn't use it much/really.

ModifierMeaning
---
C   Ignore alphabetic case distinctions (case insensitive).
C   Let C<.> match newline and ignore deprecated C<$*>.
C   Let C<^> and C<$> match next to embedded C<\n>.
C   Ignore (most) whitespace and permit comments in pattern.
C   Compile pattern once only.

--tom



Re: \z vs \Z vs $

2000-09-20 Thread Tom Christiansen

>Tom Christiansen wrote:
>> Don't forget /s's other meaning.

>Do you enjoy making people ask what you're talking about?  

Of course not.  I enjoy giving people enough pointers to help them 
learn things for themselves.

>What other
>meaning did you have in mind, overriding $*?

Yes.

--tom



Re: \z vs \Z vs $

2000-09-20 Thread Tom Christiansen

>That was my second thought. I kinda like it, because //s would have two
>effects:

> + let . match a newline too (current)

> + let /$/ NOT accept a trailing newline (new)

Don't forget /s's other meaning.

--tom



Re: \z vs \Z vs $

2000-09-20 Thread Tom Christiansen

>>>>>> "TC" == Tom Christiansen <[EMAIL PROTECTED]> writes:

>>> Could you explain what the problem is?

>TC> /$/ does not only match at the end of the string.
>TC> It also matches one character fewer.  This makes
>TC> code like $path =~ /etc$/ "wrong".

>Sorry, I'm missing it.

I know.  

On your "longest match", you are committing the classic error of thinking
green more important than eagerness.  It's not.

This is unrelated to /m.

Go back and read all the insanities we (mostly gbacon and your
truly) went through to fix the 5.6 release's modules.  People coded
them *WRONG*.  Wrong means incorrect behaviour.  Sometimes this
even leads to security foo.

BOTTOM LINE: You cannot use /foo$/ to say "does the string end in `foo'?".
You can't do that.  You can't even use /s to fix it.  It doesn't fix it.

This is an annoying gotcha.  Larry once said that he wished he had made  \Z
do what \z now does.  One would like $ to (be able to) mean "ONLY AT END OF
STRING".

--tom

EXAMPLE 1:

--- /usr/local/lib/perl5/5.00554/File/Basename.pm   Mon Jan  4 13:00:53 1999
+++ /usr/local/lib/perl5/5.6.0/File/Basename.pm Sun Mar 12 22:24:29 2000
@@ -37,10 +37,10 @@
 "VMS", "MSDOS", "MacOS", "AmigaOS" or "MSWin32", the file specification 
 syntax of that operating system is used in future calls to 
 fileparse(), basename(), and dirname().  If it contains none of
-these substrings, UNIX syntax is used.  This pattern matching is
+these substrings, Unix syntax is used.  This pattern matching is
 case-insensitive.  If you've selected VMS syntax, and the file
 specification you pass to one of these routines contains a "/",
-they assume you are using UNIX emulation and apply the UNIX syntax
+they assume you are using Unix emulation and apply the Unix syntax
 rules instead, for that function call only.
 
 If the argument passed to it contains one of the substrings "VMS",
@@ -73,7 +73,7 @@
 
 =head1 EXAMPLES
 
-Using UNIX file syntax:
+Using Unix file syntax:
 
 ($base,$path,$type) = fileparse('/virgil/aeneid/draft.book7',
'\.book\d+');
@@ -102,7 +102,7 @@
 The basename() routine returns the first element of the list produced
 by calling fileparse() with the same arguments, except that it always
 quotes metacharacters in the given suffixes.  It is provided for
-programmer compatibility with the UNIX shell command basename(1).
+programmer compatibility with the Unix shell command basename(1).
 
 =item C
 
@@ -111,8 +111,8 @@
 second element of the list produced by calling fileparse() with the same
 input file specification.  (Under VMS, if there is no directory information
 in the input file specification, then the current default device and
-directory are returned.)  When using UNIX or MSDOS syntax, the return
-value conforms to the behavior of the UNIX shell command dirname(1).  This
+directory are returned.)  When using Unix or MSDOS syntax, the return
+value conforms to the behavior of the Unix shell command dirname(1).  This
 is usually the same as the behavior of fileparse(), but differs in some
 cases.  For example, for the input file specification F, fileparse()
 considers the directory name to be F, while dirname() considers the
@@ -124,12 +124,22 @@
 
 
 ## use strict;
-use re 'taint';
+# A bit of juggling to insure that C always works, since
+# File::Basename is used during the Perl build, when the re extension may
+# not be available.
+BEGIN {
+  unless (eval { require re; })
+{ eval ' sub re::import { $^H |= 0x0010; } ' }
+  import re 'taint';
+}
+
+
 
+use 5.005_64;
+our(@ISA, @EXPORT, $VERSION, $Fileparse_fstype, $Fileparse_igncase);
 require Exporter;
 @ISA = qw(Exporter);
 @EXPORT = qw(fileparse fileparse_set_fstype basename dirname);
-use vars qw($VERSION $Fileparse_fstype $Fileparse_igncase);
 $VERSION = "2.6";
 
 
@@ -162,23 +172,23 @@
   if ($fstype =~ /^VMS/i) {
 if ($fullname =~ m#/#) { $fstype = '' }  # We're doing Unix emulation
 else {
-  ($dirpath,$basename) = ($fullname =~ /^(.*[:>\]])?(.*)/);
+  ($dirpath,$basename) = ($fullname =~ /^(.*[:>\]])?(.*)/s);
   $dirpath ||= '';  # should always be defined
 }
   }
   if ($fstype =~ /^MS(DOS|Win32)/i) {
-($dirpath,$basename) = ($fullname =~ /^((?:.*[:\\\/])?)(.*)/);
-$dirpath .= '.\\' unless $dirpath =~ /[\\\/]$/;
+($dirpath,$basename) = ($fullname =~ /^((?:.*[:\\\/])?)(.*)/s);
+$dirpath .= '.\\' unless $dirpath =~ /[\\\/]\z/;
   }
-  elsif ($fstype =~ /^MacOS/i) {
-($dirpath,$basename) = ($fullname =~ /^(.*:)?(.*)/);
+  elsif ($fstype =~ /^MacOS/si) {
+($dirpath,$basename) = ($fullname =~ /^(.*:)?(.*)/s);
   }
   elsif ($fstype =~ /^AmigaOS

\z vs \Z vs $

2000-09-19 Thread Tom Christiansen

What can be done to make $ work "better", so we don't have to
make people use /foo\z/ to mean /foo$/?  They'll keep writing
the $ for things that probably oughtn't abide optional newlines.

Remember that /$/ really means /(?=\n?\z)/. And likewise with \Z.

--tom



Re: What's in a Regex (was RFC 145)

2000-09-07 Thread Tom Christiansen

The phrase "die a horrible death" clearly reads that something was
a bletcherous botch--a terribly brain-damaged mistake, if you
would--and so must necessarily be expurgated from the language.

For example, when Larry said, "...this does not mean that some of
us should not want, in a rather dispassionate sort of way, to put
a bullet through csh's head," *that* was the sort of thing that
might be described as something he wanted to die a horrible death.
Yet note how mildly worded even this is.

While others sometimes say this about various elements of Perl,
Larry seldom states matters so strongly, as you did when you portrayed
him as having said that it should die a horrible death.  After all,
if he *really* felt that strongly about some (mis)feature (and yes,
this sometimes happens), then said misfeature would almost certainly
be long dead already.  Think about it. :-)

That's why I thought clarification was in order.

--tom



Re: What's in a Regex (was RFC 145)

2000-09-07 Thread Tom Christiansen

>   2. Many people - including Larry - have voiced their desire
>  to see =~ die a horrible death

Please provide a look-up-able reference to Larry's saying that he
wanted to =~ to die horrible death.  That's very strongly worded
for him.  Are you sure this tale hasn't merely grown in the telling?

--tom



Re: What's in a Regex (was RFC 145)

2000-09-07 Thread Tom Christiansen

>Can be rewritten as the shorter and more readable:

>   ($name) =~ split /\s+/;
>   $string =~ quotemeta;
>   @array =~ reverse;
>   @vals =~ sort { $a <=> $b };
>   $string =~ s/\s+/SPACE/;# looks familiar
>   $string =~ m/\w+/;  # this too 
>   @strs =~ m/\w+/;# cool extension
>   @strs =~ s/foo/bar/gi;  # ditto

Which can of course be written in an immeasuably more legible fashion
using current Perl, a little-known language:

($name) =  split /\s+/, $name;
$string =  quotemeta($string);
@array  =  reverse @array;
@vals   =  sort { $a <=> $b } @vals;
$string =~ s/\s+/SPACE/;
$string =~ /\w+/;

map { m/\w+/ } @strs;
s/foo/bar/gi for @strs;

Although the invention of redundant and obfuscated syntactic
alternatives for operations which are not only perfectly feasible
already but also more readable in their current incarnations seems
to be a not infrequent theme in these documents, one must always
carefully consider whether any scant benefit these cutesinesses
might provide can be truly worth further exacerbating the rampant
inscrutability problems (stemming mainly from punctuation in lieu
of alphabetics and from magically implicit targets, arguments, and
side-effects) for which Perl is already soundly--and not always
undeservedly--derided.

Explicitly saying precisely what you mean is perfectly acceptable--and
usually desirable.  Inventing subtleties merely to avoid typing, however,
may not be.

--tom



Re: What's in a Regex (was RFC 145)

2000-09-07 Thread Tom Christiansen

>But you said "lists" up there and that sparked an idea in me ...  What
>does 

>   @a =~ /pattern/;

>currently do?  AFAICT, nothing useful.  But it could be a syntactic
>shorcut for a pattern matching grep()

That changes semantics in places you might not expect.   What does

fn() =~ /pattern/

currently do?  It calls fn() in scalar context, of course.
But with your suggestion, the =~ operator is no longer a scalar
operator, so this changes.

--tom



Re: XML/HTML-specific ?< and ?> operators? (was Re: RFC 145 (alternate approach))

2000-09-06 Thread Tom Christiansen

>I am working on an RFC
>to allow boolean logic ( && and || and !) to apply a number of patterns to
>the same substring to allow easier mining of information out of such
>constructs. 

What, you don't like: :-)

$pattern = $conjunction eq "AND"
? join(''  => map { "(?=.*$_)" } @patterns)
| join("|" =>@patterns);

--tom



Re: XML/HTML-specific ?< and ?> operators? (was Re: RFC 145 (alternate approach))

2000-09-06 Thread Tom Christiansen

>...My point is that I think we're approaching this
>the wrong way.  We're trying to apply more and more parser power into what
>classically has been the lexer / tokenizer, namely our beloved
>regular-expression engine.

>A great deal of string processing is possible with perls enhanced NFA
>engine, but at some point we're looking at perl code that is inside out: all
>code embedded within a reg-ex.  That, boys and girls, is a parser, and I'm
>not convinced it's the right approach for rapid design, and definately not
>for large-scale robust design.

What you say has, I think, a great deal of sense.  While Jon and
I--with Nathan, actually (see inside page credits)--were trying to
figure out how to go about presenting all this wacky stuff for the
final section of the new regex chapter in the Camel:

Fancy Patterns
Lookaround Assertions
Non-Backtracking Subpatterns
Programmatic Patterns
Generated patterns
Substitution evaluations
Match-time code evaluation
Match-time pattern interpolation
Conditional interpolation
Defining Your Own Assertions

We kept coming back to sentiments remarkably similar to those you
yourself have just expressed: although I think we managed to put a
decently positive shine on the matter for the print version, it
still really seems that that the inside-outness of this is very
hard on your brain, and of remarkably abstruse appeal to the
incredibly few.  (Names of the usual suspects omitted to avoid using
four-letter words in public forums. :-)

I would welcome a less inside-out approach, as well as one that
were more procedural--or at least more symbolic and less punctuational.

--tom



Re: $& and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)

2000-08-31 Thread Tom Christiansen

>actually it is more like which code refers to $& and which regex that
>caem from. the problem stems from $& being a global and not local like
>$1. 

Say what?  They scope the same!

sub foo { /./ }
$_ = "stuff";
/.../;
foo();
print $&;

--tom



Re: RFC 72 (v1) The regexp engine should go backward as well as forward.

2000-08-31 Thread Tom Christiansen

Whenever I seem to have this "search backwards" urge (not very often,
I admit), I without much thought just throw memory at it with 

reverse($str) =~ /pat/

Or, if that's not the "search backwards" sense intended, then maybe
I'll throw time at it:

$str =~ /.*pat/

Sometimes I've also done

($str . $str) =~ /pat/

to effect a search that wraps around--kinda.

--tom



Re: copying and s/// (was Re: Overlapping RFCs 135 138 164)

2000-08-30 Thread Tom Christiansen

>Uri Guttman wrote:
>> 
>>   TC> ($this = $that) =~ s/foo/bar/;
>>   TC> for (@these = @those) { s/foo/bar/ }
>> 
>>   TC> You can't really do those in one step without it.

>RFC 164 v2 has a new syntax that lets you do the above or, if you want:

>   $this = s/foo/bar/, $that;
>   @these = s/foo/bar/, @those;

Those really aren't any more obvious to the reader than what we
already have.  Less so, in fact, since you can understand what the
current ones are doing based on simple operators and precedences.

--tom



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-30 Thread Tom Christiansen

>For the record, the UTF8 version of tr/// does not use a plain 256K
>table.  It uses a data strcuture called a 'swash'; this is also the
>data structure that is used by the UTF8 versions of 'uc' etc., the
>\p{...} regex escapes, and the others.  The swash is based on a hash,
>and the code is in utf8.c.

And is connected to a "swatch":

/usr/local/src/perl/utf8.c:/* a "swash" is a swatch hash */

--tom



Re: Overlapping RFCs 135 138 164

2000-08-30 Thread Tom Christiansen

>I was referring to the visual similarity of = and =~, when in fact they
>have nothing to do with one another. The expression I picked is just a
>frequently encountered idiom that puts the two in close proximity. Your
>proposed ~ thing would make it much rarer, but I still think =~ looks
>like something to do with assignment.

Well, with a /match/, it's read-only, true, and thus nothing
like "an assignment".  But with either s/ubs/titut/e or
a tr/ansli/terate/, you do (potentially) change the variable.

--tom



Re: copying and s/// (was Re: Overlapping RFCs 135 138 164)

2000-08-30 Thread Tom Christiansen


>  TC> ($this = $that) =~ s/foo/bar/;
>  TC> for (@these = @those) { s/foo/bar/ } 

>  TC> You can't really do those in one step without it.

>but do they really need to be combined into one step? i sometimes prefer
>the separate assignment statement for clarity. other times i feel i am
>in a compressing mood.

It's like why you can say

while ( (ch = getc()) != EOF ) { ... } 

With assignment an expression, not a statement, you can use
a larger expression.  Python people hate this. :-)

--tom



Re: copying and s/// (was Re: Overlapping RFCs 135 138 164)

2000-08-30 Thread Tom Christiansen

>RFC 164 v2 has a new syntax that lets you do the above or, if you want:

>   $this = s/foo/bar/, $that;
>   @these = s/foo/bar/, @those;

>Consistent with split, join, splice, etc, etc.

That looks tremendously *IN*consistent, since now you must alter
the laws of precedence! :-(

% perl -MO=Deparse,-p -e '$this = s/foo/bar/, $that;'
(($this = s/foo/bar/), $that);

>> but we need a better syntax for s/// that doesn't modify its string but
>> returns a copy which has had the substitution applied to it.

>See RFC 164 v2, all this is supported, as well as this:

>   @str =~ s/foo/bar/;

>Which has been a pipe dream for some time.

I can't imagine that the number of elements in @str constains
the string "foo".  Or has one decided that @array in scalar
context no longer returns that?  

Anyway, this is nothing we don't have, or which is broken.
We already have the highly readable:

for (@str) { s/foo/bar/ } 

Why do you want to magically use scalar operators on arrays?  This
was suggested back as early as perl1 or so because people wanted
to write @a + @b and even @a + 2, and Larry wasn't interested in
doing that.  That's a big new world.

--tom



Re: copying and s/// (was Re: Overlapping RFCs 135 138 164)

2000-08-30 Thread Tom Christiansen

I keep noticing the connection between 

$foo =~ /whatever/;
$foo->whatever;
for ($foo) { whatever } 

They're all topicalizers.

--tom



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-30 Thread Tom Christiansen

>Lightning flashed, thunder crashed and Tom Christiansen 
>m> whispered:
>| >Even if I only do something like tr/a/A/?
>| >And, it is going to get worse for UTF8/UTF16?  
>| 
>| Use the Source.

>If we all always used the source, we wouldn't need books and trainers.
>Where would you and I be then?

What, you can't read C code?  Try it.



Re: RFC 165: Allow variables in a tr///

2000-08-30 Thread Tom Christiansen

>| >A lot of
>| >non-gurus 
>| 
>| So what?

>There are far more non-gurus using perl than there are gurus.  If all we
>cared about was the gurus, we wouldn't need Perl.

Wrong.  And irrelevant.

>| Pick your own quotes is a perl thing.  Let them learn this concept.  
>| If they can't, you made a bad hiring choice.

>It may be a perl thing, but it isn't a Perl thing, at least not until
>"recently".  

You're completely full of ... wrongness.  Again.

% perl1 -e '$_ = "fred"; s#d#e#; print;'
free

>make it difficult on them in the first place?  Remember the easy things
>easy, etc.

Catering to people who don't know Perl in such a way that it hamstrings
those who *do* is brain-dead.  Ease-of-long-term is more important than 
ease-of-learning.  You're only a beginner once--or, if you would,
ignorance is merely an ephemeral state.  Well, for most people;
as for those who are permanently ignorant, you can't fix them,
so don't even try.

We don't write Greek using Latin letters just because more people
know Latin letters!  It's a minor point.  It's part of the language.
Pick-your-own-quotes is part of what makes Perl useful and easy to
write and easy to read.  Just as Greek would be *harder* to read
transliterated into Latin script, so too would Perl be harder to
read if you had to go shoving a backslash up the frontside of every
slash.  Teepees are *not* harder to read.  It's too much to factor
out, and makes no sense.  Just because the smallminded blow a fuse
doesn't mean we should screw Perl--those fuses were meant to be
fried.

>| Transforming everything that's syntactically distinctive in Perl a
>| simple C-looking function will homogenize it into the same boring
>| sameness (and thence to illegibility) as the proverbial fingernail
>| clippings stranded in a bowl of oatmeal.

>This is also an opinion.  Homogeneity isn't necessarily boring.  In fact,
>it can often be quite liberating and allow one far more flexibility and
>creativity than previously.

It may be just an opinion, but it is the PERL OPINION (read: part
of what makes perl, perl) and if you don't like it, go play with a
different language, or write your own.  Different things are supposed
to look different.  Different things are *not* supposed to look the
same.  You haven't read enough of Larry's writings about this.  

--tom



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-30 Thread Tom Christiansen

>Lightning flashed, thunder crashed and Mark-Jason Dominus <[EMAIL PROTECTED]
>pered:
>| > > The way tr/// works is that a 256-byte table is constructed at compile
>| > > time that say for each input character what output character is
>| > 
>| > Speaking of which, what's going to happen when there are more than 256
>| > values to map?
>| 
>| It's already happened, but I forget the details.

>Let me see if I understand this correctly.  For every tr/// in a program,
>256 bytes have to be allocated?  

Yes, once upon a time.

>Even if I only do something like tr/a/A/?
>And, it is going to get worse for UTF8/UTF16?  

Use the Source.

>Is this really the optimal
>solution for this (sorry, this is probably going into -internals space).
>Seems to me that we could very quickly end up with a really large memory
>image.

Memory usage is irrelevant compared with speed.

--tom



Re: RFC 165: Allow variables in a tr///

2000-08-30 Thread Tom Christiansen

>Personally, I would say that q/.../ and friends were a bad idea.  

That's one opinion.  As Piers points out, it's hardly universal.
Go read what I just wrote Uri.

>A lot of
>non-gurus 

So what?

see /.../ (whatever comes before it) and their first impression
>is that it has something to do with regex.  I would suggest that anything
>that isn't a regex should not use /.../.  Make q, qq, etc use matched
>pairs.  

Pick your own quotes is a perl thing.  Let them learn this concept.  
If they can't, you made a bad hiring choice.

>Make tr look like a regular function and do 
>tr(SEARCH, REPLACE, MOD, STR).  It just seems more orthagonal to me.

Transforming everything that's syntactically distinctive in Perl a
simple C-looking function will homogenize it into the same boring
sameness (and thence to illegibility) as the proverbial fingernail
clippings stranded in a bowl of oatmeal.

Don't even dream of it.  This is part of what makes Perl, Perl, you know.
Not everything looks like an import from libc.  And shoudn't.

--tom



Re: Overlapping RFCs 135 138 164

2000-08-30 Thread Tom Christiansen

>  TC> ($foo += 3) *= 2;

>that is way too many assignment ops. better is the normalized

>   $foo = ($foo + 3) * 2;

>  TC> $n = select($rout=$rin, $wout=$win, $eout=$ein, 2.5);

>who uses select directly anymore? use a module! :)

I see the smiley, but one must be exceedingly careful not to enshrine
one's own personal preferences and predilections--one's own small
choices of style and nuance--into laws inviolate, and then to further
go on to hold others accountable for not having followed those
choices that one has made for oneself and then dicated to others.

There's plenty that is convenient--to to mention familiar, reasonable,
and perhaps even idiomatically comforting--about about changing en
passant via assignment's lvaluability:

($this = $that) =~ s/foo/bar/;

or for a whole bunch of them:

for (@these = @those) { s/foo/bar/ } 

You can't really do those in one step without it.

I have in passing proposed a form of s/// that acts upon a temporary
not the original and returns the new value not the success status.
This would employ the previously unused binary ~ operator (I mean
binary as in two operands; the unary ~ is bitwise, but I don't mean
that kind of binary.)  Were this around, one could write that first
one as

$this = $that ~ s/foo/bar/:

Because the right side of the assignment is the string resulting
from that substitute, without harming $that.

By extension, the array case could be

@these = map { $_ ~ s/foo/bar/ } @those

Which is still not very appealing, actually.  Hm... 

--tom



Re: Overlapping RFCs 135 138 164

2000-08-29 Thread Tom Christiansen

>What about these, which are much the same thing in that they all
>use the lvaluability of assignment:

And don't forget:

for (@new = @old) { s/foo/bar/ } 

--tom



Re: Overlapping RFCs 135 138 164

2000-08-29 Thread Tom Christiansen

>($foo = $bar) =~ s/x/y/; will never make much sense to me. 

What about these, which are much the same thing in that they all
use the lvaluability of assignment:

chomp($line = );
($foo = $bar) += 10;
($foo += 3) *= 2;
func($diddle_me = $protect_me);
$n = select($rout=$rin, $wout=$win, $eout=$ein, 2.5);

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>I think what Tom means is that (for example)
>print "${\(localtime())}\n";
>does not produce "Tue Aug 29 19:15:55 2000".

Yup.  You are hereby appointed tchrist-to-lateur translator. :-)

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>>>p.s. Has anybody already suggested that we ought to have a nicer
>>>solution to execute perl code inside a string, replacing "${\(...)}" and
>>>"@{[...]}", which also won't ever win a beauty contest?  Oops, wrong
>>>mailing list.
>>
>>The first one doesn't work, and never did.  You want 
>>@{[]} and @{[scalar ]} instead.

>"Doesn't work"?

>   print "The sum of 1 + 2 is ${\(1+2)}.\n";
>-->
>   The sum of 1 + 2 is 3.

>I'm surprised your wouldn't have known this. The principle is the same:
>"${...}" expects a scalar reference inside the block, and '\' provides
>one. Of course, there shouldn't be a real multi-element list inside the
>parens, but just one scalar. And often, the parens aren't needed.

I'm surprised that you still don't understand.  Notice what I showed
you for the replacement above: @{[scalar ]}.

Using ${\(...)} doesn't work in the sense that contrary to popular
belief, it fails to provide a scalar context to the contents of
those parens.  Thus ${ \( fn() ) } is still calling fn() in list
context, not scalar context.  Witness:

sub fn { sprintf "called in %s context", wantarray ? "list" : "scalar" } 

print "Test 1: ";
print "@{ [fn()] }\n";

print "Test 2: ";
print "${ \(fn()) }\n";

print "Test 3: ";
print "@{ [scalar fn()] }\n";

That, when executed, yields:

Test 1: called in list context
Test 2: called in list context
Test 3: called in scalar context

*That's* why test 2 "doesn't work".

--tom



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Tom Christiansen

>tr///e is the same as s///g:
>
>tr/$foo/$bar/e  ==  s/$foo/$bar/g

I suggest you read up on tr///, sir.  You are completely wrong.

--tom



Re: RFC 165: Allow variables in a tr///

2000-08-29 Thread Tom Christiansen

>Building a tr/// table is much much simpler and much less work than
>compiling a regex, but we don't make people write

>eval " \$s =~ m/$pat/ "

>when they want to interpolate a string into a regex at run time.
>Instead, we take care of it transparently.  tr/// could easily be made
>to work the exact same way.

One thing to be careful of there is thread safety.  You can't hand
the data off the syntax node (the one with the tr op on it), because
tr/$foo/$bar/ wouldn't work for several threads in it at the same
time then.

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>p.s. Has anybody already suggested that we ought to have a nicer
>solution to execute perl code inside a string, replacing "${\(...)}" and
>"@{[...]}", which also won't ever win a beauty contest?  Oops, wrong
>mailing list.

The first one doesn't work, and never did.  You want 
@{[]} and @{[scalar ]} instead.

And I can't see you coming up with anything that's "better" than
this, since this already works and follows directly from understanding
of Perl.  Too often on these lists anything that "follows directly"
one seeks to special-case with brand-new syntax.  This is a poor
general principle.

This has nothing to do with regexes (although it could if we had
@foo normally interpolate into patterns with $" = '|' instead, which
would break that), so when you find a better list to discuss it on,
I'll mumble again.

--tom



Re: RFC 165: Allow variables in a tr///

2000-08-29 Thread Tom Christiansen

Perl has always excelled at convenience.  Look at this code:

while (<>) {
for (split) {
s/foo/bar/g;
next if /glarch/i;
tr/aeiou/eioua/s;
print;
} 
} 

There is *nothing*wrong* with any of them, and to suggest breaking
them is extremely demoralizing.  Don't you people have anything
that's *broken* to fix?   Sheesh.

I fully expect to see an RFC for each and every lovely Perlism 
that isn't in C, Python, and Java.  Well, Perl *isn't* C, Python,
or Java, and there's no need to freak out just because of this!!

--tom



Re: RFC 165: Allow variables in a tr///

2000-08-29 Thread Tom Christiansen

>But I think this is worth discussing further, because it neatly
>accomplishes the goal of the RFC in a straightforward way:

>tr('a-z', 'A-Z', $str)

>replaces a-z with A-Z, and

>tr($foo, $bar, $str)

>replaces the characters from $foo with the characters from $bar.
>No special syntax is necessary.

When does the structure get built?  That's why eg. tr[a-z][A-Z] 
brooks no variables, for it is solely at compile time that these
things occur, and why you must resort to delayed compilation via
eval qq/.../ to prod the compiler into building you a new one.o

Maybe you want qt/.../.../ or something.

--tom



Re: RFC 165: Allow variables in a tr///

2000-08-29 Thread Tom Christiansen

>Would there be any interest in adding these two ideas to this RFC:


>1) tr is not regex function, so it should be regularized to

>   tr(SEARCH, REPLACE, MOD, STR)

>The // tend to confuse people and make them expect tr to operate as a
>regular expression.

So what?  q/.../ is not a "regex function" either.   These are all 
pick-you-own-quotes function.  This makes no sense.

--tom





Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>And hashes are assembled just like lists anyways:

>   %hash = list get_key_vals;
>   %hash = (key, val, key2, val2);  # same thing

Eh?  List context is conferred by the % on the LHS.  You need
no redundant listification redundancy there.

>But no, I certainly wouldn't suggest going down the path of 1000
>explicit contexts. Bad. Implicit context good! But a "list" helper
>function like a "scalar" helper function would solve a lot of common
>problems.

No, a list helper function would *not* solve a lot of *common* problems:

There's no C function corresponding to C since,
in practice, one never needs to force evaluation in a list
context.  That's because any operation that wants R already
provides a list context to its list arguments for free.

It's not a "common problem".

Now, you *can* force list context, but I (and Larry, one of whose
text I just quoted) don't see it as common, so it's not worth the
word.  But it's not impossible, either, as you can use either the
construct @{ [ ... ] } if you're in a string and trying to interpolate
some function call, or simply through ()=... otherwise.

Education is a wonderful thing.

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>For me, yeah. But I can name at least 30 people in my building alone
>that have been hacking Perl for years who wouldn't get this. And a "well
>they don't know what's going on" argument doesn't work. Not everyone is
>a Perl expert.

I will always find this argument specious.  Some people "hack on X" for
years but never but scratch the surface.  There are various reasons for
this, depending.  But you won't fix it--espeically by adding more crudola
for them to scratch up.

>Besides, you're telling me this:

>   foo(list bar())

>is *LESS* intuitive? I really don't buy that. 

Noting that we use [] for an anon array and {} for an anon hash,
not ARRAY or array and HASH or hash, it seems to follow to use ()
for the list.

It's not my fault that people don't know this.  I've certainly
explained it.

% tcgrep '^\s.*\(\s*\)\s*=' ~/cookbook/*.pod
/home/tchrist/cookbook/chap10.pod:() = some_function();

% tcgrep '^\s.*\(\s*\)\s*=' ~/camel/*.pod
/home/tchrist/camel/200lexical.pod:() = funkshun();
/home/tchrist/camel/200lexical.pod:$x = ( () = funk() );   # also set 
$x to funk()'s return count
/home/tchrist/camel/290subs.pod:canmod() = 5;   # Assigns to $val.
/home/tchrist/camel/290subs.pod:nomod()  = 5;   # ERROR
/home/tchrist/camel/650threads.pod:$t1->tid() == $td->tid()

--tom



Re: RFC 166 (does-not-match)

2000-08-29 Thread Tom Christiansen

>I can tighten the definition up.  If there have been calls for a 
>(?^baz) type construct before, there will be again.  It is a matter of
>getting the definition straightforward and useable.


Are you really just wanting !/BAD/ there?  That is, something
that isn't matched by /BAD/?  One would, of course, normally
simply write !/BAD/, or perhaps !~ /BAD/.  However, if reading
a config file of patterns, you can't go invert the sense of the
match.

Well, easily, that is.

The Perl Cookbook, in Chapter 6, has these solutions:

 *  True if either C or C matches, like C:

/ALPHA|BETA/

 *  True if both C and C match, but may overlap, meaning
that C<"BETALPHA"> should be ok, like C:

/^(?=.*ALPHA)(?=.*BETA)/s

 *  True if both C and C match, but may not overlap,
meaning that C<"BETALPHA"> should fail:

/ALPHA.*BETA|BETA.*ALPHA/s

 *  True if pattern C does not match, like C<$var !~ /PAT/>:

/^(?:(?!PAT).)*$/s

 *  True if pattern C does not match, but pattern C does:

/(?=^(?:(?!BAD).)*$)GOOD/s

I suspect the penultimate is just what you're looking for.  

Or shall I go back and deepread the whole thread? :-(

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>But, for "crying out loud!", then what the hell do we need "scalar" for?
>You can accomplish the same thing like this:

>   $num = @array;
>   print "Got $num elements";

Wrong.  You just wasted a scalar needlessly, which ()= doesn't
do.  Of course, you *don't* need scalar() there.

print "Got " . @array . " elements";

>"scalar" makes things easy. So does something like "list". This

>   $stuff = () = $r =~ /crap/shit/;

>Doesn't make anything easy.

Goodness, it certainly does.  It's loads easier than learning a new buzz^Wkeyword
or a new switch, because you already know it.

>> Perl does context.  Perl does *IMPLICIT* context.  Cope.

>Great. Then let's drop "scalar" to be consistent. This can be done
>completely implicitly, right?

There are no anonymous scalars.  You'd at best have to write

foo(scalar bar())

as something more like

foo(do { my $x = bar() })

which is lame.   However, if foo($) is thus "prototyped",
you need but write

foo( () = bar() )

to get bar() to be called in list context.  This is wholly intuitive.
If it isn't, you need to review how 

my($x) 

works--once again.

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>   $count = () = /PATTERN/g;

>With a keyword forcing a list context, this new option is unnecessary.

We already *HAVE* a token set that forces list context, thank you 
very much.  It's called "()=".  I'm glad you like it.

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>While I agree that /l is bad, I think going through the crap of "= () ="
>is even worse. Does it work? Yes. But is it easily usable and fun, even
>for non-experts? No.

Oh, for crying out loud--at some point, you have to stop tossing
rotting fish for the starving ignorant and actually get them to 
LEARN something.  Or let them die of starvation.

Note the difference between

my $var = func();

and

my($var) = func();

Those are completely different in that they call func() in scalar
and list contexts.  Why?  Because of hte presence or absence of (),
of course.  If they can't learn that adding () to the LHS of an
assignment makes it list context, then they will be forever miserable.

Perl does context.  Perl does *IMPLICIT* context.  Cope.

--tom



Re: RFC 110 (v2) counting matches

2000-08-29 Thread Tom Christiansen

>If we want to use uppercase, make these unique as well. That gives us
>many more combinations, and is not necessarily confusing:

>   m//f  -  fast match
>   m//F  -  first match
>   m//i  -  case-insentitive
>   m//I  -  ignore whitespace
>   
>And so on. This seems like a much more productive use, otherwise we're
>just wasting characters.

Larry's on record as preferring not to have us going down the road
of using distinct upper and lower case regex switches.  The distance
between //c and //C, say, is far too narrow.

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>That empty list to force the proper context irks me.  How about a
>modifier to the RE that forces it (this would solve the "counting matches"
>problem too).

>   $string =~ m{
>   (\d\d) - (\d\d) - (\d\d)
>   (?{ push @dates, makedate($1,$2,$3) })
>   }gxl;

>   $count = $string =~ m/foo/gl;   # always list context

The reason why not is because you're adding a special case hack to 
one particular place, rather than promoting a general mechanism
that can be everywhere.  

Tell me: which is better and why.

1) A regex switch to specify scalar context, as in a mythical /r:

push(@got, /bar/r)

2) A general mechanism, say for example, "scalar":

push(@got, scalar /bar/)

Obviously the "scalar" is better, because it does not require that
a new switch be learnt, nor is its use restricted to pattern matching.
Furthermore, it's inarguably more mnemonic for the sense of "match this
scalarishly".

Likewise, to force list context (a far less common operation, mind
you), it is a bad idea to have what amounts to a special argument
to just one function to this.  What happens to the next function you
want to do this to?  How about if I want to force getpwnam() into list
context and get back a scalar result?

$count = getpwnam("tchrist")/l;
$count = getpwnam("tchrist", LIST);
$count = getpwnam("tchrist")->as_list;

All of those, frankly, suck.  This is much better:

$count = () = getpwnam("tchrist");

It's better because 

  * You don't have to invent anything new, whether syntactically
or mnemonically.  The sucky solution all require modification
of Perl's very syntax.  With the list assignment, you just need
to learn how to use what you *already have*.  I could say as
much for (?{...}).  Think how many of the suggestions on these
lists can be dealt with simply through using existing features
that the suggesting party was unaware of.

  * It's a general mechanism that isn't tailored for this particular
function call.  Special-purpose solutions are often inferior
to general-purpose ones, because the latter are more likely to 
be creatively usable in a fashion unforeseen by the author.

  * What could possibly be more intuitive for the action of acting
as though one were assigning to a list than doing that very
thing itself?  Since () is the canonical list (it's empty, after
all), this follows directly and requires on special knowledge
whatsoever.

--tom



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

>much possible action at a distance. I'm not seeing a nicely-parseable, 
>easily-understandable way of doing this. Would this be a possible:

>   $string =~ /(\d\d)-(\d\d)-(\d\d)?&{push @list,makedate(\1,\2,\3)}/g;

>Or is that just too ugly and nasty for words?

Yes, passing a reference to the numbers 1, 2, and 3 is clearly too ugly.

But you'll find we've already got that, I think.

sub makedate {
my($dd,$mm,$yy) = @_;
warn "Just got a date for @_\n";
return "[$yy/$mm/$dd]";
} 

$string = "22-33-44 and 55-66-77 are ok";
@dates = ();

() = $string =~ m{
(\d\d) - (\d\d) - (\d\d)
(?{ push @dates, makedate($1,$2,$3) })
}gx;

print "Now the dates are: @dates\n";

Running that yields: 

Just got a date for 22 33 44
Just got a date for 55 66 77
Now the dates are: [44/33/22] [77/66/55]

--tom



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-28 Thread Tom Christiansen

>But for style, I don't see why
>the interpreter can't also check for various non-obscure syntaxes / styles.

(You mean "compiler", not interpreter.)

You have to be quite careful there: Perl is so humungous that what's
obscure to one person is well-known to the next.  For example, $#foo
is verging on the obscure for many these days, who would surely pause
at reading

$#foo /= 2;

I don't mean to suggest that $#foo should be "preserved"; just
poiting out that in many places, "obscure" is a judgment call, and
suggest that we should avoid being too judgmental.

--tom, who is about ready to give up on this lame American habit
   of writing "judgment" and "acknowledgment" with their e's!



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-28 Thread Tom Christiansen

>The compatibility path for perl5 to perl6 is via a translator.  It
>is not expected that perl6 will run perl5 programs unchanged.  The
>complexity of the translator and the depth of the changes will be
>decided by the decisions Larry makes.

This becomes not merely 

"It is not expected that perl6 will run perl5 programs unchanged."

but also 

"It is not expected that perl6 will run perl4 programs unchanged."
"It is not expected that perl6 will run perl3 programs unchanged."
"It is not expected that perl6 will run perl2 programs unchanged."
"It is not expected that perl6 will run perl1 programs unchanged."

This has never been the case before, at least, not so dramatically.

Sure, the edges have been dodgy, like what happened with "[EMAIL PROTECTED]".
But if *MOST* perl1 .. perl5 programs aren't going to work unchanged,
that means that most people's existing perl knowledge-base will no
longer be valid.  That probably means that they aren't going to be
able to just type in the Perl that they already know, either, since
that Perl will no longer be valid.  And in my ever so humble opinion,
that's when one should consider dropping the name "perl".  

This is *not* a bad thing; think of it as much the same as occurred
when people stopped calling their improved version of Lisp "Lisp"
and started calling it Scheme, or how "C with Classes" eventually
took on a different name as well.  Names--or, I suppose, "branding",
if you truly must--are important things.  If the perl6:perl5
relationship is similar in breadth to what we saw in the  perl5:perl4
one, then perhaps, maybe even probably, one will get away with it.
However, if the stretch is appreciably further, I don't think one
will.  

And I do fear the negative public image ramifications to Perl.  This
will have to be handled gently and sensitively lest the public lose
faith.  (No, I didn't really *say* "spin control" there--you just
read it.)  A new dialect name might save some public confusion.

--tom



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-28 Thread Tom Christiansen

>Simple solution.

>If you want to require formats such as m/.../ (which I actually think is a
>good idea), then make it part of -w, -W, -ww, or -WW, which would be a perl6
>enhancement of strictness.

That's like having "use strict" enable mandatory perlstyle compliance
checks, and rejecting the program otherwise.  Doesn't seem sensible.

--tom



Re: RFC 110 (v3) counting matches

2000-08-28 Thread Tom Christiansen

>Have you ever wanted to count the number of matches of a patten?  s///g 
>returns the number of matches it finds.  m//g just returns 1 for matching.
>Counts can be made using s//$&/g but this is wastefull, or by putting some 
>counting loop round a m//g.  But this all seams rather messy. 

It's really much easier than all that:

$count = () = $string =~ /pattern/g;

--tom



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-28 Thread Tom Christiansen

>> It's nearly part of Perl's language signature.  I wouldn't count
>> on this going away if you still think to call this "Perl".  It is
>> of course much more likely in the renamed "Frob" language, however.

>First off, this argument is just a little too grandiose, because if we
>can't change anything because of precedent, then we're stuck and Perl 6
>should just be Perl 5.9 instead.

How nice of you to put words in my mouth.  Please cite me the precise
message ID, date, and appropriate text in which I said "we can't
change anything because of precedent".  



Right.  I didn't say that.  So don't *you* go saying that I did say
it, or pretend that I did, or allege that I did, or infer that I
did.  It's deceptive, misleading, and flat-out wrong, and I'll thank
you not to repeat the error.  Yes, it's a hot button, so don't push
it.

Here's something you can quote, however: You cannot hope to just
mutate absolutely *everything* willy-nilly and still expect that
the language should keep the same name.  It's not fair to anyone.
If you want to make a language with a similar relationship to Perl
as Scheme has to Lisp, then by all means do so, but note the wisdom
of the name-change that the lispers pulled.  Thus, there *is*
fundamental merit in respecting and understanding the appeal of
precedent.  That is a *long* way from saying "never change anything".

Where are the reasonable boundaries here?  Well, although it's hard
to say with inerrant precision, it's trivially easy to make a good
stab at it.  You just look at usage--how much has this feature been
used?  For how long (eg been there since perl1 vs just got added
in perl5.003)?  What is its prevalence in Perl scripts?  Is it a
rare feature (like formats) or a ubiquitous one (like hashes)?

While there are other criteria one can apply, such as whether its
presence necessarily dead-ends some other desired functionality,
this has to be taken in the light of understand what's indispensable
because of its longevity and ubiquity--not to mention convenience.

If you look at the perl1 manpage, then consider usage over time,
you'll get a good feel for these fundamentals, which range from
single- vs double- vs back-quote distinctions to if/unless variances,
from pick-your-own-quotes features to automatic memory management.
Almost of these in turn have their ancestral roots as well, like
dollar signs for variables inside of interpolated strings.

Perl is easy to learn because you don't need to know much of it,
and also because the parts you do need to know you're apt to already
know from Perl's parents.  These two features are also critical.

>That being said, I don't see why this wouldn't work still. As I noted in
>an email to Scott, at the very least this will work:

>   next if m/\s+/ || m/\w+/;

Having to write m// is needlessly burdensome, flying in the face
of thirteen years of experience and millions of users.  I guarantee
you that there are more people who know about

if (/foo/) 

and about

if ($var =~ /foo/)

than there are people who know that you can use m// for the same things.

Have you ever noticed how that 
many Most of the drastic changes suggested here seem to 

I have a long list of changes, things I'd like to see *fixed* in
Perl, but virtually none of which anybody here has ever even managed
to mention.  You're all too busy giving the baby a brain-transplant
than you are to trim his toenails.  (And you forget that the baby
is rather grown up now.)   An exception is Mark's addressing of the
empty regex problem, which was on my list of niggles.  Another of
my fix-the-pointy-edges niggles is the way wait() and waitpid()
have the wrong semantics for a syscall, since they should always
be writable as syscall() || die, just like the rest of them.  Then
there's the "1;" at the end of a require()d file, or that index/rindex
don't grok negative offsets the way the rest of the language does.
I think there might have been something about $/ and its currently
all-filehandle nature, but certainly that's there, too.  There are
plenty more where that came from, but they're all easy and obvious
changes that could almost be called fixes to simple design oversights.
Nearly none of them seem to be being addressed.

--tom



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-28 Thread Tom Christiansen

>> >next if /\s+/ || /\w+/;  next if match /\s+/ or match /\w+/;
>> 
>> Gosh this is annoying.  I *really* don't want to have to type "match"
>> all the time.  And now I have to use C rather than C<||>, which is
>> already ingrained in my head  (I rarely use "or" or "and")

There are thirteen years of precedent, not to mention the millions of users,
who are completely accustomed to writing expressions like

next if /\s+/ || /\w+/;  

It's nearly part of Perl's language signature.  I wouldn't count
on this going away if you still think to call this "Perl".  It is
of course much more likely in the renamed "Frob" language, however.

--tom



Re: RFC 145 (v2) Brace-matching for Perl Regular Expressions

2000-08-25 Thread Tom Christiansen

>All in all, though, you're right that neither set of features is particularly
>well-known/used outside of p5p followers. At least from what I've seen.
>Virtually every person I've worked with since 5.6 came out has been surprised
>and amazed at the REx eval stuff.

The completely reworked regex chapter in Camel III explains and demos all the
new 5.6 features.  I do not believe they will long remain the Cabal's secret.

--tom



Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Tom Christiansen

>> There's also long been talk/thought about making $& and $1 
>> and friends magic aliases into the original string, which would
>> save that cost.

>Please correct me if I'm mistaken, but I believe that that's the way
>they are implemented now.  A regex match populates the ->startp and
>->endp parts of the regex structure, and the elements of these items
>are byte offsets into the original string.  

I haven't looked at it at all, and perhaps that 's sometihng Ilya
idd when creating @+ etc.  So you might be right.  

Yet if so, I don't see the great fears of massive copies
for once-ever use of $` and all, since I should have thought
that that would have addressed it.

--tom



Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Tom Christiansen

>those early perl3 scripts by lwall floating around in /etc were poorly
>written. i am glad they are finally out of the distribution.

Those weren't the scripts I was thinking about, and it is *NOT*
ipso facto true that something which uses $& or $` is poorly
written.

--tom



Re: RFC 138 (v1) Eliminate =~ operator.

2000-08-25 Thread Tom Christiansen

>Solve the larger problem: permit method calls in qq() strings.

You mean outside of @{[ ... ]}, eh? :=},

I think Larry *might* have said something about making this work.

I'm just a bit concerned with the general notion that functions would under
some circumstances trigger in qq guys.  It's a bit odd to explain that 
things like abs() for $n+3 won't work, but $o->foo() would.  Then again,
it's already curious with $a[$n+3]. :-)

--tom



Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Tom Christiansen

>$`, $& and $' are useful variables which are never used by any
>experienced Perl hacker since they have well known problems with
>efficiency. 

That's hardly true.  I could show you plenty of code from
inexperienced Perl hackers like lwall that use them.  But
the cost in understood.  :-)

The rest of what you said probably is reasonable, however.

The (.*?)(blah)(.*) solution kind works sometimes, but is 
hardly pleasant.  Likewise the @+ and @- stuff.

There's also long been talk/thought about making $& and $1 
and friends magic aliases into the original string, which would
save that cost.

--tom



Re: RFC 145 (v1) Brace-matching for Perl Regular Expressions

2000-08-24 Thread Tom Christiansen

>How about \p and \P  ("P" for "pairwise groupings" or just "pairs")?

I'm afraid those are taken, too.

Symbol  Atomic  Meaning
--  --  ---
C<\0>   yes Match the null character (ASCII NUL).
C<\I>  yes Match the character given in octal, up to C<\377>.
C<\I>yes Match Rth previously captured string (decimal).
C<\a>   yes Match the alarm character (BEL).
C<\A>   no  True at beginning of string.
C<\b>   yes Match the backspace character (BS).
C<\b>   no  True at word boundary.
C<\B>   no  True when not at word boundary.
C<\cR>   yes Match the control character Control-R (C<\cZ>, C<\c[>).
C<\C>   yes Match one byte (C C) even in utf8 (dangerous).
C<\d>   yes Match any digit character.
C<\D>   yes Match any non-digit character.
C<\e>   yes Match the escape character (ASCII ESC, not backslash).
C<\E>   --  End case (C<\L>, C<\U>) or metaquote (C<\Q>).
C<\f>   yes Match the form feed character (FF).
C<\G>   no  True at end-of-match position of prior C.
C<\l>   --  Lowercase next character only.
C<\L>   --  Lowercase till C<\E>.
C<\n>   yes Match the newline character (NL, CR on Macs).
C<\N{R}>  yes Match the named char (C<\N{greek:Sigma}>.
C<\p{R}>  yes Match any character with named property.
C<\P{R}>  yes Match any character without named property.
C<\Q>   --  Quote (de-meta) metacharacters till C<\E>.
C<\r>   yes Match the return character (CR, NL on Macs).
C<\s>   yes Match any whitespace character.
C<\S>   yes Match any non-whitespace character.
C<\t>   yes Match the tab character (HT).
C<\u>   --  Titlecase next character only.
C<\U>   --  Uppercase (not titlecase) till C<\E>.
C<\w>   yes Match any "word" character (alphanums plus "_").
C<\W>   yes Match any non-word character.
C<\x{abcd}> yes Match the character given in hexadecimal.
C<\X>   yes Match "combining character sequence" string.
C<\z>   no  True at end of string only.
C<\Z>   no  True at end of string or before optional newline.



Re: RFC 144 (v1) Behavior of empty regex should be simple

2000-08-24 Thread Tom Christiansen

>Thanks, I will add this to the next version.  I did consider that, and
>I rejected it.  Here's my thinking: s/successful// does make the
>feature somewhat more useful, but (a) all those uses are more easily
>accomplished with qr() these days, and (b) it's still an
>action-at-a-distance effect, which means that it's fragile and that
>the behavior of working code can change suddenly and surprisingly when
>it is modified.

I agree with your reasoning there.  I just thought it should be
spelt out in the document, since it's a common first thought that
we've all had, but which we've not necessarily taken to its
conclusions.

thanks,

--tom



Re: RFC 150 (v1) Extend regex syntax to provide for return of a hash of matched subpatterns

2000-08-24 Thread Tom Christiansen

This is useful in that it would stop being number dependent.
For example, you can't now safely say

/$var (foo) \1/

and guarantee for arbitrary contents of $var that your you have
the right number backref anymore.  

If I recall correctly, the Python folks addressed this.  One
might check that.

--tom



Re: RFC 145 (v1) Brace-matching for Perl Regular Expressions

2000-08-24 Thread Tom Christiansen

>=head1 ABSTRACT

>It is quite difficult to match paired characters in Perl 5 regular
>expressions. A solution is proposed, using new \g (match opening grouping
>character) and \G (match closing grouping character) metacharacters.
>Two new special variables, @^g and @^G control which strings are 
>considered grouping characters and what their complement is.

What about the meaning that \G already holds?

Wasn't one going to avoid using any more cryptic variables?

You can't use $^g for a variable name, because you're pretending
it's different than $^G.  But notice that you can't use a lower
case letter there.

--tom



Re: RFC 144 (v1) Behavior of empty regex should be simple

2000-08-24 Thread Tom Christiansen

>I propose that this 'last successful match' behavior be discarded
>entirely, and that an empty pattern always match the empty string.

I don't see a consideration for simply s/successful// above, which
has also been talked about.  Thas would also match expected usage
based upon existing editors.

--tom



Re: RFC 138 (v1) Eliminate =~ operator.

2000-08-23 Thread Tom Christiansen

>But I agree that such examples would certainly make a better argument.
>The only concrete thing I can come up with is that I and several other
>perl coders I know had a lot of trouble remembering the =~ symbol. I've
>been a very frequent perl user for about 8 years, and after I didn't use
>perl for about a month (2 week vacation + intense pressure at work,
>it'll never happen again, I promise!), I couldn't for the life of me
>remember whether it was ~= or =~. I've also observed one perl beginner
>look up the symbol in a book every single time she needed it for a new
>program.

Changing anything that has ever or shall ever confuse anyone is a
task without end: you will never be done, so don't even start.

The =~ operator is perfectly obvious to csh programmers, of course,
which is where it came from.  There can be no ~= operator because
that is obviously construing a binary ~ operator, which currently
remains nonexistent.  The ~ operator is unary only, thus far.

Whether there *should* be a binary ~ operator, one related to pattern
matching, is a different question.  The awkers expect there to be,
but you can't please all the people all the time.  They've got
their !~, so hopefully this will semiappease them.

The occasionally proposed use for a binary ~ is mostly in conjunction
with a substitute or translit, so that it returns the new string
rather than the success count.The original would be unchanged.

# eg "fred & barney" or "wilma and barney"
if ( "barney" eq ($var ~ s/.*\W//) ) { 

Or thus

$alteration = ($original ~ s/old/new/);

Which is really just like 

($alteration = $original) ~= s/old/new/;

but might be better optimized.  Sure would be nice if they en-passant
things would be better optimized anyway, though.

Seems to me that if ~ did this, then ~= might be a reasonable thing!

$foo = $foo ~ s/old/new/

could then of course be written

$foo ~= s/old/new/

which would in fact be the same as 

$foo =~ s/old/new/

Whether this would actually be desirable is highly open to debate. :-)

--tom