See my comments bellow.  I'm not sure it I helped or not but hopefully I
at least clarified ?.

 

Matt

 

 

 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Wednesday, December 10, 2008 13:39
Cc: Perl-Unix-Users@listserv.activestate.com
Subject: [Perl-unix-users] '?' on a {} char length specification

 

Hi

 

We received some code to split up long non-white space strings - it has:

$text =~ s/(\S{80,}?[,;.])(\S)/$1 $2/g;

[MAS] The ? here doesn't mean match 0 or 1 times but non-greedy
matching.  That is why * gives an error in your statement bellow since
it has no meaning in this context.

 

That is, if you've got 80+ non-whitespaces in a row (supposedly 'zero or

one'), followed by any(,.;)and then another non-whitespace, insert a
blank.

This works but behaves the same as:

$text =~ s/(\S{80,}[,;.])(\S)/$1 $2/g;

 

[MAS] This is a greedy match so in theory it should match fewer times
that the above statement, potentially only placing a space after last
[,;.].

 

that is, w/o the '?' after the char spec but different than:

$text =~ s/((?:\S{80,})?[,;.])(\S)/$1 $2/g;

 

[MAS] I'm not sure why this works for you.  It didn't for me when I
created some test data and tried it.  Mine worked the same as the greedy
match and only inserted one space at the last [,;.].  in this context
the ?: is for grouping purposes such that (?:a|b|c) is the (a\b\c) but
$1 doesn't get assigned anything because the ?: tells it to not
backreference this match group; however, you have it nested in another
grouping that is apparently capturing the backreference otherwise $1
would not be set.  Here is a snippet from O'Reilly under regular
expression extensions:

    (?:...)

        This groups things like "(...)" but doesn't make backreferences
like "(...)" does. So:

        split(/\b(?:a|b|c)\b/)

        is like:

        split(/\b(a|b|c)\b/)

        but doesn't actually save anything in $1, which means that the
first split doesn't spit out extra delimiter fields as the second one
does.

I did get the expected result for the first expression when I moved the
second ? inside the nested () of this expression.

 

where here the '?' seems active and every ,.; has a space after it.

Thinking about the first one, this is how that one *should* behave (I

think) but apparently the '?' isn't really allowing zero of those.
Trying:

$text =~ s/(\S{80,}*[,;.])(\S)/$1 $2/g;

 

gets a nested qualifier RE error.  Just curious if there's an
explanation

for this.

 

Thanks.

 

a

-------------------

Andy Bach

Systems Mangler

Internet: [EMAIL PROTECTED]

Voice: (608) 261-5738 Fax: 264-5932

 

It's is not its, it isn't ain't, and it's it's, not its, if you mean it

is.  If you don't, it's its.  Then too, it's hers.  It isn't her's.

It isn't our's either.  It's ours, and likewise yours and theirs.

         -- Oxford University Press, Edpress News

 

_______________________________________________

Perl-Unix-Users mailing list

Perl-Unix-Users@listserv.ActiveState.com

To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

 

<<image001.gif>>

_______________________________________________
Perl-Unix-Users mailing list
Perl-Unix-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to