See my comments bellow. I'm not sure it I helped or not but hopefully I at least clarified ?.
Matt -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 13:39 Cc: Perl-Unix-Users@listserv.activestate.com Subject: [Perl-unix-users] '?' on a {} char length specification Hi We received some code to split up long non-white space strings - it has: $text =~ s/(\S{80,}?[,;.])(\S)/$1 $2/g; [MAS] The ? here doesn't mean match 0 or 1 times but non-greedy matching. That is why * gives an error in your statement bellow since it has no meaning in this context. That is, if you've got 80+ non-whitespaces in a row (supposedly 'zero or one'), followed by any(,.;)and then another non-whitespace, insert a blank. This works but behaves the same as: $text =~ s/(\S{80,}[,;.])(\S)/$1 $2/g; [MAS] This is a greedy match so in theory it should match fewer times that the above statement, potentially only placing a space after last [,;.]. that is, w/o the '?' after the char spec but different than: $text =~ s/((?:\S{80,})?[,;.])(\S)/$1 $2/g; [MAS] I'm not sure why this works for you. It didn't for me when I created some test data and tried it. Mine worked the same as the greedy match and only inserted one space at the last [,;.]. in this context the ?: is for grouping purposes such that (?:a|b|c) is the (a\b\c) but $1 doesn't get assigned anything because the ?: tells it to not backreference this match group; however, you have it nested in another grouping that is apparently capturing the backreference otherwise $1 would not be set. Here is a snippet from O'Reilly under regular expression extensions: (?:...) This groups things like "(...)" but doesn't make backreferences like "(...)" does. So: split(/\b(?:a|b|c)\b/) is like: split(/\b(a|b|c)\b/) but doesn't actually save anything in $1, which means that the first split doesn't spit out extra delimiter fields as the second one does. I did get the expected result for the first expression when I moved the second ? inside the nested () of this expression. where here the '?' seems active and every ,.; has a space after it. Thinking about the first one, this is how that one *should* behave (I think) but apparently the '?' isn't really allowing zero of those. Trying: $text =~ s/(\S{80,}*[,;.])(\S)/$1 $2/g; gets a nested qualifier RE error. Just curious if there's an explanation for this. Thanks. a ------------------- Andy Bach Systems Mangler Internet: [EMAIL PROTECTED] Voice: (608) 261-5738 Fax: 264-5932 It's is not its, it isn't ain't, and it's it's, not its, if you mean it is. If you don't, it's its. Then too, it's hers. It isn't her's. It isn't our's either. It's ours, and likewise yours and theirs. -- Oxford University Press, Edpress News _______________________________________________ Perl-Unix-Users mailing list Perl-Unix-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
<<image001.gif>>
_______________________________________________ Perl-Unix-Users mailing list Perl-Unix-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs