Re: RFC 145 (alternate approach)
I think it's cool too, I don't like the @^g and ^@G either. But I worry about the double-meaning of the []'s in your solution, and the fact that these: /\m[...]...\M/; /\d[...]...\D/; Will work so differently. Maybe another character like ()'s that takes a list: /\m(,[).*?\M(,])/; That solves the multiple characters problem at least. However, we still have a \M and \m, which isn't consistent if they're going to take arguments. But, how about a new ?m operator? /(?m|[).*?(?M|])/; Then the ?M matches pairs with the previous ?m, if there was one that was matched. The | character separates or'ed sets consistent with other regex patterns. -Nate David Corbin wrote: I never saw one comment on this, and the more I think about it, the more I like it. So, I thought I'd throw it back out one more time...(If I get no comments this time, I'll be quiet :) David Corbin wrote: I haven't given this a WHOLE lot of thought, so please, shoot it full of holes. I certainly like the goal of this RFC, but I dislike the idea that the specification for what chacters are going to match are specified outside of the RE.
Re: RFC 145 (alternate approach)
Nathan Wiger wrote: I think it's cool too, I don't like the @^g and ^@G either. But I worry about the double-meaning of the []'s in your solution, and the fact that these: /\m[...]...\M/; /\d[...]...\D/; Well, it's not really a double meaning. It's a set of characters, just like '[]' always means. Granted, the meaning between upper lower case characters is not the same here, but I don't think it always is the same currently (positive/negative). Will work so differently. Maybe another character like ()'s that takes a list: /\m(,[).*?\M(,])/; If you don't want to use [] (which limits it to single character "para-brace-ets"), then I"d suggest using {} as that is already established for use in with \? type escapes. Maybe: m/\m{()|(\[)}.*?\M{()|(])}/; Essentially everything inside the {} is in-fact another pattern, and the back-references within match "1-for-1". Of course, with this syntax you'd have to escape actual braces m{\{} which I don't much care for... That solves the multiple characters problem at least. However, we still have a \M and \m, which isn't consistent if they're going to take arguments. I'm not sure I understand your point here. But, how about a new ?m operator? /(?m|[).*?(?M|])/; Let's combine yor operator with my example from above where everything inside the (?m) or the ?(M) fits the syntax of a RE. /(?m()|\[).*?(?M()|(\])) Then the ?M matches pairs with the previous ?m, if there was one that was matched. The | character separates or'ed sets consistent with other regex patterns. You can do that, or you can say it's done with backreferences (as noted above) -Nate David Corbin wrote: I never saw one comment on this, and the more I think about it, the more I like it. So, I thought I'd throw it back out one more time...(If I get no comments this time, I'll be quiet :) David Corbin wrote: I haven't given this a WHOLE lot of thought, so please, shoot it full of holes. I certainly like the goal of this RFC, but I dislike the idea that the specification for what chacters are going to match are specified outside of the RE. -- David Corbin Mach Turtle Technologies, Inc. http://www.machturtle.com [EMAIL PROTECTED]
Re: RFC 145 (alternate approach)
Richard Proctor wrote: No ?] should match the closest ?[ it should nest the ?[s bound by any brackets in the regex and act accordingly. Good point. Also this does not work as a definition of simple bracket matching as you need ( to match ) not ( to match (. A ?[ list should specify for each element what the matching element is perhaps Actually, it should with some simple precedence rules. If ?] reverses the ordering of ?[, *and* we define "reversing" for bracketed pairs consistent with the current Perl definition in other contexts, then this is all automatic: "normal" "reversed" -- --- 103301 99aa99 (( )) + + {{[!_ _!]}} {__A1( )A1__} That is, when a bracket is encountered, the "reverse" of that is automatically interpreted as its closing counterpart. This is the same reason why qq// and qq() and qq{} all work without special notation. So we can replace @^g and @^G with simple precendence rules, the same that are actually invoked automatically throughout Perl already. (?[( = ),{ = }, 01 = 10) sort of hashish in style. I actually think this is redundant, for the reasons I mentioned above. I'm not striking it down outright, but it seems simple rules could make all this unnecessary. -Nate
Re: RFC 145 (alternate approach)
David Corbin wrote: I've got some vague ideas on solving all of these, I'll go into if people like the basic concept enough. not just in regexes, but in general, a way to extend the set of bratches that Perl knows about would be very nice. for instance it is very difficult for people using european keyboards to produce curlies; if it was possible to say that Q is the opening brace and it matches against q later, or any arbitrary characters, such as the single-character versions of and which I am not capable of producing, if it was possible to specify this in the code somewhere for instance $CORE::BRATCH{'Q'} = 'q'; (or maybe lexically scoped) after that one could say $isafromline = qrQ^Fromq; for instance. -- David Nicol 816.235.1187 [EMAIL PROTECTED] perl -e'@w=;for(;;){sleep print[rand@w]}' /usr/dict/words
Re: RFC 145 (alternate approach)
On Tue 05 Sep, David Corbin wrote: Nathan Wiger wrote: But, how about a new ?m operator? /(?m|[).*?(?M|])/; Let's combine yor operator with my example from above where everything inside the (?m) or the ?(M) fits the syntax of a RE. /(?m()|\[).*?(?M()|(\])) Then the ?M matches pairs with the previous ?m, if there was one that was matched. The | character separates or'ed sets consistent with other regex patterns. There already is a (?m The whole (?x set of thingies is getting complicated... The list of what is used at present (and in current suggestions is: Current Use in perl5 (?# comment (?imsx flags (?-imsx flags (?: subexpression without bracket capture (?= zero-width positive look ahead (?! zero width negative look ahead (?=zero-width positve look behind (?!zero width negative look behind (?{code}Execute code (??{code} Execute code and use result as pattern (? Independant subexpression (?(condition)yes-pattern (?(condition)yes-pattern|no-pattern Suggested in RFCs either current or in development (?$foo= suggested for assignment (RFC 112) (?%foo= suggested for hash assignment (RFC 150?) (?@foo suggested list expansion (?:$foo[0] | $foo[1] | ...) ? (RFC 166) (?Q@foo) Quote each item of lists (RFC 166) (?^pattern) matches anything that does not match pattern (RFC 166 but will be somewhere else on next rewrite [1]) (?F Failure tokens (RFC in development by me [1]) (?r),(?f) Suggested in Direction Control RFC 1 (? Boolean regexes (RFC in development [1]) (?*{code}) Execute code with pass/fail result (RFC in development [1]) [1] these will all be in an RFC which will probably be out in a day or so. Unused (? sequences a,b,c,d,e, ,g,h, ,j,k,l, ,n,o,p,q, , ,t,u,v,w,x,y,z A,B,C,D,E, ,G,H,I,J,K,L,M,N,O,P, ,R,S,T,U,V,W,X,Y,Z 0,1,2,3,4,5,6,7,8,9 `_,."+[];'~) (if I have forgotten any do tell and I will try and keep this list up to date. Richard -- [EMAIL PROTECTED]
Re: RFC 145 (alternate approach)
I think David's on to something good here. A major problem with holding the bracket-matching possibilities in a special variable (or a pair of them) is that one can't figure out what the RE is going to do just by looking at it -- you have to look elsewhere. Nathan Wiger wrote: I think it's cool too, I don't like the @^g and ^@G either. But I worry about the double-meaning of the []'s in your solution, and the fact that these: /\m[...]...\M/; /\d[...]...\D/; Will work so differently. Yes. Things that look similar should act similar. Things that act differently should look different. But, how about a new ?m operator? /(?m|[).*?(?M|])/; Then the ?M matches pairs with the previous ?m, if there was one that was matched. The | character separates or'ed sets consistent with other regex patterns. Ah, this is a neat idea! Unfortunately, as Richard Proctor pointed out, ?m is taken. Perhaps (?[list|of|openers) and (?]list|of|closers) ? Does that look too bizarre, with the lone square bracket in each? Or does that serve to make it mnemonic (which is my intention)? And --- can-of-worms time --- we're only intending the list elements to be constant characters, but that syntax *looks* like it can take a regular expression for any of the list elements, so people are going to try to do that someday. I cannot imagine what someone would want do use a regexp in such a construct, but abuses of the language are not limited to *my* imagination :-) (?[list|of|openers) would match any expression in the alternation list. Subsequently, (?]list|of|closers) would match the *corresponding* expression, but would keep track of the nesting level of the originally- matching open-bracket expression. Sound about right? -- Eric J. Roode, [EMAIL PROTECTED] print scalar reverse sort Senior Software Engineer'tona ', 'reh', 'ekca', 'lre', Myxa Corporation'.r', 'h ', 'uj', 'p ', 'ts';