Re: RFC 145 (alternate approach)

2000-09-05 Thread Nathan Wiger

I think it's cool too, I don't like the @^g and ^@G either. But I worry
about the double-meaning of the []'s in your solution, and the fact that
these:

   /\m[...]...\M/;
   /\d[...]...\D/;

Will work so differently. Maybe another character like ()'s that takes a
list:

   /\m(,[).*?\M(,])/;

That solves the multiple characters problem at least. However, we still
have a \M and \m, which isn't consistent if they're going to take
arguments.

But, how about a new ?m operator?

   /(?m|[).*?(?M|])/;

Then the ?M matches pairs with the previous ?m, if there was one that
was matched. The | character separates or'ed sets consistent with other
regex patterns.

-Nate


David Corbin wrote:
 
 I never saw one comment on this, and the more I think about it, the more
 I like it. So,
 I thought I'd throw it back out one more time...(If I get no comments
 this time, I'll
 be quiet :)
 
 David Corbin wrote:
 
  I haven't given this a WHOLE lot of thought, so please, shoot it full
  of holes.
 
  I certainly like the goal of this RFC, but I dislike the idea that the
  specification for
  what chacters are going to match are specified outside of the RE.



Re: RFC 145 (alternate approach)

2000-09-05 Thread David Corbin

Nathan Wiger wrote:
 
 I think it's cool too, I don't like the @^g and ^@G either. But I worry
 about the double-meaning of the []'s in your solution, and the fact that
 these:
 
/\m[...]...\M/;
/\d[...]...\D/;

Well, it's not really a double meaning.  It's a set of characters, just
like '[]' always means.
Granted, the meaning between upper  lower case characters is not the
same here, but I don't think
it always is the same currently (positive/negative).

 
 Will work so differently. Maybe another character like ()'s that takes a
 list:
 
/\m(,[).*?\M(,])/;
 
If you don't want to use [] (which limits it to single character
"para-brace-ets"),
then I"d suggest using {} as that is already established for use in with
\? type 
escapes.  

Maybe:  m/\m{()|(\[)}.*?\M{()|(])}/;

Essentially everything inside the {} is in-fact another pattern, and the
back-references within
match "1-for-1".  Of course, with this syntax you'd have to escape
actual braces m{\{} which I don't 
much care for...

 That solves the multiple characters problem at least. However, we still
 have a \M and \m, which isn't consistent if they're going to take
 arguments.

I'm not sure I understand your point here.


 
 But, how about a new ?m operator?
 
/(?m|[).*?(?M|])/;
 

Let's combine yor operator with my example from above where everything
inside the (?m) or the ?(M)
fits the syntax of a RE.  

/(?m()|\[).*?(?M()|(\]))

 Then the ?M matches pairs with the previous ?m, if there was one that
 was matched. The | character separates or'ed sets consistent with other
 regex patterns.

You can do that, or you can say it's done with backreferences (as noted
above)
 
 -Nate
 
 David Corbin wrote:
 
  I never saw one comment on this, and the more I think about it, the more
  I like it. So,
  I thought I'd throw it back out one more time...(If I get no comments
  this time, I'll
  be quiet :)
 
  David Corbin wrote:
  
   I haven't given this a WHOLE lot of thought, so please, shoot it full
   of holes.
  
   I certainly like the goal of this RFC, but I dislike the idea that the
   specification for
   what chacters are going to match are specified outside of the RE.

-- 
David Corbin
Mach Turtle Technologies, Inc.
http://www.machturtle.com
[EMAIL PROTECTED]



Re: RFC 145 (alternate approach)

2000-09-05 Thread Nathan Wiger

Richard Proctor wrote:
 
 No ?] should match the closest ?[ it should nest the ?[s bound by any
 brackets in the regex and act accordingly.

Good point.
 
 Also this does not work as a definition of simple bracket matching as you
 need ( to match ) not ( to match (.  A ?[ list should specify for each
 element what the matching element is perhaps

Actually, it should with some simple precedence rules. If ?] reverses
the ordering of ?[, *and* we define "reversing" for bracketed pairs
consistent with the current Perl definition in other contexts, then this
is all automatic:

   "normal"   "reversed"
   -- ---
   103301
   99aa99
   (( ))
   + +
   {{[!_ _!]}}
   {__A1( )A1__}

That is, when a bracket is encountered, the "reverse" of that is
automatically interpreted as its closing counterpart. This is the same
reason why qq// and qq() and qq{} all work without special notation. 

So we can replace @^g and @^G with simple precendence rules, the same
that are actually invoked automatically throughout Perl already.

   (?[( = ),{ = }, 01 = 10)
 
 sort of hashish in style.

I actually think this is redundant, for the reasons I mentioned above.
I'm not striking it down outright, but it seems simple rules could make
all this unnecessary. 

-Nate



Re: RFC 145 (alternate approach)

2000-09-05 Thread David L. Nicol

David Corbin wrote:

  I've got some vague ideas on solving all of these, I'll go into if
  people like the basic concept enough.

not just in regexes, but in general, a way to extend the set of bratches
that Perl knows about would be very nice.  for instance it is very difficult
for people using european keyboards to produce curlies; if it was possible to
say that Q is the opening brace and it matches against q later, or any arbitrary
characters, such as the single-character versions of  and  which I am not
capable of producing, if it was possible to specify this in the code somewhere
for instance 

$CORE::BRATCH{'Q'} = 'q';

(or maybe lexically scoped)

after that one could say 

$isafromline = qrQ^Fromq;

for instance.


-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
   perl -e'@w=;for(;;){sleep print[rand@w]}' /usr/dict/words



Re: RFC 145 (alternate approach)

2000-09-05 Thread Richard Proctor

On Tue 05 Sep, David Corbin wrote:
 Nathan Wiger wrote:
  
  But, how about a new ?m operator?
  
 /(?m|[).*?(?M|])/;
  
 
 Let's combine yor operator with my example from above where everything
 inside the (?m) or the ?(M)
 fits the syntax of a RE.  
 
   /(?m()|\[).*?(?M()|(\]))
 
  Then the ?M matches pairs with the previous ?m, if there was one that
  was matched. The | character separates or'ed sets consistent with other
  regex patterns.

There already is a (?m

The whole (?x set of thingies is getting complicated...  The list of what is
used at present (and in current suggestions is:

Current Use in perl5

(?# comment
(?imsx  flags
(?-imsx flags
(?: subexpression without bracket capture
(?= zero-width positive look ahead
(?! zero width negative look ahead
(?=zero-width positve look behind
(?!zero width negative look behind
(?{code}Execute code
(??{code} Execute code and use result as pattern
(? Independant subexpression
(?(condition)yes-pattern
(?(condition)yes-pattern|no-pattern

Suggested in RFCs either current or in development

(?$foo= suggested for assignment (RFC 112)
(?%foo= suggested for hash assignment (RFC 150?)

(?@foo  suggested list expansion (?:$foo[0] | $foo[1] | ...) ? (RFC 166)
(?Q@foo) Quote each item of lists (RFC 166)
(?^pattern) matches anything that does not match pattern 
(RFC 166 but will be somewhere else on next rewrite [1])
(?F Failure tokens (RFC in development by me [1])
(?r),(?f)   Suggested in Direction Control RFC 1
(? Boolean regexes (RFC in development [1])
(?*{code})  Execute code with pass/fail result (RFC in development [1])

[1] these will all be in an RFC which will probably be out in a day or so.

Unused (? sequences

a,b,c,d,e, ,g,h, ,j,k,l, ,n,o,p,q, , ,t,u,v,w,x,y,z
A,B,C,D,E, ,G,H,I,J,K,L,M,N,O,P, ,R,S,T,U,V,W,X,Y,Z
0,1,2,3,4,5,6,7,8,9
`_,."+[];'~)

(if I have forgotten any do tell and I will try and keep this list up to
date.

Richard


-- 

[EMAIL PROTECTED]




Re: RFC 145 (alternate approach)

2000-09-05 Thread Eric Roode

I think David's on to something good here. A major problem with 
holding the bracket-matching possibilities in a special variable
(or a pair of them) is that one can't figure out what the RE is
going to do just by looking at it -- you have to look elsewhere.

Nathan Wiger wrote:
I think it's cool too, I don't like the @^g and ^@G either. But I worry
about the double-meaning of the []'s in your solution, and the fact that
these:

   /\m[...]...\M/;
   /\d[...]...\D/;

Will work so differently. 

Yes. Things that look similar should act similar. Things that act
differently should look different.

But, how about a new ?m operator?

   /(?m|[).*?(?M|])/;

Then the ?M matches pairs with the previous ?m, if there was one that
was matched. The | character separates or'ed sets consistent with other
regex patterns.

Ah, this is a neat idea! 

Unfortunately, as Richard Proctor pointed out, ?m is taken. Perhaps
(?[list|of|openers)  and  (?]list|of|closers)   ?

Does that look too bizarre, with the lone square bracket in each?
Or does that serve to make it mnemonic (which is my intention)?

And --- can-of-worms time --- we're only intending the list elements
to be constant characters, but that syntax *looks* like it can take a
regular expression for any of the list elements, so people are going
to try to do that someday. I cannot imagine what someone would want
do use a regexp in such a construct, but abuses of the language are
not limited to *my* imagination :-)  

(?[list|of|openers) would match any expression in the alternation
list. Subsequently, (?]list|of|closers) would match the *corresponding*
expression, but would keep track of the nesting level of the originally-
matching open-bracket expression. 

Sound about right?
 --
 Eric J. Roode,  [EMAIL PROTECTED]   print  scalar  reverse  sort
 Senior Software Engineer'tona ', 'reh', 'ekca', 'lre',
 Myxa Corporation'.r', 'h ', 'uj', 'p ', 'ts';