Re: lex behavior

2002-06-14 Thread Jonathan Scott Duff

On Thu, Jun 13, 2002 at 03:48:25PM -0700, Larry Wall wrote:
 But the most straightforward way to match longest is probably to use
 :any to get a superposition of matches, and then pull out the longest
 match.  

So, does :any return a list of the substrings that matched or a list
of match objects?  Or some polymorhpic thing that can be either?
Would something like this work?

rule max ($pat) {
   $0 := {
reduce { length $^a  length $^b ?? $^a :: $^b } :a $pat;
   }
}

bacamus =~ m/max b.*a | b.*s/;

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



lex behavior

2002-06-13 Thread Luke Palmer


I'm still unclear as to how you implement lex-like longest token rule with 
P6 regexes.  If the | operator grabs the first one it matches, how do I 
match bacamus out of this?:

bacamus =~ / b.*a | b.*s /


Luke




Re: lex behavior

2002-06-13 Thread Damian Conway

 I'm still unclear as to how you implement lex-like longest token rule with
 P6 regexes.  If the | operator grabs the first one it matches, how do I
 match bacamus out of this?:
 
 bacamus =~ / b.*a | b.*s /

Borrow this trick from Parse::RecDescent:

rule max (*@candidates) {{ 
my $best; 
my $startpos = .pos;
for candidates - $next {
.pos = $startpos; 
$best = $0 if /$next/  $best  $0.length  $best.length {
}
fail unless $best;
let $0 := $best;
.pos = $best.pos;
}}

then:

bacamus =~ / max(/b.*a/, /b.*s/) /;


Damian



RE: lex behavior

2002-06-13 Thread Brent Dax

Damian Conway:
#  I'm still unclear as to how you implement lex-like longest 
# token rule 
#  with P6 regexes.  If the | operator grabs the first one it matches, 
#  how do I match bacamus out of this?:
#  
#  bacamus =~ / b.*a | b.*s /
# 
# Borrow this trick from Parse::RecDescent:
# 
#   rule max (*@candidates) {{ 
#   my $best; 
#   my $startpos = .pos;
#   for @candidates - $next {
#   .pos = $startpos; 
#   $best = $0 if /$next/  $best  
# $0.length  $best.length {
#   }
#   fail unless $best;
#   let $0 := $best;
#   .pos = $best.pos;
#   }}
# 
# then:
# 
#   bacamus =~ / max(/b.*a/, /b.*s/) /;

Will that handle captures correctly?  Maybe you should temporize $0...

--Brent Dax [EMAIL PROTECTED]
@roles=map {Parrot $_} qw(embedding regexen Configure)

Early in the series, Patrick Stewart came up to us and asked how warp
drive worked.  We explained some of the hypothetical principles . . .
Nonsense, Patrick declared.  All you have to do is say, 'Engage.'
--Star Trek: The Next Generation Technical Manual




Re: lex behavior

2002-06-13 Thread Luke Palmer

I figured that (I actually did it, in a less-pretty form, in my early 
Perl days when I wrote a syntax highlighter for my website).  So there's 
no elegant way the new regexes support it? That's a shame.

But I see now how state objects are a very cool idea.


Oh, and I'd just thought I'd let everyone know: I'm writing a vim syntax 
highlighting file for Perl 6 at the moment.  I'll post it when it's in an
acceptable state.

 Borrow this trick from Parse::RecDescent:
 
   rule max (*@candidates) {{ 
   my $best; 
   my $startpos = .pos;
   for candidates - $next {
   .pos = $startpos; 
   $best = $0 if /$next/  $best  $0.length  $best.length {
   }
   fail unless $best;
   let $0 := $best;
   .pos = $best.pos;
   }}
 
 then:
 
   bacamus =~ / max(/b.*a/, /b.*s/) /;
 
 
 Damian

Luke

--
Base 8 is just like base 10 really... if you're missing two fingers.
--Tom Lehrer, New Math




RE: lex behavior

2002-06-13 Thread Larry Wall

On Thu, 13 Jun 2002, David Whipp wrote:
: Second, we should eliminate as much of the syntactic noise as possible:
: 
:   max b.*a b.*s
: 
: would be nice -- with parenthesis, or the like, needed only when things
: become ambiguous. I think, though am not sure, that having whitespace act as
: an arglist separator in assertions makes it cleaner. There are definitely
: strong counter-arguments. But I would like to minimize the clutter: and the
: baseline is that alternation requires only one character.

That would be problematic as a default rule.  You wouldn't be able
to write assertions like:

before a | b

To get more syntactic control would take something like a macro
facility.  But that has its own problems.  Regexes will be tough
to debug even without that.

I think the biggest drawback is that it goes against the shiny new
policy about (in)significant whitespace.

On the other hand, it might be possible with regex introspection
to dissect the alternatives of

max b.*a | b.*s 

and evaluate them separately.

But the most straightforward way to match longest is probably to use
:any to get a superposition of matches, and then pull out the longest
match.  Perhaps there could be a :longest that does that internally,
and could optimize away cases that couldn't possibly be longest.
(And possibly even invoke a DFA optimizer to make it one pass, in
the absence of internal captures.)

Larry