Re: lex behavior

2002-06-14 Thread Larry Wall

On Fri, 14 Jun 2002, Jonathan Scott Duff wrote:
: On Thu, Jun 13, 2002 at 03:48:25PM -0700, Larry Wall wrote:
: > But the most straightforward way to match longest is probably to use
: > :any to get a superposition of matches, and then pull out the longest
: > match.  
: 
: So, does :any return a list of the substrings that matched or a list
: of match objects?  Or some polymorhpic thing that can be either?
: Would something like this work?
: 
:   rule max ($pat) {
:  $0 := {
:   reduce { length $^a > length $^b ?? $^a :: $^b } <:a $pat>;
:  }
:   }
: 
:   "bacamus" =~ m//;

Yeah, something like that.  Could well be a longest() builtin, or
close-to-builtin, of course.

Larry




Re: lex behavior

2002-06-14 Thread Jonathan Scott Duff

On Thu, Jun 13, 2002 at 03:48:25PM -0700, Larry Wall wrote:
> But the most straightforward way to match longest is probably to use
> :any to get a superposition of matches, and then pull out the longest
> match.  

So, does :any return a list of the substrings that matched or a list
of match objects?  Or some polymorhpic thing that can be either?
Would something like this work?

rule max ($pat) {
   $0 := {
reduce { length $^a > length $^b ?? $^a :: $^b } <:a $pat>;
   }
}

"bacamus" =~ m//;

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



Re: lex behavior

2002-06-14 Thread Damian Conway

Brent Dax asked:

> Will that handle captures correctly?

I believe so. Each (successful) time through the loop we cache
a reference to the candidate's match object, which will successfully
have stored all the captures from the candidate's matching.

Then we reinstate the best candidate, by binding it to $0.
Hence the best candidate (with its stored captures) becomes 
the "official" match of the rule.

> Maybe you should temporize $0...

$0 is lexical to each regex/rule. No need to temporize.

Damian



RE: lex behavior

2002-06-13 Thread Larry Wall

On Thu, 13 Jun 2002, David Whipp wrote:
: Second, we should eliminate as much of the syntactic noise as possible:
: 
:   
: 
: would be nice -- with parenthesis, or the like, needed only when things
: become ambiguous. I think, though am not sure, that having whitespace act as
: an arglist separator in assertions makes it cleaner. There are definitely
: strong counter-arguments. But I would like to minimize the clutter: and the
: baseline is that alternation requires only one character.

That would be problematic as a default rule.  You wouldn't be able
to write assertions like:



To get more syntactic control would take something like a macro
facility.  But that has its own problems.  Regexes will be tough
to debug even without that.

I think the biggest drawback is that it goes against the shiny new
policy about (in)significant whitespace.

On the other hand, it might be possible with regex introspection
to dissect the alternatives of



and evaluate them separately.

But the most straightforward way to match longest is probably to use
:any to get a superposition of matches, and then pull out the longest
match.  Perhaps there could be a :longest that does that internally,
and could optimize away cases that couldn't possibly be longest.
(And possibly even invoke a DFA optimizer to make it one pass, in
the absence of internal captures.)

Larry




RE: lex behavior

2002-06-13 Thread David Whipp

Luke Palmer wrote:

> So there's no elegant way the new regexes support it?
> That's a shame.

  

seems fairly elegant to me, with 2 caveats:

First, we need assertions as part of the default library. I.e. we shouldn't
need a C for things like min and max.


Second, we should eliminate as much of the syntactic noise as possible:

  

would be nice -- with parenthesis, or the like, needed only when things
become ambiguous. I think, though am not sure, that having whitespace act as
an arglist separator in assertions makes it cleaner. There are definitely
strong counter-arguments. But I would like to minimize the clutter: and the
baseline is that alternation requires only one character.


Dave.



Re: lex behavior

2002-06-13 Thread Luke Palmer

I figured that (I actually did it, in a less-pretty form, in my early 
Perl days when I wrote a syntax highlighter for my website).  So there's 
no elegant way the new regexes support it? That's a shame.

But I see now how state objects are a very cool idea.


Oh, and I'd just thought I'd let everyone know: I'm writing a vim syntax 
highlighting file for Perl 6 at the moment.  I'll post it when it's in an
acceptable state.

> Borrow this trick from Parse::RecDescent:
> 
>   rule max (*@candidates) {{ 
>   my $best; 
>   my $startpos = .pos;
>   for @candidates -> $next {
>   .pos = $startpos; 
>   $best = $0 if /<$next>/ && $best && $0.length < $best.length {
>   }
>   fail unless $best;
>   let $0 := $best;
>   .pos = $best.pos;
>   }}
> 
> then:
> 
>   "bacamus" =~ /  /;
> 
> 
> Damian

Luke

--
Base 8 is just like base 10 really... if you're missing two fingers.
--Tom Lehrer, "New Math"




RE: lex behavior

2002-06-13 Thread Brent Dax

Damian Conway:
# > I'm still unclear as to how you implement lex-like longest 
# token rule 
# > with P6 regexes.  If the | operator grabs the first one it matches, 
# > how do I match "bacamus" out of this?:
# > 
# > "bacamus" =~ / b.*a | b.*s /
# 
# Borrow this trick from Parse::RecDescent:
# 
#   rule max (*@candidates) {{ 
#   my $best; 
#   my $startpos = .pos;
#   for @candidates -> $next {
#   .pos = $startpos; 
#   $best = $0 if /<$next>/ && $best && 
# $0.length < $best.length {
#   }
#   fail unless $best;
#   let $0 := $best;
#   .pos = $best.pos;
#   }}
# 
# then:
# 
#   "bacamus" =~ /  /;

Will that handle captures correctly?  Maybe you should temporize $0...

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

Early in the series, Patrick Stewart came up to us and asked how warp
drive worked.  We explained some of the hypothetical principles . . .
"Nonsense," Patrick declared.  "All you have to do is say, 'Engage.'"
--Star Trek: The Next Generation Technical Manual




Re: lex behavior

2002-06-13 Thread Damian Conway

> I'm still unclear as to how you implement lex-like longest token rule with
> P6 regexes.  If the | operator grabs the first one it matches, how do I
> match "bacamus" out of this?:
> 
> "bacamus" =~ / b.*a | b.*s /

Borrow this trick from Parse::RecDescent:

rule max (*@candidates) {{ 
my $best; 
my $startpos = .pos;
for @candidates -> $next {
.pos = $startpos; 
$best = $0 if /<$next>/ && $best && $0.length < $best.length {
}
fail unless $best;
let $0 := $best;
.pos = $best.pos;
}}

then:

"bacamus" =~ /  /;


Damian