On 2010-05-31, at 5:32 pm, Chris Fields wrote:
> I think, in order to get regexes to work we will need a way of getting the 
> name of the matching regex from the Match object somehow.  Any idea how to do 
> that?


I had started working on this too, and ended up going on quite an adventure... 
my starting idea was something like s/ @from / %to{$/} / to look up the 
replacement based on what was matched, but $/ isn't available in a subst, and 
anyway, as you indicate, $/ would only get the matched text, not the regex's 
name or other identifier.

So I thought of what seemed like a good trick: have each pattern set its own 
replacement through the magic of embedded closures:
        / @from[$n] { $replacement = $to[$n] } /

As long as $replacement had the right scope, it would get the correct value as 
a side-effect of matching, and then could be substituted for the right thing.  
Except it turns out .subst does all the matching first, and then the 
replacement, so everything got replaced with whatever the final match happened 
to be.

However, I was stubborn enough to look at the code for .subst, and saw that 
instead of passing off :g to self.match, I could loop through the individual 
matches manually, and replace each one as it came up, thus using the correct 
value of $replacement.  So I did. 

Of course, I was using :c to continue matching each successive occurrence from 
where the previous one left off, so I had to handle any :c option manually too. 
 And :p.  And :x and :n... well, it started to feel a bit silly, especially 
since .subst is going to change to make $/ work anyway.  But... by then, I had 
it almost working.  So I've also included my mangled version of .subst. (It 
passes the same spectests from subst.t that it did before, except for a couple 
that I'm not sure are right.)

And once .subst was able to re-evaluate the RHS for each match, my .trans 
worked.  Almost.  Interpolating a literal string doesn't seem to work yet, so I 
had to escape all the chars and interpolate it as a rule.  And then there was a 
weird bug where creating regexes in a loop returned a list of copies of the 
same regex.  But it worked spelling them all out without using a loop, so I 
made a long string with all the regexes I needed and then eval()ed, and that 
*did* work.  

But at least the error messages were better, and just since last week!  (Hooray 
for line numbers!)  

As you can see, I didn't pay attention to optimisation, but it does pass the 
spectests that use .trans (rather than tr//); and I didn't do any adverbs.  (I 
reckon the P5 modifiers were dropped because they aren't as useful now: :c and 
:s can be handled with regexes as the search key, and :d is now done simply by 
using an empty string as the replacement value.)



-David

Attachment: trans.p6
Description: Binary data

use v6;
# perl6 subst.pm TEST  # to run the tests
#   tests copied from <http://svn.pugscode.org/pugs/t/spec/S05-substitution/subst.t>
#   with ones that don't parse yet commented out

use MONKEY_TYPING;
augment class Str {

multi method    subst($matcher, $replacement, :$samecase, :$samemark, :$x is copy, :$nth, :g(:$global), :continue(:$c) is copy, *%options) {
# validation for :n or :x or :c or :p <0 ?
    return self if defined $x && $x==0 or defined $nth && $nth==0;      # 0 limits mean don't actually change anything
    
    my $result = '';            # build up the replacement string
    my $match;                  # hold the match(es) as we loop over self
    my @nth = @($nth);          # list context even if just a single value
    my $num = 0;                # number of occurrences (to pick out with nth)
    my $prev = 0;               # ending position of previous match
    
    $c //= 0;                   # start "continuing" from the beginning
    my $times = 0;              # number of matches that count (for :x)
    my $max = $x.max // 1;      # stop after $max matches/1 by default
    
    # We could get all the possible matches at once (:g),
    # but then we have to do all the replacements at once, at the end;
    # so we'll loop manually, and then the replacements can differ
    # according to side-effects of each individual occurrence.
    loop
    {
        $c = $match.?to max $c;             # continue from last match, or :c if already provided
        last if .defined and $_ < $c given %options<p>; # anchored to :p, but already gone past that posn
        
        $match = self.match($matcher, :$c, |%options);  # extra options asssumed to be for m//
        last unless $match ~~ Match and $match;         # stop when we run out of matches
        
        if defined $nth     # then we might want to skip over this occurrence
        {
            $num++;                                     # got another match
            next if $num < @nth[0];                     # skip if not one we want yet
            shift @nth while @nth and $num >= @nth[0];  # one we want, shift it off (and keep shifting in case any subsequent values are smaller, since decreases should be ignored)
        }
        
        # any bits skipped over [didn't match] get added to the result unchanged
        $result ~= self.substr($prev, $match.from - $prev);     
        
        my $real-replacement = ~($replacement ~~ Callable ?? $replacement($match) !! $replacement);
        $real-replacement    = $real-replacement.samecase(~$match) if $samecase;
        $real-replacement    = $real-replacement.samemark(~$match) if $samemark;
        
        $result ~= $real-replacement;
        $prev = $match.to;
        last if defined $nth and not @nth;              # counting nths and ran out
        last unless ++$times < $max or $global or defined $nth && @nth && $max <= 1;
        # Continue if set a max and not there; or if global; or if no max, get as many nths as were specified
    }
    
    return self if defined $x and $times!=any($x);      # no change if not enough matches for :x
    
    $result ~= self.substr($prev);          # append leftovers ($prev will be the end of the final match)
    return $result;
}

}

#------------------------------------------------------------------------------

if @*ARGS ### run tests only if you use a dummy arg
{

use Test;
plan *;

# L<S05/Substitution/>

my $str = 'hello';

is $str.subst(/h/,'f'),       'fello', 'We can use subst';
is $str,                      'hello', '.. withouth side effect';
#?rakudo skip "adverbs on rx// NYI"
skip 1; #is $str.subst(rx:g:i/L/,'p'), 'heppo', '.. with multiple adverbs';

is $str.subst('h','f'),       'fello', '.. or using Str as pattern';
is $str.subst('.','f'),       'hello', '.. with literal string matching';

my $i=0;
is $str.subst(/l/,{$i++}),    'he0lo', 'We can have a closure as replacement';
#?rakudo skip "adverbs on rx// NYI"
skip 1; #is $str.subst(rx:g/l/,{$i++}),'he12o', '.. which act like closure and can be called more then once';
is $str.=subst(/l/,'i'),      'heilo', '.. and with the .= modifier';
is $str,                      'heilo', '.. it changes the receiver';

# not sure about this. Maybe '$1$0' should work.

#?rakudo 3 skip '$/ not involved in .subst yet (unspecced?)'
skip 1; #is 'a'.subst(/(.)/,"$1$0"), '',       '.. and it can not access captures from strings';
skip 1; #is 'a'.subst(/(.)/,{$0~$0}),'aa',     '.. you must wrap it in a closure';
skip 1; #is '12'.subst(/(.)(.)/,{$()*2}),'24', '.. and do nifty things in closures';

{
    is 'a b c d'.subst(/\w/, 'x', :g),      'x x x x', '.subst and :g';
    is 'a b c d'.subst(/\w/, 'x', :global), 'x x x x', '.subst and :global';
    is 'a b c d'.subst(/\w/, 'x', :x(0)),   'a b c d', '.subst and :x(0)';
    is 'a b c d'.subst(/\w/, 'x', :x(1)),   'x b c d', '.subst and :x(1)';
    is 'a b c d'.subst(/\w/, 'x', :x(2)),   'x x c d', '.subst and :x(2)';
    is 'a b c d'.subst(/\w/, 'x', :x(3)),   'x x x d', '.subst and :x(3)';
    is 'a b c d'.subst(/\w/, 'x', :x(4)),   'x x x x', '.subst and :x(4)';
    is 'a b c d'.subst(/\w/, 'x', :x(5)),   'a b c d', '.subst and :x(5)';
    #?rakudo skip ':x(*)'
    skip 1; #is 'a b c d'.subst(/\w/, 'x', :x(*)),   'x x x x', '.subst and :x(*)';

    is 'a b c d'.subst(/\w/, 'x', :x(0..1)), 'x b c d', '.subst and :x(0..1)';
    is 'a b c d'.subst(/\w/, 'x', :x(1..3)), 'x x x d', '.subst and :x(0..3)';
    is 'a b c d'.subst(/\w/, 'x', :x(3..5)), 'x x x x', '.subst and :x(3..5)';
    is 'a b c d'.subst(/\w/, 'x', :x(5..6)), 'a b c d', '.subst and :x(5..6)';
    is 'a b c d'.subst(/\w/, 'x', :x(3..2)), 'a b c d', '.subst and :x(3..2)';

    # string pattern versions
    is 'a a a a'.subst('a', 'x', :g),      'x x x x', '.subst (str pattern) and :g';
    is 'a a a a'.subst('a', 'x', :x(0)),   'a a a a', '.subst (str pattern) and :x(0)';
    is 'a a a a'.subst('a', 'x', :x(1)),   'x a a a', '.subst (str pattern) and :x(1)';
    is 'a a a a'.subst('a', 'x', :x(2)),   'x x a a', '.subst (str pattern) and :x(2)';
    is 'a a a a'.subst('a', 'x', :x(3)),   'x x x a', '.subst (str pattern) and :x(3)';
    is 'a a a a'.subst('a', 'x', :x(4)),   'x x x x', '.subst (str pattern) and :x(4)';
    is 'a a a a'.subst('a', 'x', :x(5)),   'a a a a', '.subst (str pattern) and :x(5)';
    #?rakudo skip ':x(*)'
    skip 1; #is 'a a a a'.subst('a', 'x', :x(*)),   'x x x x', '.subst (str pattern) and :x(*)';

    is 'a a a a'.subst('a', 'x', :x(0..1)), 'x a a a', '.subst (str pattern) and :x(0..1)';
    is 'a a a a'.subst('a', 'x', :x(1..3)), 'x x x a', '.subst (str pattern) and :x(0..3)';
    is 'a a a a'.subst('a', 'x', :x(3..5)), 'x x x x', '.subst (str pattern) and :x(3..5)';
    is 'a a a a'.subst('a', 'x', :x(5..6)), 'a a a a', '.subst (str pattern) and :x(5..6)';
    is 'a a a a'.subst('a', 'x', :x(3..2)), 'a a a a', '.subst (str pattern) and :x(3..2)';
}


{
    is 'a b c d'.subst(/\w/, 'x', :nth(0)), 'a b c d', '.subst and :nth(0)';
    is 'a b c d'.subst(/\w/, 'x', :nth(1)), 'x b c d', '.subst and :nth(1)';
    is 'a b c d'.subst(/\w/, 'x', :nth(2)), 'a x c d', '.subst and :nth(2)';
    is 'a b c d'.subst(/\w/, 'x', :nth(3)), 'a b x d', '.subst and :nth(3)';
    is 'a b c d'.subst(/\w/, 'x', :nth(4)), 'a b c x', '.subst and :nth(4)';
    is 'a b c d'.subst(/\w/, 'x', :nth(5)), 'a b c d', '.subst and :nth(5)';

    # string pattern versions
    is 'a a a a'.subst('a', 'x', :nth(0)), 'a a a a', '.subst (str pattern) and :nth(0)';
    is 'a a a a'.subst('a', 'x', :nth(1)), 'x a a a', '.subst (str pattern) and :nth(1)';
    is 'a a a a'.subst('a', 'x', :nth(2)), 'a x a a', '.subst (str pattern) and :nth(2)';
    is 'a a a a'.subst('a', 'x', :nth(3)), 'a a x a', '.subst (str pattern) and :nth(3)';
    is 'a a a a'.subst('a', 'x', :nth(4)), 'a a a x', '.subst (str pattern) and :nth(4)';
    is 'a a a a'.subst('a', 'x', :nth(5)), 'a a a a', '.subst (str pattern) and :nth(5)';
}

{
    # combining :g and :nth:
    #?rakudo 2 todo 'RT #61130 -- are these tests actually wrong?'
    is 'a b c d'.subst(/\w/, 'x', :nth(1), :g), 'x x x x', '.subst and :g, :nth(1)';
    is 'a b c d'.subst(/\w/, 'x', :nth(2), :g), 'a x c x', '.subst and :g, :nth(2)';
    is 'a b c d'.subst(/\w/, 'x', :nth(3), :g), 'a b x d', '.subst and :g, :nth(3)';
}

{
    # combining :nth with :x
    is 'a b c d e f g h'.subst(/\w/, 'x', :nth(1,2,3,4), :x(3)),
       'x x x d e f g h',
       '.subst with :nth(1,2,3,4)) and :x(3)';

    is 'a b c d e f g h'.subst(/\w/, 'x', :nth(2,4,6,8), :x(2)),
       'a x c x e f g h',
       '.subst with :nth(2,4,6,8) and :x(2)';

    is 'a b c d e f g h'.subst(/\w/, 'x', :nth(2, 4, 1, 6), :x(3)),
       'a x c x e x g h',
       '.subst with :nth(2) and :x(3)';
}

{
    # :p
    is 'a b c d e f g h'.subst(/\w/, 'x', :p(0)),
       'x b c d e f g h',
       '.subst with :p(0)';

    is 'a b c d e f g h'.subst(/\w/, 'x', :p(1)),
       'a b c d e f g h',
       '.subst with :p(1)';

    is 'a b c d e f g h'.subst(/\w/, 'x', :p(2)),
       'a x c d e f g h',
       '.subst with :p(2)';
       
    # :p and :g
    is 'a b c d e f g h'.subst(/\w/, 'x', :p(0), :g),
       'x x x x x x x x',
       '.subst with :p(0) and :g';

    is 'a b c d e f g h'.subst(/\w/, 'x', :p(1), :g),
       'a b c d e f g h',
       '.subst with :p(1) and :g';

    is 'a b c d e f g h'.subst(/\w/, 'x', :p(2), :g),
       'a x x x x x x x',
       '.subst with :p(2) and :g';
}

{
    # :c
    is 'a b c d e f g h'.subst(/\w/, 'x', :c(0)),
       'x b c d e f g h',
       '.subst with :c(0)';

    is 'a b c d e f g h'.subst(/\w/, 'x', :c(1)),
       'a x c d e f g h',
       '.subst with :c(1)';

    is 'a b c d e f g h'.subst(/\w/, 'x', :c(2)),
       'a x c d e f g h',
       '.subst with :c(2)';
       
    # :c and :g
    is 'a b c d e f g h'.subst(/\w/, 'x', :c(0), :g),
       'x x x x x x x x',
       '.subst with :c(0) and :g';

    is 'a b c d e f g h'.subst(/\w/, 'x', :c(1), :g),
       'a x x x x x x x',
       '.subst with :c(1) and :g';

    is 'a b c d e f g h'.subst(/\w/, 'x', :c(2), :g),
       'a x x x x x x x',
       '.subst with :c(2) and :g';

    # :c and :nth(3, 4)
    is 'a b c d e f g h'.subst(/\w/, 'x', :c(0), :nth(3, 4)),
       'a b x x e f g h',
       '.subst with :c(0) and :nth(3, 4)';
    
    is 'a b c d e f g h'.subst(/\w/, 'x', :c(1), :nth(3, 4)),
       'a b c x x f g h',
       '.subst with :c(1) and :nth(3, 4)';
    
    is 'a b c d e f g h'.subst(/\w/, 'x', :c(2), :nth(3, 4)),
       'a b c x x f g h',
       '.subst with :c(2) and :nth(3, 4)';
}

#?rakudo skip 's:global/.../../ NYI'
{
    my $s = "ZBC";
    my @a = ("A", 'ZBC');

    $_ = q{Now I know my abc's};

#    s:global/Now/Wow/;
    is($_, q{Wow I know my abc's}, 'Constant substitution');

#    s:global/abc/$s/;
    is($_, q{Wow I know my ZBC's}, 'Scalar substitution');

#    s:g/BC/@a[]/;
    is($_, q{Wow I know my ZA ZBC's}, 'List substitution');

    dies_ok { 'abc' ~~ s/b/g/ },
            "can't modify string literal (only variables)";
}

# L<S05/Modifiers/The :s modifier is considered sufficiently important>
#?rakudo skip 'ss/.../.../'
{
    given "a\nb\tc d" {
#        ok ss/a b c d/w x y z/, 'successful substitution returns True';
        is $_, "w\nx\ty z", 'ss/.../.../ preserves whitespace';
    }

skip 1; #ok !("abc" ~~ ss/a b c/ x y z/), 'ss/// implies :s (-)';
}

#L<S05/Substitution/As with Perl 5, a bracketing form is also supported>
{
    my $a = 'abc';
    ok $a ~~ s[b] = 'de', 's[...] = ... returns true on success';
    is $a, 'adec', 'substitution worked';

    $a = 'abc';
    #?rakudo todo 's[...] seems to always return true?'
    nok $a ~~ s[d] = 'de', 's[...] = ... returns false on failure';
    is $a, 'abc', 'failed substitutions leaves string unchanged';
}

#?rakudo skip '$_ and s[...] do not work together yet'
{
    given 'abc' {
        ok (s[b] = 'de'), 's[...] = ... returns true on success';
        is $_, 'adec', 'substitution worked';
    }

    given 'abc' {
        s[d] = 'foo';
        is $_, 'abc', 'failed substitutions leaves string unchanged';
    }
}

#?rakudo skip "s:g[] and s[] with $/ NYI"
{
    my $x = 'foobar';
#    ok ($x ~~ s:g[o] = 'u'), 's:g[..] = returns True';
    is $x, 'fuubar', 'and the substition worked';

    given 'a b c' {
        s[\w] = uc($/);
        is $_, 'A b c', 'can use $/ on the RHS';
    }
    given 'a b c' {
        s[(\w)] = uc($0);
        is $_, 'A b c', 'can use $0 on the RHS';
    }

    given 'a b c' {
#        s:g[ (\w) ] = $0 x 2;
        is $_, 'aa bb cc', 's:g[...] and captures work together well';
    }
}

#L<S05/Substitution/Any scalar assignment operator may be used>
#?rakudo skip 's[...] op= RHS'
{
    given 'a 2 3' {
#        ok (s[\d] += 5), 's[...] += 5 returns True';
        is $_, 'a 7 3', 's[...] += 5 gave right result';
    }
    given 'a b c' {
#        s:g[\w] x= 2;
        is $_, 'aa bb cc', 's:g[..] x= 2 worked';
    }
}

#?rakudo skip 's:g[...] ='
{
    multi sub infix:<fromplus>(Match $a, Int $b) {
        $a.from + $b
    }

    given 'a b c' {
#        ok (s:g[\w] fromplus= 3), 's:g[...] customop= returned True';
        is $_, '3 5 7', '... and got right result';
    }
}

# RT #69044
{
    sub s { 'sub s' }
    $_ = "foo";
    #?rakudo skip 'RT 69044'
#    ok s,foo,bar, , 'bare s is always substititution';
    is s(), 'sub s', 'can call sub s as "s()"';
    #?rakudo skip 's () = RHS'
    $_ = "foo";
    skip 1; #ok s (foo) = 'bar', 'bare s is substitution before whitespace then parens';
}

# Test for :samecase
{
    is 'The foo and the bar'.subst('the', 'that', :samecase), 'The foo and that bar', '.substr and :samecase (1)';
    is 'The foo and the bar'.subst('the', 'That', :samecase), 'The foo and that bar', '.substr and :samecase (2)';
    is 'The foo and the bar'.subst(/:i the/, 'that', :samecase), 'That foo and the bar', '.substr (string pattern) and :    samecase (1)';
    is 'The foo and the bar'.subst(/:i The/, 'That', :samecase), 'That foo and the bar', '.substr (string pattern) and :    samecase (2)';
    is 'The foo and the bar'.subst(/:i the/, 'that', :g, :samecase), 'That foo and that bar', '.substr (string pattern)     and :g and :samecase (1)';
    is 'The foo and the bar'.subst(/:i The/, 'That', :g, :samecase), 'That foo and that bar', '.substr (string pattern)     and :g and :samecase (2)';

    my $str = "that";
    is 'The foo and the bar'.subst(/:i the/, {++$str}, :samecase), 'Thau foo and the bar', '.substr and samecase, worked with block replacement';
    is 'The foo and the bar'.subst(/:i the/, {$str++}, :g, :samecase), 'Thau foo and thav bar', '.substr and :g and :samecase, worked with block replacement';
}

done_testing;

}

Reply via email to