Re: PGE tests wanted (was P6GE tests wanted)

2004-12-18 Thread Markus Laire
Patrick R. Michaud wrote:
Larry mentioned 're_tests' file from perl5-source. Is anyone working on 
it currently? I could make a simple script to convert at least some of 
it to this pge-testing format which uses p6rule_*
'simple script' .. it isn't so simple anymore ;)
I'm not aware of anyone working on it currently, so please go ahead
and do this!
This test seems to cause an infinite loop
(with parrot_2004-12-16_160001)
p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)');  # infinite loop

I currently have some 400..500 tests autoconverted from 're_tests', but 
quite many are broken as my script still has few bugs. I'll send the 
tests here once I get as many tests converted as seems plausible. (The 
rest can then be converted manually, or ignored. Not all of those tests 
have use with perl6, e.g. many tests for \z \Z $ etc...)


Resulting file 're_tests.t' has original lines as comments, so if test 
fails, it's easy to check whether problem is in test or in pge.

(Currently I skip all tests for $+ as pge-testing format doesn't support 
this. I'm not sure if these are needed for anything, as it's trivial to 
get endpoint from startpoint and string length.)

== re_tests.t example ==
use Parrot::Test 'no_plan';
use Parrot::Test::PGE;
# Tests from re_tests in perl5-source
# --- re_tests ---
# 1: abcabc y   $  abc
# 2: abcabc y   $-[0]   0
# 3: abcabc y   $+[0]   3 # SKIP
p6rule_like('abc', 'abc', qr/0: \Qabc\E @ 0/, 're_tests 1 (#1)');
# 4: abcxbc n   -   -
p6rule_isnt('xbc', 'abc', 're_tests 2 (#2)');
# 5: abcaxc n   -   -
p6rule_isnt('axc', 'abc', 're_tests 3 (#3)');
# 6: abcabx n   -   -
p6rule_isnt('abx', 'abc', 're_tests 4 (#4)');
# 7: abcxabcy   y   $  abc
# 8: abcxabcy   y   $-[0]   1
# 9: abcxabcy   y   $+[0]   4 # SKIP
p6rule_like('xabcy', 'abc', qr/0: \Qabc\E @ 1/, 're_tests 5 (#5)');
...
...
== re_tests.t example ==
--
Markus Laire


Re: PGE tests wanted (was P6GE tests wanted)

2004-12-18 Thread Patrick R. Michaud
On Sat, Dec 18, 2004 at 12:16:31PM +0200, Markus Laire wrote:
 
 This test seems to cause an infinite loop
 (with parrot_2004-12-16_160001)
 
 p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)');  # infinite loop

So far repeating groups of zero-length strings causes an infinite loop-
I just haven't added the code to detect + avoid that yet.  I'll take care
of it today/tomorrow, but it's good to have the test in place.

 I currently have some 400..500 tests autoconverted from 're_tests', but 
 quite many are broken as my script still has few bugs. I'll send the 
 tests here once I get as many tests converted as seems plausible. (The 
 rest can then be converted manually, or ignored. Not all of those tests 
 have use with perl6, e.g. many tests for \z \Z $ etc...)

Wow, excellent!  If you want to go ahead and get them in earlier, you
could send what you have with the unconverted tests commented out.

Pm


Re: PGE tests wanted (was P6GE tests wanted)

2004-12-18 Thread Larry Wall
On Sat, Dec 18, 2004 at 12:16:31PM +0200, Markus Laire wrote:
: Patrick R. Michaud wrote:
: Larry mentioned 're_tests' file from perl5-source. Is anyone working on 
: it currently? I could make a simple script to convert at least some of 
: it to this pge-testing format which uses p6rule_*
: 
: 'simple script' .. it isn't so simple anymore ;)

Sorry.  Well, okay, I'm not really sorry.  :-)

In fact, I might like to look at your 'simple script' when I get further
along on the p5-to-p6 translator...

: I'm not aware of anyone working on it currently, so please go ahead
: and do this!
: 
: This test seems to cause an infinite loop
: (with parrot_2004-12-16_160001)
: 
: p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)');  # infinite loop

Detecting failure to progress can be quite tricky, actually.  It's easy
enough to detect that it *might* be an infinite loop.  But that pattern
would succeed the string were all a's and b's.  It's not enough to figure
out that you're at the same position or the same state.  You have to figure
out that you're at the same position and the same state, and you may well
have visited different positions in this state, or different states in
this position.  So a naive solution requires N**2 in time or space.

Henry Spencer's original regex routines simply disallowed expressions
that might be infinite.  We tried relaxing that in Perl 5, and got
it wrong more than one way.  I'm not actually sure what approach p5
takes right now, if any.

: (Currently I skip all tests for $+ as pge-testing format doesn't support 
: this. I'm not sure if these are needed for anything, as it's trivial to 
: get endpoint from startpoint and string length.)

The whole notion of string positions as integers is now somewhat
problematic in the Unicode era.  Is a position of 5 to be interpreted
as 5 bytes, 5 codepoints, 5 graphemes, or 5 letters?  String positions
are probably opaque objects that return different integer values
in different contexts.  And there is no such thing as the length
of a string anymore, unless it's another opaque object representing
the position at the end of the string.  And we've outlawed length
as a too-general concept. You have to tell it what units you mean
(.bytes, .codes, .graphs), or maybe use .chars for the default meaning
in the current context, if we decide to allow that.

As long as we're banishing .length from strings, we're also banishing
it from arrays.  You have to use .elems for that.  (At least all
this specificity now allows us to ask for the length of an array in
codepoints or graphemes...)

Anyway, sorry about the diatribe, but this is an area where we'll be
battling our own imprecision for years to come, not to mention everyone
else's.

Larry


Re: PGE tests wanted (was P6GE tests wanted)

2004-12-18 Thread hv
Larry Wall [EMAIL PROTECTED] wrote:
:Henry Spencer's original regex routines simply disallowed expressions
:that might be infinite.  We tried relaxing that in Perl 5, and got
:it wrong more than one way.  I'm not actually sure what approach p5
:takes right now, if any.

We detect and warn of repeated empty expressions:
  zen% perl -wle 'print ok if x =~ /()*/'/'
  ()* matches null string many times in regex; marked by -- HERE in m/()* -- 
HERE / at -e line 1.
  ok
  zen% 

For optionally empty expressions, we don't allow them to match emptily
more than once:
  zen% perl -wle 'while (baa =~ /((b??)*a)/g) { print $1 }'
  ba
  a
  zen% 

For optionally empty patterns, we don't allow them to match emptily at
the same location more than once:
  zen% perl -wle 'while (a =~ /(a??)/g) { print $1 }'
  
  a
  
  zen% 

This last is achieved by magic on the string to which the pattern is
applied, which can lead to problematic interactions with other magic
(eg tainting) or restoration after local(). In principle it may also
be undesirable if you are parsing a string with a variety of //gc
patterns, and want to allow more than one of them to match an empty
string at the same location.

Hugo


Re: PGE tests wanted (was P6GE tests wanted)

2004-12-18 Thread Patrick R. Michaud
On Sat, Dec 18, 2004 at 08:47:42AM -0800, Larry Wall wrote:
 : This test seems to cause an infinite loop
 : (with parrot_2004-12-16_160001)
 : 
 : p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)');  # infinite loop
 
 Detecting failure to progress can be quite tricky, actually.  It's easy
 enough to detect that it *might* be an infinite loop.  But that pattern
 would succeed the string were all a's and b's.  It's not enough to figure
 out that you're at the same position or the same state.  You have to figure
 out that you're at the same position and the same state, and you may well
 have visited different positions in this state, or different states in
 this position.  So a naive solution requires N**2 in time or space.

In PGE I've been thinking this won't be *too* difficult (and I'll fully
admit to the possibility of being naive here).  Our states are actually
encoded into the pattern's subroutine code, with our current state
being held by Parrot's execution pointer and the various stacks, and 
we're already keeping track of the starting and current position of 
each substring being matched by a repeating group.  (Or, if we're 
not, we certainly can keep track of it in a stack of some sort.

So, when we get to the end of the bracketed group, we look to see 
if the current position has changed at all since we started the group, 
and if not we refuse to repeat the group again but just go on to 
whatever other checks need to be made.  If going on to the remaining 
checks causes a match failure, we're just going to backtrack into the 
group's subexpression anyway, which would then start doing things that
change the current pointer we'll be seeing at the end of the group.
This may not always be the most efficient mechanism -- i.e., we could
find ourselves repeating later matches we've already tried, but at
least it shouldn't infinite loop.

But rather than trying to explain it all and debate here whether it'll 
work or not, it's probably quicker for me to just implement that section 
of code and let our tests tell the tale.  I'll do that today/tomorrow and
report back on the results.  But if anyone knows of a case where what
I've discussed isn't likely to work or has caused problems in the past, 
let me know so we can code up a test and/or workaround for it and see
what happens.

Pm


Re: PGE tests wanted (was P6GE tests wanted)

2004-12-17 Thread Markus Laire
I'm currently writing few tests for PGE. So far I've found 2 failing 
tests: (with parrot_2004-12-16_160001.tar.gz)

p6rule_like('abcabbc', 'ab+?bc', qr/0: abbc @ 3/, '');
p6rule_like('abbcabbbc', 'ab+?', qr/0: ab @ 0/, '');
output from perl t/harness mytests/*.t is attached.
Larry mentioned 're_tests' file from perl5-source. Is anyone working on 
it currently? I could make a simple script to convert at least some of 
it to this pge-testing format which uses p6rule_*


mytests/capture# Failed test (lib/Parrot/Test/PGE.pm at line 73)
#   'error:imcc:parse error, unexpected LABEL, expecting 
IDENTIFIER or PARROT_OP
# in file 'EVAL_1' line 136
# '
# doesn't match '(?-xism:0: abbc @ 3)'
# '(cd .  ./parrot  
/home/malaire/omat/downloads/parrot/mytests/capture_6.imc)' failed with exit 
code 17
# Failed test (lib/Parrot/Test/PGE.pm at line 73)
#   'error:imcc:parse error, unexpected LABEL, expecting 
IDENTIFIER or PARROT_OP
# in file 'EVAL_1' line 134
# '
# doesn't match '(?-xism:0: ab @ 0)'
# '(cd .  ./parrot  
/home/malaire/omat/downloads/parrot/mytests/capture_7.imc)' failed with exit 
code 17
# Looks like you failed 2 tests of 7.
dubious
Test returned status 2 (wstat 512, 0x200)
Scalar found where operator expected at (eval 158) line 1, near 'int'  $__val
(Missing operator before   $__val?)
DIED. FAILED tests 6-7
Failed 2/7 tests, 71.43% okay
Failed 1/1 test scripts, 0.00% okay. 2/7 subtests failed, 71.43% okay.
Failed Test   Stat Wstat Total Fail  Failed  List of Failed
---
mytests/capture.t2   512 72  28.57%  6-7



Re: PGE tests wanted (was P6GE tests wanted)

2004-12-17 Thread Patrick R. Michaud
On Fri, Dec 17, 2004 at 10:21:40AM +0200, Markus Laire wrote:
 I'm currently writing few tests for PGE. So far I've found 2 failing 
 tests: (with parrot_2004-12-16_160001.tar.gz)
 
 p6rule_like('abcabbc', 'ab+?bc', qr/0: abbc @ 3/, '');
 p6rule_like('abbcabbbc', 'ab+?', qr/0: ab @ 0/, '');

Woops, the compiled output had an extraneous colon in the PIR
code on line 239 of pge_gen.c.  Now fixed in CVS -- it only
would show up on lazy quantifications where the minimum number
of repetitions was less than zero.

 Larry mentioned 're_tests' file from perl5-source. Is anyone working on 
 it currently? I could make a simple script to convert at least some of 
 it to this pge-testing format which uses p6rule_*

I'm not aware of anyone working on it currently, so please go ahead
and do this!

Many thanks!

Pm