Re: PGE tests wanted (was P6GE tests wanted)
Patrick R. Michaud wrote: Larry mentioned 're_tests' file from perl5-source. Is anyone working on it currently? I could make a simple script to convert at least some of it to this pge-testing format which uses p6rule_* 'simple script' .. it isn't so simple anymore ;) I'm not aware of anyone working on it currently, so please go ahead and do this! This test seems to cause an infinite loop (with parrot_2004-12-16_160001) p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)'); # infinite loop I currently have some 400..500 tests autoconverted from 're_tests', but quite many are broken as my script still has few bugs. I'll send the tests here once I get as many tests converted as seems plausible. (The rest can then be converted manually, or ignored. Not all of those tests have use with perl6, e.g. many tests for \z \Z $ etc...) Resulting file 're_tests.t' has original lines as comments, so if test fails, it's easy to check whether problem is in test or in pge. (Currently I skip all tests for $+ as pge-testing format doesn't support this. I'm not sure if these are needed for anything, as it's trivial to get endpoint from startpoint and string length.) == re_tests.t example == use Parrot::Test 'no_plan'; use Parrot::Test::PGE; # Tests from re_tests in perl5-source # --- re_tests --- # 1: abcabc y $ abc # 2: abcabc y $-[0] 0 # 3: abcabc y $+[0] 3 # SKIP p6rule_like('abc', 'abc', qr/0: \Qabc\E @ 0/, 're_tests 1 (#1)'); # 4: abcxbc n - - p6rule_isnt('xbc', 'abc', 're_tests 2 (#2)'); # 5: abcaxc n - - p6rule_isnt('axc', 'abc', 're_tests 3 (#3)'); # 6: abcabx n - - p6rule_isnt('abx', 'abc', 're_tests 4 (#4)'); # 7: abcxabcy y $ abc # 8: abcxabcy y $-[0] 1 # 9: abcxabcy y $+[0] 4 # SKIP p6rule_like('xabcy', 'abc', qr/0: \Qabc\E @ 1/, 're_tests 5 (#5)'); ... ... == re_tests.t example == -- Markus Laire
Re: PGE tests wanted (was P6GE tests wanted)
On Sat, Dec 18, 2004 at 12:16:31PM +0200, Markus Laire wrote: This test seems to cause an infinite loop (with parrot_2004-12-16_160001) p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)'); # infinite loop So far repeating groups of zero-length strings causes an infinite loop- I just haven't added the code to detect + avoid that yet. I'll take care of it today/tomorrow, but it's good to have the test in place. I currently have some 400..500 tests autoconverted from 're_tests', but quite many are broken as my script still has few bugs. I'll send the tests here once I get as many tests converted as seems plausible. (The rest can then be converted manually, or ignored. Not all of those tests have use with perl6, e.g. many tests for \z \Z $ etc...) Wow, excellent! If you want to go ahead and get them in earlier, you could send what you have with the unconverted tests commented out. Pm
Re: PGE tests wanted (was P6GE tests wanted)
On Sat, Dec 18, 2004 at 12:16:31PM +0200, Markus Laire wrote: : Patrick R. Michaud wrote: : Larry mentioned 're_tests' file from perl5-source. Is anyone working on : it currently? I could make a simple script to convert at least some of : it to this pge-testing format which uses p6rule_* : : 'simple script' .. it isn't so simple anymore ;) Sorry. Well, okay, I'm not really sorry. :-) In fact, I might like to look at your 'simple script' when I get further along on the p5-to-p6 translator... : I'm not aware of anyone working on it currently, so please go ahead : and do this! : : This test seems to cause an infinite loop : (with parrot_2004-12-16_160001) : : p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)'); # infinite loop Detecting failure to progress can be quite tricky, actually. It's easy enough to detect that it *might* be an infinite loop. But that pattern would succeed the string were all a's and b's. It's not enough to figure out that you're at the same position or the same state. You have to figure out that you're at the same position and the same state, and you may well have visited different positions in this state, or different states in this position. So a naive solution requires N**2 in time or space. Henry Spencer's original regex routines simply disallowed expressions that might be infinite. We tried relaxing that in Perl 5, and got it wrong more than one way. I'm not actually sure what approach p5 takes right now, if any. : (Currently I skip all tests for $+ as pge-testing format doesn't support : this. I'm not sure if these are needed for anything, as it's trivial to : get endpoint from startpoint and string length.) The whole notion of string positions as integers is now somewhat problematic in the Unicode era. Is a position of 5 to be interpreted as 5 bytes, 5 codepoints, 5 graphemes, or 5 letters? String positions are probably opaque objects that return different integer values in different contexts. And there is no such thing as the length of a string anymore, unless it's another opaque object representing the position at the end of the string. And we've outlawed length as a too-general concept. You have to tell it what units you mean (.bytes, .codes, .graphs), or maybe use .chars for the default meaning in the current context, if we decide to allow that. As long as we're banishing .length from strings, we're also banishing it from arrays. You have to use .elems for that. (At least all this specificity now allows us to ask for the length of an array in codepoints or graphemes...) Anyway, sorry about the diatribe, but this is an area where we'll be battling our own imprecision for years to come, not to mention everyone else's. Larry
Re: PGE tests wanted (was P6GE tests wanted)
Larry Wall [EMAIL PROTECTED] wrote: :Henry Spencer's original regex routines simply disallowed expressions :that might be infinite. We tried relaxing that in Perl 5, and got :it wrong more than one way. I'm not actually sure what approach p5 :takes right now, if any. We detect and warn of repeated empty expressions: zen% perl -wle 'print ok if x =~ /()*/'/' ()* matches null string many times in regex; marked by -- HERE in m/()* -- HERE / at -e line 1. ok zen% For optionally empty expressions, we don't allow them to match emptily more than once: zen% perl -wle 'while (baa =~ /((b??)*a)/g) { print $1 }' ba a zen% For optionally empty patterns, we don't allow them to match emptily at the same location more than once: zen% perl -wle 'while (a =~ /(a??)/g) { print $1 }' a zen% This last is achieved by magic on the string to which the pattern is applied, which can lead to problematic interactions with other magic (eg tainting) or restoration after local(). In principle it may also be undesirable if you are parsing a string with a variety of //gc patterns, and want to allow more than one of them to match an empty string at the same location. Hugo
Re: PGE tests wanted (was P6GE tests wanted)
On Sat, Dec 18, 2004 at 08:47:42AM -0800, Larry Wall wrote: : This test seems to cause an infinite loop : (with parrot_2004-12-16_160001) : : p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)'); # infinite loop Detecting failure to progress can be quite tricky, actually. It's easy enough to detect that it *might* be an infinite loop. But that pattern would succeed the string were all a's and b's. It's not enough to figure out that you're at the same position or the same state. You have to figure out that you're at the same position and the same state, and you may well have visited different positions in this state, or different states in this position. So a naive solution requires N**2 in time or space. In PGE I've been thinking this won't be *too* difficult (and I'll fully admit to the possibility of being naive here). Our states are actually encoded into the pattern's subroutine code, with our current state being held by Parrot's execution pointer and the various stacks, and we're already keeping track of the starting and current position of each substring being matched by a repeating group. (Or, if we're not, we certainly can keep track of it in a stack of some sort. So, when we get to the end of the bracketed group, we look to see if the current position has changed at all since we started the group, and if not we refuse to repeat the group again but just go on to whatever other checks need to be made. If going on to the remaining checks causes a match failure, we're just going to backtrack into the group's subexpression anyway, which would then start doing things that change the current pointer we'll be seeing at the end of the group. This may not always be the most efficient mechanism -- i.e., we could find ourselves repeating later matches we've already tried, but at least it shouldn't infinite loop. But rather than trying to explain it all and debate here whether it'll work or not, it's probably quicker for me to just implement that section of code and let our tests tell the tale. I'll do that today/tomorrow and report back on the results. But if anyone knows of a case where what I've discussed isn't likely to work or has caused problems in the past, let me know so we can code up a test and/or workaround for it and see what happens. Pm
Re: PGE tests wanted (was P6GE tests wanted)
I'm currently writing few tests for PGE. So far I've found 2 failing tests: (with parrot_2004-12-16_160001.tar.gz) p6rule_like('abcabbc', 'ab+?bc', qr/0: abbc @ 3/, ''); p6rule_like('abbcabbbc', 'ab+?', qr/0: ab @ 0/, ''); output from perl t/harness mytests/*.t is attached. Larry mentioned 're_tests' file from perl5-source. Is anyone working on it currently? I could make a simple script to convert at least some of it to this pge-testing format which uses p6rule_* mytests/capture# Failed test (lib/Parrot/Test/PGE.pm at line 73) # 'error:imcc:parse error, unexpected LABEL, expecting IDENTIFIER or PARROT_OP # in file 'EVAL_1' line 136 # ' # doesn't match '(?-xism:0: abbc @ 3)' # '(cd . ./parrot /home/malaire/omat/downloads/parrot/mytests/capture_6.imc)' failed with exit code 17 # Failed test (lib/Parrot/Test/PGE.pm at line 73) # 'error:imcc:parse error, unexpected LABEL, expecting IDENTIFIER or PARROT_OP # in file 'EVAL_1' line 134 # ' # doesn't match '(?-xism:0: ab @ 0)' # '(cd . ./parrot /home/malaire/omat/downloads/parrot/mytests/capture_7.imc)' failed with exit code 17 # Looks like you failed 2 tests of 7. dubious Test returned status 2 (wstat 512, 0x200) Scalar found where operator expected at (eval 158) line 1, near 'int' $__val (Missing operator before $__val?) DIED. FAILED tests 6-7 Failed 2/7 tests, 71.43% okay Failed 1/1 test scripts, 0.00% okay. 2/7 subtests failed, 71.43% okay. Failed Test Stat Wstat Total Fail Failed List of Failed --- mytests/capture.t2 512 72 28.57% 6-7
Re: PGE tests wanted (was P6GE tests wanted)
On Fri, Dec 17, 2004 at 10:21:40AM +0200, Markus Laire wrote: I'm currently writing few tests for PGE. So far I've found 2 failing tests: (with parrot_2004-12-16_160001.tar.gz) p6rule_like('abcabbc', 'ab+?bc', qr/0: abbc @ 3/, ''); p6rule_like('abbcabbbc', 'ab+?', qr/0: ab @ 0/, ''); Woops, the compiled output had an extraneous colon in the PIR code on line 239 of pge_gen.c. Now fixed in CVS -- it only would show up on lazy quantifications where the minimum number of repetitions was less than zero. Larry mentioned 're_tests' file from perl5-source. Is anyone working on it currently? I could make a simple script to convert at least some of it to this pge-testing format which uses p6rule_* I'm not aware of anyone working on it currently, so please go ahead and do this! Many thanks! Pm