Re: S5 updated

Edward Peschko Fri, 24 Sep 2004 17:45:44 -0700

> >>>just like the transformation of a string into a number, and from a 
> >>>number to a string. Two algorithmically different things as well, 
> >>>but they'd damn-well better be exact inverses of the
> >>>other.
> >>
> >>But they're not:
> >>
> >>  "  3 foo" --> 3 --> "3"
> >
> >I'd say that that's a caveat of implementation, sort of a side effect 
> >of handling
> >an error condition.
> 
> Nope, I'd call it fundamental semantics--it allows common idioms such 
> as "0 but true" in Perl5, for example. It's just an explicit part of 
> the rule for how Perl (and C's strtol/atoi functions) assign numerical 
> values to strings.


Ok, ok, I'll give you that point ... lets call them 'intimately related' and 
leave it at that... if you say "3 foo" and your algorithm goes:

        "3 foo" => 3 => "2"

then you know something is desperately wrong.

> Yeah, but when a regex isn't acting how I expected it to, I know that 
> because I've already got in-hand an example of a string it matches 
> which I thought it wouldn't, or one it fails to match which I thought 
> it should. What I want to know is *why*--what part of the regex do I 
> need to change. Generating strings which would have matched, wouldn't 
> seem to help much.

well, in a few cases, yes, but in most, yes I'd say it was a great help. 
Especially for boundary cases. That's what I do - I take a regular 
expression and generate the strings that come out of it.  
You can then follow along how the regular expression is working.
Take my original example:

("( <[^\n]>+ | < \\<[^\n]>)+ ")

If you see it generate:

"x"
"\x"
"xx"
"x\x"

you'll see right off that it doesn't generate empty strings. its got to be the
'+'... change it to

("( <[^\n]>+ | < \\<[^\n]>)* ")

and now it generates:

""
"\x"

...

> And you might be underestimating how many strings can be generated from 
> even a simple regex, and how uninformative they could be. For example, 
> the Perl5 regex /[a-z]{10}/ will match 141167095653376 different 
> strings, and it would likely be a very long time before I'd find out if 
> this would match any strings starting with "x". I'd probably be left 
> with the impression that it would only match strings starting with 
> "aaaaa".
> 

yes, I'm well aware of combinatorial explosion, and the way to get around it is
to have intelligent modifiers for the generation engine.  1 modifier picks a random 
character to show for a character class, 1 deals with how much 'branching' 
a given alternation is given, one deals with permutations (where if one 
alternation, is picked the first time, etc, one picks
if the operator is lazy.  

Its a generic way of generating data. So depending on what you pick as a 
modifier for g//, your [a-z]{10} example might show:


aaaaaaaaaa
aaaaaaaaab
aaaaaaaaac
aaaaaaaaad

or 

pfghdsifgk
rrtffdsdfw
ffhytweyth
abaweysdfh
xtetwhadsf

or simply 

aaaaaaaaaa


In fact in one incarnation, you could consider g"" a generalization of 
the .. operator:

my @array = (aa..bb); # == g"[a-z]{2}".. 

my @array = ????        # want aaa,aab,aba,abb,baa,bab,bba,bbb..  how?

                        # use g"[ab]{3}", or less prosaically
                        # g"[ab][ab][ab]"

> >Running a regular expression in reverse has IMO the best potential for 
> >making
> >regexes transparent - you graphically see how they work and what they 
> >match.
> 
> How graphically?

graphically, as in you see a list of strings that come out of a 
regular expression. Its still an exercise on the part of the reader to
put the strings together with the regex to see how they match, and understand
their regular expression.

In fact, you could augment the graphic part with a specific ":debug" modifier,
which both prints the string along with a bolded printout of how the string matched
the regular expression. (ie: what steps matched where). This'd probably be a 
worthwhile modifier for *both* the generating engine and the regular 
expression engine.

> >Why shouldn't that be reflected in the language itself?
> 
> Maybe because if it's likely to be used mostly for debugging, and can 
> be implemented in a library, then it doesn't need to be implemented as 
> an operator, and contribute to the general learning curve of the 
> language's syntax.

but that's my main point - 

        1) its wouldn't be used mostly for debugging. I can think of several
           other uses for it, ones that I've mentioned both before and above:

           regression testing
           permutations
           combinations
           generalization of ".."
           random file name/string name generation
           test data generation (for databases)
           
        2) there are far, far more esoteric features than this built
           into the language. I mentioned continuations, you could add 
           hyper operators, junctives, magical whitespace, etc. I don't mind
           having these there, as long as they give me power... why should
           I mind this.

        3) As opposed to the new features listed in #2, this fits 
           comfortably into the regular expression learning paradigm ie: 
           once you know regular expressions, you have a good handle on using this.


I don't particularly care if its a library or an operator, but I do think
that it ultimately belongs in the 'core' of perl6 if anything for its
ability to test the regular expression engine. (although I will say that 
I think that 'operator' fits it better because it is so low level). 

If anything, I'd love to do:

while ($prog = g:bound:lazy/<perl6_grammar>/) { print $prog }

just to see what it prints out... 

And 

g:bound:match(10000000)/<perl6_grammar>/ ~~ m/<perl6_grammar>/

*would* make a damn good regression test.  ;-)

Ed

Re: S5 updated

Reply via email to