Hi Lard,
sorry for the late and incomplete answer.
Am 04.07.2012 15:09, schrieb Lard Farnwell:
Hi Moritz,
Thanks that was interesting. My investigation into grammars took a while but
here are the results thus far:
Grammar rules and regexes are just methods…
I hadn't thought about what a grammar and rule actually was before. This
inspired me to try:
---------------------------
grammar Gram{
has $.x;
rule TOP{
{say $.x}
}
method test{
say $.x
}
}
my Gram $test .= new(:x("hello"));
$test.parse("ignore this");
$test.test;
say $test.TOP;
---------------------------
which outputs:
Any() #output of TOP in parse
hello #output of test.test
hello #outputted on direct call to rule
Gram.new(x => Any) #the return value of $test.TOP
So rules can't interpolate their grammar's attributes when being called by
'parse' but can when called as a method. Also rules being called directly as
methods return the parent grammar. I'm not sure whether either of these things
are intended…
I'm not sure how it's intended to work either.
Notionally, grammar rules (and other components of regexes) communicate
by passing "cursors" around. A cursor is an immutable object that points
to a location in the string, and additionally keeps track of other
information like captures.
So when you write 'grammar gram { ... }', you are actually inheriting
from class Grammar, which in turn inherits from class Cursor.
When you call the .parse method, a cursor is intatiated automatically,
which explains why its attribute is empty -- it's not the same object as
you created in your code.
I'm not sure if there is a mechanism to passing around attributes -- so
far I always just assumed it would work, but it doesn't.
=============================
Also I tried rules with arguments and it worked from grammar->parse but not
from calling directly as a method.
---------------------------
grammar Gram{
rule TOP{
<test_rule('hello')>
}
rule test_rule($a){
$a
}
}
my Gram $test .= new();
$test.parse("hello") #returns true
$test.test_rule("hello") #error
---------------------------
The error is:
Invalid operation on null string
in any !LITERAL at src/stage2/QRegex.nqp:653
in method INTERPOLATE at src/gen/CORE.setting:9731
(at the line where test_rule starts)
=============================
Ok now to try the things you mentioned:
First I tried using a parcel instead of an array as the role prototype (array
resulted in error):
---------------------------
role roley [$foo]{
token tokeny { $foo }
}
grammar gram {
token TOP { <tokeny> }
}
---------------------------
my gram $gram .= new does roley[('this','or', 'that')];
$gram.parse('this or that'); #returns true
So parcels get joined with spaces into one token
That's a known not-yet-implemented part of Rakudo :(
=============================
Now to try the around about way:
---------------------------
role roley [$foo]{
token tokeny:sym<dynamic> { $foo }
}
grammar gram {
token TOP { <tokeny>[\ <tokeny>]* }
proto token tokeny {*}
}
my gram $gram .= new;
$gram does roley[$_] for <that this>;
$gram.parse('this'); #matches
$gram.parse('that'); #nope
---------------------------
Each iteration overwrites the previous one in terms of what 'tokeny' resolves
to rather than adding it (symmetrically? is that what sym is short for?)
"sym" stands for "symbol", the thing that appears in the name of the
token inside <...>.
============================
One more thing I found which seems to be a bug. I defined my nouns/pronouns
like:
---------------------------
token PN:sym<John> { <.sym> } #The dot should mean it doesn't get captured
token N:sym<ball> { <.sym> }
---------------------------
when my grammar parses this it ends up with a tree like this:
---------------------------
sentence => q[John hit the ball]
statement => q[John hit the ball]
NP => q[John]
PN => q[John]
=> q[John]
VP => q[hit the ball]
verb => q[hit]
=> q[hit]
NP => q[the ball]
D => q[the]
=> q[the]
N => q[ball]
=> q[ball]
---------------------------
Notice the empty slots on the left. Rather than not capturing the <sym> the
<.sym> just means it doesn't capture it's name :S
I've recently discovered the same bug (and tried to fix it, instead of
submitted it as a bug report; I failed to fix it though :/). Basically
<sym> is special-cased in the compiler, and the . modifier at the start
simply doesn't harmonize with that special case.
============================
So after all this I have a much better understanding of what grammars really
are but I'm still confused about a few things:
grammars are like classes. They are special because they have a method called
'parse' which applies a rule/token definition (regex) called TOP (or whatever
is set by the :rule argument to parse).
Q: Are grammars meant to be able to have attributes like classes and are they
meant to be able to interpolate them into their rules/token?
rules and tokens are just special types of methods who's body is a regex rather
than perl6 code.
See above. The answer is "I'm not quite sure".
Q: What is the meaning of the return values of tokens/rules when called as
methods?
The specification says that a token/rule/regex returns cursor if there
is one possible match, or a lazy list of cursors with possible matches
if there are multiple ways it can match (for backtracking).
I don't think Rakudo sticks to that calling convention though;
backtracking is somehow managed through a stack of integers (pointing to
positions of the string) inside a capture object. Or so. I'm really not
an expert when it comes to implementation details of the regex engine.
I also don't know what exact arguments a regex gets passed to at invocation.
Q: Is it possible to write a normal method that conforms the the same interface as
rules/tokens (whatever that is). i.e. where we can use <normal_method> in
rules/tokens which is passed arguments and somehow matches and sets position etc.
See above. Tricky right now, because of the mismatched calling
convention between Rakudo and the specification.
Q: Are rules/tokens meant to be able to have arguments like methods and if so
how do they fit in.
grammar A {
token foo($x) { \' ~ \' $x };
token TOP { <foo("bar")> }
};
say A.parse(q['bar']); # matches
say A.parse(q['baz']); # no match
grammars don't check whether the things in their tokens/rules like <foo> are
actually defined until it comes time to call them
> Q: Is this the way it's meant to be?
Yes. Calls like <foo> are simply method calls, and method calls aren't
easily veriable at compile time. It's perfectly fine to write
grammar Sentence {
rule TOP { <subject> <verb> <object> }
}
and require that subclasses implement the rules subject, verb and object.
I saw your post on doc.perl6.org docs. If I can get my head around all this I
would be happy to help document grammars!
That would be very much appreciated.
I've also asked the author of
https://github.com/perlpilot/perl6-docs/blob/master/intro/p6-grammar-intro.pod
whether we can use that as a base for a regex/grammar tutorial on
dec.perl6.org, and knowing the author I don't think he'll object. (Just
want to make sure you don't duplicate effort in this area).
Cheers,
Moritz