Hi Moritz, Thanks that was interesting. My investigation into grammars took a while but here are the results thus far:
> Grammar rules and regexes are just methods… I hadn't thought about what a grammar and rule actually was before. This inspired me to try: --------------------------- grammar Gram{ has $.x; rule TOP{ {say $.x} } method test{ say $.x } } my Gram $test .= new(:x("hello")); $test.parse("ignore this"); $test.test; say $test.TOP; --------------------------- which outputs: Any() #output of TOP in parse hello #output of test.test hello #outputted on direct call to rule Gram.new(x => Any) #the return value of $test.TOP So rules can't interpolate their grammar's attributes when being called by 'parse' but can when called as a method. Also rules being called directly as methods return the parent grammar. I'm not sure whether either of these things are intended… ============================= Also I tried rules with arguments and it worked from grammar->parse but not from calling directly as a method. --------------------------- grammar Gram{ rule TOP{ <test_rule('hello')> } rule test_rule($a){ $a } } my Gram $test .= new(); $test.parse("hello") #returns true $test.test_rule("hello") #error --------------------------- The error is: Invalid operation on null string in any !LITERAL at src/stage2/QRegex.nqp:653 in method INTERPOLATE at src/gen/CORE.setting:9731 (at the line where test_rule starts) ============================= Ok now to try the things you mentioned: First I tried using a parcel instead of an array as the role prototype (array resulted in error): --------------------------- role roley [$foo]{ token tokeny { $foo } } grammar gram { token TOP { <tokeny> } } --------------------------- my gram $gram .= new does roley[('this','or', 'that')]; $gram.parse('this or that'); #returns true So parcels get joined with spaces into one token ============================= Now to try the around about way: --------------------------- role roley [$foo]{ token tokeny:sym<dynamic> { $foo } } grammar gram { token TOP { <tokeny>[\ <tokeny>]* } proto token tokeny {*} } my gram $gram .= new; $gram does roley[$_] for <that this>; $gram.parse('this'); #matches $gram.parse('that'); #nope --------------------------- Each iteration overwrites the previous one in terms of what 'tokeny' resolves to rather than adding it (symmetrically? is that what sym is short for?) ============================ One more thing I found which seems to be a bug. I defined my nouns/pronouns like: --------------------------- token PN:sym<John> { <.sym> } #The dot should mean it doesn't get captured token N:sym<ball> { <.sym> } --------------------------- when my grammar parses this it ends up with a tree like this: --------------------------- sentence => q[John hit the ball] statement => q[John hit the ball] NP => q[John] PN => q[John] => q[John] VP => q[hit the ball] verb => q[hit] => q[hit] NP => q[the ball] D => q[the] => q[the] N => q[ball] => q[ball] --------------------------- Notice the empty slots on the left. Rather than not capturing the <sym> the <.sym> just means it doesn't capture it's name :S ============================ So after all this I have a much better understanding of what grammars really are but I'm still confused about a few things: grammars are like classes. They are special because they have a method called 'parse' which applies a rule/token definition (regex) called TOP (or whatever is set by the :rule argument to parse). Q: Are grammars meant to be able to have attributes like classes and are they meant to be able to interpolate them into their rules/token? rules and tokens are just special types of methods who's body is a regex rather than perl6 code. Q: What is the meaning of the return values of tokens/rules when called as methods? Q: Is it possible to write a normal method that conforms the the same interface as rules/tokens (whatever that is). i.e. where we can use <normal_method> in rules/tokens which is passed arguments and somehow matches and sets position etc. Q: Are rules/tokens meant to be able to have arguments like methods and if so how do they fit in. grammars don't check whether the things in their tokens/rules like <foo> are actually defined until it comes time to call them Q: Is this the way it's meant to be? I saw your post on doc.perl6.org docs. If I can get my head around all this I would be happy to help document grammars! Cheers, Lard On 27/06/2012, at 12:49 AM, Moritz Lenz wrote: > > > On 06/26/2012 02:04 PM, Lard Farnwell wrote: >> Hi guys, >> >> To understand and play around with perl6 grammars I was trying to do a >> simple NLP parts of speech parser in perl6 grammars. This is sort of what I >> did: >> >> --------------------------- >> grammar Sentence{ >> proto rule VP {*} >> proto rule NP {*} >> >> rule sentence { >> <imperative>|<statement> >> } >> rule imperative {<VP>} >> rule statement {<NP> <VP>} >> } >> >> grammar VerbPhrase is Sentence{ >> rule VP:sym<hit> {<sym> <NP>} >> rule VP:sym<kill> {<sym> <NP>} >> } >> >> grammar NounPhrase is Sentence{ >> #define NP:sym etc >> } >> >> >> grammar English is NounPhrase is VerbPhrase { >> rule TOP { >> <Sentence>[\. <Sentence]* >> } >> } >> -------------------------------- >> >> So in case you don't get it, A sentence is made up of phrases which in turn >> can be made up on other phrases. And English is made up of Sentences. >> This sort of thing works but doesn't make much sense. >> >> The obvious problem is that to get the correct definitions of the proto >> rules in Sentence I have to say "verbPhrase is Sentence" and then "English >> is NounPhrase is VerbPhrase etc" . This makes me feel like I'm doing it >> wrong. > > Indeed. The intended mechanism for code reuse in object oriented Perl 6 > code is role composition. > > Grammar rules and regexes are just methods, so defining them in a role > and applying it to a class sounds like a good idea to me. > > role VerbPhrase { > rule VP { <verb> <NP> } > proto token verb {*} > token verb:sym<hit> { <sym> } > token verb:sym<kill> { <sym> } > } > > Define NounPhrase in a similar way, leave out the definition of NP and > VP from Sentence, and then write > > grammar English does NounPhrase does VerbPhrase is Sentence { > token TOP { ... } > } > > Role composition has much more transparent error modes than inheritance, > and probably works better for you. > > >> How do I build a flexible dynamic grammar in a OO sort of way. For example >> how could I do this so: >> >> 1) I define all my phrase structures (NP,VP,PP etc) in their own file while >> still being able to use each other. There are VPs can be made of NPs and NPs >> can be made up of VPs. > > See above > >> 2) Add to these definitions dynamically. For example, here I have defined >> "hit and kill" VPs. What if I wanted to add "dance" VP definition at run >> time? > > In theory you can write > > role VerbPhrases[@verbs] { > token verb:sym<dynamic> { @verbs } > # note that 'dynamic' has no special meaning here, but since > # we don't use <sym> in the regex body, it doesn't matter what > # we write > } > > And then instantiate your grammar as > > my $g = English.new does VerbPhrases[<dance listen juggle ...>]; > my $match = $g.parse($yourstring); > > But Rakudo doesn't yet properly handle array variables in regexes, so > you have to write something like > > role AdditionalVerbPhrase[$verb] { > token verb:sym<dynamic> { $verb }; > } > > my $g = English.new; > $g does AddtionalVerbPhrase[$_] for <dance listen juggle ...>; > my $match = $g.parse(...); > > I haven't tested it though. > If you experiment with it, please report your findings here, I'm curious > about what works right now. If it doesn't work, we can surely find some > way to make it work by going through the meta object to add methods to > the grammar. > > Cheers, > Moritz