Re: a smarter form of whitespace
Patrick R. Michaud wrote: On Tue, Jul 04, 2006 at 12:57:16PM -0700, Allison Randal wrote: -- token start { ^emptyline*$ } regex emptyline { ^^ $$ \n } token ws { [sp | \t]* } -- The above grammar doesn't have a grammar statement; as a result the regexes are being installed into the '' namespace. The original did have a 'grammar' statement, I just didn't paste it into the email. $ cat xyz.pir .sub main :main load_bytecode 'PGE.pbc' load_bytecode 'ar.pir' load_bytecode 'dumper.pbc' load_bytecode 'PGE/Dumper.pbc' $P0 = find_global 'XYZ', 'start' $P1 = $P0(\n\n\n\n\n\n\n, 'grammar' = 'XYZ') What the original didn't have is the 'grammar' named argument when calling the start rule. When I replace the previous line with: $P1 = $P0(\n\n\n\n\n\n\n) then your sample code exhibits the same problem. I assume this means that the reason overriding ws wasn't working is because it was calling the default version of ws in the root namespace. But, if it was defaulting to the root namespace, why was it able to find any of the rules? Shouldn't it have complained that it couldn't find emptyline? Thanks, Allison
Re: a smarter form of whitespace
On Tue, Jul 04, 2006 at 12:57:16PM -0700, Allison Randal wrote: I'm writing a parser for a language that treats a double newline as a statement terminator. It works if I make every rule a 'regex' (to turn off smart whitespace). But I want spaces and tabs to act as smart whitespace, and newlines to act as literal whitespace. I've overloaded ws to match only spaces and tabs, but the grammar still consumes newlines where it shouldn't consume newlines. For a simple repeatable example, take the following grammar: Overloading ws and other builtins was fixed in parrot and pugs approaching midnight (hackathon time) on 2006-06-29. If your parrot and pugs are both more recent than that, I'm not sure where the bug is. -kolibrie
Re: a smarter form of whitespace
On Thu, Jul 06, 2006 at 12:29:12AM -0700, Allison Randal wrote: $ cat xyz.pir .sub main :main load_bytecode 'PGE.pbc' load_bytecode 'ar.pir' load_bytecode 'dumper.pbc' load_bytecode 'PGE/Dumper.pbc' $P0 = find_global 'XYZ', 'start' $P1 = $P0(\n\n\n\n\n\n\n, 'grammar' = 'XYZ') What the original didn't have is the 'grammar' named argument when calling the start rule. When I replace the previous line with: $P1 = $P0(\n\n\n\n\n\n\n) then your sample code exhibits the same problem. I assume this means that the reason overriding ws wasn't working is because it was calling the default version of ws in the root namespace. But, if it was defaulting to the root namespace, why was it able to find any of the rules? Shouldn't it have complained that it couldn't find emptyline? At the moment (and this may be incorrect), PGE looks for named rules via inheritance, and if not found that way it looks in the available symbol tables using the find_name opcode. So, the match was able to find the rules because they are in the current namespace, but when it came time to find the rule for ?ws there was a ws method available (the default) and so that one was used. Again, this may not be the correct behavior; I've been using S12 as the guide here, in that a method call first considers methods from the class hierarchy and fails over to subroutine dispatch. Pm
Re: a smarter form of whitespace
Nathan Gray wrote: Overloading ws and other builtins was fixed in parrot and pugs approaching midnight (hackathon time) on 2006-06-29. If your parrot and pugs are both more recent than that, I'm not sure where the bug is. I have the latest checkout of Parrot (I'm not using Pugs). It may not be a bug. The design question is: should ws match a newline even when it's been overloaded to match only spaces and tabs? (I'm thinking No, but could be wrong.) Allison
Re: a smarter form of whitespace
On Tue, Jul 04, 2006 at 12:57:16PM -0700, Allison Randal wrote: -- token start { ^emptyline*$ } regex emptyline { ^^ $$ \n } token ws { [sp | \t]* } -- The above grammar doesn't have a grammar statement; as a result the regexes are being installed into the '' namespace. If I match this against a string of 7 newlines, it returns 7 emptyline matches, and each match is a single newline. This is the behavior I want for newlines. I tried it with a grammar statement and it seems to work: $ cat ar.pg grammar XYZ; token start { ^emptyline*$ } rule emptyline { ^^ $$ \n } token ws { [sp | \t]* } $ ./parrot compilers/pge/pgc.pir ar.pg ar.pir $ cat xyz.pir .sub main :main load_bytecode 'PGE.pbc' load_bytecode 'ar.pir' load_bytecode 'dumper.pbc' load_bytecode 'PGE/Dumper.pbc' $P0 = find_global 'XYZ', 'start' $P1 = $P0(\n\n\n\n\n\n\n, 'grammar' = 'XYZ') '_dumper'($P1) .end $ ./parrot xyz.pir VAR1 = PMC 'XYZ' = \n\n\n\n\n\n\n @ 0 { emptyline = ResizablePMCArray (size:7) [ PMC 'XYZ' = \n @ 0, PMC 'XYZ' = \n @ 1, PMC 'XYZ' = \n @ 2, PMC 'XYZ' = \n @ 3, PMC 'XYZ' = \n @ 4, PMC 'XYZ' = \n @ 5, PMC 'XYZ' = \n @ 6 ] } $ - Pm
a smarter form of whitespace
I'm writing a parser for a language that treats a double newline as a statement terminator. It works if I make every rule a 'regex' (to turn off smart whitespace). But I want spaces and tabs to act as smart whitespace, and newlines to act as literal whitespace. I've overloaded ws to match only spaces and tabs, but the grammar still consumes newlines where it shouldn't consume newlines. For a simple repeatable example, take the following grammar: -- token start { ^emptyline*$ } regex emptyline { ^^ $$ \n } token ws { [sp | \t]* } -- If I match this against a string of 7 newlines, it returns 7 emptyline matches, and each match is a single newline. This is the behavior I want for newlines. I would like to add smart whitespace matching for spaces and tabs. But, if I change emptyline to a 'rule' and match it against the same string of 7 newlines, it returns a single emptyline match and the matched string is 7 newlines. I've tried several variations on the ws rule, but it seems to boil down to: no matter what the ws rule matches, if :sigspace is on, it treats newlines as ignorable whitespace. Is this a bug or a feature? Thanks, Allison