Re: a smarter form of whitespace

2006-07-06 Thread Allison Randal

Patrick R. Michaud wrote:

On Tue, Jul 04, 2006 at 12:57:16PM -0700, Allison Randal wrote:

--

token start { ^emptyline*$ }

regex emptyline { ^^ $$ \n }

token ws { [sp | \t]* }

--


The above grammar doesn't have a grammar statement; as a result
the regexes are being installed into the '' namespace.


The original did have a 'grammar' statement, I just didn't paste it into 
the email.



$ cat xyz.pir
.sub main :main
load_bytecode 'PGE.pbc'
load_bytecode 'ar.pir'
load_bytecode 'dumper.pbc'
load_bytecode 'PGE/Dumper.pbc'

$P0 = find_global 'XYZ', 'start'
$P1 = $P0(\n\n\n\n\n\n\n, 'grammar' = 'XYZ')


What the original didn't have is the 'grammar' named argument when 
calling the start rule. When I replace the previous line with:


  $P1 = $P0(\n\n\n\n\n\n\n)

then your sample code exhibits the same problem. I assume this means 
that the reason overriding ws wasn't working is because it was calling 
the default version of ws in the root namespace. But, if it was 
defaulting to the root namespace, why was it able to find any of the 
rules? Shouldn't it have complained that it couldn't find emptyline?


Thanks,
Allison


Re: a smarter form of whitespace

2006-07-06 Thread Nathan Gray
On Tue, Jul 04, 2006 at 12:57:16PM -0700, Allison Randal wrote:
 I'm writing a parser for a language that treats a double newline as a
 statement terminator. It works if I make every rule a 'regex' (to turn
 off smart whitespace). But I want spaces and tabs to act as smart
 whitespace, and newlines to act as literal whitespace. I've
 overloaded ws to match only spaces and tabs, but the grammar still
 consumes newlines where it shouldn't consume newlines. For a simple
 repeatable example, take the following grammar:

Overloading ws and other builtins was fixed in parrot and pugs
approaching midnight (hackathon time) on 2006-06-29.  If your parrot
and pugs are both more recent than that, I'm not sure where the bug
is.

-kolibrie



Re: a smarter form of whitespace

2006-07-06 Thread Patrick R. Michaud
On Thu, Jul 06, 2006 at 12:29:12AM -0700, Allison Randal wrote:
 $ cat xyz.pir
 .sub main :main
 load_bytecode 'PGE.pbc'
 load_bytecode 'ar.pir'
 load_bytecode 'dumper.pbc'
 load_bytecode 'PGE/Dumper.pbc'
 
 $P0 = find_global 'XYZ', 'start'
 $P1 = $P0(\n\n\n\n\n\n\n, 'grammar' = 'XYZ')
 
 What the original didn't have is the 'grammar' named argument when 
 calling the start rule. When I replace the previous line with:
 
   $P1 = $P0(\n\n\n\n\n\n\n)
 
 then your sample code exhibits the same problem. I assume this means 
 that the reason overriding ws wasn't working is because it was calling 
 the default version of ws in the root namespace. But, if it was 
 defaulting to the root namespace, why was it able to find any of the 
 rules? Shouldn't it have complained that it couldn't find emptyline?

At the moment (and this may be incorrect), PGE looks for named rules
via inheritance, and if not found that way it looks in the available
symbol tables using the find_name opcode.

So, the match was able to find the rules because they are in the 
current namespace, but when it came time to find the rule for ?ws 
there was a ws method available (the default) and so that one
was used.

Again, this may not be the correct behavior; I've been using S12 as
the guide here, in that a method call first considers methods from
the class hierarchy and fails over to subroutine dispatch.

Pm


Re: a smarter form of whitespace

2006-07-05 Thread Allison Randal
Nathan Gray wrote:
 
 Overloading ws and other builtins was fixed in parrot and pugs
 approaching midnight (hackathon time) on 2006-06-29.  If your parrot
 and pugs are both more recent than that, I'm not sure where the bug
 is.

I have the latest checkout of Parrot (I'm not using Pugs).

It may not be a bug. The design question is: should ws match a newline
even when it's been overloaded to match only spaces and tabs? (I'm
thinking No, but could be wrong.)

Allison


Re: a smarter form of whitespace

2006-07-05 Thread Patrick R. Michaud
On Tue, Jul 04, 2006 at 12:57:16PM -0700, Allison Randal wrote:
 --
 
 token start { ^emptyline*$ }
 
 regex emptyline { ^^ $$ \n }
 
 token ws { [sp | \t]* }
 
 --

The above grammar doesn't have a grammar statement; as a result
the regexes are being installed into the '' namespace.

 If I match this against a string of 7 newlines, it returns 7 emptyline
 matches, and each match is a single newline. This is the behavior I want
 for newlines.

I tried it with a grammar statement and it seems to work:



$ cat ar.pg
grammar XYZ;

token start { ^emptyline*$ }

rule emptyline { ^^ $$ \n }

token ws { [sp | \t]* }

$ ./parrot compilers/pge/pgc.pir ar.pg ar.pir
$ cat xyz.pir
.sub main :main
load_bytecode 'PGE.pbc'
load_bytecode 'ar.pir'
load_bytecode 'dumper.pbc'
load_bytecode 'PGE/Dumper.pbc'

$P0 = find_global 'XYZ', 'start'
$P1 = $P0(\n\n\n\n\n\n\n, 'grammar' = 'XYZ')

'_dumper'($P1)
.end
$ ./parrot xyz.pir
VAR1 = PMC 'XYZ' = \n\n\n\n\n\n\n @ 0 {
emptyline = ResizablePMCArray (size:7) [
PMC 'XYZ' = \n @ 0,
PMC 'XYZ' = \n @ 1,
PMC 'XYZ' = \n @ 2,
PMC 'XYZ' = \n @ 3,
PMC 'XYZ' = \n @ 4,
PMC 'XYZ' = \n @ 5,
PMC 'XYZ' = \n @ 6
]
}
$ 

-

Pm


a smarter form of whitespace

2006-07-04 Thread Allison Randal
I'm writing a parser for a language that treats a double newline as a
statement terminator. It works if I make every rule a 'regex' (to turn
off smart whitespace). But I want spaces and tabs to act as smart
whitespace, and newlines to act as literal whitespace. I've
overloaded ws to match only spaces and tabs, but the grammar still
consumes newlines where it shouldn't consume newlines. For a simple
repeatable example, take the following grammar:

--

token start { ^emptyline*$ }

regex emptyline { ^^ $$ \n }

token ws { [sp | \t]* }

--

If I match this against a string of 7 newlines, it returns 7 emptyline
matches, and each match is a single newline. This is the behavior I want
for newlines.

I would like to add smart whitespace matching for spaces and tabs. But,
if I change emptyline to a 'rule' and match it against the same string
of 7 newlines, it returns a single emptyline match and the matched
string is 7 newlines. I've tried several variations on the ws rule,
but it seems to boil down to: no matter what the ws rule matches, if
:sigspace is on, it treats newlines as ignorable whitespace.

Is this a bug or a feature?

Thanks,
Allison