Re: PEG Patches

2011-03-31 Thread Noah Lavine
Hello again,

I was about to do this, and then I discovered that it wouldn't work,
because there are a few special case PEGs that don't make sense as
macros. Specifically, in the context of a PEG, we interpret strings as
matching themselves, and those can't be made into macros.

So I went ahead and implemented a simple way to extend
peg-sexp-compile. It turned out to be much less difficult than I was
afraid of.

The first attached patch adds the interface to (ice-9 peg codegen) and
changes most of the functions there to use it, and also adds some
documentation in the PEG Internals section. The second one updates
(ice-9 peg string-peg) to use it as well, and gets rid of
peg-extended-compile from peg.scm since it's no longer needed.

I wrote the patches on top of the last two that I sent, because those
included some cleanups that I wanted to keep.

Noah

On Tue, Mar 29, 2011 at 9:20 AM, Andy Wingo wi...@pobox.com wrote:
 On Tue 29 Mar 2011 14:47, Noah Lavine noah.b.lav...@gmail.com writes:

 (define-peg-matcher and cg-and)

 That's doable. But if we're going to choose what to do entirely based
 on the first element of the list, then we could also just not define
 peg-sexp-compile at all and make each of the code generation functions
 into macros.

 How does that sound?

 Good idea.  Sounds great to me!

 Andy
 --
 http://wingolog.org/



0001-Extensible-PEG-Syntax.patch
Description: Binary data


0002-Update-String-PEGs.patch
Description: Binary data


Re: PEG Patches

2011-03-29 Thread Andy Wingo
On Mon 28 Mar 2011 22:44, Noah Lavine noah.b.lav...@gmail.com writes:

  - say that string PEGs can only occur at the top level of a PEG
 expression. The peg module has never been released, so no one uses
 this feature now anyway.
  - instead of defining a new function peg-extended-compile, redefine
 peg-sexp-compile via set! once we have string pegs.
  - write peg-extended-compile as its own big case statement, basically
 duplicating peg-sexp-compile.
  - adopt some interface that allows people to extend the cases in
 peg-sexp-compile. We would start with just s-expression PEGs, then use
 this interface to add string PEGs later in the load sequence.

This last is the best.  What if we define a module that serves as a
registry of PEG match behaviors, like `(ice-9 peg matchers)'.  Then we
define `define-peg-matcher' or something, so that we can:

(define-peg-matcher and cg-and)

where define-peg-matcher is

(define-syntax define-peg-matcher
  (syntax-rules ()
((_ name binding)
 (module-define! (resolve-module '(ice-9 peg matchers))
  'name
  binding

Then instead of defining separate cases for ignore, range, etc the
peg-sexp-compile macro does:

  ((matcher arg ...) (identifier? #'matcher)
   ((module-ref (resolve-module '(ice-9 peg matchers))
(syntax-datum #'matcher))
#'(arg ...)
mode))

Then the peg-string module registers a matcher for `peg'.

Dunno.  WDYT?

Andy
-- 
http://wingolog.org/



Re: PEG Patches

2011-03-29 Thread Noah Lavine
 This last is the best.  What if we define a module that serves as a
 registry of PEG match behaviors, like `(ice-9 peg matchers)'.  Then we
 define `define-peg-matcher' or something, so that we can:

 (define-peg-matcher and cg-and)

 where define-peg-matcher is

 (define-syntax define-peg-matcher
  (syntax-rules ()
    ((_ name binding)
     (module-define! (resolve-module '(ice-9 peg matchers))
                      'name
                      binding

 Then instead of defining separate cases for ignore, range, etc the
 peg-sexp-compile macro does:

  ((matcher arg ...) (identifier? #'matcher)
   ((module-ref (resolve-module '(ice-9 peg matchers))
                (syntax-datum #'matcher))
    #'(arg ...)
    mode))

 Then the peg-string module registers a matcher for `peg'.

 Dunno.  WDYT?

That's doable. But if we're going to choose what to do entirely based
on the first element of the list, then we could also just not define
peg-sexp-compile at all and make each of the code generation functions
into macros.

How does that sound?



Re: PEG Patches

2011-03-29 Thread Andy Wingo
On Tue 29 Mar 2011 14:47, Noah Lavine noah.b.lav...@gmail.com writes:

 (define-peg-matcher and cg-and)

 That's doable. But if we're going to choose what to do entirely based
 on the first element of the list, then we could also just not define
 peg-sexp-compile at all and make each of the code generation functions
 into macros.

 How does that sound?

Good idea.  Sounds great to me!

Andy
-- 
http://wingolog.org/



Re: PEG Patches

2011-03-28 Thread Noah Lavine
Hi,

 I think the solution is to confront the circularity directly.  It exists
 because the PEG s-exp grammar also deals with the string grammar, which
 needs an already-build PEG parser.

 Let's break it instead into layers without cycles: removing the string
 grammar from the s-exp code generator.  If we want a layer with both, we
 build it on top of the two lower layers.

 What do you think?

I've been working on that. The attached two patches break the
circularity. The code still isn't organized brilliantly, but after
applying these I think we would only want pretty minor cleanups before
merging PEG into the main branch.

However, there's an interesting issue which I am not sure how to
confront. Here it is:

Currently, peg-sexp-compile is defined as a big case statement:

(define (peg-sexp-compile pattern accum)
  (syntax-case pattern ()
lots of cases here))

What these patches do is take out the case for embedded PEG strings,
so the case statement has one fewer case. Then they add a new function
peg-extended-compile, defined by

(define (peg-extended-compile pattern accum)
  (syntax-case pattern (peg)
((peg str)
 (string? (syntax-datum #'str))
 (peg-string-compile #'str (if (eq? accum 'all) 'body accum)))
(else (peg-sexp-compile pattern accum

peg-string-compile takes a string, parses it, and then calls
peg-sexp-compile on the result, so this is noncircular.

Unfortunately, this sacrifices a feature. The trouble is that the
cases in peg-sexp-compile call peg-sexp-compile on parts of
themselves, because PEG expressions are recursive. Those inner PEG
expressions can never contain embedded string PEGs with this
definition, because those calls never go through peg-extended-compile.

I see a few options:
 - say that string PEGs can only occur at the top level of a PEG
expression. The peg module has never been released, so no one uses
this feature now anyway.
 - instead of defining a new function peg-extended-compile, redefine
peg-sexp-compile via set! once we have string pegs.
 - write peg-extended-compile as its own big case statement, basically
duplicating peg-sexp-compile.
 - adopt some interface that allows people to extend the cases in
peg-sexp-compile. We would start with just s-expression PEGs, then use
this interface to add string PEGs later in the load sequence.

The second and third options seem hackish to me. The third option is
especially bad because I think some of the calls to peg-sexp-compile
are in helper functions that peg-sexp-compile calls, so we might have
to duplicate most of codegen.scm to make this work.

The fourth option seems elegant, but I'm not sure what a good
interface for that is. Is there anything in Guile now that can
idiomatically be used for an extensible list of cases? It seems almost
like something GOOPS would do, but not quite. I am also a bit
concerned about the fourth option because it could become an interface
that is only ever used once, and might just add unnecessary
complexity.

I think the first option is the best one for now, because it doesn't
require much work and it would allow a smooth transition if we ever
enable non-top-level PEG strings in the future. What do other people
think?

Noah


0001-Move-define-nonterm.patch
Description: Binary data


0002-Separate-PEG-Concerns.patch
Description: Binary data


Re: PEG Patches

2011-03-28 Thread Noah Lavine
 I've been working on that. The attached two patches break the
 circularity. The code still isn't organized brilliantly, but after
 applying these I think we would only want pretty minor cleanups before
 merging PEG into the main branch.

Actually, forget this bit. I wrote it before I remembered that the
s-expression language will probably need to be changed.



Re: PEG Patches

2011-03-28 Thread Michael Lucy
A variant on the second option would be first defining
peg-string-compile to just throw an error, then redefining it later to
actually compile the string.  That seems a little less hackish, at
least to me.

A fifth option would be to make peg-sexp-compile take an optional
argument FUN-RECUR that it will call instead of recursing into itself
(so in your example FUN-RECUR would be peg-extended-compile).  This
involves more rewriting than the other options to pass the optional
argument around, but it's pretty clean and would allow users to write
other parsing layers on top of peg-sexp-compile should they wish
(achieving similar results to the fourth option).

On Mon, Mar 28, 2011 at 3:44 PM, Noah Lavine noah.b.lav...@gmail.com wrote:
 Hi,

 I think the solution is to confront the circularity directly.  It exists
 because the PEG s-exp grammar also deals with the string grammar, which
 needs an already-build PEG parser.

 Let's break it instead into layers without cycles: removing the string
 grammar from the s-exp code generator.  If we want a layer with both, we
 build it on top of the two lower layers.

 What do you think?

 I've been working on that. The attached two patches break the
 circularity. The code still isn't organized brilliantly, but after
 applying these I think we would only want pretty minor cleanups before
 merging PEG into the main branch.

 However, there's an interesting issue which I am not sure how to
 confront. Here it is:

 Currently, peg-sexp-compile is defined as a big case statement:

 (define (peg-sexp-compile pattern accum)
  (syntax-case pattern ()
    lots of cases here))

 What these patches do is take out the case for embedded PEG strings,
 so the case statement has one fewer case. Then they add a new function
 peg-extended-compile, defined by

 (define (peg-extended-compile pattern accum)
  (syntax-case pattern (peg)
    ((peg str)
     (string? (syntax-datum #'str))
     (peg-string-compile #'str (if (eq? accum 'all) 'body accum)))
    (else (peg-sexp-compile pattern accum

 peg-string-compile takes a string, parses it, and then calls
 peg-sexp-compile on the result, so this is noncircular.

 Unfortunately, this sacrifices a feature. The trouble is that the
 cases in peg-sexp-compile call peg-sexp-compile on parts of
 themselves, because PEG expressions are recursive. Those inner PEG
 expressions can never contain embedded string PEGs with this
 definition, because those calls never go through peg-extended-compile.

 I see a few options:
  - say that string PEGs can only occur at the top level of a PEG
 expression. The peg module has never been released, so no one uses
 this feature now anyway.
  - instead of defining a new function peg-extended-compile, redefine
 peg-sexp-compile via set! once we have string pegs.
  - write peg-extended-compile as its own big case statement, basically
 duplicating peg-sexp-compile.
  - adopt some interface that allows people to extend the cases in
 peg-sexp-compile. We would start with just s-expression PEGs, then use
 this interface to add string PEGs later in the load sequence.

 The second and third options seem hackish to me. The third option is
 especially bad because I think some of the calls to peg-sexp-compile
 are in helper functions that peg-sexp-compile calls, so we might have
 to duplicate most of codegen.scm to make this work.

 The fourth option seems elegant, but I'm not sure what a good
 interface for that is. Is there anything in Guile now that can
 idiomatically be used for an extensible list of cases? It seems almost
 like something GOOPS would do, but not quite. I am also a bit
 concerned about the fourth option because it could become an interface
 that is only ever used once, and might just add unnecessary
 complexity.

 I think the first option is the best one for now, because it doesn't
 require much work and it would allow a smooth transition if we ever
 enable non-top-level PEG strings in the future. What do other people
 think?

 Noah




Re: PEG Patches

2011-03-25 Thread Andy Wingo
On Sun 06 Mar 2011 06:25, Noah Lavine noah.b.lav...@gmail.com writes:

 Attached is a series of patches I've made for the wip-mlucy branch. It
 splits the PEG code into several little modules which go in
 module/ice-9/peg/. The original peg source file becomes very little.
 At the end it finally loses its big eval-when wrapper.

Cool!  These cleanups are great.

However... when you added the files, you did not add them to
Makefile.am, so they don't get built.  I went back to add them, but they
don't compile, and it's because of the circularity we have discussed in
other threads.

I think the solution is to confront the circularity directly.  It exists
because the PEG s-exp grammar also deals with the string grammar, which
needs an already-build PEG parser.

Let's break it instead into layers without cycles: removing the string
grammar from the s-exp code generator.  If we want a layer with both, we
build it on top of the two lower layers.

What do you think?

Andy
-- 
http://wingolog.org/



Re: PEG Patches

2011-03-06 Thread Noah Lavine
Here's another patch, which in retrospect may be the most useful of
the series. It adds a section called PEG Internals to the manual,
and begins documenting how PEG actually works. This should make
hacking PEG a lot easier.

Noah

On Sun, Mar 6, 2011 at 12:25 AM, Noah Lavine noah.b.lav...@gmail.com wrote:
 Hello all,

 Attached is a series of patches I've made for the wip-mlucy branch. It
 splits the PEG code into several little modules which go in
 module/ice-9/peg/. The original peg source file becomes very little.
 At the end it finally loses its big eval-when wrapper.

 There's one part of this that I'm not satisfied with, which is that
 define-nonterminal goes into module/ice-9/peg/string-peg.scm, which is
 supposed to be solely for pegs-as-strings. The reason for this is that
 I got compiler errors if I didn't do this, and I couldn't figure out
 how to stop them. I would appreciate it if someone would take a look
 and try to find what I missed.

 Also, a note about future ideas - the current PEG code can only parse
 strings. However, there is almost nothing string-specific about the
 parsing code - just a few calls to string-ref and substring in
 codegen.scm. I'd like to see this extended to parse vectors filled
 with arbitrary objects. This would let you use a tokenizer with it,
 which is the easiest way to implement C correctly, and also probably
 the easiest way to store line number information with tokens, which is
 necessary for ultimately giving good error messages from PEG parsers.

 Thanks,
 Noah



0001-Document-PEG-Internals.patch
Description: Binary data