Re: PEG Patches
Hello again, I was about to do this, and then I discovered that it wouldn't work, because there are a few special case PEGs that don't make sense as macros. Specifically, in the context of a PEG, we interpret strings as matching themselves, and those can't be made into macros. So I went ahead and implemented a simple way to extend peg-sexp-compile. It turned out to be much less difficult than I was afraid of. The first attached patch adds the interface to (ice-9 peg codegen) and changes most of the functions there to use it, and also adds some documentation in the PEG Internals section. The second one updates (ice-9 peg string-peg) to use it as well, and gets rid of peg-extended-compile from peg.scm since it's no longer needed. I wrote the patches on top of the last two that I sent, because those included some cleanups that I wanted to keep. Noah On Tue, Mar 29, 2011 at 9:20 AM, Andy Wingo wi...@pobox.com wrote: On Tue 29 Mar 2011 14:47, Noah Lavine noah.b.lav...@gmail.com writes: (define-peg-matcher and cg-and) That's doable. But if we're going to choose what to do entirely based on the first element of the list, then we could also just not define peg-sexp-compile at all and make each of the code generation functions into macros. How does that sound? Good idea. Sounds great to me! Andy -- http://wingolog.org/ 0001-Extensible-PEG-Syntax.patch Description: Binary data 0002-Update-String-PEGs.patch Description: Binary data
Re: PEG Patches
On Mon 28 Mar 2011 22:44, Noah Lavine noah.b.lav...@gmail.com writes: - say that string PEGs can only occur at the top level of a PEG expression. The peg module has never been released, so no one uses this feature now anyway. - instead of defining a new function peg-extended-compile, redefine peg-sexp-compile via set! once we have string pegs. - write peg-extended-compile as its own big case statement, basically duplicating peg-sexp-compile. - adopt some interface that allows people to extend the cases in peg-sexp-compile. We would start with just s-expression PEGs, then use this interface to add string PEGs later in the load sequence. This last is the best. What if we define a module that serves as a registry of PEG match behaviors, like `(ice-9 peg matchers)'. Then we define `define-peg-matcher' or something, so that we can: (define-peg-matcher and cg-and) where define-peg-matcher is (define-syntax define-peg-matcher (syntax-rules () ((_ name binding) (module-define! (resolve-module '(ice-9 peg matchers)) 'name binding Then instead of defining separate cases for ignore, range, etc the peg-sexp-compile macro does: ((matcher arg ...) (identifier? #'matcher) ((module-ref (resolve-module '(ice-9 peg matchers)) (syntax-datum #'matcher)) #'(arg ...) mode)) Then the peg-string module registers a matcher for `peg'. Dunno. WDYT? Andy -- http://wingolog.org/
Re: PEG Patches
This last is the best. What if we define a module that serves as a registry of PEG match behaviors, like `(ice-9 peg matchers)'. Then we define `define-peg-matcher' or something, so that we can: (define-peg-matcher and cg-and) where define-peg-matcher is (define-syntax define-peg-matcher (syntax-rules () ((_ name binding) (module-define! (resolve-module '(ice-9 peg matchers)) 'name binding Then instead of defining separate cases for ignore, range, etc the peg-sexp-compile macro does: ((matcher arg ...) (identifier? #'matcher) ((module-ref (resolve-module '(ice-9 peg matchers)) (syntax-datum #'matcher)) #'(arg ...) mode)) Then the peg-string module registers a matcher for `peg'. Dunno. WDYT? That's doable. But if we're going to choose what to do entirely based on the first element of the list, then we could also just not define peg-sexp-compile at all and make each of the code generation functions into macros. How does that sound?
Re: PEG Patches
On Tue 29 Mar 2011 14:47, Noah Lavine noah.b.lav...@gmail.com writes: (define-peg-matcher and cg-and) That's doable. But if we're going to choose what to do entirely based on the first element of the list, then we could also just not define peg-sexp-compile at all and make each of the code generation functions into macros. How does that sound? Good idea. Sounds great to me! Andy -- http://wingolog.org/
Re: PEG Patches
Hi, I think the solution is to confront the circularity directly. It exists because the PEG s-exp grammar also deals with the string grammar, which needs an already-build PEG parser. Let's break it instead into layers without cycles: removing the string grammar from the s-exp code generator. If we want a layer with both, we build it on top of the two lower layers. What do you think? I've been working on that. The attached two patches break the circularity. The code still isn't organized brilliantly, but after applying these I think we would only want pretty minor cleanups before merging PEG into the main branch. However, there's an interesting issue which I am not sure how to confront. Here it is: Currently, peg-sexp-compile is defined as a big case statement: (define (peg-sexp-compile pattern accum) (syntax-case pattern () lots of cases here)) What these patches do is take out the case for embedded PEG strings, so the case statement has one fewer case. Then they add a new function peg-extended-compile, defined by (define (peg-extended-compile pattern accum) (syntax-case pattern (peg) ((peg str) (string? (syntax-datum #'str)) (peg-string-compile #'str (if (eq? accum 'all) 'body accum))) (else (peg-sexp-compile pattern accum peg-string-compile takes a string, parses it, and then calls peg-sexp-compile on the result, so this is noncircular. Unfortunately, this sacrifices a feature. The trouble is that the cases in peg-sexp-compile call peg-sexp-compile on parts of themselves, because PEG expressions are recursive. Those inner PEG expressions can never contain embedded string PEGs with this definition, because those calls never go through peg-extended-compile. I see a few options: - say that string PEGs can only occur at the top level of a PEG expression. The peg module has never been released, so no one uses this feature now anyway. - instead of defining a new function peg-extended-compile, redefine peg-sexp-compile via set! once we have string pegs. - write peg-extended-compile as its own big case statement, basically duplicating peg-sexp-compile. - adopt some interface that allows people to extend the cases in peg-sexp-compile. We would start with just s-expression PEGs, then use this interface to add string PEGs later in the load sequence. The second and third options seem hackish to me. The third option is especially bad because I think some of the calls to peg-sexp-compile are in helper functions that peg-sexp-compile calls, so we might have to duplicate most of codegen.scm to make this work. The fourth option seems elegant, but I'm not sure what a good interface for that is. Is there anything in Guile now that can idiomatically be used for an extensible list of cases? It seems almost like something GOOPS would do, but not quite. I am also a bit concerned about the fourth option because it could become an interface that is only ever used once, and might just add unnecessary complexity. I think the first option is the best one for now, because it doesn't require much work and it would allow a smooth transition if we ever enable non-top-level PEG strings in the future. What do other people think? Noah 0001-Move-define-nonterm.patch Description: Binary data 0002-Separate-PEG-Concerns.patch Description: Binary data
Re: PEG Patches
I've been working on that. The attached two patches break the circularity. The code still isn't organized brilliantly, but after applying these I think we would only want pretty minor cleanups before merging PEG into the main branch. Actually, forget this bit. I wrote it before I remembered that the s-expression language will probably need to be changed.
Re: PEG Patches
A variant on the second option would be first defining peg-string-compile to just throw an error, then redefining it later to actually compile the string. That seems a little less hackish, at least to me. A fifth option would be to make peg-sexp-compile take an optional argument FUN-RECUR that it will call instead of recursing into itself (so in your example FUN-RECUR would be peg-extended-compile). This involves more rewriting than the other options to pass the optional argument around, but it's pretty clean and would allow users to write other parsing layers on top of peg-sexp-compile should they wish (achieving similar results to the fourth option). On Mon, Mar 28, 2011 at 3:44 PM, Noah Lavine noah.b.lav...@gmail.com wrote: Hi, I think the solution is to confront the circularity directly. It exists because the PEG s-exp grammar also deals with the string grammar, which needs an already-build PEG parser. Let's break it instead into layers without cycles: removing the string grammar from the s-exp code generator. If we want a layer with both, we build it on top of the two lower layers. What do you think? I've been working on that. The attached two patches break the circularity. The code still isn't organized brilliantly, but after applying these I think we would only want pretty minor cleanups before merging PEG into the main branch. However, there's an interesting issue which I am not sure how to confront. Here it is: Currently, peg-sexp-compile is defined as a big case statement: (define (peg-sexp-compile pattern accum) (syntax-case pattern () lots of cases here)) What these patches do is take out the case for embedded PEG strings, so the case statement has one fewer case. Then they add a new function peg-extended-compile, defined by (define (peg-extended-compile pattern accum) (syntax-case pattern (peg) ((peg str) (string? (syntax-datum #'str)) (peg-string-compile #'str (if (eq? accum 'all) 'body accum))) (else (peg-sexp-compile pattern accum peg-string-compile takes a string, parses it, and then calls peg-sexp-compile on the result, so this is noncircular. Unfortunately, this sacrifices a feature. The trouble is that the cases in peg-sexp-compile call peg-sexp-compile on parts of themselves, because PEG expressions are recursive. Those inner PEG expressions can never contain embedded string PEGs with this definition, because those calls never go through peg-extended-compile. I see a few options: - say that string PEGs can only occur at the top level of a PEG expression. The peg module has never been released, so no one uses this feature now anyway. - instead of defining a new function peg-extended-compile, redefine peg-sexp-compile via set! once we have string pegs. - write peg-extended-compile as its own big case statement, basically duplicating peg-sexp-compile. - adopt some interface that allows people to extend the cases in peg-sexp-compile. We would start with just s-expression PEGs, then use this interface to add string PEGs later in the load sequence. The second and third options seem hackish to me. The third option is especially bad because I think some of the calls to peg-sexp-compile are in helper functions that peg-sexp-compile calls, so we might have to duplicate most of codegen.scm to make this work. The fourth option seems elegant, but I'm not sure what a good interface for that is. Is there anything in Guile now that can idiomatically be used for an extensible list of cases? It seems almost like something GOOPS would do, but not quite. I am also a bit concerned about the fourth option because it could become an interface that is only ever used once, and might just add unnecessary complexity. I think the first option is the best one for now, because it doesn't require much work and it would allow a smooth transition if we ever enable non-top-level PEG strings in the future. What do other people think? Noah
Re: PEG Patches
On Sun 06 Mar 2011 06:25, Noah Lavine noah.b.lav...@gmail.com writes: Attached is a series of patches I've made for the wip-mlucy branch. It splits the PEG code into several little modules which go in module/ice-9/peg/. The original peg source file becomes very little. At the end it finally loses its big eval-when wrapper. Cool! These cleanups are great. However... when you added the files, you did not add them to Makefile.am, so they don't get built. I went back to add them, but they don't compile, and it's because of the circularity we have discussed in other threads. I think the solution is to confront the circularity directly. It exists because the PEG s-exp grammar also deals with the string grammar, which needs an already-build PEG parser. Let's break it instead into layers without cycles: removing the string grammar from the s-exp code generator. If we want a layer with both, we build it on top of the two lower layers. What do you think? Andy -- http://wingolog.org/
Re: PEG Patches
Here's another patch, which in retrospect may be the most useful of the series. It adds a section called PEG Internals to the manual, and begins documenting how PEG actually works. This should make hacking PEG a lot easier. Noah On Sun, Mar 6, 2011 at 12:25 AM, Noah Lavine noah.b.lav...@gmail.com wrote: Hello all, Attached is a series of patches I've made for the wip-mlucy branch. It splits the PEG code into several little modules which go in module/ice-9/peg/. The original peg source file becomes very little. At the end it finally loses its big eval-when wrapper. There's one part of this that I'm not satisfied with, which is that define-nonterminal goes into module/ice-9/peg/string-peg.scm, which is supposed to be solely for pegs-as-strings. The reason for this is that I got compiler errors if I didn't do this, and I couldn't figure out how to stop them. I would appreciate it if someone would take a look and try to find what I missed. Also, a note about future ideas - the current PEG code can only parse strings. However, there is almost nothing string-specific about the parsing code - just a few calls to string-ref and substring in codegen.scm. I'd like to see this extended to parse vectors filled with arbitrary objects. This would let you use a tokenizer with it, which is the easiest way to implement C correctly, and also probably the easiest way to store line number information with tokens, which is necessary for ultimately giving good error messages from PEG parsers. Thanks, Noah 0001-Document-PEG-Internals.patch Description: Binary data