On Thu, Jun 11, 2015 at 9:21 AM, Jonathan S. Shapiro <[email protected]> wrote: > So now that we contemplate a typed AST, we run into a new problem: how to > type the code emitted by the parser generator. > > If the parse algorithm is any of the "top down" family of algorithms (LL, > PEG, GLL, others), there's no problem. Each production can be implemented as > a procedure, and each procedure returns its result AST as a return value. > The reason this can be typed using a conventional static type system is that > we are't using a separate parse stack.
This makes sense. > But if the parse algorithm is one of the "bottom up" family of algorithms > (LR, GLR, others), the typing seems more complicated, because all of the > shift-reduce algorithms use a separate parse stack. Interesting! > The reason a separate parse stack is a bother is that a given "position" in > the parse stack can have distinct types over time as the parse tree is > constructed; we cannot assign any simple static type to a position. > Off-hand, I can't even see how to type the parse stack using dependent > types. I'm sure a way exists, and I'm pretty sure one of the type system > wizards on the list will promptly trot it out and say "it's obvious", but > I'm a simple-minded old-school hacker at heart, and it's not jumping out at > me. Sure, you can use a union type as your stack slot, but it seems > unfortunate to do that when the parse state already tells you what you need > to know. Yeah, so I see two ways to think about it. One is that the shift stack is essentially a heterogeneous list. There's a (type-level) list of types of the elements, and the actual list is a tuple, whose component types are given by the list. The other is that the stack's element type is an (unsafe) untagged union, and we write the code which knows which union leg to use, and separately prove that it's safe. The former is dependent types, and the latter is refinement types (over an unsafe type system). In either case though, you essentially have to do the same thing, which is to know that the parser transitions are expecting certain types of AST at the top of the stack, it it will be right. You need to effectively assign types to the parser state transitions, saying how they affect types near the top of the stack. You get something like progress and preservation theorems, for typed parser states. The details are definitely not obvious, so don't beat yourself up. This is getting to the level of sophistication where I worry that "regular" dependent types will incur awkward restrictions on how the code is written, and performance could suffer. But it could work. And refinement types always work, with no overhead. The reason why I'm convinced it would work, without knowing the details, is that the typed AST representation I assume you have in mind is only enforcing CFG structure. So both the types of ASTs and parser transitions are basically talking about stuff from the original grammar. If, on the other hand, you tried to enforce well-typedness of ASTs, you probably could _not_ have a type-safe parser, because the parser is not going to reject ill-typed inputs, yet they are not representable. In other words, with such a strongly-typed AST, you'd actually need to parse and type check in the same pass. (I'm actually interested in trying that some day.) Come to think of it, with refinement types, either untagged union of typed ASTs or one AST type, then subunions, would probably get you the same thing, once you erase the refinements. So I guess the reason why top-down parsers can get type-safe parsers for free is that by reusing the host language's stack, they also get to reuse the host language's type soundness. > You could, of course, make the parse stack element a union type, but then > you'd end up doing a lot of run-time type checking that you don't need to do > (because the parse state already encodes that). Yes, you can also use a tagged union and runtime checks. But if you'd need to resort to this, it'd probably be better to instead resort to your original idea of a single AST type, and have more control over when you do checks. You could also try to find out what tools like ML-yacc do. http://www.smlnj.org/doc/ML-Yacc/ http://caml.inria.fr/pub/docs/manual-ocaml/lexyacc.html > It seems curious that when relative strengths of parsing strategies are > discussed, nobody ever talks about static typing. Or at least I've never > *seen* anyone do so. Maybe just too many people talk about parsing or typing in isolation, so people talking about the interaction get drowned out. Also, type-safe parsers do seem to want dependent types in general, but as a test case for dependent types, parsers are nontrivial, but don't really seem special. _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
