On Thu, Jun 11, 2015 at 9:21 AM, Jonathan S. Shapiro <[email protected]> wrote:
> So now that we contemplate a typed AST, we run into a new problem: how to
> type the code emitted by the parser generator.
>
> If the parse algorithm is any of the "top down" family of algorithms (LL,
> PEG, GLL, others), there's no problem. Each production can be implemented as
> a procedure, and each procedure returns its result AST as a return value.
> The reason this can be typed using a conventional static type system is that
> we are't using a separate parse stack.

This makes sense.

> But if the parse algorithm is one of the "bottom up" family of algorithms
> (LR, GLR, others), the typing seems more complicated, because all of the
> shift-reduce algorithms use a separate parse stack.

Interesting!

> The reason a separate parse stack is a bother is that a given "position" in
> the parse stack can have distinct types over time as the parse tree is
> constructed; we cannot assign any simple static type to a position.
> Off-hand, I can't even see how to type the parse stack using dependent
> types. I'm sure a way exists, and I'm pretty sure one of the type system
> wizards on the list will promptly trot it out and say "it's obvious", but
> I'm a simple-minded old-school hacker at heart, and it's not jumping out at
> me. Sure, you can use a union type as your stack slot, but it seems
> unfortunate to do that when the parse state already tells you what you need
> to know.

Yeah, so I see two ways to think about it. One is that the shift stack
is essentially a heterogeneous list. There's a (type-level) list of
types of the elements, and the actual list is a tuple, whose component
types are given by the list. The other is that the stack's element
type is an (unsafe) untagged union, and we write the code which knows
which union leg to use, and separately prove that it's safe. The
former is dependent types, and the latter is refinement types (over an
unsafe type system). In either case though, you essentially have to do
the same thing, which is to know that the parser transitions are
expecting certain types of AST at the top of the stack, it it will be
right. You need to effectively assign types to the parser state
transitions, saying how they affect types near the top of the stack.
You get something like progress and preservation theorems, for typed
parser states.

The details are definitely not obvious, so don't beat yourself up.
This is getting to the level of sophistication where I worry that
"regular" dependent types will incur awkward restrictions on how the
code is written, and performance could suffer. But it could work. And
refinement types always work, with no overhead.

The reason why I'm convinced it would work, without knowing the
details, is that the typed AST representation I assume you have in
mind is only enforcing CFG structure. So both the types of ASTs and
parser transitions are basically talking about stuff from the original
grammar. If, on the other hand, you tried to enforce well-typedness of
ASTs, you probably could _not_ have a type-safe parser, because the
parser is not going to reject ill-typed inputs, yet they are not
representable. In other words, with such a strongly-typed AST, you'd
actually need to parse and type check in the same pass. (I'm actually
interested in trying that some day.)

Come to think of it, with refinement types, either untagged union of
typed ASTs or one AST type, then subunions, would probably get you the
same thing, once you erase the refinements.

So I guess the reason why top-down parsers can get type-safe parsers
for free is that by reusing the host language's stack, they also get
to reuse the host language's type soundness.

> You could, of course, make the parse stack element a union type, but then
> you'd end up doing a lot of run-time type checking that you don't need to do
> (because the parse state already encodes that).

Yes, you can also use a tagged union and runtime checks. But if you'd
need to resort to this, it'd probably be better to instead resort to
your original idea of a single AST type, and have more control over
when you do checks.

You could also try to find out what tools like ML-yacc do.
http://www.smlnj.org/doc/ML-Yacc/
http://caml.inria.fr/pub/docs/manual-ocaml/lexyacc.html

> It seems curious that when relative strengths of parsing strategies are
> discussed, nobody ever talks about static typing. Or at least I've never
> *seen* anyone do so.

Maybe just too many people talk about parsing or typing in isolation,
so people talking about the interaction get drowned out.

Also, type-safe parsers do seem to want dependent types in general,
but as a test case for dependent types, parsers are nontrivial, but
don't really seem special.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to