13-May-2013 16:01, Andrei Alexandrescu пишет:
Watch, discuss, vote up!

http://reddit.com/r/programming/comments/1e8mwq/dconf_2013_day_1_talk_3_distributed_caching/



Nice one! Reminds me the gory days of our "compiler technologies" course :)

Our professor seemed obsessed with tabulated methods as well (Dragon book etc.) And above all - reverse polish notation as the one and holy AST representation. Don't you try to say recursive descent parser in the class...

Speaking of flat AST point in the talk - reverse-polish notation is kind of neat in that it does represent flat AST and is easy to walk.
So one need not to have 2-array AST (childs + map to where are these).

Another advantage - you can slice off a sub-tree as simple as a chunk of an array! It should be good for CPU caches (all related stuff is in one block, no pointer chasing), but I never was close enough to a real compiler to run any meaningful benchmarks.

Let's see the idea of AST as polish-notation array applied to the tree in the talk. I show the forward one as it easier to read and less tricky to walk.

Using the following notation for each node:
<arity>, array of offsets * arity, type code+contents |  subtree(s)

Data is laid out byte-by-byte in raw-binary.
Offsets are in bytes starting right after type+contents.

So e.g. leaf node of int would be:
0, int
No offsets nor sub-trees, content/type tag etc. is marked as int

Then whole tree (tried to align for reading):

1, 0, S | 1, 0, DeclDefs |
  3, 0, <x>, <y>, Declarator |
    1, 0, BasicType | 0, int
    1, 0, Identifier | 0, main
    1, 0, DeclDefs | 1, 0, DeclDef | 1, 0 ReturnStatement | 0, "1"


<x> and <y> are sizes of sub-trees and can easily be calculated at construction. Insertion is simple array insertion + fix ups of offsets up the tree (you have to know your path in nodes anyway to back up).

Giving up on random access to child nodes would allow us to drop offset table completely. I have limited experience on semantic analysis to tell if it makes sense to require or drop it.


Will comment on the DFA & Unicode some time later, it's a neat topic in its own right.

--
Dmitry Olshansky

Reply via email to