On 2/27/2012 10:30 AM, Steve Wart wrote:
Just to zero in on one idea here


    Anyway I digress... have you had a look at this file?:

    http://piumarta.com/software/maru/maru-2.1/test-pepsi.l

    Just read the whole thing - I found it fairly interesting :) He's
    build pepsi on maru there... that's pretty fascinating, right?
    Built a micro smalltalk on top of the S-expression language...
    and then does a Fast Fourier Transform test using it...

    my case: looked some, but not entirely sure how it works though.


See the comment at the top:
./eval repl.l test-pepsi.l
eval.c is written in C, it's pretty clean code and very cool. Then eval.l does the same thing in a lisp-like language.

Was playing with the Little Schemer with my son this weekend - if you fire up the repl, cons, car, cdr stuff all work as expected.


realized I could rip the filename off the end of the URL to get the directory, got the C file.

initial/quick observations:
apparently uses Boehm;
type system works a bit differently than my stuff, but seems to expose a vaguely similar interface (except I tend to put 'dy' on the front of everything here, so "dycar()", "dycdr()", "dycaddr()", and most predicates have names like "dyconsp()" and similar, and often I type-check using strings rather than an enum, ...); the parser works a bit differently than my S-Expression parser (mine tend to be a bit more, if/else, and read characters typically either from strings or "stream objects");
ANSI codes with raw escape characters (text editor not entirely happy);
mixed tabs and spaces not leading to very good formatting;
simplistic interpreter, albeit it is not entirely clear how the built-in functions get dispatched;
...

a much more significant difference:
in my code, this sort of functionality is spread over many different areas (over several different DLLs), so one wouldn't find all of it in the same place.

will likely require more looking to figure out how the parser or syntax changing works (none of my parsers do this, most are fixed-form and typically shun context dependent parsing).


some of my earlier/simpler interpreters were like this though, vs newer ones which tend to have a longer multiple-stage translation pipeline, and which make use of bytecode.


Optionally check out the wikipedia article on PEGs and look at the COLA paper if you can find it.


PEGs, apparently I may have been using them informally already (thinking they were EBNF), although I haven't used them for directly driving a parser. typically, they have been used occasionally for describing elements of the syntax (in documentation and similar), at least not when using the lazier system of "syntax via tables of examples".

may require more looking to try to better clarify the difference between a PEG and EBNF... (the only difference I saw listed was that PEGs are ordered, but I would have assumed that a parser based on EBNF would have been implicitly ordered anyways, hmm...).


Anyhow, it's all self-contained, so if you can read C code and understand a bit of Lisp, you can watch how the syntax morphs into Smalltalk. Or any other language you feel like writing a parser for.

This is fantastic stuff.


following the skim and some more looking, I think I have a better idea how it works.


I will infer:
top Lisp-like code defines behavior;
syntax in middle defines syntax (as comment says);
(somehow) the parser invokes the new syntax, internally converting it into the Lisp-like form, which is what gets executed.


so, seems interesting enough...


if so, my VM is vaguely similar, albeit without the syntax definition or variable parser (the parser for my script language is fixed-form and written in C, but does parse to a Scheme-like AST system).

the assumption would have been that if someone wanted a parser for a new language, they would write one, assuming the semantics mapped tolerably to the underlying VM (exactly matching the semantics of each language would be a little harder though).

theoretically, nothing would really prevent writing a parser in the scripting language, just I had never really considered doing so (or, for that matter, even supporting user-defined syntax elements in the main parser).


the most notable difference between my ASTs and Lisp or Scheme:
all forms are special forms, and function calls need to be made via a special form (this was mostly to help better detect problems);
operators were also moved to special forms, for similar reasons;
there are lots more special forms, most mapping to HLL constructs (for, while, break, continue, ...);
...

as-is, there are also a large-number of bytecode operations, many related to common special cases.

for example, a recent addition called "jmp_cond_sweq" reduces several instructions related to "switch" into a single operation, partly intended for micro-optimizing (why 3 opcodes when one only needs 1?), and also partly intended to be used as a VM hint that it is dealing with a switch (IOW: it is a jump-table hint).

granted, there are currently over 500 opcodes in total (highest numbered opcode is 540, but there are still a few gaps in the opcode map), but it is no real issue at present (there is still plenty of space left in the 2-byte range, despite the single-byte range being pretty much full).


so, I guess the philosophy is a bit different...

maybe the gap in philosophy is greater than that in the technology?...


_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Reply via email to