On 2/27/2012 10:30 AM, Steve Wart wrote:
Just to zero in on one idea here
Anyway I digress... have you had a look at this file?:
http://piumarta.com/software/maru/maru-2.1/test-pepsi.l
Just read the whole thing - I found it fairly interesting :) He's
build pepsi on maru there... that's pretty fascinating, right?
Built a micro smalltalk on top of the S-expression language...
and then does a Fast Fourier Transform test using it...
my case: looked some, but not entirely sure how it works though.
See the comment at the top:
./eval repl.l test-pepsi.l
eval.c is written in C, it's pretty clean code and very cool. Then
eval.l does the same thing in a lisp-like language.
Was playing with the Little Schemer with my son this weekend - if you
fire up the repl, cons, car, cdr stuff all work as expected.
realized I could rip the filename off the end of the URL to get the
directory, got the C file.
initial/quick observations:
apparently uses Boehm;
type system works a bit differently than my stuff, but seems to expose a
vaguely similar interface (except I tend to put 'dy' on the front of
everything here, so "dycar()", "dycdr()", "dycaddr()", and most
predicates have names like "dyconsp()" and similar, and often I
type-check using strings rather than an enum, ...);
the parser works a bit differently than my S-Expression parser (mine
tend to be a bit more, if/else, and read characters typically either
from strings or "stream objects");
ANSI codes with raw escape characters (text editor not entirely happy);
mixed tabs and spaces not leading to very good formatting;
simplistic interpreter, albeit it is not entirely clear how the built-in
functions get dispatched;
...
a much more significant difference:
in my code, this sort of functionality is spread over many different
areas (over several different DLLs), so one wouldn't find all of it in
the same place.
will likely require more looking to figure out how the parser or syntax
changing works (none of my parsers do this, most are fixed-form and
typically shun context dependent parsing).
some of my earlier/simpler interpreters were like this though, vs newer
ones which tend to have a longer multiple-stage translation pipeline,
and which make use of bytecode.
Optionally check out the wikipedia article on PEGs and look at the
COLA paper if you can find it.
PEGs, apparently I may have been using them informally already (thinking
they were EBNF), although I haven't used them for directly driving a
parser. typically, they have been used occasionally for describing
elements of the syntax (in documentation and similar), at least not when
using the lazier system of "syntax via tables of examples".
may require more looking to try to better clarify the difference between
a PEG and EBNF...
(the only difference I saw listed was that PEGs are ordered, but I would
have assumed that a parser based on EBNF would have been implicitly
ordered anyways, hmm...).
Anyhow, it's all self-contained, so if you can read C code and
understand a bit of Lisp, you can watch how the syntax morphs into
Smalltalk. Or any other language you feel like writing a parser for.
This is fantastic stuff.
following the skim and some more looking, I think I have a better idea
how it works.
I will infer:
top Lisp-like code defines behavior;
syntax in middle defines syntax (as comment says);
(somehow) the parser invokes the new syntax, internally converting it
into the Lisp-like form, which is what gets executed.
so, seems interesting enough...
if so, my VM is vaguely similar, albeit without the syntax definition or
variable parser (the parser for my script language is fixed-form and
written in C, but does parse to a Scheme-like AST system).
the assumption would have been that if someone wanted a parser for a new
language, they would write one, assuming the semantics mapped tolerably
to the underlying VM (exactly matching the semantics of each language
would be a little harder though).
theoretically, nothing would really prevent writing a parser in the
scripting language, just I had never really considered doing so (or, for
that matter, even supporting user-defined syntax elements in the main
parser).
the most notable difference between my ASTs and Lisp or Scheme:
all forms are special forms, and function calls need to be made via a
special form (this was mostly to help better detect problems);
operators were also moved to special forms, for similar reasons;
there are lots more special forms, most mapping to HLL constructs (for,
while, break, continue, ...);
...
as-is, there are also a large-number of bytecode operations, many
related to common special cases.
for example, a recent addition called "jmp_cond_sweq" reduces several
instructions related to "switch" into a single operation, partly
intended for micro-optimizing (why 3 opcodes when one only needs 1?),
and also partly intended to be used as a VM hint that it is dealing with
a switch (IOW: it is a jump-table hint).
granted, there are currently over 500 opcodes in total (highest numbered
opcode is 540, but there are still a few gaps in the opcode map), but it
is no real issue at present (there is still plenty of space left in the
2-byte range, despite the single-byte range being pretty much full).
so, I guess the philosophy is a bit different...
maybe the gap in philosophy is greater than that in the technology?...
_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc