Re: [Readable-discuss] Proposal: a concrete "pre-processor" [Implementation detail]

David A. Wheeler Sat, 26 Jan 2013 08:16:54 -0800

Alan Manuel Gloria:
> Actually, I think a tokenizer process will allow us to use a simple
> parser combinator library, which means we don't need to worry about
> calling protocols of productions on Scheme:


As an approach to clarifying the *specification*, particularly for indentation
and skipping ;-only lines, that might be sensible.

But I also want to make it obvious that an *implementation* does not have to
have a separate tokenizer process.  The "read" procedure is a very basic to any 
Lisp.
Implementors may avoid a notation that seems to *depend* on
having this for implementation.  Thus, I think we need an implementation
that does *not* need that.

> everything uses the same
> parser calling protocol, which of course can be based on Monads (^^);

Hmm, I'm concerned that discussing Modads will send 1/4 of our audience
running to the hills :-).


> So using a separate tokenizer is clearer IMO.
> 
> The only drawback is that we now need to use SAME.

I don't think that's true.  The ANTLR implementation simply
consumes same-indents and doesn't generate any tokens.
As long as a tokenizer can just consume a character
sequence, generate nothing, and then consume more characters
to finally *get* a token, it seems to me it should be fine.

Of course, I've been wrong before; if I *am* wrong, I'm curious as to why.

> A precis: the tokenizer is not a separate pass, but rather implemented
> as a stateful procedure that will consume exactly one token on the
> input stream.  This allows laziness, which allows us to leave as many
> characters as possible on the port at any one time.  The ANTLR
> architecture of having the tokenizer call a stateful parser procedure
> is also possible, but I think it's easier to have the (more complex)
> parser call the tokenizer than the reverse.

(Nitpick: ANTLR calls a stateful lexer, not a parser.)

Maybe.  Hard to know without comparing.

> And I think we should also formalize the tokenizer, since behavior
> like "comment-only lines are skipped" is NOT explicitly shown in the BNF.

Okay!!   This is certainly sensible.  What worries me is that this is
the kind of thing that was easy to fully describe in English, yet can be
tricky to correctly formalize.  I wasn't trying to be snarky about my comment
"look at the ANTLR process"; it turned out to take several tries before
I got a clean and at-least-appears-to-be-correct implementation.

Granted, the inability of the parser to influence the lexer in ANTLR made it
a little more work; a different approach avoids that issue completely.
E.G., a traditional recursive descent parser doesn't have that limitation at 
all.

> Formalizing the tokenizer also allows us to strip away the hspace's in
> the t-expression parsing spec.

I'm very leary of removing the hspace's from the parsing spec.
SRFI-49 did that; the resulting BNF was certainly simpler, but it
made it *much* more difficult to *correctly* implement the spec.
If that all moves into the tokenizer, I'm concerned that
it may not be obvious where it happens,
especially for people implementing it using traditional
recursive descent parsing approaches.
I want people to be able to implement code that is "obviously correct";
if the spec is rigged so that implementation is mostly 1-to-1 it'll
be easier to accept.

> The overall sweet-reader specifications is split into
> three components:...

Gotta run, family commitments, I'll take a real look later.

In the end, though, I suspect having several implementation trials
is a good thing.  If nothing else, it'll prove that the specification is
easy-enough to implement several ways.

--- David A. Wheeler

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Re: [Readable-discuss] Proposal: a concrete "pre-processor" [Implementation detail]

Reply via email to