Re: [Readable-discuss] Proposal: a concrete "pre-processor" [Implementation detail]

Alan Manuel Gloria Sat, 26 Jan 2013 19:14:33 -0800

On Sun, Jan 27, 2013 at 10:57 AM, David A. Wheeler
<dwhee...@dwheeler.com> wrote:
> Alan Manuel Gloria:
>> Monads: keeping readable Lisp obscure since the late 90's!
>
> :-) :-) :-) :-)
>
> Probably more than a little true...!
>
>> >> So using a separate tokenizer is clearer IMO.
>> >>
>> >> The only drawback is that we now need to use SAME.
> ...
>> Mostly, it has to do with the reader having to consume only the data
>> it needs, and no more.
>
> I solved that problem in a different way, though either would work.
>
> I vaguely remember there being an annoying problem when I had a "SAME" token, 
> but I can't remember now what that was.  I then realized I didn't need it 
> anyway, and never looked back :-).


We could change it so that it's EOL that is unneeded, which may be
more consistent: we never emit EOL (it's subsumed by "space") and have
three indentation markers INDENT SAME DEDENT.

>
>
>> Hmm, the code structure looks like the lexer calls "emit()", so it
>> looks as if the parser is CPS'ed inside a stateful emit() function.
>> Compare to bison where the lexer returns a token type and token value,
>> so that the lexer is the stateful function.  Or are you using a
>> separate thread for the Java ANTLR lexer, with emit() being a channel
>> to the parser thread that calls a lexer function that just fetches
>> from the channel?
>
> Nothing that sophisticated.  ANTLR runs the lexer to completion and saves 
> every token in a CommonTokenStream (basically a buffer), and then calls the 
> parser.  That design makes it generally useless for interactive use, but I 
> was using the tool to prototype and rigorously check the grammar, so was 
> adequate for its purpose.

Ah, I see.

>
>> > I'm very leary of removing the hspace's from the parsing spec.
>> > SRFI-49 did that; the resulting BNF was certainly simpler, but it
>> > made it *much* more difficult to *correctly* implement the spec.
>> > If that all moves into the tokenizer, I'm concerned that
>> > it may not be obvious where it happens,
>> > especially for people implementing it using traditional
>> > recursive descent parsing approaches.
>> > I want people to be able to implement code that is "obviously correct";
>> > if the spec is rigged so that implementation is mostly 1-to-1 it'll
>> > be easier to accept.
>>
>> hspace is significant in these cases:
>>
>> 1.  Indentation
>> 2.  After abbreviation sequences ' ` , ,@ and their Scheme-only syntax 
>> variants.
>> 3.  Must specifically be ignored after GROUP_SPLICE in order to handle
>> a top-level "foo bar \\ nitz kuu" sequence correctly.
>> 4.  Must specifically be ignored after RESTART_BEGIN in order to
>> handle "let <* x v \\ y v2 *>" style.
>
> There's at least one other case: it's an error to NOT have hspace between 
> n-expressions on the same line. Thus:
>   (x)a
> is not legal.
>

Ah, I see.  We could spec the tokenizer as raising this error, or spec
the parser to raise this error (in which case the tokenizer has to
handover HSPACE tokens).

I propose that hspace's outside of n-expressions should be handled by
the tokenizer (in my formulation, the tokenizer's "basic tokens" are
entire n-expressions), and the parser only cares about INDENT / DEDENT
/ SAME.  Alternatively, we can propose that only the parser actually
has the right to throw errors, and the tokenizer should instead focus
on getting the information to the parser.

>
>> The tokenizer can also be specced (and implemented!!) as a
>> recursive-descent tokenizer that calls an (emit x) function; this
>> gives significant clarity, since we don't have to mention an
>> indent-stack - the Scheme stack serves that purpose.  In the
>> implementation, we just use call/cc  in the emit function to suspend
>> execution.  I don't want to spec it that way since the readable
>> project wants implementability across multiple Lisps, and most Lisps
>> don't have call/cc.  Not even all "Scheme" implementations have a
>> call/cc, and many have an inefficient implementation (old Guile
>> versions for example).  But that style can be done, and be equivalent
>> to the current specifications.
>
> I *really* don't want call/cc in the implementation (with perhaps an 
> exception in the error handler).  I hope to get people to use this in 
> non-Scheme "Schemes" and even in completely different Lisps, which don't have 
> such a thing.

*shrug* well, it's just a spec, and call/cc can be just an
implementation detail.  It can be implemented with Erlang processes
and message passing, so that you need Erlang to parse Scheme.  LOL.

Honestly, I think having an explicitly separate tokenizer is better
because it's easier to carve up the reader into smaller parts that can
be individually debugged.  With my proposed formulation, we have three
parts each of which can be debugged (and/or re-specced) individually:
the n-expression reader is pretty much a standard Scheme reader except
it has to report datum comments and block comments, the tokenizer is a
separate piece of code (which can be plugged with a standard debugged
Scheme reader temporarily to debug it separately from the other
parts), and the parser is vastly simplified BNF.  The current
formulation has the n-expr and t-expr parser melded into a single
large spec, and the tokenizer still needs to be specced, but interacts
with the parser's state.

>
>
>> Okay, I'll try that later. For now, I want to try my approach first,
>> which needs to simplify the BNF to remove some things I don't need and
>> replace comment_eol that isn't followed by an INDENT with SAME.
>
> Okay.  Be sure it's separate; I'm already writing Scheme code to match the 
> BNF!!

Yes, I put an amkg-work/ directory specifically for that in (develop).

>
>
>> Incidentally, it might be more useful, in the error case, to have the
>> parser consume lines until it finds a completely empty line, or a line
>> without indentation.
>
> I agree, that sounds like a good approach.  I think we should give people 
> flexibility on how to handle errors, but we should at least strive to deal 
> with errors in a reasonable-enough way in our sample implementation.

Sincerely,
AmkG

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Re: [Readable-discuss] Proposal: a concrete "pre-processor" [Implementation detail]

Reply via email to