On Sun, Jan 27, 2013 at 10:57 AM, David A. Wheeler <dwhee...@dwheeler.com> wrote: > Alan Manuel Gloria: >> Monads: keeping readable Lisp obscure since the late 90's! > > :-) :-) :-) :-) > > Probably more than a little true...! > >> >> So using a separate tokenizer is clearer IMO. >> >> >> >> The only drawback is that we now need to use SAME. > ... >> Mostly, it has to do with the reader having to consume only the data >> it needs, and no more. > > I solved that problem in a different way, though either would work. > > I vaguely remember there being an annoying problem when I had a "SAME" token, > but I can't remember now what that was. I then realized I didn't need it > anyway, and never looked back :-).
We could change it so that it's EOL that is unneeded, which may be more consistent: we never emit EOL (it's subsumed by "space") and have three indentation markers INDENT SAME DEDENT. > > >> Hmm, the code structure looks like the lexer calls "emit()", so it >> looks as if the parser is CPS'ed inside a stateful emit() function. >> Compare to bison where the lexer returns a token type and token value, >> so that the lexer is the stateful function. Or are you using a >> separate thread for the Java ANTLR lexer, with emit() being a channel >> to the parser thread that calls a lexer function that just fetches >> from the channel? > > Nothing that sophisticated. ANTLR runs the lexer to completion and saves > every token in a CommonTokenStream (basically a buffer), and then calls the > parser. That design makes it generally useless for interactive use, but I > was using the tool to prototype and rigorously check the grammar, so was > adequate for its purpose. Ah, I see. > >> > I'm very leary of removing the hspace's from the parsing spec. >> > SRFI-49 did that; the resulting BNF was certainly simpler, but it >> > made it *much* more difficult to *correctly* implement the spec. >> > If that all moves into the tokenizer, I'm concerned that >> > it may not be obvious where it happens, >> > especially for people implementing it using traditional >> > recursive descent parsing approaches. >> > I want people to be able to implement code that is "obviously correct"; >> > if the spec is rigged so that implementation is mostly 1-to-1 it'll >> > be easier to accept. >> >> hspace is significant in these cases: >> >> 1. Indentation >> 2. After abbreviation sequences ' ` , ,@ and their Scheme-only syntax >> variants. >> 3. Must specifically be ignored after GROUP_SPLICE in order to handle >> a top-level "foo bar \\ nitz kuu" sequence correctly. >> 4. Must specifically be ignored after RESTART_BEGIN in order to >> handle "let <* x v \\ y v2 *>" style. > > There's at least one other case: it's an error to NOT have hspace between > n-expressions on the same line. Thus: > (x)a > is not legal. > Ah, I see. We could spec the tokenizer as raising this error, or spec the parser to raise this error (in which case the tokenizer has to handover HSPACE tokens). I propose that hspace's outside of n-expressions should be handled by the tokenizer (in my formulation, the tokenizer's "basic tokens" are entire n-expressions), and the parser only cares about INDENT / DEDENT / SAME. Alternatively, we can propose that only the parser actually has the right to throw errors, and the tokenizer should instead focus on getting the information to the parser. > >> The tokenizer can also be specced (and implemented!!) as a >> recursive-descent tokenizer that calls an (emit x) function; this >> gives significant clarity, since we don't have to mention an >> indent-stack - the Scheme stack serves that purpose. In the >> implementation, we just use call/cc in the emit function to suspend >> execution. I don't want to spec it that way since the readable >> project wants implementability across multiple Lisps, and most Lisps >> don't have call/cc. Not even all "Scheme" implementations have a >> call/cc, and many have an inefficient implementation (old Guile >> versions for example). But that style can be done, and be equivalent >> to the current specifications. > > I *really* don't want call/cc in the implementation (with perhaps an > exception in the error handler). I hope to get people to use this in > non-Scheme "Schemes" and even in completely different Lisps, which don't have > such a thing. *shrug* well, it's just a spec, and call/cc can be just an implementation detail. It can be implemented with Erlang processes and message passing, so that you need Erlang to parse Scheme. LOL. Honestly, I think having an explicitly separate tokenizer is better because it's easier to carve up the reader into smaller parts that can be individually debugged. With my proposed formulation, we have three parts each of which can be debugged (and/or re-specced) individually: the n-expression reader is pretty much a standard Scheme reader except it has to report datum comments and block comments, the tokenizer is a separate piece of code (which can be plugged with a standard debugged Scheme reader temporarily to debug it separately from the other parts), and the parser is vastly simplified BNF. The current formulation has the n-expr and t-expr parser melded into a single large spec, and the tokenizer still needs to be specced, but interacts with the parser's state. > > >> Okay, I'll try that later. For now, I want to try my approach first, >> which needs to simplify the BNF to remove some things I don't need and >> replace comment_eol that isn't followed by an INDENT with SAME. > > Okay. Be sure it's separate; I'm already writing Scheme code to match the > BNF!! Yes, I put an amkg-work/ directory specifically for that in (develop). > > >> Incidentally, it might be more useful, in the error case, to have the >> parser consume lines until it finds a completely empty line, or a line >> without indentation. > > I agree, that sounds like a good approach. I think we should give people > flexibility on how to handle errors, but we should at least strive to deal > with errors in a reasonable-enough way in our sample implementation. Sincerely, AmkG ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d _______________________________________________ Readable-discuss mailing list Readable-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/readable-discuss