On 1/25/13, Alan Manuel Gloria <almkg...@gmail.com> wrote:
> Currently the spec, as described in sweet.g, expects some form of
> "preprocessor".
>
> Perhaps we can actually concretely define such a preprocessor for the
> core parser?
>
> Here's my proposal:
>
> (preprocessor
>   port
>   neoteric-read ; so we can use Scheme read while experimenting with it
>   (lambda (get-token)
>     (let ((token (get-token)))
>       (do-whatever-with token))))
>
> The token returned by (get-token) is one of the following forms:
>
> INITIAL_INDENT_WITH_BANG
> INITIAL_INDENT_NO_BANG
> INDENT
> DEDENT
> BADDENT
> SUBLIST
> GROUP_SPLICE
> RESTART_BEGIN
> RESTART_END
> EOF
> hspace
> comment_eol
> scomment
> (n-expr ,<datum>)
>
> The logic of preprocessor's get-token function is this:
>
> We keep track of a stack of indentations (indent-stack).  We also keep
> track of whether we just recently consumed a newline (line-start), and
> a numerical number of pending dedents (pending-dedents).  Initially,
> indent-stack is '(), line-start is #t, and pending-dedents is 0.
>
> get-token promises to only use peek-char and read-char (i.e. one
> character lookahead).
>
> If pending-dedents is non-zero, decrement it and return 'DEDENT.
>
> If eof-object?, check if indent-stack is '().  If it is, return 'EOF.
> Otherwise, count the number of items in the indent-stack, set
> pending-indents to the length minus 1, and return 'DEDENT.
>
> When a ; or newline is found, consume until newline, set line-start to
> #t, and return 'comment_eol.
>
> If at line-start, and (not (null? indent-stack)), clear line-start to
> #f, consume indent characters (space, tab, !) and then:
> - if first non-indent character is ";" or newline, consume until
> newline, set line-start to #t, and return 'comment_eol.

Uh, wrong.  Need to do this instead:

- if the first non-indent character is ";", set line-start to #t,
cionsume until newline, and *recurse into (get-token)*
- if the first non-indent character is a newline, proceed as if the
current indent is "" - *do not consume the newline*

> - otherwise, update the indent-stack as needed:
> - - If the current indent is incompatible with the top-most indent,
> return BADDENT.
> - - If the current indent is greater than the top-most indent, push it
> on the indent-stack and return 'INDENT.
> - - If the current indent is the same as the top-most indent, recurse
> into (get-token) [or, if the BNF uses SAME, return 'SAME].
> - - If the current indent is less than the top-most indent, pop off
> indent-stack items (counting the number of pop-offs) until the stack
> top is equal or less than the current indent - if stack-top is less,
> we got a bad indent and return BADDENT, if stack-top is equal, record
> the number of pop-offs - 1 into pending-dedents and return DEDENT; an
> empty indent-stack is equivalent to "" for this handling.
>
> (the expectation is that BADDENT will always be an error)
>
> If at line-start, and the first character is an indent character
> (space, tab, !), clear line-start to #t and consume indent characters.
>  This is the "initial-indent" case - there is no indent-stack yet -
> so return 'INITIAL_INDENT_WITH_BANG or INITIAL_INDENT_NO_BANG as
> appropriate.
>
> (the expectation is that INITIAL_INDENT_* will stop token processing,
> i.e. get-token will not be called any more; in the
> INITIAL_INDENT_NO_BANG it's expected that the caller will use the
> ordinary Scheme read on the port)
>
> If the character is a horizontal space, consume it and return 'hspace.
>
> If the character is a "{" or "(" or "[", then return `(n-expr
> ,(neoteric-read port))
>
> [TODO: #-handling.]
>
> Otherwise, call neoteric-read.  If it returns $, return 'SUBLIST, \\
> -> 'GROUP_SPLICE.  For <* and *>, we may need to have an
> indent-stack-stack, and additional state for the extra tokens that
> RESTART_END requires.  If it's not one of the special symbols, return
> `(n-expr ,<datum>).
>
> --
>
> Assumptions:
>
> 1.  neoteric-read will not consume any whitespace or newlines after
> it.  In particular, if neoteric-read is given "foo bar", it will
> return 'foo and leave the port at " bar", including the space before
> bar.
>
> 2.  BADDENT and INITIAL_INDENT_* will not cause get-token to get called
> again.
>
> --
>
> What you think?
>
> Sincerely,
> AmkG
>

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Reply via email to