On 1/25/13, Alan Manuel Gloria <almkg...@gmail.com> wrote: > Currently the spec, as described in sweet.g, expects some form of > "preprocessor". > > Perhaps we can actually concretely define such a preprocessor for the > core parser? > > Here's my proposal: > > (preprocessor > port > neoteric-read ; so we can use Scheme read while experimenting with it > (lambda (get-token) > (let ((token (get-token))) > (do-whatever-with token)))) > > The token returned by (get-token) is one of the following forms: > > INITIAL_INDENT_WITH_BANG > INITIAL_INDENT_NO_BANG > INDENT > DEDENT > BADDENT > SUBLIST > GROUP_SPLICE > RESTART_BEGIN > RESTART_END > EOF > hspace > comment_eol > scomment > (n-expr ,<datum>) > > The logic of preprocessor's get-token function is this: > > We keep track of a stack of indentations (indent-stack). We also keep > track of whether we just recently consumed a newline (line-start), and > a numerical number of pending dedents (pending-dedents). Initially, > indent-stack is '(), line-start is #t, and pending-dedents is 0. > > get-token promises to only use peek-char and read-char (i.e. one > character lookahead). > > If pending-dedents is non-zero, decrement it and return 'DEDENT. > > If eof-object?, check if indent-stack is '(). If it is, return 'EOF. > Otherwise, count the number of items in the indent-stack, set > pending-indents to the length minus 1, and return 'DEDENT. > > When a ; or newline is found, consume until newline, set line-start to > #t, and return 'comment_eol. > > If at line-start, and (not (null? indent-stack)), clear line-start to > #f, consume indent characters (space, tab, !) and then: > - if first non-indent character is ";" or newline, consume until > newline, set line-start to #t, and return 'comment_eol.
Uh, wrong. Need to do this instead: - if the first non-indent character is ";", set line-start to #t, cionsume until newline, and *recurse into (get-token)* - if the first non-indent character is a newline, proceed as if the current indent is "" - *do not consume the newline* > - otherwise, update the indent-stack as needed: > - - If the current indent is incompatible with the top-most indent, > return BADDENT. > - - If the current indent is greater than the top-most indent, push it > on the indent-stack and return 'INDENT. > - - If the current indent is the same as the top-most indent, recurse > into (get-token) [or, if the BNF uses SAME, return 'SAME]. > - - If the current indent is less than the top-most indent, pop off > indent-stack items (counting the number of pop-offs) until the stack > top is equal or less than the current indent - if stack-top is less, > we got a bad indent and return BADDENT, if stack-top is equal, record > the number of pop-offs - 1 into pending-dedents and return DEDENT; an > empty indent-stack is equivalent to "" for this handling. > > (the expectation is that BADDENT will always be an error) > > If at line-start, and the first character is an indent character > (space, tab, !), clear line-start to #t and consume indent characters. > This is the "initial-indent" case - there is no indent-stack yet - > so return 'INITIAL_INDENT_WITH_BANG or INITIAL_INDENT_NO_BANG as > appropriate. > > (the expectation is that INITIAL_INDENT_* will stop token processing, > i.e. get-token will not be called any more; in the > INITIAL_INDENT_NO_BANG it's expected that the caller will use the > ordinary Scheme read on the port) > > If the character is a horizontal space, consume it and return 'hspace. > > If the character is a "{" or "(" or "[", then return `(n-expr > ,(neoteric-read port)) > > [TODO: #-handling.] > > Otherwise, call neoteric-read. If it returns $, return 'SUBLIST, \\ > -> 'GROUP_SPLICE. For <* and *>, we may need to have an > indent-stack-stack, and additional state for the extra tokens that > RESTART_END requires. If it's not one of the special symbols, return > `(n-expr ,<datum>). > > -- > > Assumptions: > > 1. neoteric-read will not consume any whitespace or newlines after > it. In particular, if neoteric-read is given "foo bar", it will > return 'foo and leave the port at " bar", including the space before > bar. > > 2. BADDENT and INITIAL_INDENT_* will not cause get-token to get called > again. > > -- > > What you think? > > Sincerely, > AmkG > ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d _______________________________________________ Readable-discuss mailing list Readable-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/readable-discuss