Currently the spec, as described in sweet.g, expects some form of "preprocessor".
Perhaps we can actually concretely define such a preprocessor for the core parser? Here's my proposal: (preprocessor port neoteric-read ; so we can use Scheme read while experimenting with it (lambda (get-token) (let ((token (get-token))) (do-whatever-with token)))) The token returned by (get-token) is one of the following forms: INITIAL_INDENT_WITH_BANG INITIAL_INDENT_NO_BANG INDENT DEDENT BADDENT SUBLIST GROUP_SPLICE RESTART_BEGIN RESTART_END EOF hspace comment_eol scomment (n-expr ,<datum>) The logic of preprocessor's get-token function is this: We keep track of a stack of indentations (indent-stack). We also keep track of whether we just recently consumed a newline (line-start), and a numerical number of pending dedents (pending-dedents). Initially, indent-stack is '(), line-start is #t, and pending-dedents is 0. get-token promises to only use peek-char and read-char (i.e. one character lookahead). If pending-dedents is non-zero, decrement it and return 'DEDENT. If eof-object?, check if indent-stack is '(). If it is, return 'EOF. Otherwise, count the number of items in the indent-stack, set pending-indents to the length minus 1, and return 'DEDENT. When a ; or newline is found, consume until newline, set line-start to #t, and return 'comment_eol. If at line-start, and (not (null? indent-stack)), clear line-start to #f, consume indent characters (space, tab, !) and then: - if first non-indent character is ";" or newline, consume until newline, set line-start to #t, and return 'comment_eol. - otherwise, update the indent-stack as needed: - - If the current indent is incompatible with the top-most indent, return BADDENT. - - If the current indent is greater than the top-most indent, push it on the indent-stack and return 'INDENT. - - If the current indent is the same as the top-most indent, recurse into (get-token) [or, if the BNF uses SAME, return 'SAME]. - - If the current indent is less than the top-most indent, pop off indent-stack items (counting the number of pop-offs) until the stack top is equal or less than the current indent - if stack-top is less, we got a bad indent and return BADDENT, if stack-top is equal, record the number of pop-offs - 1 into pending-dedents and return DEDENT; an empty indent-stack is equivalent to "" for this handling. (the expectation is that BADDENT will always be an error) If at line-start, and the first character is an indent character (space, tab, !), clear line-start to #t and consume indent characters. This is the "initial-indent" case - there is no indent-stack yet - so return 'INITIAL_INDENT_WITH_BANG or INITIAL_INDENT_NO_BANG as appropriate. (the expectation is that INITIAL_INDENT_* will stop token processing, i.e. get-token will not be called any more; in the INITIAL_INDENT_NO_BANG it's expected that the caller will use the ordinary Scheme read on the port) If the character is a horizontal space, consume it and return 'hspace. If the character is a "{" or "(" or "[", then return `(n-expr ,(neoteric-read port)) [TODO: #-handling.] Otherwise, call neoteric-read. If it returns $, return 'SUBLIST, \\ -> 'GROUP_SPLICE. For <* and *>, we may need to have an indent-stack-stack, and additional state for the extra tokens that RESTART_END requires. If it's not one of the special symbols, return `(n-expr ,<datum>). -- Assumptions: 1. neoteric-read will not consume any whitespace or newlines after it. In particular, if neoteric-read is given "foo bar", it will return 'foo and leave the port at " bar", including the space before bar. 2. BADDENT and INITIAL_INDENT_* will not cause get-token to get called again. -- What you think? Sincerely, AmkG ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d _______________________________________________ Readable-discuss mailing list Readable-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/readable-discuss