ISSUE: Leading whitespace at start of expression reading in I-expressions I propose a new specific interpretation for leading whitespace in an indented I-expression, which I'll call the "most consistent" format. This is different than my previous proposal... I think this one's better. Below is an explanation of the problem, and my proposed resolution.
Thoughts? After fiddling with the alternatives, I'm getting very worried that it'd be easy to type in text that would APPEAR to mean one thing, but would ACTUALLY mean something else. That's definitely something to avoid. My "most consistent" proposal completely avoids that, without being quite as strict as Python's "thou shalt start at the left edge". First, the problem. What should be done if the start of an expression (I'll call that start-expr) begins with whitespace that is NOT followed by comments or NL or EOF? E.G.: start-expr -> hspace+ (not eol...) An example should make it clear. Imagine you read this (three lines, all indented to the same level at the TOPMOST level): x y z One interpretation is that there should be 3 different results: x, y, and z. But consider how this would be read. You'd read in the indentation before x, and note that as the "topmost" indentation. Then you'd read in the indentation before y, notice that it was the same as x's, and stop just before reading the "y" and return with just "x". But wait - if you did that, when you read "y" you would think that there was <i>no</i> indentation (the previous read consumed it), and thus z would be further indented... returning (y z). Ooops, that can't be right. Since essentially the dawn of Lisp in the 1950s there has been a "read" function that reads an S-expression from the input and returns it. This is an extremely stable function interface, and one not easily changed in fundamental ways. In particular, no user of "read" expects it to <i>also</i> return some state - such as the indentation that was read the <i>last</i> time read was called - and certainly they aren't going to provide that information back to "read" anyway. Not only is this difficult to change for backwards-compatibility reasons, it's not clear you should - simple interfaces are a good idea, if you can get them, and adding such "indentation state" as a required parameter would certainly complicate the interface. In theory, you could "unget" all the indentation characters, so that the next read would work correctly. But the support for this is rare; for example, Scheme doesn't even <i>have</i> a standard unget character function, and the Common Lisp standard only supports one character unget (not enough!). You could store "hidden state" inside the read function. Problem is, character-reading is not the exclusive domain of the read function; many other functions read characters, and they are unlikely to look at this hidden state. These functions tend to be low-level functions and in some implementations are difficult to override. What's more, you would have to store hidden state for each possible input source, and this can become insane in the many implementations that support support ports of non-files (such as from strings). "Hidden state" could allow for all this, but the hideous complications of <i>implementing</i> hidden state suggests that it'd be better to spec something that does <i>not</i> require hidden state. Possible solutions: 1. Simplest approach: Forbid it. It's an error if it doesn't start on left line. Python does this. You could argue that the spec requires this, since there's no production that accepts an initial INDENT. The example above would then be illegal. But this is not very flexible; #2 appears to be a better option. 2. Most consistent: Allow indentation on initial line (and consider that the indentation for that expression), as long all later lines have a further indentation OR are on the left edge (including a blank line ending in EOL or EOF, or a comment the left edge). This at least LETS you indent each expression if you like, with NO risk of misinterpretation of later lines. Typical use: if you want to indent everything, separate with blank lines. Then you initialize indent preprocessor, and have it do this: start-expr -> INDENT expr DEDENT The xyz example above would be illegal, and thus rejected. However, if you inserted blank lines between x, y, and z, you'd be okay. This is the most consistent and most flexible, and has no risk of misinterpretation, so I propose this one. 3. Original implementation ignores hspace: start-expr -> hspace+ start-expr $2 But when this is given the xyz example above, it will misleadingly produce (x y z). That kind of surprise seems undesirable, esp. given that there is alternative #2. 4. Instead, could disable indent processing on initial hspace, to maximize backwards-compatibility and simplify some command line use: start-expr -> hspace+ s-expr $2 This would read in the "xyz" example as you would expect. It would also read in old text like this as it was originally intended: (define x 5) (define y 6) However, other formats will be misinterpreted, e.g., fact 5 will be understood as: fact 5 and not as (fact 5). This is risky; on printouts, it might not at ALL be obvious when expressions are indented like this - resulting in hard-to-debug code and hidden defects. In general, I think it's much wiser to reject text that might be very easily misinterpreted by the reader. So I suggest #2. --- David A. Wheeler