On 17Aug2015 0813, Barry Warsaw wrote:
On Aug 18, 2015, at 12:58 AM, Chris Angelico wrote:

The linters could tell you that you have no 'end' or 'start' just as
easily when it's in that form as when it's written out in full.
Certainly the mismatched brackets could easily be caught by any sort
of syntax highlighter. The rules for f-strings are much simpler than,
say, the PHP rules and the differences between ${...} and {$...},
which I've seen editors get wrong.

I'm really asking whether it's technically feasible and realistically possible
for them to do so.  I'd love to hear from the maintainers of pyflakes, pylint,
Emacs, vim, and other editors, linters, and other static analyzers on a rough
technical assessment of whether they can support this and how much work it
would be.

With the current format string proposals (allowing arbitrary expressions) I think I'd implement it in our parser with a FORMAT_STRING_TOKEN, a FORMAT_STRING_JOIN_OPERATOR and a FORMAT_STRING_FORMAT_OPERATOR.

A FORMAT_STRING_TOKEN would be started by f('|"|'''|""") and ended by matching quotes or before an open brace (that is not escaped).

A FORMAT_STRING_JOIN_OPERATOR would be inserted as the '{', which we'd either colour as part of the string or the regular brace colour. This also enables a parsing context where a colon becomes the FORMAT_STRING_FORMAT_OPERATOR and the right-hand side of this binary operator would be FORMAT_STRING_TOKEN. The final close brace becomes another FORMAT_STRING_JOIN_OPERATOR and the rest of the string is FORMAT_STRING_TOKEN.

So it'd translate something like this:

f"This {text} is my {string:>{length+3}}"

FORMAT_STRING_TOKEN[f"This ]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[text]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN[ is my ]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[string]
FORMAT_STRING_FORMAT_OPERATOR[:]
FORMAT_STRING_TOKEN[>]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[length]
OPERATOR[+]
NUMBER[3]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN[]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN["]

I *believe* (without having tried it) that this would let us produce a valid tokenisation (in our model) without too much difficulty, and highlight/analyse correctly, including validating matching braces. Getting the precedence correct on the operators might be more difficult, but we may also just produce an AST that looks like a function call, since that will give us "good enough" handling once we're past tokenisation.

A simpler tokenisation that would probably be sufficient for many editors would be to treat the first and last segments ([f"This {] and [}"]) as groupings and each section of text as separators, giving this:

OPEN_GROUPING[f"This {]
EXPRESSION[text]
COMMA[} is my {]
EXPRESSION[string]
COMMA[:>{]
EXPRESSION[length+3]
COMMA[}}]
CLOSE_GROUPING["]

Initial parsing may be a little harder, but it should mean less trouble when expressions spread across multiple lines, since that is already handled for other types of groupings. And if any code analysis is occurring, it should be happening for dict/list/etc. contents already, and so format strings will get it too.

So I'm confident we can support it, and I expect either of these two approaches will work for most tools without too much trouble. (There's also a middle ground where you create new tokens for format string components, but combine them like the second example.)

Cheers,
Steve

Cheers,
-Barry


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to