Sergey Gromov wrote:
Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:

Sergey Gromov wrote:
Well I think it's hard to create a regular expression engine flexible
enough to allow arbitrary highlighting.
I can't see how it can be at all complicated to find the beginning and end of a C string or character literal.

This (Posix?) regexp

"(\\.|[^\\"])*"

works as I try (though not in the tiny subset of Posix regexps that N++ understands). But that's an aside - you don't need regexps at all to get it working at this basic level, only a rudimentary concept of escape sequences.

I think the best such engine
I've seen was Colorer by Igor Russkih, and even there I wasn't able to
express D's WYSIWYG or delimited strings.  You need a real programming
language for that.
For WYSIWYG strings, all that's needed is a generic highlighter that supports:
- the aforementioned string escapes
- multiple types of string literals distinguished by whether they support string escapes, and not just delimiters

TextPad's syntax highlighting engine manages 2/3 of this without any regexps (or anything to that effect). That said, I've just found that it can do a little bit of what remains: I can make it do `...` but not r"..." at the expense of distinguishing string and character literals.

But token-delimited strings are indeed more complex to deal with. (How many people do we have putting them to practical use at the moment, for that matter?)

Well, you can write a regexp to handle a simple C string.  That is, if
your regexp is matched against the whole file, which is usually not the
case.  Otherwise you'll have troubles with C string:

"foo\
bar"

or D string:

"foo
bar"

Then you want to highlight string escapes and probably format
specifiers.  Therefore you need not simple regexps but hierarchies of
them, and also you need to know where *internals* of the string start
and end.

Then you have r"foo" which probably can be handled with regexps.

Then you have q"/foo/" where "/" can be anything.  Still can be handled
by extended regexps, even though they won't be regular expressions in
scientific sense.

Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
Regexps cannot translate while substituting, so you must create regexps
for all possible parens.

Remember that the whole point of q{} strings was that they should NOT be highlighted as strings!

Reply via email to