Re: Notepad++

Stewart Gordon Fri, 14 Aug 2009 17:40:15 -0700

Sergey Gromov wrote:
<snip>

Well, you can write a regexp to handle a simple C string.  That is, if
your regexp is matched against the whole file, which is usually not the
case.  Otherwise you'll have troubles with C string:


"foo\
bar"

or D string:

"foo
bar"

So there is a problem if the highlighter works by matching regexps on aline-by-line basis. But matching regexps over a whole file is no harderin principle than matching line-by-line and, when the maximal munchprinciple is never called to action, it can't be much less efficient.(The only bit of C or D strings that relies on maximal munch is octalescapes.)

Then you want to highlight string escapes and probably format
specifiers.  Therefore you need not simple regexps but hierarchies of
them, and also you need to know where *internals* of the string start
and end.

Let's just concentrate for the moment on the simple process of findingthe beginning and end of a string. Here's a snippet of a TextPad syntaxfile:


StringsSpanLines = Yes
StringStart = "
StringEnd = "
StringEsc = \

A possible snippet of lexer code to handle this (which FAIK might benear enough how TP does it):


if (*c == StringStart) {
    beginHighlightString(c);
    for (++c; *c != StringEnd && *c != '\0'
          &&(StringsSpanLines || *c != '\n'); ++c) {
        if (*c == StringEsc) ++c;
    }
    endHighlightString(c+1);
}

It's simple and it should work. (OK, there are two assumptions made forsimplicity: that line breaks are normalised to LF, and that the file isterminated by at least two null bytes in memory, but you get the idea.)

While it doesn't support highlighting of escapes, I can't see this factas being the reason N++'s developers haven't implemented even this inthe generic lexer module. I probably couldn't see it being the reasoneven if the C lexer did highlight escapes (which it doesn't).

Then you have r"foo" which probably can be handled with regexps.

Then you have q"/foo/" where "/" can be anything.  Still can be handled
by extended regexps, even though they won't be regular expressions in
scientific sense.

Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
Regexps cannot translate while substituting, so you must create regexps
for all possible parens.

Yes, these aspects are more complicated. Both TP and N++ (out of thebox, anyway) are probably far from being able to lex D2 properly. Butthey certainly could do better in supporting D1. Still, once N++ gainsaccess to Scintilla's D lexer, things will certainly be better.

And of course q"BLAH
whatever BLAH here
BLAH", well, probably nice for help texts.

And these are only strings.  Try to write regexp which treats .__15 as
number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
number(2), operator(..), number(3).

<snip>

We'd need many regexps to handle all possible cases, but a possible setto cover these cases and a few others (listed in a possible order ofpriority) is:


\._*[0-9][0-9_]*
([1-9][0-9]*)(\.\.)
[0-9]+\.[0-9]*
[1-9][0-9]*
\.\.
\.
[a-zA-Z_][a-zA-Z0-9_]*

Note the use of capturing groups to handle the 2..3 case. Eachcapturing group would match a token, while in the other cases the wholeregexp matches a token.


Stewart.

Re: Notepad++

Reply via email to