Re: Notepad++

2009-08-21 Thread Stewart Gordon

Sergey Gromov wrote:


Here's a string which is valid in D but is invalid in C:

"foo
bar"

Here's another string which is, on the contrary, valid in C but is
invalid in D:

"foo\
bar"

They both "span lines."


Doesn't quite relate to what I was querying ... but anyway, it's 
perfectly straightforward to add another rule like


LineSplice = \

among other possibilities.

You could argue over whether it's worth going to all this effort, if you 
think the only point is to support C, C++ and D.  But really, there are 
many C-like languages out there with their own slightly different rules, 
and even the likes of Prolog, SQL and Unix shell scripts with their own 
variants of C string syntax.  I think the scheme I've come up with would 
be a good way to capture the subtle differences between these languages' 
string syntaxes, while at the same time being something that the average 
user wanting to add a new language to the system should be able to get 
their head around sooner or later.


Stewart.


Re: Notepad++

2009-08-20 Thread Sergey Gromov
Tue, 18 Aug 2009 20:40:37 +0100, Stewart Gordon wrote:

> Sergey Gromov wrote:
>> Exactly.  There is a 32-bit "style" known for every character, plus
>> another 32-bit field associated with every line.  A lexer is free to use
>> these fields for any purpose, except the lower byte of a style defines
>> the characters' color.
> 
> Does it keep around in memory the style of every character, or only the 
> 32-bit field associated with the line so that the lexer can re-style the 
> characters on repaint/scroll?

It can tell about any character of which style it is.  This is to
repaint unchanged lines without ever calling a lexer.

> 
>>> [DelimitedToken9]
>>> Start = '
>>> End = '
>>> Esc = \
>>> Type = Char
>>> SpanLines = No
>>> Nest = No
>>>
>>> There, we have all of D1 covered now, and not a regexp in sight.
>> 
>> Yes and no, because your ad-hoc format doesn't cover subtle differences
>> between C and D strings.  Like C strings don't support embedded EOLs.
> 
> I don't understand.  How does SpanLines not achieve this?
> 
> Then what _does_ SpanLines achieve according to whatever conclusion 
> you've come to?

Here's a string which is valid in D but is invalid in C:

"foo
bar"

Here's another string which is, on the contrary, valid in C but is
invalid in D:

"foo\
bar"

They both "span lines."


Re: Notepad++

2009-08-18 Thread Stewart Gordon

Sergey Gromov wrote:

Mon, 17 Aug 2009 21:23:56 +0100, Stewart Gordon wrote:



Is this anything like how Scintilla works?


Exactly.  There is a 32-bit "style" known for every character, plus
another 32-bit field associated with every line.  A lexer is free to use
these fields for any purpose, except the lower byte of a style defines
the characters' color.


Does it keep around in memory the style of every character, or only the 
32-bit field associated with the line so that the lexer can re-style the 
characters on repaint/scroll?




[DelimitedToken9]
Start = '
End = '
Esc = \
Type = Char
SpanLines = No
Nest = No

There, we have all of D1 covered now, and not a regexp in sight.


Yes and no, because your ad-hoc format doesn't cover subtle differences
between C and D strings.  Like C strings don't support embedded EOLs.


I don't understand.  How does SpanLines not achieve this?

Then what _does_ SpanLines achieve according to whatever conclusion 
you've come to?



Though you may consider this minor.




Basically yes, but they're going to be much more complex.  3Lu...5 is
also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
still, regexps don't nest.  Don't you want to highlight DDoc sections
and macros?
That would be nice as well, as would being able to do things with 
Doxygen comments.  But let's not try to run before we can walk.


This assumes that TextPad could run at some point. 


You're right - it turns out TP doesn't get all the D floating point 
notations right.  It appears that TP has hard-coded the syntax of C 
numeric literals.  I must've just not noticed since I had never before 
changed the number colour from the same as the default text colour.


Maybe we do want regexps for all these floating point notations after all.

;)  This is exactly where I'm sceptical.  I think that when it runs 
it'll have so many weird rules and settings that it won't be fun 
anymore.  And they won't be powerful enough for anything authors 
didn't consider anyway.


Maybe someone can come up with something

Stewart.


Re: Notepad++

2009-08-17 Thread Sergey Gromov
Mon, 17 Aug 2009 10:37:47 +0200, Don wrote:

> Sergey Gromov wrote:
>> Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
>> Regexps cannot translate while substituting, so you must create regexps
>> for all possible parens.
> 
> Remember that the whole point of q{} strings was that they should NOT be 
> highlighted as strings!

You confuse q{} and q"{}" here.  The former is a token string which may
contain only valid D tokens.  The latter is a delimited string with
nesting delimiters.  Like q"".


Re: Notepad++

2009-08-17 Thread Sergey Gromov
Mon, 17 Aug 2009 21:23:56 +0100, Stewart Gordon wrote:

> Sergey Gromov wrote:
>> Highlighting the whole file every time a charater is typed is slow.
>> Scintilla doesn't do that.  It provides the lexer with a range of
>> changed lines.  The lexer is then free to choose a larger range if it
>> cannot deduce context from the initial range.  I tried to ignore this
>> range and re-highlight the whole file in my lexer.  The performance was
>> unacceptable.
> 
> Of course.  I suppose now that the right strategy is line-by-line with 
> some preservation of state between lines:
> 
> - Keep a note of the state at the beginning of each line
> - When something is changed, re-highlight those lines that have changed
> - Carry on re-highlighting until the state is back in sync with what was 
> there before.  If this means going way beyond the visible area of the 
> file, record the state of the next however many lines as unknown (so 
> that it will have another go when/if those lines are later scrolled into 
> view).
> - If a range of lines that has just come into view begins in unknown 
> state, it's up to the particular lexer module to start from the first 
> visible line or backtrack as far as it likes to get some context.
> 
> Is this anything like how Scintilla works?

Exactly.  There is a 32-bit "style" known for every character, plus
another 32-bit field associated with every line.  A lexer is free to use
these fields for any purpose, except the lower byte of a style defines
the characters' color.

> 
> 
>> It's actually trivial* to implement a lexer for Scintilla which would
>> work exactly as TextPad does, including use of the same configuration
>> files.
>> 
>> * That is, if you know exactly how TextPad works.
> 
> It would also be straightforward to improve TextPad's scheme to support 
> an arbitrary number of string/comment types.  How about this as an 
> all-in-one replacement for TP's comment and string syntax directives?
> 
> [...]
> 
> [DelimitedToken8]
> Start = "
> End = "
> Esc = \
> Type = String
> SpanLines = Yes
> Nest = No
> 
> [DelimitedToken9]
> Start = '
> End = '
> Esc = \
> Type = Char
> SpanLines = No
> Nest = No
> 
> There, we have all of D1 covered now, and not a regexp in sight.

Yes and no, because your ad-hoc format doesn't cover subtle differences
between C and D strings.  Like C strings don't support embedded EOLs.
Though you may consider this minor.

> 
>> Basically yes, but they're going to be much more complex.  3Lu...5 is
>> also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
>> still, regexps don't nest.  Don't you want to highlight DDoc sections
>> and macros?
> 
> That would be nice as well, as would being able to do things with 
> Doxygen comments.  But let's not try to run before we can walk.

This assumes that TextPad could run at some point.  ;)  This is exactly
where I'm sceptical.  I think that when it runs it'll have so many weird
rules and settings that it won't be fun anymore.  And they won't be
powerful enough for anything authors didn't consider anyway.


Re: Notepad++

2009-08-17 Thread Stewart Gordon

Sergey Gromov wrote:

Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:


Sergey Gromov wrote:

"foo
bar"
So there is a problem if the highlighter works by matching regexps on a 
line-by-line basis.  But matching regexps over a whole file is no harder 
in principle than matching line-by-line and, when the maximal munch 
principle is never called to action, it can't be much less efficient. 
(The only bit of C or D strings that relies on maximal munch is octal 
escapes.)


Highlighting the whole file every time a charater is typed is slow.
Scintilla doesn't do that.  It provides the lexer with a range of
changed lines.  The lexer is then free to choose a larger range if it
cannot deduce context from the initial range.  I tried to ignore this
range and re-highlight the whole file in my lexer.  The performance was
unacceptable.


Of course.  I suppose now that the right strategy is line-by-line with 
some preservation of state between lines:


- Keep a note of the state at the beginning of each line
- When something is changed, re-highlight those lines that have changed
- Carry on re-highlighting until the state is back in sync with what was 
there before.  If this means going way beyond the visible area of the 
file, record the state of the next however many lines as unknown (so 
that it will have another go when/if those lines are later scrolled into 
view).
- If a range of lines that has just come into view begins in unknown 
state, it's up to the particular lexer module to start from the first 
visible line or backtrack as far as it likes to get some context.


Is this anything like how Scintilla works?



It's actually trivial* to implement a lexer for Scintilla which would
work exactly as TextPad does, including use of the same configuration
files.

* That is, if you know exactly how TextPad works.


It would also be straightforward to improve TextPad's scheme to support 
an arbitrary number of string/comment types.  How about this as an 
all-in-one replacement for TP's comment and string syntax directives?


[DelimitedToken1]
Start = /**
End = */
Type = DocComment
SpanLines = Yes
Nest = No

[DelimitedToken2]
Start = /*!
End = */
Type = DocComment
SpanLines = Yes
Nest = No

[DelimitedToken3]
Start = /*
End = */
Type = Comment
SpanLines = Yes
Nest = No

[DelimitedToken4]
Start = /+
End = +/
Type = Comment
SpanLines = Yes
Nest = Yes

[DelimitedToken5]
Start = //
Type = Comment
SpanLines = No
Nest = No

[DelimitedToken6]
Start = r"
End = "
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken7]
Start = `
End = `
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken8]
Start = "
End = "
Esc = \
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken9]
Start = '
End = '
Esc = \
Type = Char
SpanLines = No
Nest = No

There, we have all of D1 covered now, and not a regexp in sight.



Basically yes, but they're going to be much more complex.  3Lu...5 is
also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
still, regexps don't nest.  Don't you want to highlight DDoc sections
and macros?


That would be nice as well, as would being able to do things with 
Doxygen comments.  But let's not try to run before we can walk.


Stewart.


Re: Notepad++

2009-08-17 Thread Don

Sergey Gromov wrote:

Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:


Sergey Gromov wrote:

Well I think it's hard to create a regular expression engine flexible
enough to allow arbitrary highlighting.
I can't see how it can be at all complicated to find the beginning and 
end of a C string or character literal.


This (Posix?) regexp

"(\\.|[^\\"])*"

works as I try (though not in the tiny subset of Posix regexps that N++ 
understands).  But that's an aside - you don't need regexps at all to 
get it working at this basic level, only a rudimentary concept of escape 
sequences.



I think the best such engine
I've seen was Colorer by Igor Russkih, and even there I wasn't able to
express D's WYSIWYG or delimited strings.  You need a real programming
language for that.
For WYSIWYG strings, all that's needed is a generic highlighter that 
supports:

- the aforementioned string escapes
- multiple types of string literals distinguished by whether they 
support string escapes, and not just delimiters


TextPad's syntax highlighting engine manages 2/3 of this without any 
regexps (or anything to that effect).  That said, I've just found that 
it can do a little bit of what remains: I can make it do `...` but not 
r"..." at the expense of distinguishing string and character literals.


But token-delimited strings are indeed more complex to deal with.  (How 
many people do we have putting them to practical use at the moment, for 
that matter?)


Well, you can write a regexp to handle a simple C string.  That is, if
your regexp is matched against the whole file, which is usually not the
case.  Otherwise you'll have troubles with C string:

"foo\
bar"

or D string:

"foo
bar"

Then you want to highlight string escapes and probably format
specifiers.  Therefore you need not simple regexps but hierarchies of
them, and also you need to know where *internals* of the string start
and end.

Then you have r"foo" which probably can be handled with regexps.

Then you have q"/foo/" where "/" can be anything.  Still can be handled
by extended regexps, even though they won't be regular expressions in
scientific sense.

Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
Regexps cannot translate while substituting, so you must create regexps
for all possible parens.


Remember that the whole point of q{} strings was that they should NOT be 
highlighted as strings!


Re: Notepad++

2009-08-15 Thread Nick Sabalausky
"bearophile"  wrote in message 
news:h67s40$2aq...@digitalmars.com...
>
> Today the difference isn't much important because CPUs are fast.

Not when people are compensating for it with slow code in slow langauges and 
running many such bloatwares all at once. "Fast modern CPUs" is just a 
rationalization for certain shitty development practices. 




Re: Notepad++

2009-08-15 Thread bearophile
Sergey Gromov:
> Sure, TextPad uses a dozen of simple hacks specific to lexing
> programming languages.  They're ad-hoc and they're limited to exactly
> what TextPad authors thought were important.

Today the difference isn't much important because CPUs are fast. But on Windows 
with a Pentium3 Scintilla was very slow. TextPad was fast enough even for very 
quick fingers. (TextPad may even contain some parts coded in assembly). TextPad 
on Windows is very fast :-)

Bye,
bearophile


Re: Notepad++

2009-08-15 Thread Sergey Gromov
Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:

> Sergey Gromov wrote:
>> 
>> "foo
>> bar"
> 
> So there is a problem if the highlighter works by matching regexps on a 
> line-by-line basis.  But matching regexps over a whole file is no harder 
> in principle than matching line-by-line and, when the maximal munch 
> principle is never called to action, it can't be much less efficient. 
> (The only bit of C or D strings that relies on maximal munch is octal 
> escapes.)

Highlighting the whole file every time a charater is typed is slow.
Scintilla doesn't do that.  It provides the lexer with a range of
changed lines.  The lexer is then free to choose a larger range if it
cannot deduce context from the initial range.  I tried to ignore this
range and re-highlight the whole file in my lexer.  The performance was
unacceptable.

>> Then you want to highlight string escapes and probably format
>> specifiers.  Therefore you need not simple regexps but hierarchies of
>> them, and also you need to know where *internals* of the string start
>> and end.
> 
> Let's just concentrate for the moment on the simple process of finding 
> the beginning and end of a string.  Here's a snippet of a TextPad syntax 
> file:
> 
> StringsSpanLines = Yes
> StringStart = "
> StringEnd = "
> StringEsc = \
> 
> A possible snippet of lexer code to handle this (which FAIK might be 
> [...]

Sure, TextPad uses a dozen of simple hacks specific to lexing
programming languages.  They're ad-hoc and they're limited to exactly
what TextPad authors thought were important.

Regexps is a different approach.  They are more generic but are limited,
too, because they're slow and don't nest naturally.  Slow means they
must try to re-color as little lines as possible.  Not nestable means
you need to invent some framework around regexps which is another sort
of description language.  If you implement the former naively and ignore
the latter you'll get what presumably N++ has: not a very powerful
system.

It's actually trivial* to implement a lexer for Scintilla which would
work exactly as TextPad does, including use of the same configuration
files.

* That is, if you know exactly how TextPad works.

>> And these are only strings.  Try to write regexp which treats .__15 as
>> number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
>> number(2), operator(..), number(3).
> 
> 
> We'd need many regexps to handle all possible cases, but a possible set 
> to cover these cases and a few others (listed in a possible order of 
> priority) is:
> 
> \._*[0-9][0-9_]*
> ([1-9][0-9]*)(\.\.)
> [0-9]+\.[0-9]*
> [1-9][0-9]*
> \.\.
> \.
> [a-zA-Z_][a-zA-Z0-9_]*

Basically yes, but they're going to be much more complex.  3Lu...5 is
also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
still, regexps don't nest.  Don't you want to highlight DDoc sections
and macros?


Re: Notepad++

2009-08-14 Thread Stewart Gordon

Stewart Gordon wrote:

TextPad's syntax highlighting engine manages 2/3 of this without any 
regexps (or anything to that effect).  That said, I've just found that 
it can do a little bit of what remains: I can make it do `...` but not 
r"..." at the expense of distinguishing string and character literals.



For the record, what I'd done is

StringStart = "
StringEnd = "
StringAlt = '
StringEsc = \
CharStart = `
CharEnd = `
CharEsc =

however, I've just found a bigger problem: only string literals, not 
char literals, can span lines in TP.


Stewart.


Re: Notepad++

2009-08-14 Thread Stewart Gordon

Sergey Gromov wrote:


Well, you can write a regexp to handle a simple C string.  That is, if
your regexp is matched against the whole file, which is usually not the
case.  Otherwise you'll have troubles with C string:

"foo\
bar"

or D string:

"foo
bar"


So there is a problem if the highlighter works by matching regexps on a 
line-by-line basis.  But matching regexps over a whole file is no harder 
in principle than matching line-by-line and, when the maximal munch 
principle is never called to action, it can't be much less efficient. 
(The only bit of C or D strings that relies on maximal munch is octal 
escapes.)



Then you want to highlight string escapes and probably format
specifiers.  Therefore you need not simple regexps but hierarchies of
them, and also you need to know where *internals* of the string start
and end.


Let's just concentrate for the moment on the simple process of finding 
the beginning and end of a string.  Here's a snippet of a TextPad syntax 
file:


StringsSpanLines = Yes
StringStart = "
StringEnd = "
StringEsc = \

A possible snippet of lexer code to handle this (which FAIK might be 
near enough how TP does it):


if (*c == StringStart) {
beginHighlightString(c);
for (++c; *c != StringEnd && *c != '\0'
  &&(StringsSpanLines || *c != '\n'); ++c) {
if (*c == StringEsc) ++c;
}
endHighlightString(c+1);
}

It's simple and it should work.  (OK, there are two assumptions made for 
simplicity: that line breaks are normalised to LF, and that the file is 
terminated by at least two null bytes in memory, but you get the idea.)


While it doesn't support highlighting of escapes, I can't see this fact 
as being the reason N++'s developers haven't implemented even this in 
the generic lexer module.  I probably couldn't see it being the reason 
even if the C lexer did highlight escapes (which it doesn't).



Then you have r"foo" which probably can be handled with regexps.

Then you have q"/foo/" where "/" can be anything.  Still can be handled
by extended regexps, even though they won't be regular expressions in
scientific sense.

Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
Regexps cannot translate while substituting, so you must create regexps
for all possible parens.


Yes, these aspects are more complicated.  Both TP and N++ (out of the 
box, anyway) are probably far from being able to lex D2 properly.  But 
they certainly could do better in supporting D1.  Still, once N++ gains 
access to Scintilla's D lexer, things will certainly be better.



And of course q"BLAH
whatever BLAH here
BLAH", well, probably nice for help texts.

And these are only strings.  Try to write regexp which treats .__15 as
number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
number(2), operator(..), number(3).



We'd need many regexps to handle all possible cases, but a possible set 
to cover these cases and a few others (listed in a possible order of 
priority) is:


\._*[0-9][0-9_]*
([1-9][0-9]*)(\.\.)
[0-9]+\.[0-9]*
[1-9][0-9]*
\.\.
\.
[a-zA-Z_][a-zA-Z0-9_]*

Note the use of capturing groups to handle the 2..3 case.  Each 
capturing group would match a token, while in the other cases the whole 
regexp matches a token.


Stewart.


Re: Notepad++

2009-08-14 Thread Sergey Gromov
Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:

> Sergey Gromov wrote:
>> Well I think it's hard to create a regular expression engine flexible
>> enough to allow arbitrary highlighting.
> 
> I can't see how it can be at all complicated to find the beginning and 
> end of a C string or character literal.
> 
> This (Posix?) regexp
> 
> "(\\.|[^\\"])*"
> 
> works as I try (though not in the tiny subset of Posix regexps that N++ 
> understands).  But that's an aside - you don't need regexps at all to 
> get it working at this basic level, only a rudimentary concept of escape 
> sequences.
> 
>> I think the best such engine
>> I've seen was Colorer by Igor Russkih, and even there I wasn't able to
>> express D's WYSIWYG or delimited strings.  You need a real programming
>> language for that.
> 
> For WYSIWYG strings, all that's needed is a generic highlighter that 
> supports:
> - the aforementioned string escapes
> - multiple types of string literals distinguished by whether they 
> support string escapes, and not just delimiters
> 
> TextPad's syntax highlighting engine manages 2/3 of this without any 
> regexps (or anything to that effect).  That said, I've just found that 
> it can do a little bit of what remains: I can make it do `...` but not 
> r"..." at the expense of distinguishing string and character literals.
> 
> But token-delimited strings are indeed more complex to deal with.  (How 
> many people do we have putting them to practical use at the moment, for 
> that matter?)

Well, you can write a regexp to handle a simple C string.  That is, if
your regexp is matched against the whole file, which is usually not the
case.  Otherwise you'll have troubles with C string:

"foo\
bar"

or D string:

"foo
bar"

Then you want to highlight string escapes and probably format
specifiers.  Therefore you need not simple regexps but hierarchies of
them, and also you need to know where *internals* of the string start
and end.

Then you have r"foo" which probably can be handled with regexps.

Then you have q"/foo/" where "/" can be anything.  Still can be handled
by extended regexps, even though they won't be regular expressions in
scientific sense.

Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
Regexps cannot translate while substituting, so you must create regexps
for all possible parens.

And of course q"BLAH
whatever BLAH here
BLAH", well, probably nice for help texts.

And these are only strings.  Try to write regexp which treats .__15 as
number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
number(2), operator(..), number(3).

> Scintilla's definition of a plugin is confusing - normally plugins are 
> things that can be dynamically loaded at runtime, rather than having to 
> compile them in.  If only

I'm not sure they call them "plugins".  They're lexer modules made so
that lexer is relatively easily extendable.


Re: Notepad++

2009-08-14 Thread Kagamin
Nick Sabalausky Wrote:

> > I don't see how the lexer is being chosen.
> > Programmer's Notepad does it correctly.
> 
> I use Programmer's Notepad. It's good, but it still has some problems:
> 
> http://code.google.com/p/pnotepad/issues/detail?id=480 (Proper Highlighting 
> for D's Wysiwyg Strings)
> http://code.google.com/p/pnotepad/issues/detail?id=481 (In D, strings with 
> embedded newlines are not highlighted correctly)
> http://code.google.com/p/pnotepad/issues/detail?id=482 (Support for D's 
> nested comments)

At least PN chooses lexer. That's what I meant.

These issues do not pertain to PN. They're RFEs for Scintilla D lexer and as I 
said they were fixed in version 1.79. PN developer just plans to upgrade to new 
Scintilla in PN 3, in fact I compiled scintilla 1.78 with recent D lexer an it 
works fine. BTW bug 482 is invalid, support for nested comments was there from 
the start, make sure you don't use C lexer.


Re: Notepad++

2009-08-13 Thread Nick Sabalausky
"Kagamin"  wrote in message 
news:h60euh$2sg...@digitalmars.com...
> Stewart Gordon Wrote:
>
>> Anyway, attached is the result.  Can anybody do better (other than by
>> telling it to treat D as C or some other language instead)?
>
> I don't see how the lexer is being chosen.
> Programmer's Notepad does it correctly.

I use Programmer's Notepad. It's good, but it still has some problems:

http://code.google.com/p/pnotepad/issues/detail?id=480 (Proper Highlighting 
for D's Wysiwyg Strings)
http://code.google.com/p/pnotepad/issues/detail?id=481 (In D, strings with 
embedded newlines are not highlighted correctly)
http://code.google.com/p/pnotepad/issues/detail?id=482 (Support for D's 
nested comments)




Re: Notepad++

2009-08-13 Thread Stewart Gordon

Sergey Gromov wrote:

Thu, 13 Aug 2009 01:40:47 +0100, Stewart Gordon wrote:


It puzzles me that they didn't make this plugin powerful enough to 
highlight the language it (and indeed the whole of Notepad++) is written 
in.  Even more so considering the sheer number of C-like languages out 
there, which people are likely to want to use N++ to write.


Well I think it's hard to create a regular expression engine flexible
enough to allow arbitrary highlighting.


I can't see how it can be at all complicated to find the beginning and 
end of a C string or character literal.


This (Posix?) regexp

"(\\.|[^\\"])*"

works as I try (though not in the tiny subset of Posix regexps that N++ 
understands).  But that's an aside - you don't need regexps at all to 
get it working at this basic level, only a rudimentary concept of escape 
sequences.



I think the best such engine
I've seen was Colorer by Igor Russkih, and even there I wasn't able to
express D's WYSIWYG or delimited strings.  You need a real programming
language for that.


For WYSIWYG strings, all that's needed is a generic highlighter that 
supports:

- the aforementioned string escapes
- multiple types of string literals distinguished by whether they 
support string escapes, and not just delimiters


TextPad's syntax highlighting engine manages 2/3 of this without any 
regexps (or anything to that effect).  That said, I've just found that 
it can do a little bit of what remains: I can make it do `...` but not 
r"..." at the expense of distinguishing string and character literals.


But token-delimited strings are indeed more complex to deal with.  (How 
many people do we have putting them to practical use at the moment, for 
that matter?)



---

I've just had a look at Notepad++ sources.  The Scintilla they use
contains Scintilla's built-in D lexer.  I think it's just not
configured.


Sounds as though N++'s developers overlooked to keep the configuration 
files up to date as new languages have been added to Scintilla.



SciTE uses *.properties files to configure stuff.
Notepad++ uses XML files for the same purpose.  I think it's all in
langs.model.xml.  My current idea is to take d.properties from the
corresponding release of SciTE and try to translate it into the
langs.model.xml format.  I'll probably try it later when I have time.

Of course it would be nice to replace the original D lexer with mine.
Or, even better, to ask Scintilla developers to include my lexer into
the official bundle.  May be worth a try.


You have two good plans there.

Scintilla's definition of a plugin is confusing - normally plugins are 
things that can be dynamically loaded at runtime, rather than having to 
compile them in.  If only


Stewart.


Re: Notepad++

2009-08-13 Thread Kagamin
Sergey Gromov Wrote:

> Or, even better, to ask Scintilla developers to include my lexer into
> the official bundle.  May be worth a try.

Uh... that's not an option.


Re: Notepad++

2009-08-13 Thread Kagamin
Stewart Gordon Wrote:

> For the record, there's a SciLexer.dll in my Notepad++ dir, but no 
> d.properties to be found.  The SciLexer.dll reports itself as file 
> version 1.7.8.0, product version 1.78.  So maybe the question is of what 
> effect replacing it with a fork of version 1.76 would have.  (Do SciTE 
> versions correspond directly to Scintilla versions?)

Wrong lexer is used here. Scintilla builtin d lexer supported nested comments 
and escape sequences from version 1.72, but support for multiline strings was 
added in version 1.79.


Re: Notepad++

2009-08-13 Thread Kagamin
Stewart Gordon Wrote:

> Anyway, attached is the result.  Can anybody do better (other than by 
> telling it to treat D as C or some other language instead)?

I don't see how the lexer is being chosen.
Programmer's Notepad does it correctly.


Re: Notepad++

2009-08-12 Thread Sergey Gromov
Wed, 12 Aug 2009 21:35:02 -0500, Andrei Alexandrescu wrote:

> Sergey Gromov wrote:
>> 2.  Lexers are written in C++ and interface with the rest of Scintilla
>> via C++ classes.  Therefore if a field is added or removed anywhere, or
>> if you use a different compiler to build your DLL than that used to
>> build Scintilla, you'll get GPF, or worse.
> 
> If they use binary interfacing with virtual functions a la COM's
> binary standard, then field presence shouldn't matter.

They don't, unfortunately.  Every lexer defines a static instance of a
LexerModule class.  The coloring function receives a reference to an
Accessor class.  They're full-blown classes, with fields and stuff.

> Also, most compilers on Windows respect the basic ABI. No?

Even though they don't use inheritance, and therefore most compilers
will likely build identical data layouts for them, there is still zero
compatibility between different versions of those classes.


Re: Notepad++

2009-08-12 Thread Andrei Alexandrescu

Sergey Gromov wrote:

2.  Lexers are written in C++ and interface with the rest of Scintilla
via C++ classes.  Therefore if a field is added or removed anywhere, or
if you use a different compiler to build your DLL than that used to
build Scintilla, you'll get GPF, or worse.


If they use binary interfacing with virtual functions a la COM's binary 
standard, then field presence shouldn't matter. Also, most compilers on 
Windows respect the basic ABI. No?


Andrei


Re: Notepad++

2009-08-12 Thread Sergey Gromov
Thu, 13 Aug 2009 01:40:47 +0100, Stewart Gordon wrote:

> Sergey Gromov wrote:
>> Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:
> 
>> Scintilla uses plugins to highlight source.  These plugins are written
>> in C++ and have almost full access to the buffer so the highlighter code
>> may be arbitrarily complex.  I actually wrote such a plugin to highlight
>> D a while back:
>> 
>> http://dsource.org/projects/scrapple/browser/trunk/scilexer
> 
> "1.  If you have SciTE 1.76 for Windows installed simply replace
> SciLexer.dll and d.properties with the supplied files.
> 
> 2.  If you wish to build Scintilla from source:"
> 
> Can it be used in Scintilla-based editors besides SciTE short of 
> acquiring the whole Scintilla source and rebuilding it?

There are two problems at least:

1.  SciLexer.dll contains *all* of the built-in lexer modules.
Replacing your DLL with another version will remove any extra lexers
which 3rd party put there, like an XML-configurable lexer in case of
Notepad++.

2.  Lexers are written in C++ and interface with the rest of Scintilla
via C++ classes.  Therefore if a field is added or removed anywhere, or
if you use a different compiler to build your DLL than that used to
build Scintilla, you'll get GPF, or worse.

Good news is that Notepad++ is on SourceForge so that the "from source"
way is at least possible.

> For the record, there's a SciLexer.dll in my Notepad++ dir, but no 
> d.properties to be found.  The SciLexer.dll reports itself as file 
> version 1.7.8.0, product version 1.78.  So maybe the question is of what 
> effect replacing it with a fork of version 1.76 would have.  (Do SciTE 
> versions correspond directly to Scintilla versions?)

Yes, SciTE versions seem to be in sync with Scintilla versions.

>> It seems like Notepad++ developers added their own highlighter plugin
>> which takes userDefineLang.xml as its configuration.  Such a
>> configurable plugin is presumably much less flexible than pure C++
>> implementation for a particular language.  It's very likely that PHP
>> highlighter is written in C++ and comes bundled with Scintilla.
> 
> It puzzles me that they didn't make this plugin powerful enough to 
> highlight the language it (and indeed the whole of Notepad++) is written 
> in.  Even more so considering the sheer number of C-like languages out 
> there, which people are likely to want to use N++ to write.

Well I think it's hard to create a regular expression engine flexible
enough to allow arbitrary highlighting.  I think the best such engine
I've seen was Colorer by Igor Russkih, and even there I wasn't able to
express D's WYSIWYG or delimited strings.  You need a real programming
language for that.

---

I've just had a look at Notepad++ sources.  The Scintilla they use
contains Scintilla's built-in D lexer.  I think it's just not
configured.  SciTE uses *.properties files to configure stuff.
Notepad++ uses XML files for the same purpose.  I think it's all in
langs.model.xml.  My current idea is to take d.properties from the
corresponding release of SciTE and try to translate it into the
langs.model.xml format.  I'll probably try it later when I have time.

Of course it would be nice to replace the original D lexer with mine.
Or, even better, to ask Scintilla developers to include my lexer into
the official bundle.  May be worth a try.


Re: Notepad++

2009-08-12 Thread Stewart Gordon

Sergey Gromov wrote:

Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:



Scintilla uses plugins to highlight source.  These plugins are written
in C++ and have almost full access to the buffer so the highlighter code
may be arbitrarily complex.  I actually wrote such a plugin to highlight
D a while back:

http://dsource.org/projects/scrapple/browser/trunk/scilexer


"1.  If you have SciTE 1.76 for Windows installed simply replace
SciLexer.dll and d.properties with the supplied files.

2.  If you wish to build Scintilla from source:"

Can it be used in Scintilla-based editors besides SciTE short of 
acquiring the whole Scintilla source and rebuilding it?


For the record, there's a SciLexer.dll in my Notepad++ dir, but no 
d.properties to be found.  The SciLexer.dll reports itself as file 
version 1.7.8.0, product version 1.78.  So maybe the question is of what 
effect replacing it with a fork of version 1.76 would have.  (Do SciTE 
versions correspond directly to Scintilla versions?)



It seems like Notepad++ developers added their own highlighter plugin
which takes userDefineLang.xml as its configuration.  Such a
configurable plugin is presumably much less flexible than pure C++
implementation for a particular language.  It's very likely that PHP
highlighter is written in C++ and comes bundled with Scintilla.


It puzzles me that they didn't make this plugin powerful enough to 
highlight the language it (and indeed the whole of Notepad++) is written 
in.  Even more so considering the sheer number of C-like languages out 
there, which people are likely to want to use N++ to write.


Stewart.


Re: Notepad++

2009-08-12 Thread Jussi Jumppanen
Stewart Gordon Wrote:

> Or maybe I should just go back to TextPad (which isn't perfect 
> either) and put up with its not supporting Unicode

FWIW Zeus is very similar to TextPad in feature set and the latest 
version also adds support for Unicode/UTF8.

   http://www.zeusedit.com/

It will do D syntax highlighting and code folding out of the box.

It also comes with a version of ctags.exe made with these 
changes specifically for the D languages:

   http://www.zeusedit.com/z300/ctags_src.zip

meaning it can produce tags infomation for your D source files.

NOTE: Zeus like TextPad is shareware.

Jussi Jumppanen
Author: Zeus for Windows



Re: Notepad++

2009-08-12 Thread Sergey Gromov
Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:

> What's the best anybody's managed to get Notepad++ to syntax-highlight 
> D?  (I'm on version 5.4.5, if that makes a difference.)
> 
> My userDefineLang.xml file is as given here
> http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport/NotepadPlus
> (note that I've fixed a few errors I've no idea how got there).
> 
> Notepad++ does a good job of syntax-highlighting PHP files, whose 
> syntactic structure is more complex than that of D.  So clearly, 
> Notepad++ is a powerful syntax-highlighter (or Scintilla is, whatever). 
>   However, at the moment I can't even seem to get it up to C standard! 
> (Can anybody find a full reference of the userDefineLang.xml format, for 
> that matter?)

Scintilla uses plugins to highlight source.  These plugins are written
in C++ and have almost full access to the buffer so the highlighter code
may be arbitrarily complex.  I actually wrote such a plugin to highlight
D a while back:

http://dsource.org/projects/scrapple/browser/trunk/scilexer

It seems like Notepad++ developers added their own highlighter plugin
which takes userDefineLang.xml as its configuration.  Such a
configurable plugin is presumably much less flexible than pure C++
implementation for a particular language.  It's very likely that PHP
highlighter is written in C++ and comes bundled with Scintilla.


Notepad++

2009-08-12 Thread Stewart Gordon
What's the best anybody's managed to get Notepad++ to syntax-highlight 
D?  (I'm on version 5.4.5, if that makes a difference.)


My userDefineLang.xml file is as given here
http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport/NotepadPlus
(note that I've fixed a few errors I've no idea how got there).

Notepad++ does a good job of syntax-highlighting PHP files, whose 
syntactic structure is more complex than that of D.  So clearly, 
Notepad++ is a powerful syntax-highlighter (or Scintilla is, whatever). 
 However, at the moment I can't even seem to get it up to C standard! 
(Can anybody find a full reference of the userDefineLang.xml format, for 
that matter?)


Maybe it's just a case in point of some comments here:
http://d.puremagic.com/issues/show_bug.cgi?id=3193

Anyway, attached is the result.  Can anybody do better (other than by 
telling it to treat D as C or some other language instead)?


Or maybe I should just go back to TextPad (which isn't perfect either) 
and put up with its not supporting Unicode


Stewart.
<>