[scintilla] Recent updates to the Perl lexer

KHMan Sun, 29 Jul 2007 22:28:10 -0700

[my first attempt at posting this was lost, so I apologize if anyone received a 
duplicate; I checked the scintilla list archives, no sign of the posting]


Hi all,

A number of updates have been made recently to the Perl lexer. The changes are 
in CVS.

(a) Due to file test operator parsing, a minus-prefixed bareword can include 
the minus in the bareword. The minus causes problems when disambiguating 
between barewords and quote-like delimiters, because of backward scanning used 
by LexPerl. The fix marks minus-prefixed barewords as a single unit if the 
previous significant element is a keyword or an operator. This mostly fixes 
cases where something like '-x' or '-y' is used as a hash key; can still be 
improved.

(b) Allows permissive underscoring in number literals and vector/version 
strings. Underscores can now be inserted into a number literal in a pretty 
generic way. LexPerl now correctly lexes such cases, except for some weird 
corner cases. Of course, most people would not write literals this way, but 
actual Perl lexing seems pretty liberal in this area.

(c) Added handling of ^D and ^Z as indicators of the logical end of Perl code, 
acts like __END__ or __DATA__.

(d) Support for subroutine prototypes in LexPerl. Added as style 40. SciTE 
perl.properties is also updated. Styles bits now 8.

(e) Basic support for formats, or format blocks. Added styles 41 and 42 -- one 
for the identifier up to the equal sign, and the other for the format body. 
SciTE perl.properties is also updated. No syntax is recognized within the 
format body, only the ending '.' which needs to be alone on a line of its own. 
This implementation requires the "format <id> =" elements to be in one line, 
but I don't think this would be a problem.

(f) Disambiguation of a bareword or keyword followed by a '/'. I've taken a 
close look at the behaviour of vim 7.0, and I've adopted the rules (at least, 
rules based on observation of the highlighting behaviour), which extends 
current LexPerl behaviour. This eliminates all '/' failure cases in my 
collection based on actual source code, leaving only artificial failure cases.

In updating the Perl lexer, I also keep track of some Perl test cases, gleaned 
from the documentation and from testing code snippets using recent versions of 
Cygwin Perl. The URLs of the latest test files I use is below, if anyone wants 
to check or verify anything. Please see the sections marked 20070713, 20070714 
and 20070715.

General tests: http://www.geocities.com/keinhong/scite/perl-test-cases.pl
UTF-8 tests: http://www.geocities.com/keinhong/scite/perl-test-cases-utf-8.pl

That's all from me for LexPerl for the time being. I don't think there is 
anything major left unsupported, only some spotty areas here and there, some 
corner cases, and possible enhancements. Uncovered two minor failure cases 
recently, so I do think some of the lexing can be done better, so if anyone 
wants to hack on it, be my guest.

Enjoy,
-- 
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest

[scintilla] Recent updates to the Perl lexer

Reply via email to