[my first attempt at posting this was lost, so I apologize if anyone received a duplicate; I checked the scintilla list archives, no sign of the posting]
Hi all, A number of updates have been made recently to the Perl lexer. The changes are in CVS. (a) Due to file test operator parsing, a minus-prefixed bareword can include the minus in the bareword. The minus causes problems when disambiguating between barewords and quote-like delimiters, because of backward scanning used by LexPerl. The fix marks minus-prefixed barewords as a single unit if the previous significant element is a keyword or an operator. This mostly fixes cases where something like '-x' or '-y' is used as a hash key; can still be improved. (b) Allows permissive underscoring in number literals and vector/version strings. Underscores can now be inserted into a number literal in a pretty generic way. LexPerl now correctly lexes such cases, except for some weird corner cases. Of course, most people would not write literals this way, but actual Perl lexing seems pretty liberal in this area. (c) Added handling of ^D and ^Z as indicators of the logical end of Perl code, acts like __END__ or __DATA__. (d) Support for subroutine prototypes in LexPerl. Added as style 40. SciTE perl.properties is also updated. Styles bits now 8. (e) Basic support for formats, or format blocks. Added styles 41 and 42 -- one for the identifier up to the equal sign, and the other for the format body. SciTE perl.properties is also updated. No syntax is recognized within the format body, only the ending '.' which needs to be alone on a line of its own. This implementation requires the "format <id> =" elements to be in one line, but I don't think this would be a problem. (f) Disambiguation of a bareword or keyword followed by a '/'. I've taken a close look at the behaviour of vim 7.0, and I've adopted the rules (at least, rules based on observation of the highlighting behaviour), which extends current LexPerl behaviour. This eliminates all '/' failure cases in my collection based on actual source code, leaving only artificial failure cases. In updating the Perl lexer, I also keep track of some Perl test cases, gleaned from the documentation and from testing code snippets using recent versions of Cygwin Perl. The URLs of the latest test files I use is below, if anyone wants to check or verify anything. Please see the sections marked 20070713, 20070714 and 20070715. General tests: http://www.geocities.com/keinhong/scite/perl-test-cases.pl UTF-8 tests: http://www.geocities.com/keinhong/scite/perl-test-cases-utf-8.pl That's all from me for LexPerl for the time being. I don't think there is anything major left unsupported, only some spotty areas here and there, some corner cases, and possible enhancements. Uncovered two minor failure cases recently, so I do think some of the lexing can be done better, so if anyone wants to hack on it, be my guest. Enjoy, -- Cheers, Kein-Hong Man (esq.) Kuala Lumpur, Malaysia _______________________________________________ Scintilla-interest mailing list [email protected] http://mailman.lyra.org/mailman/listinfo/scintilla-interest
