Rev 5334 is a first draft of a fix of bug 879338:
Global tables in leoApp.py should describe all languages known to the
colorizer
https://bugs.launchpad.net/leo-editor/+bug/879338
I believe the new code is safe, and all unit tests pass, but this is
one bug for which the phrase "you have to break some eggs to make an
omelet" applies. Please report any problems immediately.
The essence of the bug fix is that Leo's language-description tables
should contain entries for all .py files in the leo/modes folder.
These files control the colorizer. If Leo's colorizer knows about a
language, then Leo should know as much as possible about the language.
In concept, this is a fairly straightforward process, but there were
*many* details to handle.
If you aren't a Leo developer, you might want to stop reading now...
===== Tables
Fixing this bug required non-trivial changes to the following tables::
g.app.language_delims_dict
# Keys are languages, values are tuples of delims.
g.app.language_extension_dict
# Keys are language names, values are extensions.
g.app.self.extension_dict
# Keys are extensions, values are language names.
I used scripts to generate new entries for these tables, but these
scripts could not possibly deal with the all the complications. There
is a unit test that tests the consistency of these tables, and this
test failed a few times. It now passes.
Leo uses these tables as follows:
1. To generate the comment delimiters in sentinels for each language.
Happily, getting the comment delimiters correct was probably the
easiest part, so Leo should continue to write sentinels properly for
previously-know languages. However, I had to take care to preserve
the REM, CWEB, forth and perlpod hacks, so that comment delims would
include the necessary spaces.
2. To associate file extensions with importers.
Knowing about new file extensions doesn't actually allow Leo to import
any new languages. For all languages without an official importer Leo
will simply copy the entire text of the file into a single node, as it
always has.
3. To colorize code.
Leo's colorizer mostly doesn't use these tables: to colorize language
x, the colorizer looks for the file leo/modes/x.py. Thus, these
changes probably do not affect the colorizer at all.
===== Special cases
I did a lot of googling in order to determine the proper file
extensions to use for various language. In the process, I learned
that *almost* all languages described in the leo/modes folder are
real, interesting and useful languages.
However, there at least 5 categories of special cases that affect the
tables:
1. Languages that are really just colorizer modes:
These include embperl, pseudoplain and phpsection. We need leo/modes
files for these, but they aren't real languages and thus they should
not appear in the language-description tables.
2. Things that might be colorized but aren't real languages.
Afaik, the following are not real languages, and Leo would never have
to generate files in these languages::
cvs_commit
dsssl
relax_ng_compact: An xml schema.
rtf
svn_commit
In particular, the rtf colorizer is *not* a colorizer for binary .rtf
file format, is a colorizer for .rtf sources. It probably won't do
too much harm to retain the colorizer data for these languages, but I
wouldn't mind eliminating them either.
3. Unknown languages.
A few languages seem not really to exist:
freemarker
hex
jcl
moin
progress
props
sas
I'll consider retaining the mode files for these languages only if
somebody can explain what these languages are.
4. Languages without real comment delimiters.
Patch annotations are *not* real comment delimiters, so Leo could not
generate patch (.fix or .patch) files from an outline. Happily, there
is no need to do so.
5. Conflicting file extensions.
There are two separate kinds of problems:
A. Leo contains colorizers for several assembly languages. Typically,
assembly languages have .asm or .a file extensions. However, a
particular extension can only be associated with a single language
name. Thus, Leo has no way of knowing what language to associate
with .asm or .a files. So I just punted and didn't make any
association at all.
B. Both the rebol and r languages use the .r file extension. One of
Leo's users previously created an entry for rebol, so that's the
language that takes precedence.
So that's it. If you know more about any of these special cases I'd
like to hear about it.
Edward
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/leo-editor?hl=en.