Re: Why is mc since 4.6 shiped with a striped doen html.syntax?

Peter A. Kerzum Sat, 26 May 2007 13:38:21 -0700

Hi!

Let me introduce my humble opinion =)
I use mc for several years; and during last several months I got deep into the 
jungle of modern HTML. What you told about HTML seems wrong to me, so I felt 
an urge to join your discussion (please see below)

> On Wed, 2007-05-16 at 16:44 +0200, Michelle Konzack wrote:
> > while I am using mc since 8 years and it was working fine for editing
> > HTML files, I like to kown, WHY mc since 4.6 is shiped with a striped
> > down html-syntax.

> The reason for the removal of the uppercase tags was because the new
> XHTML standard required lowercase tags.  

Well, XHTML is a thing of order, yet plain old HTML is still generally used.

> Since mc still had no support 
> for case-insensitive match, I decided that mc should do the highlighting
> based on the syntax alone without checking the spelling of the tags.

Yes, simplicity is great! Less function is much better than incorrect 
behaviour. But why not use XML hiliter then? It renders nice and it has been 
supporting single-quoted attr values for a long time.

> Restoring uppercase HTML tags today would be ridiculous.  

yes

> Maybe using 
> lowercase tags would the better, 

no

> but I would prefer a fix that would 
> allow case-insensitive match in the syntax rules.

Yes, I believe this is the only way to get back to names; both tag and attr 
names should be insensitive.
Well, lets postpone tag/attr names for future mc - the simple way, you 
mentioned earlier is
> based on the syntax alone without checking the spelling of the tags.

I prefer to call it 'lexical layer' (leaving 'syntax' to basic 
interpretation). 

So, the real-life HTML is in fact something very different from what is said 
in W3C specs. Actually it is influenced by major browser vendors; all of them 
now process broken HTML constructs in quite the same way. It will be 
consistent for us to support the same model in mc.
Well, I have never seen a 100% correct html hiliter; that's impossible withing 
mc's hiliting framework. But we can cover a number of frequent cases with 
little effort.

Let me provide quite raw html syntax file. It is based on yours, but supports 
a number of cases:

- tag is started with the sequence '<[[:alpha:]]' , not '<'
- strings can be quoted with backstrict also
- quoted string (inside tag) can only happen after space or '=', otherwise 
it's just a meaningless quote, not a string

This hiliter is quite raw, I actually failed to correctly hilite '>' and '=' 
in some cases. There are HTML issues not addressed at all.
I really don't understand this hiliting technology very well, I just like mc 
and know a bit of HTML.. I wish you will consider this aproach useful, and 
hack this file inspired by my ideas =)

--
Peter A. Kerzum

context default
        keyword &\[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\]; 
brightgreen
        keyword &#\{xX\}\[0123456789abcdefABCDEF\]; brightgreen
        keyword &#\[0123456789\]; brightgreen
        spellcheck

context <!-- --> brown
        spellcheck

context <! > brightred/orange
        spellcheck

context <\/ > brightcyan
        keyword /\[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\] white

context <\{abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\} > brightcyan
        keyword wholeleft 
<\[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\] white

        keyword \{\s\t\=\}'*' brightgreen
        keyword \{\s\t\=\}"*" green
        keyword \{\s\t\=\}`*` green

        wholechars [EMAIL PROTECTED]&*()+|\{}[]:;,.?    
        keyword wholeright 
\{\s\t\=\}\{abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\}\[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\]
 yellow

_______________________________________________
Mc mailing list
http://mail.gnome.org/mailman/listinfo/mc

Re: Why is mc since 4.6 shiped with a striped doen html.syntax?

Reply via email to