Re: [whatwg] HTML5 named entity Gt; and Lt;

2012-02-13 Thread Ian Hickson
On Wed, 14 Dec 2011, Mike Samuel wrote:

 The table in section 12.5 (
 http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html
 ) says
  GT;U+0003E
  Gt;U+0226B≫
  gt;U+0003E
  GT U+0003E
  gt U+0003E
 
 which I believe means that GT;, gt;,GT, and gt all encode
  but Gt; encodes U+226B MUCH GREATER-THAN.

Correct.

 
 Similarly
 
  Lt;U+0226A≪

Correct.


 This is a potential source of confusion for naive HTML entity decoders 
 fall-back to case-insensitive matching when there is no mapping for a 
 given entity name.

Such decoders are non-conforming.


 MathML already has other succinct mappings for U+226A (ll;) and U+226B 
 (gg;).  Could HTML5 avoid confusion by deprecating Lt; and Gt; in 
 favor of ll; and gg; or remove them entirely?

The mappings in the HTML standard are actually the MathML mappings. We 
literally use the same database they do to automatically generate the 
mapping in the spec.


On Wed, 14 Dec 2011, Ilhan Y. wrote:

 By the way, can we have Unicode names (HTML names) for Mercury, Sun, 
 Earth and other planets. They are used by many astronomers on the 
 internet.

The named character references used in HTML are just those provided to us 
by the MathML working group, so if you actually want a change here, I 
recommend contacting that group. In general though I doubt we will add 
more names. It's gotten rather out of hand.


On Wed, 14 Dec 2011, Jukka K. Korpela wrote:
 
 After all, there is no rationale given for the inclusion of new “named 
 character references,” so people might see the idea as asking authors 
 to submit new proposals for every possible and impossible character.

The rationale is compatibility with deployed MathML content.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

[whatwg] HTML5 named entity Gt; and Lt;

2011-12-14 Thread Mike Samuel
The table in section 12.5 (
http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html
) says
 GT;U+0003E
 Gt;U+0226B≫
 gt;U+0003E
 GT U+0003E
 gt U+0003E

which I believe means that GT;, gt;,GT, and gt all encode
 but Gt; encodes U+226B MUCH GREATER-THAN.

http://svn.whatwg.org/webapps/entities-unicode.inc includes these but
the entities-legacy.inc does not.

Similarly

 Lt;U+0226A≪

This is a potential source of confusion for naive HTML entity decoders
fall-back to case-insensitive matching when there is no mapping for a
given entity name.

MathML already has other succinct mappings for U+226A (ll;) and
U+226B (gg;).  Could HTML5 avoid confusion by deprecating Lt; and
Gt; in favor of ll; and gg; or remove them entirely?

http://www.google.com/codesearch#search/q=amp;Gt;%20file:.html$%20case:yestype=cs
shows four files using Gt;, 2 of which treat it as synonymous with gt;.


Re: [whatwg] HTML5 named entity Gt; and Lt;

2011-12-14 Thread Ilhan Y.
By the way, can we have Unicode names (HTML names) for Mercury, Sun,
Earth and other planets. They are used by many astronomers on the
internet.


On Wed, Dec 14, 2011 at 7:18 PM, Mike Samuel mikesam...@gmail.com wrote:
 The table in section 12.5 (
 http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html
 ) says
 GT;    U+0003E        
 Gt;    U+0226B        ≫
 gt;    U+0003E        
 GT     U+0003E        
 gt     U+0003E        

 which I believe means that GT;, gt;,GT, and gt all encode
  but Gt; encodes U+226B MUCH GREATER-THAN.

 http://svn.whatwg.org/webapps/entities-unicode.inc includes these but
 the entities-legacy.inc does not.

 Similarly

 Lt;    U+0226A        ≪

 This is a potential source of confusion for naive HTML entity decoders
 fall-back to case-insensitive matching when there is no mapping for a
 given entity name.

 MathML already has other succinct mappings for U+226A (ll;) and
 U+226B (gg;).  Could HTML5 avoid confusion by deprecating Lt; and
 Gt; in favor of ll; and gg; or remove them entirely?

 http://www.google.com/codesearch#search/q=amp;Gt;%20file:.html$%20case:yestype=cs
 shows four files using Gt;, 2 of which treat it as synonymous with gt;.


Re: [whatwg] HTML5 named entity Gt; and Lt;

2011-12-14 Thread Jukka K. Korpela

2011-12-14 19:34, Ilhan Y. wrote:


By the way, can we have Unicode names (HTML names) for Mercury, Sun,
Earth and other planets. They are used by many astronomers on the
internet.


Nice parody! But maybe people won’t take it as parody.

After all, there is no rationale given for the inclusion of new “named 
character references,” so people might see the idea as asking authors to 
submit new proposals for every possible and impossible character.


The whole idea of extending the repertoire is wrong. We have lived with 
a certain set of entity references (now being renamed “named character 
references”), widely supported by browsers, except possibly in XHTML 
mode. Authors who need other characters can enter them as such, using 
UTF-8 (which is being favored, is it not?) or using numeric character 
references.


So nobody really needs any added pseudo-mnemonic “named references,” and 
they just cause incompatibility: pages fail on most browsers, when they 
would work perfectly if other methods of including characters had been used.


Allowing gt and GT and GT; as synonyms for gt; might be pragmatic, 
if there is sufficient evidence of their use on legacy pages, but code 
checkers should issue a warning (there is nothing to be gained by using 
such deviating forms). And adding things like Gt;, with a different 
meaning, is just asking for trouble.


Yucca


Re: [whatwg] HTML5 named entity Gt; and Lt;

2011-12-14 Thread Anne van Kesteren
On Wed, 14 Dec 2011 19:40:04 +0100, Jukka K. Korpela jkorp...@cs.tut.fi  
wrote:
The whole idea of extending the repertoire is wrong. We have lived with  
a certain set of entity references (now being renamed “named character  
references”), widely supported by browsers, except possibly in XHTML  
mode. Authors who need other characters can enter them as such, using  
UTF-8 (which is being favored, is it not?) or using numeric character  
references.


Personally, I like named entities, I use them all the time to get the  
correct Unicode code point (e.g. data:text/html,middot;). That is often  
faster than looking the character up somehow.



--
Anne van Kesteren
http://annevankesteren.nl/