New submission from Daniel Lovell <lovell.danie...@gmail.com>:

html.entities.html5 keys should either require a trailing semicolon. The Python 
docs say:

html.entities.html5
"A dictionary that maps HTML5 named character references [1] to the equivalent 
Unicode character(s), e.g. html5['gt;'] == '>'. Note that the trailing 
semicolon is included in the name (e.g. 'gt;'), however some of the names are 
accepted by the standard even without the semicolon: in this case the name is 
present with and without the ';'. See also html.unescape()."

https://docs.python.org/3/library/html.entities.html?highlight=html

However, it is not clear without looking at the source which keys require the 
semicolon and which do not. Taking a look at the source, the number which 
require a trailing semicolon vastly outnumber the others.

For simplicity and continuity with the w3.org standard HTML5 Character Entity 
Reference Chart - I recommend that the trailing semicolon be required. As they 
are in HTML5: https://dev.w3.org/html5/html-author/charref

My recommendation could then be extrapolated to say we should require the 
ampersand as HTML5 does, but I don't think this revision should be taken this 
far unless others agree.

----------
components: Library (Lib)
messages: 329105
nosy: daniellovell
priority: normal
severity: normal
status: open
title: html.entities.html5 should require a trailing semicolon
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35142>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to