Feature Requests item #513840, was opened at 2002-02-06 12:55
Message generated for change (Comment added) made by fdrake
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=513840&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: entity unescape for sgml/htmllib

Initial Comment:
The parsers defined in htmllib and sgmllib do not 
provide any facilities for unescaping a tag attribute 
which has an embedded html entityref (i.e., they do 
not provide a way to convert "a&b" to "a&b").  
The 
parser in HTMLParser unescapes all tag attributes 
automatically.  I'm not sure that's the right approach 
for sgmllib and htmllib (since it might break existing 
code), but it seems to me that one of the modules 
ought to provide a function or method which can do the 
unescaping if needed.  (I'm not familiar with either 
the SGML or the HTML specification, but I assume one 
of them mandates the escaping of '&' (e.g.) in tag 
attributes.  If so, then it seems appropriate for one 
of the modules to provide a function which undoes the 
mandated transformation.)


----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-06-21 23:57

Message:
Logged In: YES 
user_id=3066

This request is making me reconsider some other changes that
have already been made on the trunk (and are now in 2.5b1).

Reading this, I thought "Doesn't it already do that?"  Turns
out that in Python 2.4, it doesn't.  Both versions handle
this in parsed character data; the difference is confined to
attribute values.

I'd like to propose adding a Boolean configuration attribute
on the parser instance that, when set, causes the parser to
decode entity and character references.  By default, it
would be unset.  This would support backward compatibility
and make it easier to get attribute value decoding.

Another possibility would be to revert the new feature and
add a separate method to perform the decoding.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=513840&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to