[issue26084] HTMLParser mishandles last attribute in self-closing tag

Tom Anderl Mon, 11 Jan 2016 12:49:05 -0800

New submission from Tom Anderl:

When the HTMLParser encounters a start tag element that includes:
  1. an unquoted attribute as the final attribute 
  2. an optional '/' character marking the start tag as self-closing
  3. no space between the final attribute and the '/' character


the '/' character gets attached to the attribute value and the element is 
interpreted as not self-closing.  This can be illustrated with the following:

===============================================================================

import HTMLParser

# Begin Monkeypatch
#import re
#HTMLParser.attrfind = re.compile(
#    r'((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*'
#    r'(\'[^\']*\'|"[^"]*"|(?![\'"])[^/>\s]*))?(?:\s|/(?!>))*')
# End Monkeypatch

class MyHTMLParser(HTMLParser.HTMLParser):
    def handle_starttag(self, tag, attrs):
        print('got starttag: {0} with attributes {1}'.format(tag, attrs))

    def handle_endtag(self, tag):
        print('got endtag: {0}'.format(tag))

MyHTMLParser().feed('<img height=1.0 width=2.0/>')

==============================================================================

Running the above code yields the output:

    got starttag: img with attributes [('height', '1.0'), ('width', '2.0/')]

Note the trailing '/' on the 'width' attribute.  If I uncomment the monkey 
patch, the script then yields:

    got starttag: img with attributes [('height', '1.0'), ('width', '2.0')]
    got endtag: img

Note that the trailing '/' is gone, and an endtag event was generated.

----------
components: Library (Lib)
messages: 258013
nosy: Tom Anderl
priority: normal
severity: normal
status: open
title: HTMLParser mishandles last attribute in self-closing tag
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue26084] HTMLParser mishandles last attribute in self-closing tag

Reply via email to