New submission from Alessandro Vesely:

SYMPTOM:
When used in a multithreaded program, instances of a class derived from 
HTMLParser may convert an entity or leave it alone, in an apparently random 
fashion.

CAUSE:
The class has a static attribute, entitydefs, which, on first use, is 
initialized from None to a dictionary of entity definitions.  Initialization is 
not atomic.  Therefore, instances in concurrent threads assume that 
initialization is complete and catch a KeyError if the entity at hand hasn't 
been set yet.  In that case, the entity is left alone as if it were invalid.

WORKAROUND:
class Dummy(HTMLParser):
        """this class is defined here so that we can initialize its base 
class"""
        def __init__(self):
                HTMLParser.__init__(self)

# Initialize HTMLParser by loading htmlentitydefs
dummy = Dummy()
dummy.feed('<a href="&amp;">')
del dummy, Dummy

----------
components: Library (Lib)
messages: 291256
nosy: ale2017
priority: normal
severity: normal
status: open
title: HTMLParser class is not thread safe
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30011>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to