On 16/09/2010 22:46, John Nagle wrote:
There's a tendency to use "dynamic attributes" in Python when
trying to encapsulate objects from other systems. It almost
works. But it's usually a headache in the end, and should be
discouraged. Here's why.
Some parsers, like BeautifulSoup, try to encapsulate HTML tag
fields as Python attributes. This gives trouble for several reasons.
First, the syntax for Python attributes and Python tags is different.
Some strings won't convert to attributes. You can crash BeautifulSoup
(which is supposed to be robust against bad HTML) by using a non-ASCII
character in a tag in an HTML document it is parsing.
Then there's the reserved word problem. "class" is a valid field
name in HTML, and a reserved word in Python. So there has to be a
workaround for reserved words.
There's also the problem that user-created attributes go into the
same namespace as other object attributes. This creates a vulnerability
comparable to MySQL injection. If an attacker controls the input
being parsed, they may be able to induce a store into something
they shouldn't be able to access.
This problem shows up again in "suds", the module for writing
SOAP RPC clients. This module tries to use attributes for
XML structures, and it almost works. It tends to founder when
the XML data model has strings that aren't valid attributes.
("-" appears frequently in XML fields, but is not valid in an
attribute name.)
Using a dictionary, or inheriting an object from "dict", doesn't
create these problems. The data items live in their own dictionary,
and can't clash with anything else. Of course, you have to write
tag['a']
instead of
tag.a
but then, at least you know what to do when you need
tag['class']
"suds", incidentally, tries to do both. They accept both
item.fieldname
and
item['fieldname']
But they are faking a dictionary, and it doesn't quite work right.
'fieldname' in item
works correctly, but the form to get None when the field is missing,
item.get('fieldname',None)
isn't implemented.
Much of the code that uses objects as dictionaries either predates
the days when you couldn't inherit from "dict", or was written by
Javascript programmers. (In Javascript, an object and a dictionary
are the same thing. In Python, they're not.) In new code, it's
better to inherit from "dict". It eliminates the special cases.
For the work on updating the re module there was a discussion about
whether named capture groups should be available as attributes of the
match object or via subscripting (or both?). Subscripting seemed
preferable to me because:
1. Adding attributes looks too much like 'magic'.
2. What should happen if a group name conflicts with a normal attribute?
3. What should happen if a group name conflicts with a reserved word?
For those reasons the new regex module uses subscripting. It's more
Pythonic, IMHO.
--
http://mail.python.org/mailman/listinfo/python-list