[lxml] Re: python lxml.objectify gives no attribute access to gco:CharacterString node

Holger.Joukl Tue, 01 Mar 2022 08:19:12 -0800

Hi Volker,

> > The reason for this is that obviously 
> > {http://www.isotc211.org/2005/gco}CharacterString is not a valid Python 
> > identifier and it makes sense
> > to restrict unqualified lookup to children from the same namespace.


> I like to disagree on

> > and it makes sense
> > to restrict unqualified lookup to children from the same namespace

I beg to differ :-).

> What does the namespace of a node has in common with the namespace of one of 
> its subnodes? Nothing. It is quite common in XML that you borrow from other 
> namespaces.

I'd rather assume the vast majority of XML documents do not consist of many 
different namespaces and heavily "oscillate" between parents and children
from different namespaces.
But I've no data to back up that claim.

For me, simply using parent['{/other/ns}child'] or getattr(parent, 
'{/other/ns}child') syntax just works.
Not as beautiful as parent.child but not ugly, either.

> Other namespace based python libs like for instance RDFlib solve this problem 
> generically by adding the namespace to the python property.
> {http://www.isotc211.org/2005/gco}CharacterString   -> gco_CharacterString

Well, but they're not adding the namespace but the ns-prefix. lxml uses 
Clarke-notation and qualified tag names everywhere,
which is less error-prone and preferable in my experience 
(http://www.jclark.com/xml/xmlns.htm).
No problems with namespace <--> prefix indirections, multiple prefixes for the 
same namespace, etc.

Note that to the best of my knowledge not every allowed prefix is a valid 
python identifier (I think all XML name chars except ":" are allowed).
So apart from tedious namespace <--> prefix indirection handling you'd also 
needed to cater for rules to replace characters
(e.g. <my-nsprefix:root xmlns:my-nsprefix="/my/ns"/> --> my_ns_root?). If you 
wanted to lookup children using a
<ns-prefix>_<unqualified name> attribute name syntax, that is.


> The problem lies deeply burrowed in the nature of LXML objectify 
> implementation. Objectify does not really transform the XML into a real 
> python instance hierarchy (as RDFlib does),
> but directs all attribute access via function calls to the C-libxml core. 
> This is on one hand a desired behavior since one so can change XML on-the-fly 
> and some of the changes
> are visible as well in the XML as also in the objectified representation.
> But on the other hand the information what namespace a node belongs to is not 
> persistent in the node and therefore cannot be used for lookup.

To me this is a feature rather than a problem and it's been a design decision 
from day one 
(https://mail.python.org/archives/list/[email protected]/message/GOTPAWC4MHI5LVS5FBZRSBAKTHHOXLBE/).
Some others seem to agree: 
https://zato.io/blog/posts/saving-time-with-a-pythonic-api-of-lxmlobjectify.html.

But I'm obviously and deeply biased ;-)

Best regards,
Holger







Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
HRA 4356, HRA 104 440
Amtsgericht Mannheim
HRA 40687
Amtsgericht Mainz

Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen 
Daten.
Informationen finden Sie unter https://www.lbbw.de/datenschutz.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]

[lxml] Re: python lxml.objectify gives no attribute access to gco:CharacterString node

Reply via email to