[lxml] Re: python lxml.objectify gives no attribute access to gco:CharacterString node

Holger.Joukl Fri, 04 Mar 2022 02:23:33 -0800

> Absolutely! To be 100% sure to access the child you need the {URI}Name 
> notation and there will never be a shortcut to that.
> If we can agree on this, it is obious that it is no good idea to rely your 
> code on python properties exposed by lxml.objectify, at all:
> * These properties are not representing all of the xml data.
> * There is a magical assumption, that a property is only visible if its 
> namespace matches that of its parent. Sorry, but where is this supported in 
> the RFCs of XML?


Where do the W3C XML Recommendations say anything about how an XML data 
handling library's API should
behave?

> * semantic changes can change the accessability visibility of properties.
> * Property names does not even show their namespace or prefix.
> I would never ever dare to base code on these brittle lxml.objectify python 
> properties. Do you?

I don't consider them brittle at all. How attribute lookup works is 
well-defined. Semantic changes (e.g. an element changes its
namespace, not merely a prefix) means you deal with a whole other element 
(qualified name changed).
This requires you to rewrite child-accessing code anyway, unless you want to 
access all elements, regardless of
semantic meaning. But then, you could just as well user iterchildren() or 
iter().

> OK let us assume for a moment that there are silly people out there basing 
> there code on a magical design decision from 2006 in
>a particular python lib called lxml, that is based on no standard. So they are 
>using lxml.objectify properties literal in their code aka
> "process(node.image)".

There was no "magical design decision". It was a deliberate design choice that 
elem.<attr> should lookup on ns-unqualified tag attr names
from elem's namespace, not from other namespaces.

But you can just as easily lookup qualified using getattr(elem, 
'{/other/ns'attr') or elem['{/other/ns'attr].

> Then we have the obligation to help these people. A week ago I was one of 
> them. I was not basing my code but my debugging on lxml.objectify - but all 
> the same.
> I like to make the debugging of lxml less harmfull for people like me. With 
> lxml.objectify2 the code of such poor people, only relying on a property name
> (prone to semantic changes), can be supported with namespace prefixes, 
> helping to gain more stability in closed contexts.

But what's the use case here: Iterate over elem.__dict__ and then getattr(elem, 
key) for every key?
Simply use elem.iterchildren() instead.

I understand the debugging inconvenience (and proposed a possible mitigation 
for it, though
I'm unsure if this should be actually done because of littering __dict__ and 
dir() with
names that are not valid Python identifiers, which is uncommon).

> The mapping of ns-prefix <-> ns-URI is already present in lxml.objectify in 
> any node.
> So in case of a xml file with
> xmlns html : https://www.w3.org/1999/xhtml
> or an xml file with
> xmlns s : https://www.w3.org/1999/xhtml
> there is in any case a stable (but for sure lokal/temporal) mapping between 
> the prefix html|s and the html namespace URI)
> So the only thing that lxml.objectify2 does is mediating between different 
> representations (clarke, <prefix>_<name>) of the same property.
>
> So if the user gets a property from an lxml.objectify2 entity that is 
> "s_image" lxml.objectify2 can map this (for this particulars xml-file)
> to {https://www.w3.org/1999/xhtml}image when talking to the etree api.
> If the user is of the overly optimistic kind they can use "s_image" literal 
> in their code. This will fail in some cases (depending on the namepace
> mapping and context). But it will also fail if they use "image" (standard 
> lxml) in cases of changed semantics.
> If we start a discussion here if changes in the namespace prefixes are more 
> likely to happen than semantic changes,
> the whole world will laught at us. So I think we can agree that LXML should 
> be resilent against both types of change.

So then what's the point of using elem.s_image in my code when this may 
suddenly cease to work if I could simply
use getattr(elem, '{https://www.w3.org/1999/xhtml}attr') instead and be safe?

IMHO You can't be resilient against those changes: If the meaning of a variable 
"color" (in XML terms: "{/my/colors}color")
changes you'll have to change your code. Because color now suddenly contains 
sounds, not colors :-).
And if its name changes to "my_shiny_color" (in XML terms 
"{/my/shiny/colors}color") you'll have to change
your code, because now the name "color" is undefined.

> To make lxml.objectify2 perfect I can add the option for an user to add a 
> prefix-namespace mapping to lxml.objectify2. With this mapping any code can 
> define stable prefixes to work with while being independent of the namespace 
> prefixes of a given file. This is the same notion
> as for instance for node.xpath(namespaces={}) in lxml.

Which you'd have to hand in when parsing? Because you can't hand it in for 
elem.attr syntax or getattr(elem, 'attr'),
unless you'd  want to override Python's built-in getattr.

> To conlude my proposal:
> lxml.objectify2 is better:
> * since it is an addition that changes nothing at the current lxml/objectify
> * since it shows (__dict__) all sub_nodes (lxml.objectify does not)
> * since it shows also the namespace prefixes (lxml.objectify does not)
> * since it allows for more possibilities to access/display a node
>    unqualified property name -> 'image' [unstable]
>    prefix qualified property name -> 'html_image' [locally or in certain 
> contexts stable]
>    full qualified property name -> '{https://www.w3.org/1999/xhtml}image' 
> [globally stable]

Do you propose another way to access with fully qualified property name  apart 
from getattr() / indexed access?
As it stands, this is already available and your other 2 methods are unstable 
(so unusable, in my book).
And I fail to imagine s.th. like elem.https_www_w3_org_1999_xhtml_image as  a 
desirable API.

So to me, the proposal doesn’t add substantial gain apart from the debugging 
visibility but rather
adds ambiguity.
Maybe the debugging inconvenience could be addressed in lxml.objectify, as I 
mentioned. Maybe
you could simply implement it in pure Python on top of objectify, as Stefan 
suggested.
Or maybe teach your IDE to better support your debugging needs.

One last thing I'd respectfully ask you to:
> I would never ever dare to base code on these brittle lxml.objectify python 
> properties. Do you?
> [...] silly people out there basing there code on a magical design decision 
> from 2006 in a particular python lib called lxml,
> that is based on no standard.
> Then we have the obligation to help these people
> I like to make the debugging of lxml less harmfull for people like me. With 
> lxml.objectify2 the code of such poor people, only relying on a property name
> If we start a discussion here if changes in the namespace prefixes are more 
> likely to happen than semantic changes, the whole world will laught at us

Please tone down your wording ("brittle lxml.objectify properties", "silly 
people basing there code on a magical design decision from 2006",
"based on no standard", "obligation to help these people", "make the debugging 
of lxml less harmful", "poor people",
"the whole world will laught at us").
It doesn't help.

Holger







Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
HRA 4356, HRA 104 440
Amtsgericht Mannheim
HRA 40687
Amtsgericht Mainz

Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen 
Daten.
Informationen finden Sie unter https://www.lbbw.de/datenschutz.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]

[lxml] Re: python lxml.objectify gives no attribute access to gco:CharacterString node

Reply via email to