Dear Stefan!
Am 03.03.22 um 20:05 schrieb Stefan Behnel:
So … I think keeping prefixes generally out of the interface is a good
decision.
I share your sorrows. Therefore I never even thought of changing the
behavior of LXML - or even that of lxml.objectify. I will come up with
lxml.objectify2 which does not change anything in lxml (or objectify)
and is a pure addition. The user can decide to use lxml.objectify2 as
tree_class when parsing.
While I can see that it might be helpful for debugging purposes to see
that there are attributes like "html_image", no-one keeps them from
ending up as "s_image" or just "image" (with a default namespace and
no prefix), if the creator of the specific document at hand decides so.
Absolutely! To be 100% sure to access the child you need the {URI}Name
notation and there will never be a shortcut to that.
If we can agree on this, it is obious that it is no good idea to rely
your code on python properties exposed by lxml.objectify, at all:
* These properties are not representing all of the xml data.
* There is a magical assumption, that a property is only visible if its
namespace matches that of its parent. Sorry, but where is this supported
in the RFCs of XML?
* semantic changes can change the accessability visibility of properties.
* Property names does not even show their namespace or prefix.
I would never ever dare to base code on these brittle lxml.objectify
python properties. Do you?
OK let us assume for a moment that there are silly people out there
basing there code on a magical design decision from 2006 in a particular
python lib called lxml, that is based on no standard. So they are using
lxml.objectify properties literal in their code aka "process(node.image)".
Then we have the obligation to help these people. A week ago I was one
of them. I was not basing my code but my debugging on lxml.objectify -
but all the same.
I like to make the debugging of lxml less harmfull for people like me.
With lxml.objectify2 the code of such poor people, only relying on a
property name (prone to semantic changes), can be supported with
namespace prefixes, helping to gain more stability in closed contexts.
||The mapping of ns-prefix <-> ns-URI is already present in
lxml.objectify in any node.||
||||
||So in case of a xml file with
||
||xmlns html : |||||http://www.w3.org/1999/xhtml|
<https://www.w3.org/1999/xhtml>
||
|or an xml file with
|
||xmlns s : |||||http://www.w3.org/1999/xhtml|
<https://www.w3.org/1999/xhtml>
||
|there is in any case a stable (but for sure lokal/temporal) mapping
between the prefix html|s and the html namespace URI)|
|So the only thing that lxml.objectify2 does is mediating between
different representations (clarke, <prefix>_<name>) of the same property.||
|
|
|
|So if the user gets a property from an lxml.objectify2 entity that is
"s_image" lxml.objectify2 can map this (for this particulars xml-file)
to {||http://www.w3.org/1999/xhtml|
<https://www.w3.org/1999/xhtml>|}image when talking to the etree api.|
|If the user is of the overly optimistic kind they can use |||"s_image"
literal in their code. This will fail in some cases (depending on the
namepace mapping and context). But it will also fail if they use "image"
(standard lxml) in cases of changed semantics.||
||If we start a discussion here if changes in the namespace prefixes are
more likely to happen than semantic changes, the whole world will laught
at us. So I think we can agree that LXML should be resilent against both
types of change.
||
||
||
To make lxml.objectify2 perfect I can add the option for an user to add
a prefix-namespace mapping to lxml.objectify2. With this mapping any
code can define stable prefixes to work with while being independent of
the namespace prefixes of a given file. This is the same notion as for
instance for node.xpath(namespaces={}) in lxml.
||To conlude my proposal:||
||lxml.objectify2 is better:||
||* since it is an addition that changes nothing at the current
lxml/objectify
||
||* since it shows (__dict__) all sub_nodes (lxml.objectify does not)
||
||* since it shows also the namespace prefixes ||||||(lxml.objectify
does not)||||
||* since it allows for more possibilities to access/display a node||
|| unqualified property name -> 'image' [unstable]||||
|||||||| prefix qualified property name -> 'html_image' [locally or
in certain contexts stable||||]
full qualified property name ->
'{|||||||||||http://www.w3.org/1999/xhtm|
<https://www.w3.org/1999/xhtml>|}image' |||||||||||[globally
stable||||]||||
||
||
||lxml.objectify2 is worse:||
||<your comment>||
|Cheers,|
|Volker
|
||
|
|
|
|
|
|
||||||
||
||
--
=========================================================
inqbus Scientific Computing Dr. Volker Jaenisch
Hungerbichlweg 3 +49 (8860) 9222 7 92
86977 Burggenhttps://inqbus.de
=========================================================
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]