[caiman-discuss] New XML DOM tree traversal module

Jack Schwartz Fri, 04 Apr 2008 15:41:38 -0700

Hi everyone.

Most effective use of an XML manifest by the Distro-Constructor, 
Automated Installer and perhaps other utilities requires an easy way to 
get to the data.  I have written a module which layers on top of the XML 
DOM API, allowing easy searching, adding, changing and writing of a DOM 
tree.


I wrote this tool module as part of my effort to infuse the DOM tree 
with default values and to aid in content checking, but plan on using it 
as well to extract data, do data layering and to save the data tree.

Tree nodes are specified with a unix-path-like syntax.  For example,  
given this:

<distribution name="Wizbang_Solaris_1.0" supports_install="true">
        <distro_constr_params>
             ....
        </distro_constr_params>
        <live_img_params>
                ....
                <root_user directlogin="true">
                        <shell>/bin/bash</shell>
                     ....
                </root_user>

I could specify the directlogin attribute of root_user as 
"live_img_params/root_user/directlogin" or the shell element as 
"live_img_params/root_user/shell"

In the case of multiple nodes with the same path specification, the API 
returns a list of all matching nodes. 

For example,

<distribution name="Wizbang_Solaris_1.0" supports_install="true">
        <live_img_params>
                <user username="joeblow" UID="1000" GID="100" 
directlogin="true">
                        <shell>/bin/ksh</shell>
                </user>
                <user username="diags" UID="1001" GID="1" 
directlogin="false">
                        <shell>/bin/rsh</shell>
                </user>

Searching for "live_img_params/user/username" would return both matches.

If a unique node is sought, the caller can choose to narrow the search 
by starting in the middle of the tree.  The tree can also be traversed 
through parent nodes using the unix path ".." convention.

Assumptions and design decisions:

1) There is only one active child text node per element node, and it is 
the first child text node found.  (Multiple  text nodes are more for 
HTML where, say bolded parts of a string are in their own text nodes.  
Not applicable here.)

2) Except for the document node at the top, the tree has only elements 
and attributes.  Top tree node (document node) has other things but they 
are not searchable or changeable.

3) Deleting elements or attributes is not supported because there isn't 
a use for it.

4) The root element node does not need to be specified in the path.  It 
is implied.  There is no need for it, as that is where the search starts 
when no path is given, and the root element node has no sybling element 
nodes.

5) Leaf status is noted for each element and attribute.  Leaf status is 
as it pertains to the path.  All attributes are leaves, as nothing 
follows them in their path.  Elements may or may not be leaves.

6)  The path back to the root is returned as part of a node's 
information.  This path does not include the root element itself, as the 
root is not specified in paths used to search.

Please review at http://cr.opensolaris.org/~schwartz/080404.1/review

TreeAcc.py is the module.  A few test programs and a test XML file are 
also there for demo/experimentation.

Please send comments by Friday 4/11 COB.

    Thanks,
    Jack

[caiman-discuss] New XML DOM tree traversal module

Reply via email to