Doug OLeary wrote: > Hey; > > Reasonably new to python and incredibly new to xml much less trying to > parse it. I need to identify cluster nodes from a series of weblogic xml > configuration files. I've figured out how to get 75% of them; now, I'm > going after the edge case and I'm unsure how to proceed. > > Weblogic xml config files start with namespace definitions then a number > of child elements some of which have children of their own. > > The element that I'm interested in is <server> which will usually have a > subelement called <listen-address> containing the hostname that I'm > looking for. > > Following the paradigm of "we love standards, we got lots of them", this > model doesn't work everywhere. Where it doesn't work, I need to look for a > subelement of <server> called <machine>. That element contains an alias > which is expanded in a different root child, at the same level as > <server>. > > So, picture worth a 1000 words: > > <?xml version='1.0' encoding='UTF-8'?> > < [[ heinous namespace xml snipped ]] > > <name>[[text]]</name> > ... > <server> > <name>EDIServices_MS1</name> > ... > <machine>EDIServices_MC1</machine> > ... > </server> > <server> > <name>EDIServices_MS2</name> > ... > <machine>EDIServices_MC2</machine> > ... > </server> > <machine xsi:type="unix-machineType"> > <name>EDIServices_MC1</name> > <node-manager> > <name>EDIServices_MC1</name> > <nm-type>SSL</nm-type> > <listen-address>host001</listen-address> > <listen-port>7001</listen-port> > </node-manager> > </machine> > <machine xsi:type="unix-machineType"> > <name>EDIServices_MC2</name> > <node-manager> > <name>EDIServices_MC2</name> > <listen-address>host002</listen-address> > <listen-port>7001</listen-port> > </node-manager> > </machine> > </domain> > > So, running it on 'normal' config, I get: > > $ ./lxml configs/EntsvcSoa_Domain_config.xml > EntsvcSoa_CS => host003.myco.com > EntsvcSoa_CS => host004.myco.com > > Running it against the abi-normal config, I'm currently getting: > > $ ./lxml configs/EDIServices_Domain_config.xml > EDIServices_CS => EDIServices_MC1 > EDIServices_CS => EDIServices_MC2 > > Using the examples above, I would like to translate EDIServices_MC1 and > EDIServices_MC2 to host001 and host002 respectively. > > The primary loop is: > > for server in root.findall('ns:server', namespaces): > cs = server.find('ns:cluster', namespaces) > if cs is None: > continue > # cluster_name = server.find('ns:cluster', namespaces).text > cluster_name = cs.text > listen_address = server.find('ns:listen-address', namespaces) > server_name = listen_address.text > if server_name is None: > machine = server.find('ns:machine', namespaces) > if machine is None: > continue > else: > server_name = machine.text > > print("%-15s => %s" % (cluster_name, server_name)) > > (it's taken me days to write 12 lines of code... good thing I don't do > this for a living :) )
You tend to get more efficient when you read the tutorial before you start writing code. Hard-won advice that I still not always follow myself ;) > > Rephrased, I need to find the <listen-address> under the <machine> child > who's name matches the name under the corresponding <server> child. From > some of the examples on the web, I believe xpath might help but I've not > been able to get even the simple examples working. Go figure, I just > figured out what a namespace is... > > Any hints/tips/suggestions greatly appreciated especially with complete > noob tutorials for xpath. Use your favourite search engine. One advantage of XPath is that it's not limited to Python. I did not completely follow your question, so the example below is my interpretation of what you are asking for. It may still help you get started... $ cat lxml_translate_host.py from lxml import etree s = """\ <?xml version='1.0' encoding='UTF-8'?> <domain> <name>text</name> <server> <name>EDIServices_MS1</name> <machine>EDIServices_MC1</machine> </server> <server> <name>EDIServices_MS2</name> <machine>EDIServices_MC2</machine> </server> <machine type="unix-machineType"> <name>EDIServices_MC1</name> <node-manager> <name>EDIServices_MC1</name> <nm-type>SSL</nm-type> <listen-address>host001</listen-address> <listen-port>7001</listen-port> </node-manager> </machine> <machine type="unix-machineType"> <name>EDIServices_MC2</name> <node-manager> <name>EDIServices_MC2</name> <listen-address>host002</listen-address> <listen-port>7001</listen-port> </node-manager> </machine> </domain> """.encode() root = etree.fromstring(s) for server in root.xpath("./server"): servername = server.xpath("./name/text()")[0] print("server", servername) if not servername.isidentifier(): raise ValueError("Kind regards to Bobby Tables' Mom") machine = server.xpath("./machine/text()")[0] print("machine", machine) path = ("../machine[name='{}']/node-manager/" "listen-address/text()").format(machine) host = server.xpath(path)[0] print("host", host) print() $ python3 lxml_translate_host.py server EDIServices_MS1 machine EDIServices_MC1 host host001 server EDIServices_MS2 machine EDIServices_MC2 host host002 $ -- https://mail.python.org/mailman/listinfo/python-list