Jason Friedman wrote: > I have XML which looks like: > > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE KMART SYSTEM "my.dtd"> > <LEVEL_1> > <LEVEL_2 ATTR="hello"> > <ATTRIBUTE NAME="Property X" VALUE ="2"/> > </LEVEL_2> > <LEVEL_2 ATTR="goodbye"> > <ATTRIBUTE NAME="Property Y" VALUE ="NULL"/> > <LEVEL_3 ATTR="aloha"> > <ATTRIBUTE NAME="Property X" VALUE ="3"/> > </LEVEL_3> > <ATTRIBUTE NAME="Property Z" VALUE ="welcome"/> > </LEVEL_2> > </LEVEL_1> > > The "Property X" string appears twice times and I want to output the > "path" > that leads to all such appearances. In this case the output would be: > > LEVEL_1 {}, LEVEL_2 {"ATTR": "hello"}, ATTRIBUTE {"NAME": "Property X", > "VALUE": "2"} > LEVEL_1 {}, LEVEL_2 {"ATTR": "goodbye"}, LEVEL_3 {"ATTR": "aloha"}, > ATTRIBUTE {"NAME": "Property X", "VALUE": "3"} > > My actual XML file is 2000 lines and contains up to 8 levels of nesting.
That's still small, so xml = """<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE KMART SYSTEM "my.dtd"> <LEVEL_1> <LEVEL_2 ATTR="hello"> <ATTRIBUTE NAME="Property X" VALUE ="2"/> </LEVEL_2> <LEVEL_2 ATTR="goodbye"> <ATTRIBUTE NAME="Property Y" VALUE ="NULL"/> <LEVEL_3 ATTR="aloha"> <ATTRIBUTE NAME="Property X" VALUE ="3"/> </LEVEL_3> <ATTRIBUTE NAME="Property Z" VALUE ="welcome"/> </LEVEL_2> </LEVEL_1> """ import xml.etree.ElementTree as etree tree = etree.fromstring(xml) def walk(elem, path, token): path += (elem,) if token in elem.attrib.values(): yield path for child in elem.getchildren(): for match in walk(child, path, token): yield match for path in walk(tree, (), "Property X"): print(", ".join("{} {}".format(elem.tag, elem.attrib) for elem in path)) -- http://mail.python.org/mailman/listinfo/python-list