Re: Finding all instances of a string in an XML file

2013-06-23 Thread Jason Friedman
 xml = ?xml version=1.0 encoding=UTF-8?
 !DOCTYPE KMART SYSTEM my.dtd
 LEVEL_1
   LEVEL_2 ATTR=hello
 ATTRIBUTE NAME=Property X VALUE =2/
   /LEVEL_2
   LEVEL_2 ATTR=goodbye
 ATTRIBUTE NAME=Property Y VALUE =NULL/
 LEVEL_3 ATTR=aloha
   ATTRIBUTE NAME=Property X VALUE =3/
 /LEVEL_3
 ATTRIBUTE NAME=Property Z VALUE =welcome/
   /LEVEL_2
 /LEVEL_1
 

 import xml.etree.ElementTree as etree

 tree = etree.fromstring(xml)

 def walk(elem, path, token):
 path += (elem,)
 if token in elem.attrib.values():
 yield path
 for child in elem.getchildren():
 for match in walk(child, path, token):
 yield match

 for path in walk(tree, (), Property X):
 print(, .join({} {}.format(elem.tag, elem.attrib) for elem in
 path))

 Peter, thank you, that exactly meets my need.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Finding all instances of a string in an XML file

2013-06-21 Thread Peter Otten
Jason Friedman wrote:

 I have XML which looks like:
 
 ?xml version=1.0 encoding=UTF-8?
 !DOCTYPE KMART SYSTEM my.dtd
 LEVEL_1
   LEVEL_2 ATTR=hello
 ATTRIBUTE NAME=Property X VALUE =2/
   /LEVEL_2
   LEVEL_2 ATTR=goodbye
 ATTRIBUTE NAME=Property Y VALUE =NULL/
 LEVEL_3 ATTR=aloha
   ATTRIBUTE NAME=Property X VALUE =3/
 /LEVEL_3
 ATTRIBUTE NAME=Property Z VALUE =welcome/
   /LEVEL_2
 /LEVEL_1
 
 The Property X string appears twice times and I want to output the
 path
 that leads  to all such appearances.  In this case the output would be:
 
 LEVEL_1 {}, LEVEL_2 {ATTR: hello}, ATTRIBUTE {NAME: Property X,
 VALUE: 2}
 LEVEL_1 {}, LEVEL_2 {ATTR: goodbye}, LEVEL_3 {ATTR: aloha},
 ATTRIBUTE {NAME: Property X, VALUE: 3}
 
 My actual XML file is 2000 lines and contains up to 8 levels of nesting.

That's still small, so

xml = ?xml version=1.0 encoding=UTF-8?
!DOCTYPE KMART SYSTEM my.dtd
LEVEL_1
  LEVEL_2 ATTR=hello
ATTRIBUTE NAME=Property X VALUE =2/
  /LEVEL_2
  LEVEL_2 ATTR=goodbye
ATTRIBUTE NAME=Property Y VALUE =NULL/
LEVEL_3 ATTR=aloha
  ATTRIBUTE NAME=Property X VALUE =3/
/LEVEL_3
ATTRIBUTE NAME=Property Z VALUE =welcome/
  /LEVEL_2
/LEVEL_1


import xml.etree.ElementTree as etree

tree = etree.fromstring(xml)

def walk(elem, path, token):
path += (elem,)
if token in elem.attrib.values():
yield path
for child in elem.getchildren():
for match in walk(child, path, token):
yield match

for path in walk(tree, (), Property X):
print(, .join({} {}.format(elem.tag, elem.attrib) for elem in path))


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Finding all instances of a string in an XML file

2013-06-21 Thread dieter
Jason Friedman jsf80...@gmail.com writes:

 I have XML which looks like:

 ?xml version=1.0 encoding=UTF-8?
 !DOCTYPE KMART SYSTEM my.dtd
 LEVEL_1
   LEVEL_2 ATTR=hello
 ATTRIBUTE NAME=Property X VALUE =2/
   /LEVEL_2
   LEVEL_2 ATTR=goodbye
 ATTRIBUTE NAME=Property Y VALUE =NULL/
 LEVEL_3 ATTR=aloha
   ATTRIBUTE NAME=Property X VALUE =3/
 /LEVEL_3
 ATTRIBUTE NAME=Property Z VALUE =welcome/
   /LEVEL_2
 /LEVEL_1

 The Property X string appears twice times and I want to output the path
 that leads  to all such appearances.

You could use lxml and its xpath support.

This is a high end approach: you would use a powerful (and big)
infrastructure (but one which could also be of use for other
XML applications). There are more elementary approaches as well
(e.g. parse the XML into a DOM and provide your own visitor
to find the elements you are interested in).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Finding all instances of a string in an XML file

2013-06-21 Thread Jason Friedman
Thank you Peter and Dieter, will give those thoughts a try and report back.
-- 
http://mail.python.org/mailman/listinfo/python-list


Finding all instances of a string in an XML file

2013-06-20 Thread Jason Friedman
I have XML which looks like:

?xml version=1.0 encoding=UTF-8?
!DOCTYPE KMART SYSTEM my.dtd
LEVEL_1
  LEVEL_2 ATTR=hello
ATTRIBUTE NAME=Property X VALUE =2/
  /LEVEL_2
  LEVEL_2 ATTR=goodbye
ATTRIBUTE NAME=Property Y VALUE =NULL/
LEVEL_3 ATTR=aloha
  ATTRIBUTE NAME=Property X VALUE =3/
/LEVEL_3
ATTRIBUTE NAME=Property Z VALUE =welcome/
  /LEVEL_2
/LEVEL_1

The Property X string appears twice times and I want to output the path
that leads  to all such appearances.  In this case the output would be:

LEVEL_1 {}, LEVEL_2 {ATTR: hello}, ATTRIBUTE {NAME: Property X,
VALUE: 2}
LEVEL_1 {}, LEVEL_2 {ATTR: goodbye}, LEVEL_3 {ATTR: aloha},
ATTRIBUTE {NAME: Property X, VALUE: 3}

My actual XML file is 2000 lines and contains up to 8 levels of nesting.

I have tried this so far (partial code, using the xml.etree.ElementTree
module):
def get_path(data_dictionary, val, path):
  for node in data_dictionary[CHILDREN]:
if node[CHILDREN]:
if not path or node[TAG] != path[-1]:
path.append(node[TAG])
print(CR + recursing ...)
get_path(node, val, path)
else:
for k,v in node[ATTRIB].items():
if v == val:
print(path- ,path)
print(  + node[TAG] +   + str(node[ATTRIB]))

I'm really not even close to getting the output I am looking for.
Python 3.2.2.
Thank you.
-- 
http://mail.python.org/mailman/listinfo/python-list