John and Stefan, On Jan 23, 5:33 am, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Hi, > > Mike Driscoll wrote: > > I got lxml to create a tree by doing the following: > > > from lxml import etree > > from StringIO import StringIO > > > parser = etree.HTMLParser() > > tree = etree.parse(filename, parser) > > xml_string = etree.tostring(tree) > > context = etree.iterparse(StringIO(xml_string)) > > No idea why you need the two steps here. lxml 2.0 supports parsing HTML in > iterparse() directly when you pass the boolean "html" keyword.
I don't know why I have 2 steps either, now that I look at it. However, I don't do enough XML parsing to get real familiar with the ins and outs of Python parsing either, so it's mainly just my inexperience. And I also got lost in the lxml tutorials... > > > However, when I iterate over the contents of "context", I can't figure > > out how to nab the row's contents: > > > for action, elem in context: > > if action == 'end' and elem.tag == 'relationship': > > # do something...but what!? > > # this if statement probably isn't even right > > I would really encourage you to use the normal parser here instead of > iterparse(). > > from lxml import etree > parser = etree.HTMLParser() > > # parse the HTML/XML melange > tree = etree.parse(filename, parser) > > # if you want, you can construct a pure XML document > row_root = etree.Element("newroot") > for row in tree.iterfind("//Row"): > row_root.append(row) > > In your specific case, I'd encourage using lxml.objectify: > > http://codespeak.net/lxml/dev/objectify.html > > It will allow you to do this (untested): > > from lxml import etree, objectify > parser = etree.HTMLParser() > lookup = objectify.ObjectifyElementClassLookup() > parser.setElementClassLookup(lookup) > > tree = etree.parse(filename, parser) > > for row in tree.iterfind("//Row"): > print row.relationship, row.StartDate, row.Priority * 2.7 > > Stefan I'll give your ideas a go and also see if what the others posted will be cleaner or faster. Thank you all. Mike -- http://mail.python.org/mailman/listinfo/python-list