Stefan Behnel, 26.01.2011 10:29:
Johann Spies, 26.01.2011 10:07:
I an not a Python newbie but working with xml is new to me.
I get data through a soap connection, using suds, and want to convert that
to objects which I can use to populate a rather complex database.
Your problem description is pretty comprehensive in general. It would be
helpful to see an example snippet of the XML that you parse.
[example received in private e-mail]
I have been able to parse the xml using
tree = etree.iterparse(infile,events=("start","end")) but it seems like a
lot of work to get that to sql-objects.
I have seen references to lxml.objectify and have created a object
containing the contents of a whole file using
tree = objectify.parse(fileobject)
That object contains for example the data of 605 records and I do not know
how to use it. I could not figure out from the lxml.objectify documentation
how to do it.
In the end I want to use data from about 54 fields of each records. I would
like to have a list of dictionaries as a result of the parsing. From there
it should not be too difficult to create sql.
I think iterparse() is a good way to deal with this, as is objectify.
iterparse() has the advantage that you can dispose of handled records, thus
keeping memory usage low (if that's an issue here).
Using objectify, you would usually do something like this:
tree = objectify.parse(fileobject)
root = tree.getroot()
for record in root.xml_record_tag:
title_name = record.title.text
It really just depends on what your XML looks like. In the above, I assumed
that each record hangs directly below the root tag and is called
"xml_record_tag". I also assumed that each record has a "title" tag with
text content.
The example you sent me is almost perfect for lxml.objectify. Basically,
you'd do something like this:
for record in root.REC:
refs = [ ref.text for ref in record.item.refs.ref ]
publisher = record.item.copyright.publisher.text
for issue in record.issue:
units = [ unit.text for unit in issue.units.unit ]
and so on. The same in ET:
for record in root.findall('REC'):
refs = [ ref.text for ref in record.findall('item/refs/ref') ]
publisher = record.findtext('item/copyright/publisher')
for issue in record.findall('issue'):
units = [ unit.text for unit in issue.findall('units/unit') ]
Not much harder either.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list