I am using the lxml.etree library to validate an xml instance file with a specified schema that contains the data types of each element. This is some of the internals of a function that extracts the elements:
schema_doc = etree.parse(schema_fn) schema = etree.XMLSchema(schema_doc) context = etree.iterparse(xml_fn, events=('start', 'end'), schema=schema) # get root event, root = context.next() for event, elem in context: if event == 'end' and elem.tag == self.tag: yield elem root.clear() I retrieve a list of elements from this... and do further processing to represent them in different ways. I need to be able to capture the data type from the schema definition for each field in the element. i.e. <xsd:element name="concept"> <xsd:complexType> <xsd:sequence> <xsd:element ref="foo"/> <xsd:element name="concept_id" type="xsd:string"/> <xsd:element name="line" type="xsd:integer"/> <xsd:element name="concept_value" type="xsd:string"/> <xsd:element ref="some_date"/> </xsd:sequence> </xsd:complexType> </xsd:element> My thought is to recursively traverse through the schema definition match the `name` attribute since they are unique to a `type` and return that element. But I can't seem to make it quite work. All the xml is valid, validation works, etc. This is what I have: def find_node(tree, name): for c in tree: if c.attrib.get('name') == name: return c if len(c) > 0: return find_node(c, name) return 0 I may have been staring at this too long, but when something is returned... it should be returned completely, no? This is what occurs with `return find_node(c, name) if it returns 0. `return c` works (used pdb to verify that), but the recursion continues and ends up returning 0. Thoughts and/or a different approach are welcome. Thanks -- http://mail.python.org/mailman/listinfo/python-list