On 8 Jun 2023, at 9:10, Jamie Norrish wrote: > I would approach this by first transforming each document into a > simpler structure, using XSLT. If you do not care about anything other > than tei:p, tei:w, and tei:sc elements, and for all of the latter two > to be children of the former, then your transform can go find all tei:p > (and any other containing elements you might have) and output them, and > then all descendant tei:w and tei:sc, as children.
lxml will also simply let you pass a list of tags into iterparse so you can do this directly while iterating. See https://lxml.de/parsing.html#iterparse-and-iterwalk Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Sengelsweg 34 Düsseldorf D- 40489 Tel: +49-203-3925-0390 Mobile: +49-178-782-6226 _______________________________________________ lxml - The Python XML Toolkit mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: [email protected]
