Hi, Hernan, Did you try to parse GML?
Surely, there can be very concise and smart ways to do these things. Regards, David On Fri, 6 Nov 2020 at 20:57, Hernán De Angelis <variablestarli...@gmail.com> wrote: > Thank you Terry, Dan and Dieter for encouraging me to post here. I have > already solved the problem albeit with a not so efficient solution. > Perhaps, it is useful to present it here anyway in case some light can > be added to this. > > My job is to parse a complicated XML (iso metadata) and pick up values > of certain fields in certain conditions. This goes for the most part > well. I am working with xml.etree.elementtree, which proved sufficient > for the most part and the rest of the project. JSON is not an option > within this project. > > The specific trouble was in this section, itself the child of a more > complicated parent: (for simplicity tags are renamed and namespaces > removed) > > <tagA> > <tagB> > <tagC> > <string>Something</string> > </tagC> > <tagC> > <string>Something else</string> > </tagC> > <tagC> > <note> > <title> > <string>value</string> > </title> > <date0> > <date1> > <date2> > <gco:Date>2020-11-06</gco:Date> > </date2> > <dateType> > <code blah lots of strange things blah /> > </dateType> > </date1> > </date0> > </note> > </tagC> > </tagB> > </tagA> > > Basically, I have to get what is in tagC/string but only if the value of > tagC/note/title/string is "value". As you see, there are several tagC, > all children of tagB, but tagC can have different meanings(!). And no, I > have no control over how these XML fields are constructed. > > In principle it is easy to make a "findall" and get strings for tagC, > using: > > elem.findall("./tagA/tagB/tagC/string") > > and then get the content and append in case there is more than one > tagC/string like: "Something, Something else". > > However, the hard thing to do here is to get those only when > tagC/note/title/string='value'. I was expecting to find a way of > specifying a certain construction in square brackets, like > [@string='value'] or [@/tagC/note/title/string='value'], as is usual in > XML and possible in xml.etree. However this proved difficult (at least > for me). So this is the "brute" solution I implemented: > > - find all children of tagA/tagB > - check if /tagA/tagB/tagC/note/title/string has "value" > - if yes find all tagA/tagB/tagC/string > > In quasi-Python: > > string = [] > element0 = elem.findall("./tagA/tagB/") > for element1 in element0: > element2 = element1.find("./tagA/tagB/tagC/note/title/string") > if element2.text == 'value' > element3 = element1.findall("./tagA/tagB/tagC/string) > for element4 in element3: > string.append(element4.text) > > > Crude, but works. As I wrote above, I was wishing that a bracketed > clause of the type [@ ...] already in the first "findall" would do a > more efficient job but alas my knowledge of xml is too rudimentary. > Perhaps something to tinker on in the coming weeks. > > Have a nice weekend! > > > > > > On 2020-11-06 20:10, Terry Reedy wrote: > > On 11/6/2020 11:17 AM, Hernán De Angelis wrote: > >> I am confronting some XML parsing challenges and would like to ask > >> some questions to more knowledgeable Python users. Apparently there > >> exists a group for such questions but that list (xml-sig) has > >> apparently not received (or archived) posts since May 2018(!). I > >> wonder if there are other list or forum for Python XML questions, or > >> if this list would be fine for that. > > > > If you don't hear otherwise, try here. Or try stackoverflow.com and > > tag questions with python and xml. > > > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list