No, it is XML metadata. I also believe there should be a better way using [@...] expressions in the path.
H. Den lör 7 nov. 2020 13:14Shaozhong SHI <shishaozh...@gmail.com> skrev: > Hi, Hernan, > > Did you try to parse GML? > > Surely, there can be very concise and smart ways to do these things. > > Regards, > > David > > On Fri, 6 Nov 2020 at 20:57, Hernán De Angelis < > variablestarli...@gmail.com> wrote: > >> Thank you Terry, Dan and Dieter for encouraging me to post here. I have >> already solved the problem albeit with a not so efficient solution. >> Perhaps, it is useful to present it here anyway in case some light can >> be added to this. >> >> My job is to parse a complicated XML (iso metadata) and pick up values >> of certain fields in certain conditions. This goes for the most part >> well. I am working with xml.etree.elementtree, which proved sufficient >> for the most part and the rest of the project. JSON is not an option >> within this project. >> >> The specific trouble was in this section, itself the child of a more >> complicated parent: (for simplicity tags are renamed and namespaces >> removed) >> >> <tagA> >> <tagB> >> <tagC> >> <string>Something</string> >> </tagC> >> <tagC> >> <string>Something else</string> >> </tagC> >> <tagC> >> <note> >> <title> >> <string>value</string> >> </title> >> <date0> >> <date1> >> <date2> >> <gco:Date>2020-11-06</gco:Date> >> </date2> >> <dateType> >> <code blah lots of strange things blah /> >> </dateType> >> </date1> >> </date0> >> </note> >> </tagC> >> </tagB> >> </tagA> >> >> Basically, I have to get what is in tagC/string but only if the value of >> tagC/note/title/string is "value". As you see, there are several tagC, >> all children of tagB, but tagC can have different meanings(!). And no, I >> have no control over how these XML fields are constructed. >> >> In principle it is easy to make a "findall" and get strings for tagC, >> using: >> >> elem.findall("./tagA/tagB/tagC/string") >> >> and then get the content and append in case there is more than one >> tagC/string like: "Something, Something else". >> >> However, the hard thing to do here is to get those only when >> tagC/note/title/string='value'. I was expecting to find a way of >> specifying a certain construction in square brackets, like >> [@string='value'] or [@/tagC/note/title/string='value'], as is usual in >> XML and possible in xml.etree. However this proved difficult (at least >> for me). So this is the "brute" solution I implemented: >> >> - find all children of tagA/tagB >> - check if /tagA/tagB/tagC/note/title/string has "value" >> - if yes find all tagA/tagB/tagC/string >> >> In quasi-Python: >> >> string = [] >> element0 = elem.findall("./tagA/tagB/") >> for element1 in element0: >> element2 = element1.find("./tagA/tagB/tagC/note/title/string") >> if element2.text == 'value' >> element3 = element1.findall("./tagA/tagB/tagC/string) >> for element4 in element3: >> string.append(element4.text) >> >> >> Crude, but works. As I wrote above, I was wishing that a bracketed >> clause of the type [@ ...] already in the first "findall" would do a >> more efficient job but alas my knowledge of xml is too rudimentary. >> Perhaps something to tinker on in the coming weeks. >> >> Have a nice weekend! >> >> >> >> >> >> On 2020-11-06 20:10, Terry Reedy wrote: >> > On 11/6/2020 11:17 AM, Hernán De Angelis wrote: >> >> I am confronting some XML parsing challenges and would like to ask >> >> some questions to more knowledgeable Python users. Apparently there >> >> exists a group for such questions but that list (xml-sig) has >> >> apparently not received (or archived) posts since May 2018(!). I >> >> wonder if there are other list or forum for Python XML questions, or >> >> if this list would be fine for that. >> > >> > If you don't hear otherwise, try here. Or try stackoverflow.com and >> > tag questions with python and xml. >> > >> > >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > -- https://mail.python.org/mailman/listinfo/python-list