Hi, I have created a very, very simple parser for an XML.
class FindGoXML2(ContentHandler): def characters(self, content): print content I have made it simple because I want to debug. This prints out any content enclosed by tags (right?). The XML is publicly available here: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml I show a few line embedded in this XML: <Gene-commentary_source> <Other-source> <Other-source_src> <Dbtag> <Dbtag_db>GO</Dbtag_db> <Dbtag_tag> <Object-id> <Object-id_id>3824</Object-id_id> </Object-id> </Dbtag_tag> </Dbtag> </Other-source_src> <Other-source_anchor>catalytic activity</Other-source_anchor> <Other-source_post-text>evidence: IEA</Other-source_post-text> </Other-source> </Gene-commentary_source> Notice the third line before the last. I expect my content printout to print out "evidence:IEA". However this is what I get. ------------------------- catalytic activity ==> this is the print out the line before e vidence: IEA ------------------------- I don't understand why a few blank lines were printed after "catalytic activity". But that doesn't matter. What matters is where the string "evidence: IEA" is split into two printouts. First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs without a problem, this occurs on my 826th XML. Any explanations?? Timothy
-- http://mail.python.org/mailman/listinfo/python-list