xml.sax problem

Timothy Wu Sun, 23 Mar 2008 22:35:37 -0700

Hi,

I have created a very, very simple parser for an XML.


class FindGoXML2(ContentHandler):
    def characters(self, content):
        print content

I have made it simple because I want to debug. This prints out any content
enclosed by tags (right?).

The XML is publicly available here:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml

I show a few line embedded in this XML:

              <Gene-commentary_source>
                <Other-source>
                  <Other-source_src>
                    <Dbtag>
                      <Dbtag_db>GO</Dbtag_db>
                      <Dbtag_tag>
                        <Object-id>
                          <Object-id_id>3824</Object-id_id>
                        </Object-id>
                      </Dbtag_tag>
                    </Dbtag>
                  </Other-source_src>
                  <Other-source_anchor>catalytic
activity</Other-source_anchor>
                  <Other-source_post-text>evidence:
IEA</Other-source_post-text>
                </Other-source>
              </Gene-commentary_source>

Notice the third line before the last. I expect my content printout to print
out "evidence:IEA".
However this is what I get.

-------------------------
catalytic activity  ==> this is the print out the line before



e
vidence: IEA
-------------------------

I don't understand why a few blank lines were printed after "catalytic
activity". But that
doesn't matter. What matters is where the string "evidence: IEA" is split
into two printouts.
First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs
without a problem,
this occurs on my 826th XML.

Any explanations??

Timothy

-- 
http://mail.python.org/mailman/listinfo/python-list

xml.sax problem

Reply via email to