Larry Trammell <ridge...@nwi.net> added the comment:

Sure...  I'll cut and paste some of the text I was organizing to go into a 
possible new issue page.

The only relevant documentation I could find was in the "xml.sax.handler" page 
in the Python 3.9.2 Documentation for the Python Standard Library (as it has 
been through many versions):

-----------
ContentHandler.characters(content) -- The Parser will call this method to 
report each chunk of character data.  SAX parsers may return all contiguous 
character data in a single chunk, or they may split it into several chunks...
-----------

As an example, here is a typical snippet taken from Web page

     https://www.tutorialspoint.com/parsing-xml-with-sax-apis-in-python 

The application example records the tag name "type" in the "CurrentData" 
member, and shortly thereafter, the "type" tag's content is received:

   # Call when a character is read
   def characters(self, content):
      if self.CurrentData == "type":
         self.type = content

Suppose that the parser receives the following text line from the input file.  

<type>SciFi</type>

Though there seems no reason for it, the parser could decide to deliver the 
content text as "Sc" followed by "iFi".  In that case, a second invocation of 
the "characters" method would overwrite the characters received in the first 
invocation, and some of the content text seems "lost."  

Given how rarely it happens, I suspect that when internal processing reaches 
the end of a block of buffered text from the input file, the easiest thing to 
do is to report any fragments of text that happen to remain at the end, no 
matter how tiny, and start fresh with the next internal buffer. Easy for the 
implementer, but baffling to the application developer.  And rare enough to 
elude application testing.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43483>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to