[issue43483] Loss of content in simple (but oversize) SAX parsing

Larry Trammell Wed, 17 Mar 2021 10:14:59 -0700


Larry Trammell <ridge...@nwi.net> added the comment:


Great minds think alike I guess... 

I was thinking of a much smaller carryover size... maybe 1K. With individual 
text blocks longer than that, the user will almost certainly be dealing with 
collecting and aggregating content text anyway, and in that case, the problem 
is solved before it happens. 

Here is a documentation change I was experimenting with...

-----------
ContentHandler.characters(content) -- The Parser will call this method to 
report chunks of character data.  In general, character data may be reported as 
a single chunk or as sequence of chunks; but character data sequences with 
fewer than  xml.sax.handler.ContiguousChunkLength characters, when 
uninterrupted any other xml.sax.handler.ContentHandler event, are guaranteed to 
be delivered as a single chunk...  
-----------

That puts users on notice, "...wait, are my chunks of text smaller than that?" 
and they are less likely to be caught unaware.  But of course, the 
implementation change would be helpful even without this extra warning.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43483>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43483] Loss of content in simple (but oversize) SAX parsing

Reply via email to