[EMAIL PROTECTED] wrote: > Fredrik Lundh schreef: > > [EMAIL PROTECTED] wrote: > > > I think I ran into a bug in the XML SAX parser. > > > > > > part of my program consist of reading a rather large XML file (about > > > 10Mb) containing a few thousand elements. > > > I have the following problem. Sometimes that SAX parses misreads a > > > line. > > > > it's not a bug; the parser is free to split up character runs (due to > > buffering, > > entities or character references, etc). it's up to you to merge character > > runs > > into strings. > > but how do I detect that the parser has split up the characters? I gues > I need to detect it in order to reconstruct the complete string
Here's a recipe: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/265881 Using this filter you can then write SAX code that assumes normalized text events. Also, 4Suite's SAX implementation, Saxlette, automatically does this text event merging for you at C speed: http://4suite.org/docs/CoreManual.xml#saxlette -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list