On 2/7/2021 10:22 AM, Arrigo Marchiori wrote: > Hello all, > > re-replying to Jim's message. > > On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote: > >> Funny that you bring this up... I'm been tracking down some bugs and they >> all seem to be XML related... fastsax->libwriterfilter with occasional cores >> due to __cxa_call_unexpected. >> >> I feel that making AOO more fragile by trying to work around cases where >> invalid and/or non-compliant XML is encountered is just wrong. We should >> either ignore the error (catch it) or raise an exception. Invalid data >> shouldn't >> be tolerated. Additionally, trying to be "lenient" is an easy vector for >> vulnerabilities. > > For the record: the detection of duplicated attributes is made > internally by the expat library. Our code just receives the error > message and cannot do anything to recover it. > > I don't believe it's worth patching expat to allow duplicated > attributes. I don't know the library well and I fear about the > consequences of tinkering with it. > > But then my question becomes: do we want to offer any data recovery > tools for corrupted documents? Like ``dumb'' XML parsers that just > shave away XML errors? > > 1- it could be an external tool, written in a language that is easier > to code into? (like Python, Perl, Java... whatever) > > 2- or an internal pre-parsing phase? It should not be based on the > expat library though; do we have any other possibilities among the > current modules? > > 3- or we leave it to hand-crafting by knowledgeable people on the > forum, as it is happening now? > > I am looking forward to opinions ... and possibily reviews of PR 122 > please ;-) > > Best regards, > Purely from a users point of view I agree with Jim. It should not be allowed to happen. Asking the user to run an external program, our to send it to the forum to be hand edit is a recipe for disaster to our user base and from a marketing standpoint.
I could see an external program as a short term, stop gap work around. However it should only be that. Regards Keith
signature.asc
Description: OpenPGP digital signature