Hello all,
re-replying to Jim's message.
On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:
> Funny that you bring this up... I'm been tracking down some bugs and they
> all seem to be XML related... fastsax->libwriterfilter with occasional cores
> due to __cxa_call_unexpected.
>
> I feel that making AOO more fragile by trying to work around cases where
> invalid and/or non-compliant XML is encountered is just wrong. We should
> either ignore the error (catch it) or raise an exception. Invalid data
> shouldn't
> be tolerated. Additionally, trying to be "lenient" is an easy vector for
> vulnerabilities.
For the record: the detection of duplicated attributes is made
internally by the expat library. Our code just receives the error
message and cannot do anything to recover it.
I don't believe it's worth patching expat to allow duplicated
attributes. I don't know the library well and I fear about the
consequences of tinkering with it.
But then my question becomes: do we want to offer any data recovery
tools for corrupted documents? Like ``dumb'' XML parsers that just
shave away XML errors?
1- it could be an external tool, written in a language that is easier
to code into? (like Python, Perl, Java... whatever)
2- or an internal pre-parsing phase? It should not be based on the
expat library though; do we have any other possibilities among the
current modules?
3- or we leave it to hand-crafting by knowledgeable people on the
forum, as it is happening now?
I am looking forward to opinions ... and possibily reviews of PR 122
please ;-)
Best regards,
--
Arrigo
http://rigo.altervista.org
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]