So Owl has a ZipLayer - for messages that contain the bytes of a zip file.

As part of exercising these new Daffodil Layer APIs, I have rewritten this
Zip Layer.

I have written a fuzz tester, the objective being to try to crash the
layers so they throw bad exceptions like NPE or other ill-behaved things.

E Voila.... I am able to get an SDE thrown by the InfosetWalker after the
parse is completed. It took 100,000 trials to get it to happen (only a few
seconds), but sure enough.

So fuzz testing is super helpful.

Turns out we throw a RemapPUACharDetected if data contains Unicode PUA
characters. Fuzz testing is causing the zip-layer to read some data as
unicode PUA characters, and when the InfosetOutputter is projecting that
string to XML, it catches this exception and escalates it to an SDE.

So bug 1 is that we can't throw an exception when a PUA character is
encountered. That's easily remedied by changing a flag already in the code.
Pre-existing PUA characters can just be passed through. Changing this
breaks no tests. The fact that we remap some XML illegal characters into
the PUA means there's some chance that some kinds of data won't round-trip
parse/unparse or unparse/parse due to this, but that's just
par-for-the-course when dealing with XML as a data format. I will create a
test for this, a separate ticket for it, and fix it separately from the
Layer changes.

But bug 2 is how the String-as-XML feature works. Turns out the catch of
Exceptions in the InfosetOutputter gets exercised by the Daffodil
"String-as-XML" extension on negative tests. When the string of data, which
is supposed to be XML, turns out to be malformed, an exception gets thrown
such as WstxUnexpectedCharException. That gets escalated to an SDE by the
InfosetOutputter and a few tests check for this.

This needs to be a ParseError. It's a problem with the data, not the
schema, so an SDE is really incorrect here.

I think we have to do this conversion in the parser so that if the XL is
malformed we can make it a ParseError.  How to make this work is a bit
tricky.

This is very out of scope of the Layering changes. But there is a real
issue here with String-as-XML feature being unable to backtrack on a data
error like the XML string is not well-formed XML. I'll report that as a
separate issue.

Reply via email to