It seems like many people are surprised by when data successfully parses,
it does not successfully unparse.

This happens when the parsed data is well-formed, but values do not obey
the XSD facets. That is, the XML result from parsing is created, but it is
invalid.

Depending on  the DFDL schema, such data may not unparse successfully.

Users who test schemas with a simple parse -> unparse process and test data
that has this well-formed-but-invalid behavior may get the impression that
there is a problem with the schema, but really it is just that validation
errors coming out of the parse are not being escalated into true errors.
This behavior of daffodil holds regardless of whether Daffodil is
configured to do validation or not, as validation errors are never parse
errors. They are effectively just warnings. I think this is unintuitive to
many users, who expect the DFDL parse cannot produce invalid XML.

A test process that does parse -> XSD Validate -> unparse, is correct. The
XSD Validate step in the middle would block such messages as invalid and
they'd never get to the unparser so would not fail in the unparser.

With that background, should we have an option where at the end of a
Daffodil parse, if there are validation errors we can cause the entire
parse to be considered a failure? This is not the same as escalating
individual validation errors into parse errors as that would affect
backtracking behavior. This is a separate final check once the DFDL Infoset
has been ceated.

API users of Daffodil can of course inspect output for validation errors
and do this themselves. I just think they are not aware that this is needed.

Thoughts?



Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com

Reply via email to